[Haskell-cafe] two problems with Data.Binary and Data.ByteString

Tim Newsham newsham at lava.net
Wed Aug 13 23:40:56 EDT 2008


> I think he might be saying that decodeFile is not the place for
> application code, because the lower level cannot possibly know whether
> it makes sense for there to be residual data in the ByteString. There
> are plenty of file formats that consist of back-to-back concatenated
> chunks of data, in which reading just one chunk does not by any means
> require that a file can only contain one.

Right, but because of the way decodeFile works, whenever you do
not have a data type that explicitely checks for EOF in it's
Get definition, decodeFile will leak a file handle.  There is no
way to check that there is residual data, to access it, or to
close the file handle.  Since this is the normal state of affairs
(are there any Get definitions in the current library which check
for EOF when done?) I would suggest that this is an API bug.

I would suggest that "decodeFile" should check for EOF when done.
A second wrapper function "decodePartialFile" could return the
unconsumed data, perhaps, for situations when the EOF behavior is
not desired, or return some other way for the file to be closed.

Additionally, I would suggest that the Data.Binary library provide
a combinator for consuming data fully (ie. checking for EOF).  ie:

    fully :: Get a -> Get a
    fully a = do
       x <- a
       e <- isEmpty
       return $ case e of
           False -> error "expected EOF"
           True  -> x

    decodeFully = runGet $ fully get
    decodeFile fn = decodeFully <$> B.readFile fn

to make it easy for developers who do not use the decodeFile
primitive to add EOF checking to their marshalling functions.

As it currently stands, the most obvious application of the Data.Binary
API leads to subtly confusing errors that may go unnoticed for a
while.  (This would be a fine point for the documentation to address
to prevent others from falling in the same hole).

I'm currently using definitions like these and (`using` rnf) and
have a server that is able to repeatedly read and write the state
file.  Many thanks to Dons, Brian, Duncan and everyone else who
helped me out...

Tim Newsham
http://www.thenewsh.com/~newsham/


More information about the Haskell-Cafe mailing list