[Haskell-beginners] More Deserialization Woes

Yitzchak Gale gale at sefer.org
Tue Jul 6 06:07:29 EDT 2010


Tom Hobbs wrote:
> I've been reading through various tutorials and they all put IO as the
> outermost monad, like you suggest.  However, I don't think that's what I
> want.

It is definitely what you want.

> ...am I in a niche where my requirement makes sense

No, you are doing something quite routine.

> or does my requirement make no sense

Your requirement is fine, and you would have no trouble
satisfying it with IO as your outer type.

But there are two basic approaches to dealing with failure:
returning a pure value that indicates the failure, like Maybe
or Either, or throwing an exception in IO that is not reflected
in the type. Since you are using Data.Binary for deserialization,
that is designed to use the second method. So rather than
spending more time on how to structure your types to indicate
failure, let's leave that aside for now and focus on how to do
deserialization. Error processing will automatically happen the
way you say - if anything goes really wrong in the middle, an
exception will be thrown and the entire operation will terminate
immediately. Later on, you can learn how to catch the exception
and do something other than end your program with the
standard error message.

> readNames 0 _ = []
> readNames n h = do
>   length <- fmap (fromIntegral . runGet getWord32be) $ L.hGet h 4
>   name <- L.hGet h length
>   (UTF.toString name) : readNames (n-1) h

Besides the type errors, which others have been helping you with,
(and another minor point - avoid using "length" as a variable name,
it is the name of one of the most commonly used Prelude functions),
let's look at the whole approach.

You are ping-ponging back and forth here between the Get monad
and manually reading ByteStrings from the handle.

The idea of the Get monad is to give a complete description
of your serialization format. Then, reading the ByteStrings will be
driven by your serialization format - just the right number of bytes
will automatically be read off the wire at each stage.

Here is the serialization format (note that we're not reading anything
here, just describing the format):

readNames :: Int -> Get [String]
readNames n = replicateM n $ do
  len <- getWord32be
  name <- getByteString len
  return $ UTF8.toString name

Now, in your "main" function (whose type is IO ()), you can
write:

  names <- fmap (runGet $ readNames n) $ L.hGetContents h

That will read bytes off the wire lazily, just the right number
of bytes to deserialize n names.

Of course, that will leave your handle in an unusable state.
If you have more to read after that, you have a few options.
Best is to combine everything you need to read out of
that handle into a single Get monad object that describes
the entire deserialization. Another (messier) approach is to use
runGetState instead of runGet - that gives you, in addition
to the deserialized data, a lazy ByteString that represents
additional bytes that can later be read off the handle.

Regards,
Yitz


More information about the Beginners mailing list