[Haskell-cafe] The state of binary (de)serialization

Wed Feb 27 07:49:53 CET 2013

On Mon, Feb 25, 2013 at 01:30:40PM +0100, Nicolas Trangez wrote:
> All,
> 
> In order to implement some network protocol clients recently, I needed
> binary serialization of commands and deserialization of responses
> ('Command -> ByteString' and 'ByteString -> Response' functions,
> preferably for both strict as well as lazy ByteStrings).
> 
> My go-to packages have always been 'binary' and 'cereal', but I was
> wondering about the current (and future) state/goals:
> 
> - cereal supports chunk-based 'partial' parsing (runGetPartial). It
> looks like support for this is introduced in recent versions of 'binary'
> as well (runGetIncremental)
> - cereal can output a strict bytestring (runPut) or a lazy one
> (runPutLazy), whilst binary only outputs lazy ones (runPut)
> - Next to binary and cereal, there's bytestring's Builder interface for
> serialization, and Simon Meier's "blaze-binary" prototype
> 
> There are some blog posts and comments out there about merging cereal
> and binary, is this what's the goal/going on (cfr runGetIncremental)?
> 
> In my use-case I think using Builder instead of binary/cereal's PutM
> monad shouldn't be a major problem. Is this advisable performance-wise?
> 
> Overall: what's the advised future-proof strategy of handling binary
> (de)serialization?

I've been looking at the same thing lately, and i've been quite surprised, to
say the least, by the usual go-to packages (cereal, binary). Performance wise
this is hard to summarize, but if you serialize something small and have a easy
to compute size (e.g. fixed size structure), i would advise against using any
kind of builder structure (builder,cereal,binary), and go directly at the
Storable level, if performance need to be on-par other languages.

My initial interpretation is that the builder initial cost is quite high, and
only get amortized if the number of operations is quite high (and have less
bytestrings). So if you have many structures encoded in one encoding operation
it's probably ok-ish.

I've made the following benchmark when i was doing my experiments,
that shows basic serialization of bytestring-y data structures:

* "bclass" is a simple function that use bytestring concat or append
* "bclass+io" is a simple function that use mutable bytestring + poke to create the bytestring
* "cereal" is cereal's encode function
* "binary" is binary's encode function
* "builder" is bytestring's builder.

* simple bytestring of constant size: <sz>
* n bytestrings of same size: n*<sz>
* n bytestrings of different size: <sz>+<sz2>+..
* n bytestrings plus a w32 prefixed size: len+n*<sz>

Obviously, caveat emptor:

http://tab.snarc.org/others/benchmark-bytestring-serialization.html

Let me know if anyone want the source file.

-- 
Vincent