[Haskell-cafe] Using Data.Binary for compression

David Roundy droundy at darcs.net
Thu Nov 15 10:52:42 EST 2007


On Wed, Nov 14, 2007 at 10:03:52PM -0800, Chad Scherrer wrote:
> I'd like to be able to use Data.Binary (or similar) for compression.
> Say I have an abstract type Symbol, and for each value of Symbol I
> have a representation in terms of some number of bits. For compression
> to be efficient, commonly-used Symbols should have very short
> representations, while less common ones can be longer.

I agree with others that it's probably not worth your effort to do
compression yourself except for fun, but it *is* fun, and interpret my
advice below in that light.  (Also, bitwise operations could be useful for
other things, like interacting with standard formats.  e.g. writing IEEE
doubles portably, something that Data.Binary doesn't do.)

...

> (2) This seems like it will work ok, but the feel is not as clean as
> the current Data.Binary interface. Is there something I'm missing that
> might make it easier to integrate this?

I would write this as a monad, analogous to PutM.  Make bit monads GetBits
and PutBitsM (with PutBits = PutBitsM ()), and then write functions like

writeBits :: PutBits -> Put
readBits :: GetBits a -> Get a

where writeBits and readBits would pad their reading/writing to the next
byte boundary (or perhaps Word32 boundary, for better performance?) as they
must.  So now this would have two main uses: users could use it to
serialize data (e.g. writing an IEEE Double serialization), or could use it
to write their own data compression, but putting *all* the serialization
into the Bits level.

This approach, of course, also would allow you to copy much of the
infrastructure of Binary into your new bit-level interface.

> (3) Right now this is just proof of concept, but eventually I'd like
> to do some performance tuning, and it would be nice to have a
> representation that's amenable to this. Any thoughts on speeding this
> up while keeping the interface reasonably clean would be much
> appreciated.

I think a monad as above would have the advantage of separating the
implementation from the interface, which should make it tuneable.
-- 
David Roundy
Department of Physics
Oregon State University


More information about the Haskell-Cafe mailing list