[Haskell-cafe] Data.Binary poor read performance

Don Stewart dons at galois.com
Tue Feb 24 18:17:49 EST 2009


jnf:
> 
> 
> wren ng thornton wrote:
> > 
> > If you have many identical strings then you will save lots by memoizing 
> > your strings into Integers, and then serializing that memo table and the 
> > integerized version of your data structure. The amount of savings 
> > decreases as the number of duplications decrease, though since you don't 
> > need the memo table itself you should be able to serialize it in a way 
> > that doesn't have much overhead.
> > 
> 
> I had problems with the size of the allocated heap space after serializing 
> and loading data with the binary package. The reason was that 
> binary does not support sharing of identical elements and considered a 
> restricted solution for strings and certain other data types first, but 
> came up with a generic solution in the end.
> (I did it just last weekend).

And this is exactly the intended path -- that people will release their
own special instances for doing more elaborate parsing/printing tricks!

  
> I put the Binary monad in a state transformer with maps for memoization:
> type PutShared = St.StateT (Map Object Int, Int) PutM ()
> type GetShared = St.StateT (IntMap Object) Bin.Get
> 
> In addition to standard get ant put methods:
> class (Typeable α, Ord α, Eq α) ⇒ BinaryShared α  where
>     put :: α  →  PutShared
>     get :: GetShared α
> I added putShared and getShared methods with memoization:
>     putShared :: (α →  PutShared) →  α →  PutShared
>     getShared :: GetShared α →  GetShared α 
> 
> For types that I don't want memoization I can either refer to the underlying 
> binary monad for primitive types, e.g.:
> instance BinaryShared Int where
>     put = lift∘Bin.put
>     get = lift Bin.get
> or stay in the BinaryShared monad for types of which I may memoize
> components, e.g.:
> instance (BinaryShared a, BinaryShared b) ⇒ BinaryShared (a,b) where
>     put (a,b)          = put a ≫ put b
>     get                 = liftM2 (,) get get
> 
> And for types for which I want memoization, I wrap it with putShared and
> getShared ,e.g:
> instance BinaryShared a ⇒ BinaryShared [a] where
>     put    = putShared (λl →  lift (Bin.put (length l)) ≫ mapM_ put l)
>     get    = getShared (do
>                 n ←  lift (Bin.get :: Bin.Get Int)
>                 replicateM n get)
>     
> This save 1/3 of heap space to my application. I didn't measure time.
> Maybe it would be useful to have something like this in the binary module.
> 

Very nice. Maybe even upload these useful instances in a little
binary-extras package?


More information about the Haskell-Cafe mailing list