getting a Binary module into the standard libs

Tue, 17 Dec 2002 08:45:27 -0800 (PST)

Hi, another update wrt speed of bit writes; it makes me think it might not
be worthwhile to split the instances or anything like that.  I'm still
looking for some comments though (hint hint).

A small program writes 1 million maybe ints to a binmem, 1/5 of which are
nothings, 4/5s of which are just ints.  Note that both the nothing and the
just int take 1 bit (mod 1 byte), so the bit offset is increasing by one
every write, independent of which we're writing.  the best case in terms
of speed occurs when we write just int and we're already written 7 bits
(so writing the int doesn't have to much around with bit stuff).

Anyway, here are the speed results, averaged over 3 runs, with
optimization on and off:

                       Optimization
                  level 0        level 2
                  -------        -------
bits    read        2446            195
        write      23157           1729

bytes   read        1822            177
        write      13769           1563

So, moral #1 is: never use binary without optimization or you will be
waiting a long time.  Moral #2 is that when optimization is on, bit-based
writes take about 11% longer to read and write.

We can pretty easily calculate the space savings, directly.  An Int is 4
bytes, we write 800000 of them = 3200000 bytes.  When we use bits, we also
write 1 million bits = 125000 bytes.  When we use bytes, we also write 1
million bytes.  Thus, the difference is:

  bytes:   4200000
   bits: - 1125000
         ---------
           3075000 byte savings using bits

so using bits we save about 26% space (for writing Maybe Int, as always
YMMV).

So again, the question is how should we divide the bit-based instances and
the byte-based instances?

 - Hal

--
Hal Daume III

 "Computer science is no more about computers    | hdaume@isi.edu
  than astronomy is about telescopes." -Dijkstra | www.isi.edu/~hdaume

On Tue, 10 Dec 2002, Hal Daume III wrote:

> Hi again all,
> 
> I've got a somewhat functioning new Binary module for GHC which supports
> bit writes.  I've written a small test script to make sure simple things
> work...it essentially looks like this:
> 
> data BinItem = ByteItem Word8
>              | BitItem  Int Word8   -- num_bits bits
>              | FlushItem            -- flush byte
>              | forall a . (Eq a, Binary a, Show a) => BinaryItem a  --
> some arbi
> trary data
>              | forall a . (Eq a, Show a) => PutItem a (BinHandle -> a ->
> IO ()) 
> (BinHandle -> IO a)
> 
> on which Eq is defined (I use unsafePerformIO to cast the forall'd
> elements).
> 
> then we have test functions which essentially open up a binMem or a binIO
> and write a list of BinItems to it, then read them back.  We can then
> check if what we read back is the same as what we wrote.
> 
> The tests I have run are:
> 
> byteTest = map ByteItem [1,2,3,4,5]
> 
> bitTest1 = [BitItem 3 6, FlushItem]
> bitTest2 = [BitItem 3 6, BitItem 4 9, BitItem 1 0, FlushItem]
> bitTest3 = [BitItem 6 10, BitItem 6 10, FlushItem]
> 
> flushTest1 = [BitItem 3 6, FlushItem, BitItem 4 9, BitItem 1 0, FlushItem]
> flushTest2 = [ByteItem 1, FlushItem, FlushItem, FlushItem, FlushItem,
> ByteItem 2
> , FlushItem]
> 
> comboTest1 = [ByteItem 5, BitItem 3 6, FlushItem, ByteItem 9, BitItem 4 9,
> BitIt
> em 1 0, FlushItem, ByteItem 84, BitItem 3 2, FlushItem]
> comboTest2 = [ByteItem 5, BitItem 3 6, ByteItem 9, BitItem 7 9, BitItem 4
> 0, Byt
> eItem 84, BitItem 3 2, FlushItem]
> 
> maybeTest1 = [BinaryItem (Just 5::Maybe Int), BinaryItem (Nothing::Maybe
> Int), B
> inaryItem (Just 0::Maybe Int), BinaryItem (Just 1::Maybe Int), BinaryItem
> (Nothi
> ng :: Maybe Int), FlushItem]
> maybeTest2 = map (\i -> PutItem i putMaybeInt getMaybeInt) [Just 5,
> Nothing, Jus
> t 0, Just 1, Nothing] ++ [FlushItem]
> 
> 
> Which all work fine.  putMaybeInt uses a bit to put the Just/Nothing,
> while the normal Binary instance for Maybe uses a byte.
> 
> 
> ...this brings up another question...since we don't have named instances,
> I'm thinking of separating the actual instance definitions into two
> modules, one which uses bit-based instances and one which uses byte-based
> instances.  You could then import whichever makes more sense for your
> application.  The problem with this is that sometimes it might make sense
> to use bytes in one aspect and bits in another.
> 
> Another solution would be to extend the Binary class to:
> 
> class Binary a where
>     put_   :: BinHandle -> a -> IO ()
>     put    :: BinHandle -> a -> IO (Bin a)
>     get    :: BinHandle -> IO a
>     putWithBits_ :: BinHandle -> a -> IO ()
>     putWithBits  :: BinHandle -> a -> IO (Bin a)
>     getWithBits  :: BinHandle -> IO a
> 
> so that each instance specifies how to write itself both with bits and
> bytes.  of course they could default off eachother, so the user only has
> to write one put and one get if they're lazy.
> 
> Thoughts?
> 
>  - Hal
> 
> _______________________________________________
> Libraries mailing list
> Libraries@haskell.org
> http://www.haskell.org/mailman/listinfo/libraries
>