Using the NHC.Binary library
This document sketches the York NHC.Binary library. (See also the BinArray library for an example of the use of Binary to build other abstractions.) For fuller details, see this paper.
The NHC.Binary library
module NHC.Binary where data BinPtr a = ... data BinLocation = Memory | File FilePath BinIOMode data BinIOMode = RO | RW | WO data BinHandle = ... stdmem :: BinHandle openBin :: BinLocation -> IO BinHandle freezeBin :: BinHandle -> IO () -- changes BinIOMode to RO closeBin :: BinHandle -> IO () putBits :: BinHandle -> Int -> Int -> IO (BinPtr a) getBits :: BinHandle -> Int -> IO Int getBitsF :: BinHandle -> Int -> BinPtr a -> (Int, BinPtr b) seekBin :: BinHandle -> BinPtr a -> IO () tellBin :: BinHandle -> IO (BinPtr a) isEOFBin :: BinHandle -> IO Bool copyBin :: BinHandle -> BinLocation -> IO BinHandle copyBits :: BinHandle -> BinPtr a -> BinHandle -> BinPtr a -> Int -> IO () copyBytes :: BinHandle -> BinHandle -> Int -> IO (BinPtr a) class Binary a where put :: BinHandle -> a -> IO (BinPtr a) get :: BinHandle -> IO a getF :: BinHandle -> BinPtr a -> (a, BinPtr b) sizeOf :: a -> Int putAt :: Binary a => BinHandle -> BinPtr a -> a -> IO () getAt :: Binary a => BinHandle -> BinPtr a -> IO a getFAt :: Binary a => BinHandle -> BinPtr a -> a
Both in-heap data compression and binary I/O can be achieved using the York Binary library. The basic model is rather like file I/O: binary data resides in a separate space which is accessed only through a BinHandle acting like a buffering file descriptor. Each item of binary data lies at a particular position within the space, the position being denoted by a BinPtr. Data can be written and read sequentially just as with ordinary files. Also, like ordinary files, we allow random-access reading and writing. However, the particular beauty of this scheme is the ability to engage in pure, lazy, random-access reading when a BinHandle is in the appropriate RO (read-only) mode. (A BinHandle which is already open for writing can be changed to RO mode with the freezeBin call.)
BinHandles do not just denote files - they can also refer to areas of heap memory. One such area is available by default - called stdmem - but new areas can be opened in just the same way as files. They are opened in the default mode RW. Binary heap areas grow automatically to fit the data placed in them, and, like files, they are naturally garbage-collected when they are no longer in use. (The closeBin operation is an explicit means to close a file or discard some memory.)
There are in principle two layers to the library functions. At the lower level, functions like getBits and putBits deal with raw bounded integers in the bit-stream. At the higher level, a type class abstracts these operations across arbitrary datatypes, providing overloaded functions put and get.
Low-level raw bit-stream functions
Each BinHandle has a notion of its ``current'' position. This is the position at which a subsequent read or write operation will start. You can think of it as a bit-offset from the start of the file/memory. The function putBits writes some bits into the bit-stream. Its first argument is the number of bits to write, and the second is an int value representing those bits. (Hence there is a maximum of 32 bits that can be written or read in one operation.) The function getBits similarly reads a number of bits from the bit-stream, returning an int which represents their value. Both operations update the ``current'' position in the stream.
putBits :: BinHandle -> Int -> Int -> IO (BinPtr a) getBits :: BinHandle -> Int -> IO Int getBitsF :: BinHandle -> Int -> BinPtr a -> (Int, BinPtr b)
The pure lazy function getBitsF is slightly different - because its result depends only on its arguments, you must tell it what position to start reading from. (It also returns the position immediately following the value read as part of its result.)
In order to get full control of the bit-stream, there are various other operations available, to move the current position, to report the current position, and so on.
seekBin :: BinHandle -> BinPtr a -> IO () tellBin :: BinHandle -> IO (BinPtr a) isEOFBin :: BinHandle -> IO Bool
Higher-level typed binary functions
The Binary class is derivable for any datatype defined in a program except functions. (Please note however that cyclic or infinite values will cause the compressing function to diverge.)
The class member functions and their derivatives come in two varieties, one for sequential access, the other for random access. A BinHandle contains a hidden state, including the current position in the file or memory. Understanding the notion of the current position is important for using the sequential operations correctly. put and get always start reading or writing from the current position. All operations including the random-access ones, when they return, set the current position to the end of the value which has just been read or written.
Transferring bits in bulk
The easiest way to transfer bits in bulk is with the copyBin operation. It takes an active BinHandle and copies its entire contents into the given BinLocation, returning a fresh BinHandle denoting the copy. As an alternative, copyBytes copies just a section of a bit-stream from the current position in one BinHandle to the current position in another - the copied section must be entirely byte-aligned. Finally, the least efficient but most flexible bulk transfer operation is copyBits, which allows any number of bits to be copied without alignment constraints - it even allows the source and destination bitstreams to overlap within the same BinHandle.
Defining your own compression
If you want to play with defining your own instances of Binary, have a look at some of the instances for standard types like Int and Lists in src/prelude/Binary/Instances.hs to see how things work.
The lower-level tools used in defining instances are:
getBits :: BinHandle -> Int -> BinPtr a -> IO Int putBits :: BinHandle -> Int -> Int -> IO (BinPtr a) getBitsF :: BinHandle -> Int -> BinPtr a -> (Int, BinPtr b) (<<) :: ((a->b),c) -> (c->(a,d)) -> (b,d)
Read and write modes
A file BinHandle can be opened in one of three modes: read-only (RO), write-only (WO), or read-write (RW). A memory BinHandle is always opened in RW mode, but may be changed to RO mode by the freezeBin operation. These modes differ from those of ordinary textual files:
The latest updates to these pages are available on the WWW from http://www.cs.york.ac.uk/fp/nhc98/