[Haskell] reading binary files

Bulat Ziganshin bulat.ziganshin at gmail.com
Thu Apr 6 04:17:09 EDT 2006


Hello minh,

Wednesday, April 5, 2006, 10:41:02 PM, you wrote:

> but in 1/, i have to choose between different kind of array
> representation (and i dont know which one is better) and it seems to
> me that the resulting code (compiled) would have to be the same.

no, the code will be slightly different. IOUArray will allocate space
in the GHC's heap, while malloc - in the C heap (ghc's heap is
additional storey on the C heap)

btw, `getElems` is VERY INEEFECIENT way - it will convert entire array
to the list before return

> for example, the couples (hGet*,peek/readArray) could be written in one line;
> also, one line for the reading/reconstructing more-than-one-Word8 value.

> is it already possible ?
> would it be interesting to add such capabilities to haskell ? (i think so)
> i can try to add it but i need some pointers about how to do it.

i don't see much problems here, just add peek16LE and other procedures
like it and you can use trivial code:

idLength <- peek8 a 1
x <- peek16LE a 8

peek8 a i = do (x::Word8) <- peekByteOff a i
               return (fromIntegral x)

peek16LE a i = do (x::Word8) <- peekByteOff a i
                  (y::Word8) <- peekByteOff a (i+1)
                  return (fromIntegral x + fromIntegral y * 256 )


there are a couple of binary I/O libs (including my own one :) ), i
just don't think you need such power here. of course, if you want to
read data sequentially, binary i/o lib will be preferable. with my lib
you can write smth like this:

          -- Create new MemBuf filled with data from file
          h <- readFromFile "test"
          -- Read header fields sequentially
          idLength <- getWord8 h
          x <- getWord16le h
          ....
          

i attached here a part of my library docs where this described in much
more details :)

the lib itself is at http://freearc.narod.ru/Streams.tar.gz

-- 
Best regards,
 Bulat                            mailto:Bulat.Ziganshin at gmail.com
-------------- next part --------------
In AltBinary library there 4 methods of binary I/O builded on top of
each other:

- Byte I/O              (vGetByte and vPutByte)
- Integral values I/O   (getWordXX and putWordXX)
- Data structures I/O   (over 100 operations :) )
- Serialization API     (get and put_)

We will study them all sequentially, starting from the lowest level.

* Byte I/O

Lowest level, the byte I/O, isn't differ significantly from the Char I/O.
All Streams support vGetByte and vPutByte operations, either directly
or via buffering transformer. These operations has rather generalized
types:

vGetByte :: (Stream m h, Enum a) => h -> m a
vPutByte :: (Stream m h, Enum a) => h -> a -> m ()

This allows to read/write any integral and enumeration values without
additional type conversions (of course, these values should belong to
the 0..255 range)

Together with other Stream operations, such as vIsEOF, vTell/vSeek,
vGetBuf/vPutBuf, this allows to write any programs that operate upon
binary data. You can freely mix byte and text I/O on one Stream:

main = do vPutByte stdout (1::Int)
          vPutStrLn stdout "text"
          vPutBuf stdout buf bufsize


* Integral values / bit sequences I/O

The core of this API is two generalized operations:

getBits bits h
putBits bits h value

`getBits` reads certain number of bits from given BinaryStream and
returns it as value of any integral type (Int, Word8, Integer and so on).
`putBits` writes given value as a certain number of bits. The `value`,
again, may be of any integral type.

These two operations can be implemented in one of 4 ways, depending on
the answers on two questions:
- whether integral values written as big- or little-endian?
- whether values written are bit-aligned or byte-aligned?

The library allows you to select any answers on these questions. The
`h` parameter in this operation represents BinaryStream and there are
4 methods to open BinaryStream on top of plain Stream:

binaryStream <- openByteAligned stream      -- big-endian
binaryStream <- openByteAlignedLE stream    -- little-endian
binaryStream <- openBitAligned stream       -- big-endian
binaryStream <- openBitAlignedLE stream     -- little-endian

Moreover, to simplify your work, Stream by itself can also be used as
BinaryStream - in this case byte-aligned big-endian representation used.
So, you can write, for example:
   putBits 16 stdout (0::Int)
or
   bh <- openByteAlignedLE stdout
   putBits 16 bh (0::Int)

There is also operation `flushBits h` what aligns BinaryStream on the
byte boundary. It fills the rest of pyte with zero bits on output and
skip the rest of bits in current bytes on input. Of course, this
operation does nothing on byte-aligned BinaryStreams.

There are also "shortcut" operations what read/write some number of bits:

getBit h
getWord8 h
getWord16 h
getWord32 h
getWord64 h
putBit h value
putWord8 h value
putWord16 h value
putWord32 h value
putWord64 h value

Although these operations seems like just shortcuts for partial
application of getBits/putBits, they are works somewhat faster.
In contrast to other binary I/O libraries, each of these operations
can accept/return values of any integral type.

You can freely mix text I/O, byte I/O and bits I/O as long as you
don't forget to make `flushBits` after bit-aligned chunks of I/O:

main = do putWord32 stdout (1::Int)  -- byte-aligned big-endian

          stdoutLE <- openByteAlignedLE stdout
          putWord32  stdoutLE (1::Int)  -- byte-aligned little-endian
          putBits 15 stdoutLE (1::Int)  -- byte-aligned little-endian

          stdoutBitsLE <- openBitAlignedLE stdout
          putBit     stdoutBitsLE (1::Int)  -- bit-aligned little-endian
          putBits 15 stdoutBitsLE (1::Int)  -- bit-aligned little-endian
          flushBits stdoutBitsLE

          vPutStrLn stdout "text"

          stdoutBits <- openBitAligned stdout
          putBit     stdoutBits (1::Int)  -- bit-aligned big-endian
          putBits 15 stdoutBits (1::Int)  -- bit-aligned big-endian
          flushBits stdoutBit

When you request to write, say, 15 bits to byte-aligned BinaryStream,
the whole number of bytes are written. In particular, each `putBit`
operation on byte-aligned BinaryStream writes the whole byte to the
stream while the same operation on bit-aligned streams fills one bit at
a time.

But that is not yet the whole story! There are also operations that
allow to intermix little-endian and big-endian I/O:

getWord16le h
getWord32le h
getWord64le h
putWord16le h value
putWord32le h value
putWord64le h value
getWord16be h
getWord32be h
getWord64be h
putWord16be h value
putWord32be h value
putWord64be h value

For example, you can write:

main = do putWord32le stdout (1::Int)  -- byte-aligned little-endian
          putWord16be stdout (1::Int)  -- byte-aligned big-endian

Please note that `h` in these operations is a Stream, not
BinaryStream. Actually, these operations just perform several fixed
vGetByte or vPutByte operations and, strictly speaking, they should be
noted in previous section.

There are also combinator versions of `open*` operations, that
automatically perform `flushBits` at the finish:

    withBitAlignedLE stdout $ \h -> do
        putBit     h (1::Int)  -- bit-aligned little-endian
        putBits 15 h (1::Int)  -- bit-aligned little-endian

I also should say that you can perform all the Stream operations on
any BinaryStream, and bit-aligned streams will flush themselves before
performing any I/O and seeking operations. For example:

    h <- openBitAligned stdout
    vPutStr h "text"
    putBit h (1::Int)
    vPutByte h (1::Int)     -- `flushBits` will be automatically
                            --   called before this operation
    putWord16le h (1::Int)  -- little-endian format will be used here despite
                            --   big-endiannes of the BinaryStream itself



* Serialization API

This part is a really small! :) There are just two operations:

get h
put_ h a

where `h` is a BinaryStream. These operations read and write binary
representation of any value belonging to the class Binary.


More information about the Haskell mailing list