Personal tools

Library/Data encoding

From HaskellWiki

< Library
Revision as of 01:28, 20 November 2007 by EricSessoms (Talk | contribs)

Jump to: navigation, search

Data Encodings (dataenc): A collection of data encoding algorithms.

Contents

1 Data encodings library

The data encodings library strives to provide implementations in Haskell of every major data encoding, and a few minor ones as well. Currently the following encodings are implemented:

  • Base16 (
    Codec.Binary.Base16
    )
  • Base32 (
    Codec.Binary.Base32
    )
  • Base32Hex (
    Codec.Binary.Base32Hex
    )
  • Base64 (
    Codec.Binary.Base64
    )
  • Base64Url (
    Codec.Binary.Base64Url
    )
  • Uuencode (
    Codec.Binary.Uu
    )

2 The API

2.1 Main API

The module <Codec.Binary.DataEncoding> provides a type that collects the functions for an individual encoding:

data DataCodec = DataCodec {
    encode :: [Word8] -> String,
    decode :: String -> Maybe [Word8],
    decode' :: String -> [Maybe Word8],
    chop :: Int -> String -> [String],
    unchop :: [String] -> String
}

It also exposes instances of this type for each encoding:

base16 :: DataCodec
base32 :: DataCodec
base32Hex :: DataCodec
base64 :: DataCodec
base64Url :: DataCodec
uu :: DataCodec

2.2 Secondary API

Each individual encoding module is also exposed and offers four functions:

encode :: [Word8] -> String
decode :: String -> [Word8]
chop :: Int -> String -> [String]
unchop :: [String] -> String

3 Description of the encodings

3.1 Base16

Implemented as it's defined in RFC 4648.

Each four bit nibble of an octet is encoded as a character in the set 0-9,A-F.

3.2 Base32

Implemented as it's defined in RFC 4648.

Five octets are expanded into eight so that only the five least significant bits are used. Each is then encoded into a 32-character encoding alphabet.

3.3 Base32Hex

Implemented as it's defined in RFC 4648.

Just like Base32 but with a different encoding alphabet. Unlike Base64 and Base32, data encoded with Base32Hex maintains its sort order when the encoded data is compared bit wise.

3.4 Base64

Implemented as it's defined in RFC 4648.

Three octets are expanded into four so that only the six least significant bits are used. Each is then encoded into a 64-character encoding alphabet.

3.5 Base64Url

Implemented as it's defined in RFC 4648.

Just like Base64 but with a different encoding alphabet. The encoding alphabet is made URL and filename safe by substituting + and / for - and _ respectively.

3.6 Uuencode

Unfortunately uuencode is badly specified and there are in fact several differing implementations of it. This implementation attempts to encode data in the same way as the uuencode utility found in GNU's sharutils. The workings of
chop
and
unchop
also follow how sharutils split and unsplit encoded lines.

4 Downloading

At the moment there is no prepared package of the source, the only option is to get the development version (see below).

5 Contributing

Retrive the source code using darcs like this:

   darcs get http://code.haskell.org/dataenc/devo dataenc

Send any patches to [email protected]