Difference between revisions of "Library/Data encoding"

From HaskellWiki
Jump to navigation Jump to search
(Development has been moved to github)
(Page moved to readme.md in source code on github)
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
[[Category:Libraries]]
 
Data Encodings (dataenc): A collection of data encoding algorithms.
 
 
== Data encodings library ==
 
 
The data encodings library strives to provide implementations in Haskell of every major data encoding, and a few minor ones as well. Currently the following encodings are implemented:
 
 
* Base16 (<hask>Codec.Binary.Base16</hask>)
 
* Base32 (<hask>Codec.Binary.Base32</hask>)
 
* Base32Hex (<hask>Codec.Binary.Base32Hex</hask>)
 
* Base64 (<hask>Codec.Binary.Base64</hask>)
 
* Base64Url (<hask>Codec.Binary.Base64Url</hask>)
 
* Base85 (<hask>Codec.Binary.Base85</hask>)
 
* Python string escaping (<hask>Codec.Binary.PythonString</hask>)
 
* Quoted-Printable (<hask>Codec.Binary.QuotedPrintable</hask>)
 
* URL encoding (<hask>Codec.Binary.Url</hask>)
 
* Uuencode (<hask>Codec.Binary.Uu</hask>)
 
* Xxencode (<hask>Codec.Binary.Xx</hask>)
 
* yEncode (<hask>Codec.Binary.Yenc</hask>)
 
 
In some cases the encodings also specify headers and footers for the encoded data. Implementation of that is left for the user of the library.
 
 
== The API ==
 
 
=== Main API ===
 
 
The module <hask>Codec.Binary.DataEncoding</hask> provides a type that collects the functions for an individual encoding:
 
 
<haskell>
 
data DataCodec = DataCodec {
 
encode :: [Word8] -> String,
 
decode :: String -> Maybe [Word8],
 
decode' :: String -> [Maybe Word8],
 
chop :: Int -> String -> [String],
 
unchop :: [String] -> String
 
}
 
</haskell>
 
 
It also exposes instances of this type for each encoding:
 
 
<haskell>
 
base16 :: DataCodec
 
base32 :: DataCodec
 
base32Hex :: DataCodec
 
base64 :: DataCodec
 
base64Url :: DataCodec
 
uu :: DataCodec
 
</haskell>
 
 
<b>NB</b> There is no instance for yEncoding since the functions in that module have slightly different type signatures.
 
 
=== Secondary API ===
 
 
Each individual encoding module is also exposed and offers four functions:
 
 
<haskell>
 
encode :: [Word8] -> String
 
decode :: String -> Maybe [Word8]
 
decode' :: String -> [Maybe Word8]
 
chop :: Int -> String -> [String]
 
unchop :: [String] -> String
 
</haskell>
 
 
== Description of the encodings ==
 
 
=== Base16 ===
 
 
Implemented as it's defined in [http://tools.ietf.org/html/rfc4648 RFC 4648].
 
 
Each four bit nibble of an octet is encoded as a character in the set 0-9,A-F.
 
 
=== Base32 ===
 
 
Implemented as it's defined in [http://tools.ietf.org/html/rfc4648 RFC 4648].
 
 
Five octets are expanded into eight so that only the five least significant bits are used. Each is then encoded into a 32-character encoding alphabet.
 
 
=== Base32Hex ===
 
 
Implemented as it's defined in [http://tools.ietf.org/html/rfc4648 RFC 4648].
 
 
Just like Base32 but with a different encoding alphabet. Unlike Base64 and Base32, data encoded with Base32Hex maintains its sort order when the encoded data is compared bit wise.
 
 
=== Base64 ===
 
 
Implemented as it's defined in [http://tools.ietf.org/html/rfc4648 RFC 4648].
 
 
Three octets are expanded into four so that only the six least significant bits are used. Each is then encoded into a 64-character encoding alphabet.
 
 
=== Base64Url ===
 
 
Implemented as it's defined in [http://tools.ietf.org/html/rfc4648 RFC 4648].
 
 
Just like Base64 but with a different encoding alphabet. The encoding alphabet is made URL and filename safe by substituting <tt>+</tt> and <tt>/</tt> for <tt>-</tt> and <tt>_</tt> respectively.
 
 
=== Base85 ===
 
 
Implementation as described in the [http://en.wikipedia.org/wiki/Ascii85 Wikipedia article].
 
 
=== Python string escaping ===
 
 
Implementation of Python's string escaping.
 
 
=== Quoted-Printable ===
 
 
Implemented as defined in [http://tools.ietf.org/html/rfc2045 RFC 2045].
 
 
=== URL encoding ===
 
 
Implemented as defined in [http://tools.ietf.org/html/rfc3986 RFC 3986].
 
 
=== Uuencode ===
 
 
Unfortunately uuencode is badly specified and there are in fact several differing implementations of it. This implementation attempts to encode data in the same way as the <tt>uuencode</tt> utility found in [http://www.gnu.org/software/sharutils/ GNU's sharutils]. The workings of <hask>chop</hask> and <hask>unchop</hask> also follow how sharutils split and unsplit encoded lines.
 
 
=== Xxencode ===
 
 
Implemented as described in the [http://en.wikipedia.org/wiki/Xxencode Wikipedia article].
 
 
=== yEncoding ===
 
 
Implemented as it's defined in [http://yence.sourceforge.net/docs/protocol/version1_3_draft.html the 1.3 draft].
 
 
== Downloading ==
 
 
The current release is available from [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/dataenc HackageDB].
 
 
See [[#Contributing]] below for how to get the development version.
 
 
== Example of use ==
 
 
The package [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/omnicodec omnicodec] contains two command line tools for encoding and decoding data.
 
 
== Contributing ==
 
 
The source is hosted on [https://github.com/magthe/dataenc/ github] and can be downloaded using git:
 
 
git clone https://github.com/magthe/dataenc.git
 
 
Patches can be sent to magnus@therning.org, but I suggest using github's pull requests if possible.
 

Latest revision as of 16:52, 17 April 2014