Difference between revisions of "Library/Data encoding"

From HaskellWiki
Jump to navigation Jump to search
(Adding note about the lack of a maintainer.)
(Page moved to readme.md in source code on github)
 
Line 1: Line 1:
[[Category:Libraries]]
 
Data Encodings (dataenc): A collection of data encoding algorithms.
 
 
'''This library is currently without maintainer'''
 
 
----
 
 
== Data encodings library ==
 
 
The data encodings library strives to provide implementations in Haskell of every major data encoding, and a few minor ones as well. Currently the following encodings are implemented:
 
 
* Base16 (<hask>Codec.Binary.Base16</hask>)
 
* Base32 (<hask>Codec.Binary.Base32</hask>)
 
* Base32Hex (<hask>Codec.Binary.Base32Hex</hask>)
 
* Base64 (<hask>Codec.Binary.Base64</hask>)
 
* Base64Url (<hask>Codec.Binary.Base64Url</hask>)
 
* Base85 (<hask>Codec.Binary.Base85</hask>)
 
* Python string escaping (<hask>Codec.Binary.PythonString</hask>)
 
* Quoted-Printable (<hask>Codec.Binary.QuotedPrintable</hask>)
 
* URL encoding (<hask>Codec.Binary.Url</hask>)
 
* Uuencode (<hask>Codec.Binary.Uu</hask>)
 
* Xxencode (<hask>Codec.Binary.Xx</hask>)
 
* yEncode (<hask>Codec.Binary.Yenc</hask>)
 
 
In some cases the encodings also specify headers and footers for the encoded data. Implementation of that is left for the user of the library.
 
 
== The API ==
 
 
=== Main API ===
 
 
The module <hask>Codec.Binary.DataEncoding</hask> provides a type that collects the functions for an individual encoding:
 
 
<haskell>
 
data DataCodec = DataCodec {
 
encode :: [Word8] -> String,
 
decode :: String -> Maybe [Word8],
 
decode' :: String -> [Maybe Word8],
 
chop :: Int -> String -> [String],
 
unchop :: [String] -> String
 
}
 
</haskell>
 
 
It also exposes instances of this type for each encoding:
 
 
<haskell>
 
base16 :: DataCodec
 
base32 :: DataCodec
 
base32Hex :: DataCodec
 
base64 :: DataCodec
 
base64Url :: DataCodec
 
uu :: DataCodec
 
</haskell>
 
 
<b>NB</b> There is no instance for yEncoding since the functions in that module have slightly different type signatures.
 
 
=== Secondary API ===
 
 
Each individual encoding module is also exposed and offers four functions:
 
 
<haskell>
 
encode :: [Word8] -> String
 
decode :: String -> Maybe [Word8]
 
decode' :: String -> [Maybe Word8]
 
chop :: Int -> String -> [String]
 
unchop :: [String] -> String
 
</haskell>
 
 
== Description of the encodings ==
 
 
=== Base16 ===
 
 
Implemented as it's defined in [http://tools.ietf.org/html/rfc4648 RFC 4648].
 
 
Each four bit nibble of an octet is encoded as a character in the set 0-9,A-F.
 
 
=== Base32 ===
 
 
Implemented as it's defined in [http://tools.ietf.org/html/rfc4648 RFC 4648].
 
 
Five octets are expanded into eight so that only the five least significant bits are used. Each is then encoded into a 32-character encoding alphabet.
 
 
=== Base32Hex ===
 
 
Implemented as it's defined in [http://tools.ietf.org/html/rfc4648 RFC 4648].
 
 
Just like Base32 but with a different encoding alphabet. Unlike Base64 and Base32, data encoded with Base32Hex maintains its sort order when the encoded data is compared bit wise.
 
 
=== Base64 ===
 
 
Implemented as it's defined in [http://tools.ietf.org/html/rfc4648 RFC 4648].
 
 
Three octets are expanded into four so that only the six least significant bits are used. Each is then encoded into a 64-character encoding alphabet.
 
 
=== Base64Url ===
 
 
Implemented as it's defined in [http://tools.ietf.org/html/rfc4648 RFC 4648].
 
 
Just like Base64 but with a different encoding alphabet. The encoding alphabet is made URL and filename safe by substituting <tt>+</tt> and <tt>/</tt> for <tt>-</tt> and <tt>_</tt> respectively.
 
 
=== Base85 ===
 
 
Implementation as described in the [http://en.wikipedia.org/wiki/Ascii85 Wikipedia article].
 
 
=== Python string escaping ===
 
 
Implementation of Python's string escaping.
 
 
=== Quoted-Printable ===
 
 
Implemented as defined in [http://tools.ietf.org/html/rfc2045 RFC 2045].
 
 
=== URL encoding ===
 
 
Implemented as defined in [http://tools.ietf.org/html/rfc3986 RFC 3986].
 
 
=== Uuencode ===
 
 
Unfortunately uuencode is badly specified and there are in fact several differing implementations of it. This implementation attempts to encode data in the same way as the <tt>uuencode</tt> utility found in [http://www.gnu.org/software/sharutils/ GNU's sharutils]. The workings of <hask>chop</hask> and <hask>unchop</hask> also follow how sharutils split and unsplit encoded lines.
 
 
=== Xxencode ===
 
 
Implemented as described in the [http://en.wikipedia.org/wiki/Xxencode Wikipedia article].
 
 
=== yEncoding ===
 
 
Implemented as it's defined in [http://yence.sourceforge.net/docs/protocol/version1_3_draft.html the 1.3 draft].
 
 
== Downloading ==
 
 
The current release is available from [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/dataenc HackageDB].
 
 
See [[#Contributing]] below for how to get the development version.
 
 
== Example of use ==
 
 
The package [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/omnicodec omnicodec] contains two command line tools for encoding and decoding data.
 
 
== Contributing ==
 
 
'''This section is outdated, but left intact for a future maintainer to update.'''
 
 
The source is hosted on [https://github.com/magthe/dataenc/ github] and can be downloaded using git:
 
 
git clone https://github.com/magthe/dataenc.git
 
 
Patches can be sent to magnus@therning.org, but I suggest using github's pull requests if possible.
 

Latest revision as of 16:52, 17 April 2014