[Haskell-cafe] Toy compression algorithms [was: A very edgy language]

Andrew Coppin andrewcoppin at btinternet.com
Sun Jul 8 07:10:04 EDT 2007


Bulat Ziganshin wrote:
> Hello Donald,
>   
>> Good work. Probably worth benchmarking against the other compression
>> libraries
>>     
>
> are you really want to totally discredit Haskell? :)  they should be
> hundreds of times slower than any practical compression algorithm
> (and btw, zlib/bzlib isn't good performers anyway, my own algorithm is
> several times faster with the same compression ratio)
>
> Haskell isn't a good tool to develop compression algorithms because
> it's the very well studied area where it has meaning to use all the
> sorts of optimizations. so it falls in the "low-level algorithms"
> category where using Haskell means at least 3x slower development and
> 3x worse performance - or faster development with 100x worse
> performance. Andrew's code should fall into later category
>   

Indeed. I'm more interested in which algorithms produce the best 
compression ratios than how to implement them fast. Currently the code 
uses very general, flexible, polymorphic types all over the place, 
everything is normal lists, I'm using monadic parsing rather than 
bit-twiddling, etc etc etc. It goes alright with 100 KB files on my 
monster PC at home, but toss a 4 MB file over there and it takes a 
minute or two to compress. Hardly a match for a "practical" compression 
solution that's been optimised to within inches of its life in C or even 
assembly. ;-)

I mentioned that my LZW algorithm isn't even as efficient as it could be 
- it uses 16 bits per symbol rather than being variable. Partly that's 
because it's easier to code. But partly that's so that I can write a 
16-bit Huffman compressor and run it over the top. (LZW + Huffman being 
a very common combination.) And that's really what this is - a toolbox 
for throwing algorithms together to see what they do.

OTOH, if the Gods behind GHC want to take a look and figure out whether 
there's any magic that can be added to the optimiser, I wouldn't 
complain... ;-)

(Realistically though. My program takes a [Word8] and turns it into a 
[Bool] before running a parser over it. The GHC optimiser doesn't really 
stand a hope in hell of optimising that into a program that reads a 
machine word into a CPU register and starts playing with bit flips on it...)

PS. Are those zlib libraries actually written in Haskell? Or are they a 
thin layer on top of a C library?

PPS. Does GHC make use of MMX, SSE, et al?



More information about the Haskell-Cafe mailing list