[Haskell-beginners] Fwd: Implementing a spellchecker - problem with Data.HashTable performance

umptious umptious at gmail.com
Sat Apr 21 18:16:03 CEST 2012


On 20 April 2012 17:58, Lorenzo Bolla <lbolla at gmail.com> wrote:

> Maybe you could also time with datasets of increasing size (up to 1M),
> and see if the execution time grows like O(n^2), in which case I'd say
> it's a hashing problem...
>

Testing with 1/4 and 1/2 the original data is definitely an excellent thing
to do.

Also, In the original post it sounded as if **with the same data set**

1 - creating the hash with the data already in memory is fast

2 - reading the data is fast

3 - doing both is very, very slow

..but was this true? Or was the in-memory data set different? If it was
different, then HOW was it different?

Other thought: you printed the last line to force the reading of the set in
the file read test, ***but would this have forced the entire set to be
converted into ByteStrings***? Even more than that - I'm no Haskell expert,
but I can claim to have debugged a fair amount of weird code.  Such
enormously long read times are more typical of IO or, possibly, GC
weirdness than any possible flaw in a hashing algorithm. In fact, it's
almost impossible to see how an algorithmic flaw could generate such a vast
runtime without provoking GC weirdness. And even then it would be pretty
amazing.

So I'd try reading in the strings and converting to byte strings and then
doing something simple to them that didn't involve hashing. Like xoring
every byte in a byte string and then the result of every string with every
other - some operation that definitely forces every bytestring to be
present in bytestring form for sure.

You might open a system monitor and watch memory consumption, too.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/beginners/attachments/20120421/f66e49c2/attachment.htm>


More information about the Beginners mailing list