[Haskell-beginners] Fwd: Implementing a spellchecker - problem with Data.HashTable performance

Chaddaï Fouché chaddai.fouche at gmail.com
Mon Apr 23 17:38:37 CEST 2012


On Sun, Apr 22, 2012 at 4:58 PM, Radosław Szymczyszyn <lavrin at gmail.com> wrote:
> Thank you all for thoughts and suggestions. I've been tracking them as
> they appeared, but being busy with university assignments, couldn't
> try them out yet.
>
> In the meantime, however, Karol Samborski investigated the issue
> further and found the cause of poor performance -- somehow using
> HashTable.newHint doesn't play well with HashTable.update. Simply
> changing newHint to hint gives me execution time of about 12s (on my
> rather slowish AMD Fusion 1.6GHz laptop) with a dataset of ~1.35mln
> words.
>

While it's good that you have found a bug that explain the very poor
performance you got, I must reiterate Brent Yorgey's advice to use
Data.HashMap from unordered-containers or some other performance
oriented implementation : Data.HashTable is a legacy datastructure
that was mainly introduced to have a hashtable in Haskell core
libraries, it ran into some performance problem of GHC (only reduced
in very recent release I believe) and was never written to be very
fast in the first place... Avoiding it is generally a very good idea !
Using another structure (HashMap or some trie implementation) would
probably get you much better performance without hassle. I must admit
that having Data.HashTable in the base library without warning as to
its unsuitability is very misleading to the beginner coming from Perl
or Python.

-- 
Jedaï



More information about the Beginners mailing list