[Haskell-cafe] Re: Optimizing spelling correction program

Don Stewart dons at galois.com
Mon Jun 22 17:22:46 EDT 2009


kamil:
> On Jun 22, 9:10 am, Ketil Malde <ke... at malde.org> wrote:
> > Kamil Dworakowski <ka... at dworakowski.name> writes:
> > > Right... Python uses hashtables while here I have a tree with log n
> > > access time. I did not want to use the Data.HashTable, it would
> > > pervade my program with IO. The alternative is an ideal hashmap that never
> > > gets changed. This program creates a dictionary at start which then is only
> > > being used to read from: an ideal application for the Data.PerfectHash
> > > by Mark Wotton available on Hackage [3].
> >
> > If you are considering alternative data structures, you might want to
> > look at tries or Bloom filters, both have O(n) lookup, both have
> > Haskell implementations.  The latter is probably faster but
> > probabilistic (i.e. it will occasionally fail to detect a
> > misspelling - which you can of course check against a "real"
> > dictionary).
> 
> Using Bryan O'Sullivan's fantastic BloomFilter I got it down below
> Python's run time! Now it is 35.56s, 28% of the time is spent on GC,
> which I think means there is still some room for improvement.

One easy way to fix the GC time is to increase the default heap size.

 ./a.out +RTS -A200M 

for example.


More information about the Haskell-Cafe mailing list