[Haskell-beginners] histogram over large data

Radosław Szymczyszyn lavrin at gmail.com
Tue Jun 5 20:41:42 CEST 2012


Hi Ian,

In case you were looking for an example to get your teeth into you
might be interested in these: https://gist.github.com/2876666

These two scripts both serve the same purpose of building a map of
word counts from a text file. They both use Data.Text for Unicode IO,
but each tests a different structure. Though unordered-containers
package with its Data.HashMap is often suggested as an efficient
mapping structure, in my case (of these two scripts) the
Data.HashTable from standard library wins taking circa half the time
to run on the same dataset (though it's not purely functional as its
actions operate in the IO monad).

Finally, what puzzles me the most, is that a roughly equivalent script
in Python which just reads the same datafile into a standard dict
performs in about 1/3 of the time of the faster one of the above two
and Python's hardly a fast language... Bewildering, indeed.

Hope I didn't put you off :)



More information about the Beginners mailing list