[Haskell-cafe] How on Earth Do You Reason about Space?

Aleksandar Dimitrov aleks.dimitrov at googlemail.com
Wed Jun 1 00:14:54 CEST 2011


Hi Johan,

> Here's how I would do it:

I implemented your method, with these minimal changes (i.e. just using a main
driver in the same file.)

> countUnigrams :: Handle -> IO (M.Map S.ByteString Int)
> countUnigrams = foldLines (\ m s -> M.insertWith (+) s 1 m) M.empty
> 
> main :: IO ()
> main = do (f:_) <- getArgs
>           openFile f ReadMode >>= countUnigrams >>= print . M.toList

It seems to perform about 3x worse than the iteratee method in terms of time,
and worse in terms of space :-( On Brandon's War & Peace example, hGetLine uses
1.565 seconds for the small file, whereas my iteratee method uses 1.085s for the
small file, and around 2 minutes for the large file.

For the large file, the code above starts consuming around 2.5GB of RAM,
so it clearly has a space leak somewhere. Where, I don't know.

If you want to try it out, here's a short command line to make a test corpus the
way Brandon made one:

+++

wget 'http://www.gutenberg.org/files/2600/2600.zip';
unzip 2600.zip;
touch wnp100.txt;
for i in {1..100}; do echo -n "$i "; cat 2600.txt >> wnp100.txt; done;
echo "Done.

+++

Note, that, as I detailed in my prior email to Brandon, even if you do end up
with a (supposedly) non-leaking program for this example corpus, that doesn't
mean it'll scale well to real world data.

I also tried sprinkling strictness annotations throughout your above code, but I
failed to produce good results :-(

> We definitely need more accessible material on how to reliably write
> fast Haskell code. There are those among us who can, but it shouldn't
> be necessary to learn it in the way they did (i.e. by lots of
> tinkering, learning from the elders, etc). I'd like to write a 60 (or
> so) pages tutorial on the subject, but haven't found the time.

I'd be an eager reader :-) Please do announce it on -cafe or the "usual places"
should you ever come around to writing it!

I, unfortunately, don't really have any contact to "the elders," apart from what
I read on their respective blogs…

> In addition to RWH, perhaps the slides from the talk on
> high-performance Haskell I gave could be useful:
> 
>     http://blog.johantibell.com/2010/09/slides-from-my-high-performance-haskell.html

Thanks, I'll give it a look later tomorrow!

Regards,
Aleks

PS: Sorry I didn't answer you in #haskell, I ended up having to go afk for a
short while. Thanks for all your help!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: Digital signature
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110601/730dcbb6/attachment.pgp>


More information about the Haskell-Cafe mailing list