[Haskell-cafe] zip-archive performance/memmory usage

Jason Dagit dagit at codersbase.com
Mon Aug 9 22:10:42 EDT 2010


On Mon, Aug 9, 2010 at 4:29 PM, Pieter Laeremans <pieter at laeremans.org>wrote:

> Hello,
>
> I'm trying some haskell scripting. I'm writing a script to print some
> information
> from a zip archive.  The zip-archive library does look nice but the
> performance of zip-archive/lazy bytestring
> doesn't seem to scale.
>
> Executing :
>
>    eRelativePath $ head $ zEntries archive
>
> on an archive of around 12 MB with around 20 files yields
>
> Stack space overflow: current size 8388608 bytes.
>

So it's a stack overflow at about 8 megs.  I don't have a strong sense of
what is normal, but that seems like a small stack to me.  Oh, actually I
just check and that is the default stack size :)

I looked at Zip.hs (included as an example).  The closest I see to your
example is some code for listing the files in the archive.  Perhaps you
should try the supplied program on your archive and see if it too has a
stack overflow.

The line the author uses to list files is:
List        -> mapM_ putStrLn $ filesInArchive archive

But, you're taking the head of the entries, so I don't see how you'd be
holding on to too much data.  I just don't see anything wrong with your
program.  Did you remember to compile with optimizations?  Perhaps try the
author's way of listing entries and see if performance changes?


>
> The script in question can be found at :
>
> http://github.com/plaeremans/HaskellSnipplets/blob/master/ZipList.hs
>
> I'm using the latest version of haskell platform.  Are these libaries not
> production ready,
> or am I doing something terribly wrong ?
>

Not production ready would be my assumption.  I think an iteratee style
might be more appropriate for these sorts of nested streams of potentially
large size anyway.  I'm skeptical of anything that depends on lazy
bytestrings or lazy io.  In this case, the performance would appear to be
depend on lazy bytestrings.

You might want to experiment with increasing the stack size.  Something like
this:
./ZipList +RTS -K100M -RTS foo.zip

Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100809/cab45dcc/attachment.html


More information about the Haskell-Cafe mailing list