[Haskell-cafe] zip-archive performance/memmory usage

Tue Aug 10 07:30:53 EDT 2010

I was interested to see if I could determine what was happening with this.
After some playing around, I noticed the code was running significantly
faster if I *didn't* compile it, but ran it with 'runghc' instead (running
under ghci was also fast).

Here are the running times I found.  The 'Zip.hs' program comes with the
zip-archive package.  The runtime of the compiled version didn't seem to be
affected by optimisations.  Regardless, I'm quite surprised running
interpreted was significantly faster than compiled.

> time runghc ./Zip.hs -l ~/jdk1.6.0_05-src.zip
 1.48s user 0.17s system 97% cpu 1.680 total

> time ./dist/build/Zip/Zip -l ~/jdk1.6.0_05-src.zip
 89.00s user 1.06s system 98% cpu 1:31.84 total

The file 'jdk1.6.0_05-src.zip' was just an 18MB zip file I had lying
around.  I'm using ghc 6.12.1

Cheers,

-- 
David Powell

On Tue, Aug 10, 2010 at 12:10 PM, Jason Dagit <dagit at codersbase.com> wrote:

>
>
> On Mon, Aug 9, 2010 at 4:29 PM, Pieter Laeremans <pieter at laeremans.org>wrote:
>
>> Hello,
>>
>> I'm trying some haskell scripting. I'm writing a script to print some
>> information
>> from a zip archive.  The zip-archive library does look nice but the
>> performance of zip-archive/lazy bytestring
>> doesn't seem to scale.
>>
>> Executing :
>>
>>    eRelativePath $ head $ zEntries archive
>>
>> on an archive of around 12 MB with around 20 files yields
>>
>> Stack space overflow: current size 8388608 bytes.
>>
>
> So it's a stack overflow at about 8 megs.  I don't have a strong sense of
> what is normal, but that seems like a small stack to me.  Oh, actually I
> just check and that is the default stack size :)
>
> I looked at Zip.hs (included as an example).  The closest I see to your
> example is some code for listing the files in the archive.  Perhaps you
> should try the supplied program on your archive and see if it too has a
> stack overflow.
>
> The line the author uses to list files is:
> List        -> mapM_ putStrLn $ filesInArchive archive
>
> But, you're taking the head of the entries, so I don't see how you'd be
> holding on to too much data.  I just don't see anything wrong with your
> program.  Did you remember to compile with optimizations?  Perhaps try the
> author's way of listing entries and see if performance changes?
>
>
>>
>> The script in question can be found at :
>>
>> http://github.com/plaeremans/HaskellSnipplets/blob/master/ZipList.hs
>>
>> I'm using the latest version of haskell platform.  Are these libaries not
>> production ready,
>> or am I doing something terribly wrong ?
>>
>
> Not production ready would be my assumption.  I think an iteratee style
> might be more appropriate for these sorts of nested streams of potentially
> large size anyway.  I'm skeptical of anything that depends on lazy
> bytestrings or lazy io.  In this case, the performance would appear to be
> depend on lazy bytestrings.
>
> You might want to experiment with increasing the stack size.  Something
> like this:
> ./ZipList +RTS -K100M -RTS foo.zip
>
> Jason
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100810/b32d5e56/attachment.html