Map folds in containers: why no worker/wrapper split?

Milan Straka fox at ucw.cz
Thu Jun 24 06:03:37 EDT 2010


Hi,

I agree the strictness is undocumented and inconsistent (I was bitten by
it several times).

I did not mention it much in the paper, maybe I should have.

I want to work on the patches now with the knowledge of the benchmark
results, and the strictness info is another thing I want to deal with.

BTW, the paper speaks of sets only, but mentions that it applied to maps
too. The benchmarks contains both Set/Map and IntSet/IntMap and the
initial patches were also to all there four containers.

Cheers,
Milan

>> I did quite a lot of benchmarking, strictness analysis and performance
>> improvements to the containers package recently, see the draft of my
>> paper http://fox.ucw.cz/papers/containers/containers.pdf.
>>
>> Next week I will start finalizing the patches and submit them. The 'not
>> generating a worker/wrapper' issue is one of several on my radar, so
>> this should be resolved.
>
> That is good to hear. It is great that someone is doing the work
> needed to pin down, measure, document, and perhaps even cure some of the 
> performance issues in containers. If nothing else, a published summary of 
> the performance issues that have popped up over the years would be a 
> start.
>
> Btw, the first thing I'd look for in a paper on containers performance,  
> and the one issue that tends to trip up most users, is strictness. So I  
> was surprised not to see a discussion of this in your draft (at least
> not in a quick browse). Could it be that your work has focussed on
> the Sets rather than the Maps in containers?
>
> Lots of performance problems with containers are about
> pragmatics, not representation, eg, one knows that Maps are
> strict in keys but available operations happen to be defined
> in such a way that one cannot easily ensure other desirable
> strictness properties.
>
> For the worker/wrapper split, there are non-strict definitions
> of higher-order functions, which the compiler could make strict
> by using strictness information about the calling context if the
> definitions were written differently.
>
> For other operations, turning the non-strict definitions into
> strict versions (when strict versions are more appropriate;
> ideally, both variants should be well supported and documented) is often 
> difficult for users of containers. 
>
> The issues are also non-obvious, which is perhaps worse, because 
> Map/IntMap is often used to attempt to speed up a loop accumulator, with 
> the result that thunks pile up behind the scene and performance goes down 
> until large problem sizes cannot be handled. At this point, users are no 
> longer aware that they built in the performance problem right at the 
> start, by using a supposedly efficient data structure
> in an unfortunate way - they think it is about Haskell and large 
> datasets.
>
> Some examples that I happen to recall (there are more in the list 
> archives of cafe and libraries, under various keywords, as the original 
> poster doesn't realize that the root cause is in containers; there are 
> also some related tickets in the GHC trac):
>
> - IntMap and Map expose different APIs wrt strictness
>    http://www.haskell.org/pipermail/libraries/2009-March/011399.html
>    http://www.mail-archive.com/haskell-cafe@haskell.org/msg48925.html
>
> - to work around the non-strict Map fold, users have to
>    resort to nested folds (non-strict fold to convert to list,
>    then strict foldl' over list) and other tricks:
>    http://www.haskell.org/pipermail/haskell-cafe/2010-June/079098.html
>
>    similarly non-trivial thinking is required to work around other
>    no-control-over-strictness issues in the APIs, bypassing the    
> obvious functions and using non-obvious ones based on
>    knowledge about their implementation
>
> - strictness is used inconsistently:
>    http://hackage.haskell.org/trac/ghc/ticket/4109
>
> - strictness is not documented: !!!
>    search for "strict" in the Data.Map/IntMap documentation..
>
>    this is really bad - the only way for users to become aware
>    of strictness problems is by running into them, and the only    way to 
> fix strictness-related performance issues is by referring
>    to the implementation (in theory, an implementation change
>    could invalidate lots of performance fixes in production code)
>
> It seems that strictness documentation and control would make for a large 
> section in a paper on containers performance.
>
> Claus
>


More information about the Libraries mailing list