[Haskell-beginners] Memory tuning

Daniel Fischer daniel.is.fischer at web.de
Mon Oct 11 18:52:07 EDT 2010


On Monday 11 October 2010 23:29:30, Thorsten Hater wrote:
>
> Thank you for pointing me to Unboxed Arrays and accumarray, it works
> like a charm.
> The new statistics:
>
>       35,180,328 bytes allocated in the heap
>           13,608 bytes copied during GC
>       12,027,912 bytes maximum residency (2 sample(s))
>          594,520 bytes maximum slop
>               13 MB total memory in use (0 MB lost due to fragmentation)
>
>   Generation 0:    43 collections,     0 parallel,  0.00s,  0.00s
> elapsed Generation 1:     2 collections,     0 parallel,  0.00s,  0.00s
> elapsed
>
>   INIT  time    0.00s  (  0.00s elapsed)
>   MUT   time    0.14s  (  0.17s elapsed)
>   GC    time    0.00s  (  0.00s elapsed)
>   EXIT  time    0.00s  (  0.00s elapsed)
>   Total time    0.14s  (  0.17s elapsed)
>
>   %GC time       0.0%  (0.6% elapsed)
>
>   Alloc rate    251,288,057 bytes per MUT second
>
>   Productivity 100.0% of total user, 82.9% of total elapsed

Yes, that's more like it :)

>
> I don't think there is much left to do in terms of optimization.

Generate the primitive triangles by a more efficient method, all those 
gcd's take a lot of time.

> So I'm left with the question of when to use Data.Map, if accumarray
> is so much better even for heavy number crunching as this.

That's not really heavy number crunching, it's a pique-nique.

There are several considerations.

- sparse (irregular) data
If only one in 100 possible data points actually occurs, an array is a huge 
waste of space. You quickly lose cache-locality then, and Maps are often 
faster in those cases.

- unknown bounds
That rules out arrays.

- pattern of computation
For this problem, accumArray is perfectly suited, but that is not always 
the case. Consider problems where you update your Map using previously 
stored data, then accumArray isn't applicable. In such cases Maps give you 
nice code and usually decent performance, although in most cases if your 
data is not sparse, using an unboxed mutable array (STUArray) if possible 
gives you much better performance - but at the cost of uglier code. A boxed 
mutable array (STArray) will in many cases also perform better than a Map, 
but usually not as well as an unboxed one.


Of course, arrays can't be used (or only with great convolutions) for stuff 
like Map String a, where the key can't be mapped easily to an array index.

> Especially when Data.Map advertises as efficient.

It is efficient for what it does (of course, there's always room for 
improvement), but it's not the appropriate data structure for everything.
By default, Haskell's data structures are immutable, thus if you modify a 
Map, you get a new one, which typically requires copying of O(log size) 
nodes (and the bulk of the Map is shared, otherwise performance would be 
abysmal).
But still, that is a nontrivial operation, much more costly than modifying 
a mutable map in an impure language (where such a modification is often 
just relinking a few pointers).
So if you have an algorithm that requires frequent updates, the cost of 
these operations makes itself felt.

But what happens if you use an array instead of a Map?
If the keys map easily to array indices, that is very easy to do, but if 
you use immutable arrays, each update requires copying the entire array (in 
case of boxed arrays only the pointers to the values are copied, but still, 
allocating a new memory block and copying 10 million pointers likely takes 
much longer than copying 20-25 nodes of a Map). If you use a mutable array, 
such a modification is cheap, only one array slot needs to be changed. But 
that ties you to a mutability monad (ST s or IO, mostly), which has its 
downsides too.
And if data is sparse, you again waste a lot of space.

And if the keys don't map easily to indices, well, you can implement a Map 
as an Array Int (Key,Value) or a pair (Array Int Key, Array Int Value), 
keeping the entries sorted by the keys.
In fact, if you build the map once and use it later for a lot of lookups, 
that is often better than a Map (regarding performance).
Both give you O(log size) lookup, but the array solution can have less 
space overhead.
However, inserting a new element or deleting an element becomes an O(size) 
operation, so if you have to do that a lot, this is a terrible solution, 
you should use a Map.


> (Even without Unboxed, normal Arrays beat Maps by a factor of 2)
> Best regards.



More information about the Beginners mailing list