[Haskell-cafe] uvector package appendU: memory leak?

Sun Mar 29 17:17:50 EDT 2009

Claus Reinke ha scritto:
>>>     IntMap (UArr (Word16 :*: Word8))
>>>
>>> I was adding elements to the map using something like:
>>>
>>>     v = map singletonU (a :*: b)
>>>
>>>     insertWith appendU k v m
>>>
>>> However doing this eats a *lot* of memory.
> 
> Since 'insertWith' doesn't actually do the 'appendU', the appends
> will also be compressed in time, at point of use, rather than spread
> out in time, over construction. Which might be inconvenient if large
> volumes of data are involved.
> 

Ah, ok; that may explain the problem, but I'm still not sure.

With this program:
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3063

Memory usage seems ok.
955 MB, against 622 MB when ratings are grouped by movies
(  using [(MovieID, UArr (Word32 :*: Word8))]  )

The version using `insertWith appendU` is here:
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3063#a3065

I still can not explain so much difference in memory usage.

>>> So the question is: why appending an array of only one element to an  
>>> existing array causes memory problems?
>>
>> It must copy the entire array.
> 
> And doing this repeatedly, with arrays of increasing length, would
> not be a good idea. 

I can't really see other solutions.

> For single-threaded use, one might use fusion
> to avoid the construction/recopying of the intermediate arrays.

Fusion is not possible, in my case, IMHO.
I'm parsing movie ratings, where we have customers ratings for each 
movie in separate files (17770 total movies).

Each file has format
<movie1>:
<customer1>,<rating1>,<date1>
<customer2>,<rating2>,<date2>
...
<movie2>:
<customer3>,<rating3>,<date3>
...

I want to group ratings by customers, instead of grouping them by movies.
Stream fusion is only possible in the latter case (but in this case I 
don't even need an IntMap).

I'm missing something?

 > [...]

Manlio