[Haskell-cafe] Trying to reduce memory costs of String duplicates

Sat Sep 5 19:35:08 EDT 2009

Hi Luke,

thanks, this is some very good advice as I find many duplicates in the  
data I have to iterate over.

However so far I'm unable to tell wether this actually works or not, I  
tried it a couple of times under different settings but it showed to  
difference in memory consumption. The same mem peeks as before.

Do you have some code that where you could see a before and after?

Günther

Am 05.09.2009, 19:38 Uhr, schrieb Luke Palmer <lrpalmer at gmail.com>:

> 2009/9/5 Günther Schmidt <gue.schmidt at web.de>:
>> Hi all,
>>
>> I'm reading in a data of 216k records into a map of Key, Values Pairs,  
>> the
>> values being strings.
>>
>> As it happens out of 216k String values there really are only about 6.6k
>> distinct string values, so I could save a lot of RAM if I was able to
>> "insert" only actually *new* string values into the map and use  
>> references
>> to (string) values that already are in memory instead.
>>
>> Is there a container that would, if I wanted to insert an element,  
>> return a
>> pair of either the previously inserted, equal value and the container
>> unchanged, or the new, previously unknown value and the new container
>> amended by that element?
>
> I believe a memoization of the identity function will do what you want:
>
> import qualified Data.MemoCombinators as Memo
>
> share = Memo.list Memo.char id
>
> Then pass any string through share to make/get a cached version.
>
> You might want to limit the scope of share -- eg. put it in a where
> clause for the function where you're using it -- so that it doesn't
> eat memory for the lifetime of your program, only for when you need
> it.
>
> Luke