[Haskell-cafe] Re: generalized shuffle

Wed Mar 25 08:23:46 EDT 2009

oleg at okmij.org ha scritto:
> Hello!
> 
>>     The reason is that in a project I need to extract random elements
>>     from a list (and I need to extract a lot of them), and using "normal"
>>     methods [1] seems to be rather inefficient.

Note that I have used an incorrect term, sorry.
What I need, in detail, is to:
Given a population of 100480507 elements, extract 17770 random samples, 
where each sample has a size between 1 and 480189.

Yes, it is again my Netflix Prize project; here I'm trying to write a 
program to generate the entire training data set using random data.

>> [1] by normal method I mean: extract a random number, in the range
>>      [0, n] (where n is the length of the sequence), and get the element
>>      indexed by that number.
> 
> BTW, have you tried to use Data.IntMap? That is, put the sequence into
> an IntMap (each element being indexed by its index) and then extract
> elements at random; IntMap seems to be quite efficient...
> 

This should be ok if I need independent random choices from a 
population, but I need random samples, instead, sorry for the confusion.

Do you think that it is possible to further optimize your tree 
structure? Or to use IntMap, instead?

> 	Cheers,
> 	Oleg

Thanks   Manlio