[Haskell-cafe] How to select n random words from a file ...

Jerzy Karczmarczuk jerzy.karczmarczuk at unicaen.fr
Mon Jun 11 11:29:02 CEST 2012


KC:
> An interesting related problem is if you are only allowed one pass 
> through the data how would you randomly choose one word.

Let's choose  n items.

You must know the length of the sequence, of course, otherwise the 
'probability' loses its sense. So, for lists it might not be "just one 
pass"...

Suppose the length of the sequence be m.

Suppose you have a random generator  called rg, just a simple function 
which transforms: seed -> seed' , between 0 and 1

Make n and m real to make the typechecker happy.
Then the most straightforward solution for lists is:

nran l n = nr l m n seed where
   m = fromIntegral(length(l))
   nr [] _ _ _ = []
   nr (x:q) m n seed =
     let seed'=rg seed
     in  if    seed' < n/m then x:nr q (m-1) (n-1) seed'
         else  nr q (m-1) n seed'

-- =========

Now, you may make it tail-recursive, use a different random generation 
protocol, or whatever. I believe that this solution is known for years...

Jerzy Karczmarczuk




More information about the Haskell-Cafe mailing list