genBits: small addition to random API + largely new implementation that needs code review

Wed Jun 29 04:55:05 CEST 2011

Well, my main motivation in the work over the last few days was to fix
correctness problems in System.Random (and performance as secondary).  I
think this matters whether or not it is shipped only for Haskell 98 or not.
 I've been focused mainly on the Random instances.  Does anyone know good
points of comparison for these?  I've seen lots of PRNGs on Hackage but not
lots of code that maps that randomness onto the usual types (Double,
Integer, Char, etc).

My hope is to improve random to the point that it provides a good, simple
default library.  From a pedagogic perspective I imagine many new users want
randomness but don't want to scour hackage and learn the more complex
interfaces.  So as long as 'random' is a clear default I am happy, whether
or not it's part of the GHC distro or Haskell Platform.

In my mind a good default implementation has decent-to-good performance,
generates uniform distributions (#5278, #5280), and has statistically sound
splitting.  If we had a those three things maybe there would not have been
as much motivation to provide new RNGs!

*Re "nextBits":*
Thomas, could we flesh out the performance argument for bulk random bit
generation?  It seems to me that if we do a good enough job with Data.Bits
that an interface that produces a word at a time is not that bad in terms of
throughput.  (On the bit producer side, producing in bulk and then doling
out word-at-a-time seemed to work pretty well for me when I was testing it
in intel-aes.)  The biggest problem I see is that a kind of redundant
buffering could occur if both the bit-producer and the bit-consumer want
bulk randomness but have to communicate it through individual words.
   That said, if bytestring dependency is ok -- why not?  It then becomes a
backwards-compatible change that may open an extra opportunity to some RNGs.

*Re "newGen":*
I like Thomas's interface and I've been thinking about the "newGen" issue as
well:
   newGen :: ByteString -> g
Here ByteString does seem most appropriate, anything else -- Int or [Int] --
seems unsatisfactory.

Just for arguments sake I'd like to reexamine the "SplittableGen" topic in
light of "newGen":
   http://www.haskell.org/pipermail/libraries/2010-September/014252.html
It seems that with newGen we could provide a default implementation of split
and keep it in RandomGen.  Yes, it would be poor.  But it would mean that
things like Mersenne don't have to throw an error.  Further, if stdGen were
to, say, provide cryptographically strong RNG with sound splitting -- would
the quality of the default split be as much of a concern?  Someone who
chooses anything other than stdGen in that case would have to have a good
reason for doing so, and I think can be saddled with the consequences.

Adding newGen instead of subtracting split would not be Haskell-98 backwards
compatible.  However, it would create a *different* kind of breakage than
subtracting split.  Subtracting split breaks the client code, whereas adding
newGen just breaks the instances of RandomGen (hopefully few).

Don't we want to be pushing splittable tree RNG *more* in the future?  GHC
for parallelism, right!?  Let me give an example of the feature's value --
the Cilk  folks are making significant changes to their runtime just to get
deterministic parallel RNG:

http://groups.csail.mit.edu/sct/wiki/index.php?title=Other_Projects#Deterministic_Parallel_Random-Number_Generation

My feeling is that if we could get the best of what's out there into one
library with a simple interface and made it default that it would have a
positive impact and avoid unnecessary balkanization and do-it-yourself in
the random number department.  In order to accomplish this maybe we would
need two stdGens.  One maximally fast and one maximally sound --  perhaps
Mersenne Twister and crypto-based splittable RNG?

By the way, perhaps that "genBits" method should probably be rename
"genRangeBits"?

  -Ryan

On Tue, Jun 28, 2011 at 6:01 PM, Thomas DuBuisson <
thomas.dubuisson at gmail.com> wrote:

> Ryan Newton <rrnewton at gmail.com> wrote:
> > There implementation has two major pieces:
> >     (1) Sources of random bits (class RandomGen)
> >     (2) Instances of class Random that map use random bits to create
> Haskell
> > types
>
> I'd say there are three pieces, initial entropy (clock, external seed,
> system crypto random generator), deterministic generator interface
> (PRNG, the RandomGen class), and the instances of that class.
>
> I tried to answer the first item in System.Crypto.Random.  The 'random'
> package
> never really had an answer for entropy and I'm not sure what the
> community thinks about that.  Perhaps answering this problem in slightly
> obscure packages is OK.
>
> > The issue is
> > that the legacy RandomGen API isn't very good at generating random bits.
>
> Agreed.  This is why CryptoRandomGen uses ByteString (so we can
> generate any number of BYTES of random values).
>
> As I tried to argue when I make RandomGen more polymorphic, this is an
> issue with hardcoding the type as Int.  Unfortunately, people didn't
> accept that alteration and I feel continuing to generating Int while
> allowing devs to discover the amount of entropy isn't taking things
> far enough.
>
> > I'd also be happy to hear other proposals for API changes and additions.
>
> As I say below, I don't understand why we can't use ByteString.  There
> are (or should be) rather fast decodings of all the popular primitive
> types from bytestrings, so this takes care of our polymorphism issue.
>
> I accept that, unlike the CryptoRandomGen, we don't want an explicit
> failure for RandomGen.  But how about a common interface for
> instantiating generators ('newGen')?  Is there a reason not to have that?
>
> --- CODE ---
> class RandomGen g where
>    next :: g -> (Int, g)
>    nextBits :: g -> BitLength -> (ByteString, g)
>    genBits :: g -> Int
>    newGen :: ByteString -> g
> --- END CODE ---
>
> Also, perhaps we could still have an explicit reseed?
>
> --- CODE ---
> class Reseedable g where
>    reseed :: g -> Bytestring -> g
> --- END CODE ---
>
> I'll stop here before I entirely recreate CryptoRandomGen, but without
> the explicit errors and in more classes (the next step would be a method
> for querying how much entropy is needed for instantiation).
>
> >  System.Random can't depend on bytestring, so I doubt we want block RNG
> in
> > the RandomGen class.
>
> Why can't it?  System.Ranomd isn't part of H2010 and H98 already needs
> its own random module.
>
> I'll try to make time to review the code, you'll hear if I do.
>
> Cheers,
> Thomas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/libraries/attachments/20110628/8a2dbad6/attachment.htm>