<div>[CC'ing some people involved in the relevant ticket discussions or showing up in git blame]</div><div><br></div><div>Hello all,</div><div><br></div><div>So it seems random was due for an overhaul. I would like to initiate a discussion period to talk about what changes should happen before the next major release. I think this is timely because there is <b>already</b> a pending backwards-incompatible change in the API<a href="http://hackage.haskell.org/trac/ghc/ticket/4314"> (factoring out the SplittableGen class)</a>. We might as well make any other fixes now and make all the changes at once. For reference, I've attached a list of the open tickets I know about pertaining to System.Random at the end of this email. </div>
<meta http-equiv="content-type" content="text/html; charset=utf-8"><div><br></div><div>There implementation has two major pieces:</div>
<div><br></div><div> (1) Sources of random bits (class RandomGen)</div><div> (2) Instances of class Random that map use random bits to create Haskell types</div><div><br></div><div>For now I've left (1) alone and have made heavy changes to part (2) in a branch to fix correctness and performance problems:</div>
<div> <a href="https://github.com/haskell/random/tree/new_api" target="_blank">https://github.com/haskell/random/tree/new_api</a></div><div><br></div><div>These new changes require a small API change to RandomGen. The issue is that the legacy RandomGen API isn't very good at generating random <b><i>bits</i></b>. Rather, it <font class="Apple-style-span" face="'courier new', monospace">next</font> creates numbers in an arbitrary range: <font class="Apple-style-span" face="'courier new', monospace">genRange</font>. First of all, I would argue that -- completely aside from performance considerations -- this extra flexibility for RandomGens increases the likelihood of errors in the clients of the API. As supporting evidence I would cite tickets <a href="http://hackage.haskell.org/trac/ghc/ticket/5278">#5278</a> and <a href="http://hackage.haskell.org/trac/ghc/ticket/5280">#5280</a> -- it looks like even use of the API in System/Random.hs itself was incorrect!</div>
<div><br></div><div><b>My proposed API addition/restriction:</b></div><div><br></div><div>I propose that we add a "genBits" function that reports how many bits of randomness are created by a generator.</div><div>
<br></div><div><font class="Apple-style-span" face="'courier new', monospace"> class RandomGen g where </font></div><div><font class="Apple-style-span" face="'courier new', monospace"> ...</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> genBits :: g -> Int</font></div><div><meta http-equiv="content-type" content="text/html; charset=utf-8"><div><br></div><div>I added a default implementation of genBits that computes the answer based on genRange. This need not be a backwards-incompatible change.</div>
</div><div><br></div><div>What about generators that have a genRange which is not a power of two? We could consider making genBits return a Maybe value to distinguish these. But I would prefer that we<b> </b>instead<b> restrict RandomGen instances, requiring that they generate a clean number of random bits</b>. This requires that genRange be (0,2^N-1) or (-2^N,2^N-1). But we can tolerate slight deviations; for example, in the <a href="https://github.com/haskell/random/tree/new_api">new_api branch</a>, stdGen has a genRange of (0,2^31 - 86). Probably close enough. (In any case, in the future I'd like to see stdGen replaced with something better anyway.)</div>
<div><br></div><div><b>An alternative would be to go even further and require every RandomGen instances to generate the full range of Ints.</b> I think any future replacement for stdGen will probably meet this additional requirement, though I would be interested in important generators that do not, and in any references on the original API design of RandomGen. As long as we have <font class="Apple-style-span" face="'courier new', monospace">genBits</font> I don't feel strongly about it but am interested in other opinions.</div>
<div><br></div><div>Anyway, I've used genBits to reimplement the random and randomR methods. Let me know what you think. This new version should fix tickets <a href="http://hackage.haskell.org/trac/ghc/ticket/5278">#5278</a>, <a href="http://hackage.haskell.org/trac/ghc/ticket/5280">#5280</a>, <a href="http://hackage.haskell.org/trac/ghc/ticket/5133">#5133</a>. But give me a code review and help me head off future new bugs!</div>
<div><br></div><div>Regarding other tickets:</div><div> <a href="http://hackage.haskell.org/trac/ghc/ticket/3575">#3575</a> and <a href="http://hackage.haskell.org/trac/ghc/ticket/3620">#3620</a> will have to wait for stdGen to get replaced. Whereas <a href="http://hackage.haskell.org/trac/ghc/ticket/427">#427</a> and <a href="http://hackage.haskell.org/trac/ghc/ticket/2280">#2280</a> are complaints about the performance; they will also require a new stdGen to be adequately addressed. However, the current batch of changes <i>does</i> change the performance of some Random instances significantly. (See appendix.) There's plenty of room for improvement, and in particular, the new random approach would work better if <a href="http://hackage.haskell.org/trac/ghc/ticket/4102">#4102</a> were implemented in Data.Bits.</div>
<div> The excessive laziness problems are still there. (<a href="http://hackage.haskell.org/trac/ghc/ticket/4218">#4218</a> is still unfixed.) Perhaps that's something we can fix in this audit.</div><div><br></div>
<div>I'd also be happy to hear other proposals for API changes and additions. System.Random can't depend on bytestring, so I doubt we want block RNG in the RandomGen class. I could, however, imagine having nextWord16, nextWord32, etc. </div>
<div> Something worthwhile if we're doing an API audit is to look at some of the other Hackage packages. Here are some that I am familiar with:</div><div> <a href="http://hackage.haskell.org/package/DRBG-0.2.2">http://hackage.haskell.org/package/DRBG-0.2.2</a><meta http-equiv="content-type" content="text/html; charset=utf-8"></div>
<div><meta http-equiv="content-type" content="text/html; charset=utf-8"> <a href="http://hackage.haskell.org/package/mersenne-random-pure64">http://hackage.haskell.org/package/mersenne-random-pure64</a></div><div><meta http-equiv="content-type" content="text/html; charset=utf-8"><div>
<a href="http://hackage.haskell.org/package/mwc-random">http://hackage.haskell.org/package/mwc-random</a></div></div><div><meta http-equiv="content-type" content="text/html; charset=utf-8"> <a href="http://hackage.haskell.org/package/intel-aes">http://hackage.haskell.org/package/intel-aes</a></div>
<div><br></div><div>And some that I'm not:</div><div><meta http-equiv="content-type" content="text/html; charset=utf-8"> <a href="http://hackage.haskell.org/package/random-fu">http://hackage.haskell.org/package/random-fu</a></div>
<div><meta http-equiv="content-type" content="text/html; charset=utf-8"> <a href="http://hackage.haskell.org/package/gsl-random">http://hackage.haskell.org/package/gsl-random</a></div><div><meta http-equiv="content-type" content="text/html; charset=utf-8"> <a href="http://hackage.haskell.org/package/cprng-aes">http://hackage.haskell.org/package/cprng-aes</a></div>
<div><br></div><div>Pardon the long email!</div><div><br></div><div>Cheers,</div><div> -Ryan</div><div><br></div><div><b><u>Appendix A: List of tickets</u></b></div><div><br></div><div>#5278<span style="white-space:pre-wrap">        </span>System.Random.randomIvalInteger makes invalid assumptions about RandomGen<span style="white-space:pre-wrap">        </span></div>
<div>#5280<span style="white-space:pre-wrap">        </span>System.Random commits (rand `mod` base) error.<span style="white-space:pre-wrap">        </span></div><div>#5133<span style="white-space:pre-wrap">        </span>Random instance for Float can generate values out of requested range</div>
<div>#4218<span style="white-space:pre-wrap">        </span>System.Random is way too lazy<span style="white-space:pre-wrap">        </span></div><div>#427<span style="white-space:pre-wrap">        </span>Random.StdGen slowness<span style="white-space:pre-wrap">        </span></div>
<div>#2280<span style="white-space:pre-wrap">        </span>randomR too slow<span style="white-space:pre-wrap">        </span></div><div>#3575<span style="white-space:pre-wrap">        </span>mkStdGen and split conspire to make some programs predictable<span style="white-space:pre-wrap">        </span></div>
<div>#3620<span style="white-space:pre-wrap">        </span>The seeds generated by split are not independent<span style="white-space:pre-wrap">        </span>libraries/rando</div><div>#3352<span style="white-space:pre-wrap">        </span>Data.Word should define instances of Random for each type</div>
<div><br></div><div><b><u>Appendix B: Performance data</u></b></div><div><br></div><div>Here's an "S curve" showing the speedup or slowdown of the new code when generating different types of data with random or randomR (the R labeled ones are randomR). They range from from some massive speedups on floats and doubles as well as a hefty performance regression when Integer is restricted to a small range by randomR. (I'm looking into it.)</div>
<div><br></div><div>speedup of new over old:</div><div>new_api revision f85c6a55b731d5e380c1d6e880105c9933aa9d48</div><div>old, revision 130e421e912d394653a43c987be99</div><div> <font class="Apple-style-span" face="'courier new', monospace"> 161.87843858371693 "Float"</font></div>
<div><div><font class="Apple-style-span" face="'courier new', monospace"> 120.29838090036381 "Float R"</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> 54.66538723769602 "CDouble"</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> 41.01849643134359 "CDouble R"</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> 5.546651773234119 "Int"</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> 5.250176145938512 "Integer"</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> 2.4963231822924556 "Word16"</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> 2.340365231384218 "Bool R"</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> 2.2810769806267412 "Bool"</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> 2.1576446209293216 "Double R"</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> 2.059792819214518 "Double"</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> 1.4283216783216783 "Integer R Big"</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> 1.235562774008646 "Word16 R"</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"><b> 1.008346493029544 "stdGen/next"</b></font></div><div><font class="Apple-style-span" face="'courier new', monospace"> 0.713868706483777 "Int R"</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> 0.6738365001610773 "Char R"</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> 0.5974339051794343 "Char"</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> 0.16546783206240073 "Integer R"</font></div></div><div><br></div><div>The baseline stdGen/next hasn't been changed. Thus it's "speedup" is 1.0.</div>
<div><br></div><div><br></div>