Dan Weston westondan at imageworks.com
Tue Jul 24 19:47:07 EDT 2007

``` > But this simple modification allows us to use only O(sqrt(n)) space at
> the point we print the nth prime:

I wouldn't call your modification simple. It appears that you are trying
to put smarts into the garbage collector and memoization logic, the
first step towards a priority queue of memoized results.

data Nat = Zero | Succ Nat

firstNprimes :: Nat -> [Integer]
firstNprimes                Zero  = []
firstNprimes (       Succ \$ Zero) = 2 : firstNprimes         Zero
firstNprimes (Succ . Succ \$ Zero) = 3 : firstNprimes (Succ \$ Zero)
...

The resulting sublists should be shared, so that each memoized partial
evaluation is just a head and a pointer, with space O(2*n).

Suppose further you could tell the garbage collector to discard the
highest Nat firstNprimes sublists first, forcing a recomputation
whenever needed again.

Then, assuming you use only the one (outer) primes function, your primes
function (which needs all the firstNprimes) has the lowest priority and
gets recalculated on memory exhaustion, but only back to the highest
known prime, which will eventually (and forever thereafter) be the
highest firstNprimes that fits in memory.

The code uses the most memory it can for efficiency, then continues on
maximally efficiently from there on the fly.

This is the sort of control you are getting on the cheap with your
non-trivial use of two primes functions. It is the kind of logic that
might be difficult to automate.

Dan Weston

Melissa O'Neill wrote:
> I've tended to make is that referential transparency allows the language
> to be much more aggressive about things like common subexpression
> elimination (CSE) than traditional imperative languages (which need to
> worry about preserving proper side-effect sequencing).
>
> But a recent example has left me thinking that maybe I've gone too far
> in my claims.
>
> First, lets consider a simple consumer program, such as:
>
>> printEveryNth c l n  = do    print (c', x)
>>                              printEveryNth c' xs n
>>                        where c'   = c+n
>>                              x:xs = drop (n-1) l
>
> Note that we can pass this function an infinite list, such as [1..], and
> it won't retain the whole list as it prints out every nth element of the
> list.
>
> Now let's consider two possible infinite lists we might pass to our
> consumer function.  We'll use a list of primes (inspired by the recent
> discussion of primes, but you can ignore the exact function being
> computed).  Here's the first version:
>
>> primes = 2 : [x | x <- [3,5..], all (\p -> x `mod` p > 0)
>> (factorsToTry x)]
>>     where
>>         factorsToTry x = takeWhile (\p -> p*p <= x) primes
>
> As you might expect, at the point where we print the nth prime from our
> infinite list, we will be retaining a list that requires O(n) space.
>
> But this simple modification allows us to use only O(sqrt(n)) space at
> the point we print the nth prime:
>
>> primes =
>>     2 : [x | x <- [3,5..], all (\p -> x `mod` p > 0) (factorsToTry x)]
>>     where
>>         slowerPrimes =
>>             2 : [x | x <- [3,5..], all (\p -> x `mod` p > 0)
>> (factorsToTry x)]
>>         factorsToTry x = takeWhile (\p -> p*p <= x) slowerPrimes
>
> Notice the gigantic common subexpression -- both primes and slowerPrimes
> define exactly the same list, but at the point where we're examining the
> nth element of primes, we'll only have advanced to the sqrt(n)th element
> of slowerPrimes.
>
> Clearly, "simplifying" the second version of primes into the first by
> performing CSE actually makes the code much *worse*.   This
> "CSE-makes-it-worse" property strikes me as "interesting".
>
> So, is it "interesting"...?  Has anyone worked on characterizing CSE
> space leaks (and avoiding CSE in those cases)?  FWIW, it looks like
> others have run into the same problem, since bug #947 in GHC (from
> October 2006) seems to be along similar lines.
>
>     Melissa.
>
> P.S.  These issues do make massive difference in practice.  There is a
> huge difference between taking O(n) and O(sqrt(n)) space -- the
> difference between a couple of megabytes for the heap and tens or
> hundreds of megabytes.
>
> _______________________________________________