[Haskell-cafe] Re: FASTER primes

Thu Jan 7 05:43:44 EST 2010

Will Ness wrote:
> Heinrich Apfelmus writes:
>
>> Concerning lists as producer/consumer, I think that's exactly what lazy
>> evaluation is doing. Neither  filter ,  map  or  span  evaluate and
>> store more list elements that strictly necessary.
> 
> I laways suspected as much, but was once told that Chris Okasaki has shown that 
> any filter etc must allocate its own storage. With the peek/pull they don't 
> have to, if they are nested, and the deepest one from the real storage gets 
> pulled through some pointer chasing eventually. Span isn't so easily compiled 
> out too or is it? But that might be a minor point.

Hm? In my world view, there is only reduction to normal form and I don't
see how "allocate its own storage" fits in there. Okasaki having shown
something to that end would be new to me?

> I did that once in Scheme, as per SICP, with 'next' hanging in a stream's tail. 
> Put filters and maps on top of that (inside the new 'next' actually). But that 
> used the Scheme's lists as sorage. Another one was actual producers/modifiers 
> with {pull,peek} interface. It even produced some primes, and some Hamming 
> numbers. Then I saw Haskell, and thought I'd get all that for free with its 
> equational reasoning.
> 
> But I get the impression that GHC isn't working through equational reasoning?.. 
> I see all this talk about thunks etc.

Sure it does. Concerning the thunks, they're part of the implementation
of the reduction model (call-by-need  aka  lazy evaluation).

>> Concerning the sieves, there is a fundamental difference between the
>> imperative sieve and the functional sieves, regardless of whether the
>> latter start at p or p^2 or use a priority queue. [...]
> 
> We can directy jump to the next multiple too, it is called (+). :) :)
>
> But seriously, the real issue is that we have to merge the produced streams of 
> multiples, while the mutable-storage code works on same one, so its "merging 
> cost" is zero. And even if we are smart to merge them in a tree-like fashion, 
> we still have no (or little) control over the compiler's representation of 
> lists and retention of their content and whether it performs stream fusion or 
> not (if we use lists).

Not quite, I think there are two things going on:

1. In contrast to the standard imperative version, we are implementing
an on-line algorithm that produces arbitrarily many primes on demand.
Any imperative on-line version will need to think about storage for
pending prime filters as well.

2. Even the imperative algorithm can be interpreted as "merging" arrays,
just that the composites are magically merged at the right place (at
distance p from each other) because arrays offer O(1) "jumps". In
contrast, the functional version needs O(log something) time to
calculate where to put the next composite.

> If you could take a look at the tree-merging primes and why it uses too much 
> memory, it would be great.

Fortunately, Daniel already took a detailed look. :) But I also have a
few remarks.

> The code is in Daniel's post to which I replied, or 
> on haskellwiki Prime_numbers page (there in its rudimentary form). It's a tangent 
> to your VIP code, where instead of People structure an ordered list is just 
> maintained as a split pair, of its known (so far, finite) prefix and the rest 
> of it.

Aww, why remove the cutesy name? The VIPs will be angry for being ignored!

Jest aside, note that putting the crowd at the end of the list where it
belongs has a reason: you have a guarantee that crowds won't take any
memory until they actually appear after all the VIPs. If you put both
VIPs and the crowd into a pair, it's easier to get space leaks.

In particular, I think that your  mergeSP  has a laziness bug, it should be

    mergeSP ~(a,b) ~(c,d) = ...

This is a cure for the (potential) problem that  spMerge  first performs
a pattern match / comparison and then returns a pair constructor instead
of the other way round. For instance, your second case

    spMerge u [] = ([], u)

should actually behave like

    spMerge u v  = (a,b)
        where
        a = if null v then [] else ...
        b = if null v then u else ...

(I don't recommend rewriting  spMerge  in this fashion, the lazy pattern
in  mergeSP  will do the trick.)

Now, I'm not sure whether this is actually a problem or not. But the
version shown here, with two lazy patterns instead of just one, is much
closer to how the original  People  structure behaves.

> Then under merging these split pairs form a monoid, s can be rearranged 
> in a tree. If you haven't seen it yet, it uses a different folding structure to 
> your code, with a lower total cost of multiples production (estimated as Sum 
> (1/p)*depth):
> 
>  tfold f (x:y:z:xs) = (x `f` (y `f` z)) `f` pairwise f xs
>  comps = tfold $ pairwise mergeSP multips

The idea being that each prime produces  1/p  composites each "turn" and
it takes time  depth  to trickle it to the top? Sounds reasonable, but
remember that a prime  p  does not start to generate composites until
the "turn count" has reached p^2, so it might occupy a place of low
depth in the tree that is better spent on the previous primes. But your
experiments seem to tell that it works anyway.

By the way, I think that both  tfold  and  pairwise  need to be lazier.

    tfold f ~(x: ~(y: ~(z:zs))) = (x `f` (y `f` z)) `f` pairwise f zs

    pairwise f ~(x: ~(y:ys))    = f x y : pairwise f ys

Otherwise, large parts of the tree might be generated too early and hog
memory.

Also, a small remark on style: I don't like the repetition of  mergeSP  in

    compos = fst $ tfold mergeSP (pairwise mergeSP (multip ps))

After all, the right hand side is just another tree fold and I would
clearly mark it as such:

    compos = fst . superSmartTreeFold mergeSP . multip $ ps

    superSmartTreeFold f = tfold f . pairwise f

;)

> I think I'll have to try out your code (amended with a new folding structure)
> and see if it has less memory problems maybe. > I assume it is your code on the haskellwiki page. (?)

Yes, I put the code snippet on the wiki. And never tested it on large
numbers. ;)

Regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com