'any' and 'all' compared with the rest of the Report

Jan-Willem Maessen jmaessen@mit.edu
Wed, 24 Jan 2001 14:25:12 -0500


(Note, I've kept this on Haskell-cafe)
Bjorn Lisper wrote (several messages back):

> ...  What I in turn would like to add is that specifications like
> 
> any p = or . map p
> 
> are on a higher level of abstraction 
...
> The first specification is, for instance, directly data parallel 
> which facilitates an implementation on a parallel machine or in hardware.

As was pointed out in later discussion, the specification of "or" uses
foldr and is thus not amenable to data parallelism.  

On which subject Jerzy Karczmarczuk <karczma@info.unicaen.fr> later noted:
>   On the other hand, having an even more generic logarithmic iterator
>   for associative operations seems to me a decent idea.

In my master's thesis, "Eliminating Intermediate Lists in pH using
Local Transformations", I proposed doing just that, and called the
resulting operator "reduce" in line with Bird's list calculus.  I then
gave rules for performing deforestation so that sublists could be
traversed in parallel.  The compiler can choose from many possible
definitions of reduce, including:

reduce = foldr

reduce f z = foldl (flip f) z

reduce = divideAndConquer

(most of the complexity of the analysis focuses on making this choice
intelligently; the rest could be re-stated RULES-style just as the GHC
folks have done with foldr/build.)  We've been using this stuff for
around 6-7 years now.

Fundamentally, many associative operators have a "handedness" which we
disregard at our peril.  For example, defining "or" in terms of foldl
as above is very nearly correct until handed an infinite list, but it
performs really badly regardless of the input.  Thus, we need a richer
set of primitives (and a lot more complexity) if we really want to
capture "what is most commonly a good idea" as opposed to "what is
safe".  For example, our deforestation pass handles "concat" as part
of its internal representation, since no explicit definition using
reduction is satisfactory:

  concat = foldr (++) []    -- least work, no parallelism
  concat = reduce (++) []   -- parallelizes well, but could pessimize
                            -- the program

For this reason, it's hard to use "reduce" directly in more than a
handful of prelude functions.  We use reduce a lot, but only after
uglification of the prelude definitions to convince the compiler to
"guess right" about the appropriate handedness.

There is also the problem of error behavior; if we unfold an "or"
eagerly to exploit its data parallelism we may uncover errors, even if
it's finite:

> or [True, error "This should never be evaluated"]
True

My current work includes [among other things] ways to eliminate this
problem---that is, we may do a computation eagerly and defer or
discard any errors.

-Jan-Willem Maessen


My Master's Thesis is at:
ftp://csg-ftp.lcs.mit.edu/pub/papers/theses/maessen-sm.ps.gz

A shorter version (with some refinements) is found in:
ftp://csg-ftp.lcs.mit.edu/pub/papers/csgmemo/memo-370.ps.gz