Lists representations (was: What does FP do well? (was ...))

Fri, 31 May 2002 19:01:42 +0100

Long away and far ago (or something like that;), there was a
discussion on Lists implemented as arrays rather than linked
structures, during which Jerzy Karczmarczuk commented:

> What bothers me quite strongly is the algorithmic side of operations
> upon such objects. 
> 
> Typical iterations map- (or zip-) style: do something with the head, pass
> recursively to the tail, would demand "intelligent" arrays, with the indexing
> header detached from the bulk data itself. The "consumed" part could not be
> garbage collected. In a lazy language this might possibly produce a considerable
> amount of rubbish which otherwise would be destroyed quite fast. The
> concatenation of (parts of) such lists might also have very bad behaviour.
> 
> Can you calm my anxiety?
> 
> Jerzy Karczmarczuk

The reason I wanted to reply is that I can offer one data point on
this. An even longer time ago, there were the various reduction
systems developed in Kiel, implementing the Kiel Reduction Language
KiR, a variant of Berkling's reduction languages (the Berkling
pointed to in Backus' Turing Award Lecture). KiR lacked lots of
useful features Haskell has, and Haskell's implementations still
lack lots of useful features KiR's had. I dearly miss those
features, but that is not the topic here (I don't know whether any
of the systems still install or even run, but see the Manual at
http://www.informatik.uni-kiel.de/~base/ for more info).

The topic here was list representations. KiR's implementations moved
from interpreted graph-reduction over compiled graph-reduction with
a code interpreter to compiled graph-reduction with compilation via
C, all more or less with the same high-level front end. All of these
represented lists as vectors (KiR was dynamically and implicitly
typed, btw), and the memory was divided into an area for fixed-size
descriptors pointing to each other or into the second area, the
heap, for variable-sized data blocks. The descriptor area was
reference-counted and simply reused free space (most KiR variants
were call-by-value), the other area needed memory compactification
when space grew fragmented.

A list's elements (or pointers to their descriptors) went into a
contiguous block in the heap, and the descriptors made it possible
to share subsequences of elements between different lists
(descriptors were large enough to hold a pointer to the start of the
sequence and its length). Quite as Jerzy suspected. Supported
operations included both array and list operations, append required
the allocation of a new heap block and copying of *both* lists, but
was provided as a primitive (the standard approach for systems that
started out as interpreters: a good mix of efficient primitives and
interpreted user defined constructs).

As others have pointed out, this looks rather inefficient,
especially for longer lists, so when we set out to make measurements
for a JFP paper [1], comparing with the likes of ghc, we expected to
be beaten, but hoped to be not too far away, at least with the
latest via-C implementation..

Benchmarks are always difficult, but especially so between so
different languages: in KiR, we could easily and correctly execute
programs that in Haskell, either wouldn't even compile, or wouldn't
terminate, or wouldn't show any result (with similar problems the
other way round). And after adapting to the common subset of
algorithms, a translation to Haskell might mean that a complex
program execution might return immediately, as the compiler and
runtime system lazily determined that none of it was needed for the
program output (compiled Haskell programs report almost no reduction
results, only explicit program output, or show-able results).

With all these preliminaries and caveats, and the standard
disclaimer that all benchmarks are useless, but interesting, the
relevant benchmark is the infamous "pretty quicksort" for some 4000
elements (qusort in the paper - lots of finite lists, traversals,
fragments and appends, just like the typical Haskell program written
without any concern for efficiency; Haskell programs concerned with
efficiency tend to look rather different).

To our astonishment, even the code interpreting implementation
(which should otherwise be in the ballpark of Hugs) outperformed
ghc, hbc, and Clean on this example (call-by-value also played a
role: compiled sml was in the same area as compiled KiR, but both
only slightly faster than code-interpreted KiR, so data representation
and primitives seemed to play the main role). This prompted us to
include Augustsson's sanitized variant of quicksort (qusortbin in
the paper - from the hbc libs) as well, which gave the results
everyone expected (it substantially modifies the algorithm to a
profile better supported by the current list representation, e.g.,
no appends). [and before anyone accuses me of advocating functional
quicksort: the naive quicksort is useless, and even the real one
isn't the best choice in many cases;-]

But the moral for the current discussion: a more intelligent list
representation could have substantially more benefits for the
average Haskell program than any compiler optimization twiddling,
and I'd really like to see someone (PhD student?) investigating that
topic seriously, as the answers are unlikely to be obvious. 

The representation chosen in the reduction systems could be a first
hint, but as Jerzy points out, things may be more complicated in the
context of Haskell.  For comparison, Haskell array performance was
somewhere between non-existent and terrible in those days (another
clear win for both the compiled and the interpreted reduction
systems) and has only recently improved somewhat. That needs to
continue and, please, someone do the same for lists!

Just my old 2 Pfennige (former currency;-),
Claus

[1] D. Gaertner, W. Kluge: pi-RED+: An Interactive Compiling Graph 
    Reduction System for an Applied Lambda-Calculus 
    Journal of Functional Programming, 6 (5), 1996.