[Haskell-cafe] Reference for technique wanted

Thu Nov 4 05:56:21 EDT 2010

> The bottom line is that
> 
> - in logic programming languages, building a list by working on
>   a pair of arguments representing a segment of the list is the
>   NORMAL way to build a list; it's as fast as it gets, and the
>   list is inspectable during construction.

modulo usage patterns: e.g., mostly linear use.

> - at least in SML, the (fn ys => xs @ ys)  [Haskell: \ys -> xs ++ ys]
>   approach is stunningly slow, so slow as to make even naive use of
>   append more attractive for most uses of; it is not the normal way
>   to build lists, and for good reason; you can do nothing with a list
>   so represented except compose and apply.

Even in your SML code, the "boring old plain lists" seemed to 
be list_flatten, which uses difference lists in disguise, and won
your little test, right? Using Haskell notation:

 flat (LEAF x) ys = x : ys
 flat (FORK(a,b)) ys = flat a (flat b ys)
-->
 flat (LEAF x)  = \ys->x : ys
 flat (FORK(a,b)) = \ys->flat a (flat b ys)
-->
 flat (LEAF x)  = (x :)
 flat (FORK(a,b)) = flat a . flat b

As in Prolog, it is often better not to make the structure
explicit, though I am slightly disappointed that GHC's
optimizer doesn't give the same performance for the
two versions (when summing a flattened strict tree in 
Haskell, I get roughly a factor of 2 between naive list 
append, explicit diff lists, as in the lower version of flat, 
and hidden diff lists, as in your list_flatten, the latter 
being fastest).

Of course, one has to be careful about usage patterns and
efficiency considerations. For instance, head and tail with
Haskell diff lists are not O(n), as you mentioned, because
of non-strictness. And building up lots of nested operations
under a lambda means that many of those operations are 
not likely to be shared, but repeated every time the lambda
is applied (such as converting back to plain lists), so one 
has to be careful about not doing that too often. etc.

> The practical consequence of this is that calling both techniques by
> the same name is going to mislead people familiar with one kind of
> language when they use the other:  logic programmers will wrongly
> expect dlists to be practically useful in functional languags,
> functional programmers will expect them to be impractical in logic
> programming languages.

I do tend to expect a little more insight from Haskellers, but
perhaps you're right. Having a pre-canned library instead of
writing out the near-trivial code patterns oneself, one might 
be surprised when expected miracles don't happen (diffarrays
were one such case for me;-).

I'd still call both patterns difference lists, because it can be
useful to know about the connections, but if you could suggest
text for the DList documentation that warns of the differences,
and of performance pitfalls, I'm sure the package author would
be happy to add such improvements.

If we ever get per-package wikis for hackage, you could add 
such comments yourself.

Claus