[Haskell] Re: sequences

Fri Apr 9 12:48:41 EDT 2004

hi Ross and others,

On Wed, 7 Apr 2004, Robert Will wrote:
>
> On Mon, 5 Apr 2004 ross at soi.city.ac.uk wrote:
>
> > To have something concrete to discuss, I've placed a structure based on
> > Edison at
> > docs: http://www.soi.city.ac.uk/~ross/seq/
> > code: http://www.soi.city.ac.uk/~ross/seq.tar.gz

Really, since you have read the comparison of my Abstract Collections with
Edison, you should have said one word, why you choose Edison as a basis
and not mine.  Well, I gues the reason is just that you don't want MPTC
and that you're perhaps afraid of all my new terminology while Edisons
names sound so familiar.

So here is a short argument in favour of my choices:

1. The class "OperationalSequence" in the Abstract Collections is an MPTC
that makes no restrictions on the element type.  I dubbed it
"Operational", because without an equality over Sequences we can't give
executable algebraic specifications.

It uses MPTC for a debatable reason: to be part of the framework of
Abstract Collections.  (The source file explains why we can't mix SPTC and
MPTC code.)  But is that really so bad?  MPTC work on all major compilers
and no-one doubts they will be in a future standard.  Furthermore, since
it uses Constructor Classes (just like Edison) there are no ambiguity
problems that would involve FunDeps!

2. Also the names are choosen not to be compatible with the past, but to
be combatible with the future: the names of the Sequence operations fit
well with those of all other Collections.  Many names are shared and have
the same semantics on all Collections.

> PS: Dessy also started with (list-compatible) Sequences and than added the
> rest.  We should certainly not make that detour on the large scale.

with "list-compatible" I mean starting with class members (cons head tail
foldr foldl...) and then add many more.  Really one should be looking for
the common base of all sequence implementations and what's more all
sequence applications.

The absurdness of the "list-compatible" approach shows in the ridiculous
default running times for 'cons' O(1) and 'snoc' O(n), really what's the
big difference between those operations?  The names alone are a nightmare:
(cons snoc init tail head last left right) (as in lview, rview) and many
functions that only work from the front.  How many different names should
we have for the front (with the first element) and the back (with the last
element)?  My proposal shows that those four terms (first last front
back) suffice to name all useful functions on lists.  Surely the naming
approach taken for my Abstract Collections is no fundamental breaking with
conventions: all the names and operators are either tradition in Haskell
or tradition in Scheme or some consistent extension of this (e.g. (<:)
and (>:) for 'add_first' and 'add_last').

Clearly my new names are easier to learn for novices, but I also claim
that learning the whole library (i.e. the "compatible" functions and the
new ones) is easier for the Abstract Collections than for Edison.  Just
because the names are more consistent, and still many names are like in
standard Haskell.

So I'd suggest, that a standard Sequence class should be based on the
Abstract Collections.  (And by the way, why not also think about Sets and
Maps, you can use them now: "import Prelude (); import Collections" is not
so hard to write!)

By the way I think that the documentation for the Sequence instances in
Ross' proposal is too redundant.  The only thing we should need is a
specification of the performance and of additional operations.  I would
propose the following time bounds:

Defaults for Sequences:
  O(logN) for all operations, except
  O(1) for "atomic queries" (is_empty, first, last, size (or length))

This is the simplest to remember and a "democratic" implementation will
just have those bounds.  (And balanced tree does, Dessy has five of them.)

Next thing needed are Deques:
O(1) for (empty single <: >: first last but_first but_last is_empty)
O(N) for the rest
(JoinLists are like that with additional O(1) (++), so they're perhaps
better called 'AppendableDeques'.  Likewise CatenableLists are really
AppendableStacks.)

Write-only Sequences: (OrdList)
O(1) for (empty single <: >: ++)
O(N) for the rest

One-side flexible Arrays or Random Access lists: are another very
"algorithmic" structure, which support efficient 'update', a function that
my Sequences don't even specify.

This reminds me, that I haven't really seen any applications of Sequence
data structures and that I decided anyway to get some examples before
fixing the details of the class.  By the way, there are already some
examples of Maps and Sets at work:
http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/examples/

The only examples for Sequences is Okasakis Tree Enumeration using Deques
and that doesn't look nice without views :-(

Robert