AVL Trees available

Adrian Hey ahey at iee.org
Thu May 20 08:50:27 EDT 2004


On Wednesday 19 May 2004 10:47 am, Simon Marlow wrote:

> I'm happy for this to be "the" Data.Trees.AVL library.  I'm sure others
> will want to comment on the details.  I wonder whether it would make
> sense to have a specialised AVLMap type, which would let you optimise
> the representation of AVL-based finite maps?

Do you mean a newtype wrapper for the current AVL type or distinct
algebraic type which contains both the key and the associated value
as separate fields (like the Map type folks were talking about
recently)?

If you mean the former, one thing I was thinking about doing was creating
an AVL based clone of the current Data.FiniteMap library so folk could
experiment with the effect of using both on real programs just by
changing their imports.

If you mean the latter, I'm not keen on this because it would mean
duplication of a lot of code for the new type, and I'm not sure that
it would be an optimisation at all. I considered this but decided
against it on the grounds that I suspect a major contributor to
the true cost of tree operations is heap burn rate. The AVL tree
type has only 3 fields in each constructor (not 5 as with the Map
type for example), and the code was carefully written to avoid
unnecessary heap construction wherever possible. Doing this would
result in 4 fields in each constructor.

But I may be wrong about this, and I guess it wouldn't be to hard
to find out for sure by doing a cut and paste job once the current
AVL code is a bit more stable. Keeping pairs separately has it's
own overheads, like an additional level of indirection during
searches and more heap records (so possibly slower gc). But if
the tree is so stable that lookup times dominate costs it may
be better to convert the entire thing into an array (or maybe
an unboxed array of Ints and a corresponding array of values)
and do binary searches on that instead.

> > I'd also like to book "Data.COrdering" too, if that's OK.
>
> Data.COrdering is less obviously the right thing.  Can you give some
> motivation for the design here?

Because you you generally do end up combining values which turn out
to be "equal", in some manner. If you don't use a function which
returns a COrdering, then you have to either..

 Pass the comparison and combining function separately down your
 recursive search or whatever. But this will have extra runtime
 overhead, which is often unnecessary since the only time the
 combining function is ever used is when the comparison returns EQ.

or..

 Use some default combining function (like pairing) and apply the
 real combining function later. But pairing is the only reasonable
 default combining function I can think of, and this not always
 the right type (e.g. for pushing elements in a tree or set union),
 and even when it's OK type wise you've still got the overhead of an
 additional map operation (e.g for set intersection). You could get
 around the type problem using lists I suppose, and use (++) as the
 default combining function, but this doesn't seem very attractive
 to me.

Anyway, if that doesn't persuade you I can give a more pragmatic
reason :-)

The AVL library depends on COrdering, so if it's going to exist
then COrdering has to exist too. I could make it part of the AVL
library, but since COrdering is not in any way dependent on
the AVL library (and potentially useful in it's own right elsewhere)
then I think it should exist independently of the AVL library.
This will stop people having to reinvent their own (incompatible)
COrdering equivalents.

Of course, I'm open to persuasion regarding the type name and the
support functions provided in the module, if anybody has strong
objections to what I've chosen.

Regards
--
Adrian Hey



More information about the Libraries mailing list