Representing cyclic data structures efficiently in Haskell

08 Jul 2003 12:37:14 +0100

Hi Sarah,

On Mon, 2003-07-07 at 16:04, Sarah Thompson wrote:
> What is the best way to represent cyclic data structures in Haskell?
> 
> Lists are trivial. Trees are trivial. But what if the thing you're 
> trying to represent is shaped like an electronic circuit, ...
> ...
> How do people typically solve this problem at the moment? I know a few 
> people out there have written hardware compilers in Haskell and similar 
> languages, ...

You're right, this is a longstanding problem!

I've experimented with several solutions to it in Hydra, a pure
functional hardware description language whose aims include preserving
referential transparency, and also keeping within a natural functional
style.  My view is that if you have to resort to encoding a C-style
solution in a functional language, then you might as well use C instead.

Anyway, there are several papers on Hydra that are relevant.  The first
solution, described in a paper from 1988, is to build the circular
graphs in the obvious way and then to use a pointer-equality predicate
to traverse the graphs safely.  Of course, this makes the hardware
description language impure, it breaks referential transparency, and it
has other drawbacks.  You can find that paper on my web page, but it was
written before Haskell appeared, and it uses an older functional
language with different syntax.

I wrote a paper about the problem in 1992,
www.dcs.gla.ac.uk/~jtod/papers/1992-Netlist/  This paper discusses the
problems with the pointer-equality solution and introduces naming as a
pure alternative.  The idea of naming is to give unique labels to some
of the nodes, so that you can build the circular graph in the usual way,
and you can also traverse the graph safely.  This is a pure functional
solution, and it has a lot of advantages over adjacency lists, but it
does require that the labels are introduced correctly.  The best way to
put the labels in is by a transformation.

Currently, Hydra uses an automated transformation implemented with
Template Haskell to introduce the names.  I think this is the ideal
solution: it results in a very clean notation for specifying the
circuit, and it supports traversal of the graph, yet it preserves simple
equational reasoning and all the supporting algorithms are in a pure
functional style. I'm currently writing a paper on the details of this
solution.  It won't be finished for some time, but there already exists
a brief survey of the solution, which you can find at:
www.dcs.gla.ac.uk/~jtod/drafts/ ("Embedding a hardware description
language in Template Haskell")  This paper focuses on the way Hydra is
implemented as a domain specific language embedded in Haskell, and the
last part of it talks about the graph traversal issues.  This is a
draft, submitted for publication, and any comments on it would be
welcome!  (However, the actual method used in Hydra is considerably more
complex than what's presented in the draft paper, because there are a
number of other issues that need to be addressed in addition to just
traversing the circular graph safely.)

There are a lot of alternative approaches to the problem.  They will be
compared in more detail in the paper-in-progress, but briefly they
include: (1) Using adjacency lists, which have the disadvantages that
you pointed out. (2) Using a monadic approach, which can be used in
several distinct ways to solve the problem, but which have a bad impact
on the specification style and the ease of reasoning about the circuit.
(3) Using impure solutions, such as the pointer equality method used in
the 1988 Hydra paper.  Observable sharing (Claessen and Sands, 1999) is
identical to the solution that was implemented in Hydra in the mid 80's
and described in the 1988 paper; the difference is that they call the
approach "observable sharing" instead of "pointer equality".

Best wishes,
John O'Donnell