[Haskell-cafe] Canonical Graphviz code

Mon Jul 5 22:26:34 EDT 2010

Graphviz (http://graphviz.org/) has the option to convert provided Dot
code for visualising a graph into a canonical form.  For example, take
the sample Dot code:

----
| digraph foo {
|     a [color=blue];
|     subgraph cluster_bar {
|         a [label="hi!"]
|         b [color=blue, label="bye!"];
|         b -> c;
|     }
|     a -> b;
|     a -> c;
|     c -> d;
|     d [color=red]
`----

Calling "dot -Tcanon" on this produces:

,----
| digraph foo {
|         node [label="\N"];
|         subgraph cluster_bar {
|                 a [label="hi!", color=blue];
|                 b [label="bye!", color=blue];
|                 b -> c;
|                 a -> b;
|                 a -> c;
|         }
|         d [color=red];
|         c -> d;
| }
`----

Note in particular the following:

* The addition of the top-level "node [label="\N"]" attribute; this
happens automatically even if all nodes have their own custom label.

* The two `a' nodes are merged into the inner-most cluster/subgraph
that they are defined in (since they represent the same node).

* The attributes for `b' are re-ordered.

* The "a -> b" and "a -> c" edges are shifted into the cluster, since
all of the related nodes are in that cluster (note that the `c' node
is implicitly defined there since an edge involving it is defined
there).

* The "c -> d" edge is shifted to being _after_ the definition of the
`d' node (in fact, in each grouping all nodes are defined before
edges).

I've recently thought up a way that I can duplicate this functionality
(in terms of what it does, not necessarily the actual output) in my
graphviz library (http://hackage.haskell.org/package/graphviz),
however I'm not sure how closely to follow what Graphviz does.  What
would the community prefer out of the following (including a mixture,
such as options 2 and 3):

1) Match Graphviz's output as much as possible (note that the ordering
of the attributes won't happen this way, since I'll be sorting them
alphabetically rather than working out Graphviz's arcane rules).

1a) As with 1), but don't worry about defining extra attributes such
as "node [label="\N"]".

2) Explicitly define nodes that are only mentioned in edges (e.g.
actually define a `c' node, even if it doesn't have any extra
attributes).  Note that this already happens if such an edge is
specified inside a different cluster than the already known node is
in; in that case then the stand-alone node is defined in the cluster
its edge is defined in, and the actual edge is moved into the outer
context that combines the two clusters.

2a) As with 2), but define all edges in the overall graph rather than
grouping them internally inside clusters, etc.

3) Group common attributes together as much as possible (e.g. inside
the bar cluster, define another subgroup which has a top-level
attribute "node [color=blue]" and define the `a' and `b' nodes in
there).  Not sure how well this will work with edges though, and is
probably not possible in conjunction with option 2a).  This is a bit
tricky and subtle: if `a' and `b' are already in such a subgraph, but
an edge specifying one of them is defined _before_ the subgraph is
defined, then that node will have an explicit color attribute defined
of "" (i.e. empty string), though I would classify this as a bug.

4) As an alternate to option 3), remove all explicit top-level node
and edge attributes and move them all to the affected node and edge
definitions themselves; this may require option 2).

I'm also willing to hear and consider other possible variants;
however, I'm only going to code (at most) one.

-- 
Ivan Lazar Miljenovic
Ivan.Miljenovic at gmail.com
IvanMiljenovic.wordpress.com