[Haskell-cafe] Canonical Graphviz code

Tue Jul 6 10:54:45 EDT 2010

On Mon, 05 Jul 2010 19:26:34 -0700, Ivan Miljenovic <ivan.miljenovic at gmail.com> wrote:

> Graphviz (http://graphviz.org/) has the option to convert provided Dot
> code for visualising a graph into a canonical form.  For example, take
> the sample Dot code:
[snip]
> I've recently thought up a way that I can duplicate this functionality
> (in terms of what it does, not necessarily the actual output) in my
> graphviz library (http://hackage.haskell.org/package/graphviz),
> however I'm not sure how closely to follow what Graphviz does.

There doesn't seem to be a clear definition of "canon" output is, and the implication in the documentation is that this mode might better have been named "pprint" (thus my hesitance to refer to it as "canonical form").  Based on this, I'd suggest that you don't need strict adherence to graphviz.

It generally appears that "canon" output follows the two guidelines of "Atomic" and "Minimal":

* "Atomic" meaning specify all attributes that apply to an element instead of relying on inheritance
* "Minimal" meaning don't specify too much
    - No default attributes
    - Each element (edge or node) on its own, no combining (see also "Atomic")
    - Elements only in necessary scope (thus move edges/nodes that only appear in a subgraph context into that subgraph)

Based on this I would anticipate your options (1), (2), and (4) below.  The advantages of "canon" form would seem to be that removing, or adding elements to the graph is as "safe" as possible because the impact to the graph is only as explicitly stated.  For example, removal of a node requires only removing that node statement and any edge statement specifying that node---no concern about removing it from multi-node or multi-edge elements---and that the locality of the removal is fairly evident (i.e. which subgraph the element is part of).  Likewise adding an element means that the element will appear pretty much as-specified without inheriting attributes.

Interestingly you could envision providing a dual of this mode named "compact".  The compact mode would attempt to specify the graph as compactly as possible, using inheritance for attributes and multi-element statements (interestingly it's an attribute-oriented approach to the graph as opposed to the element-oriented approach of "canon" form).

-KQ

>  What
> would the community prefer out of the following (including a mixture,
> such as options 2 and 3):
>
> 1) Match Graphviz's output as much as possible (note that the ordering
> of the attributes won't happen this way, since I'll be sorting them
> alphabetically rather than working out Graphviz's arcane rules).
>
> 1a) As with 1), but don't worry about defining extra attributes such
> as "node [label="\N"]".
>
> 2) Explicitly define nodes that are only mentioned in edges (e.g.
> actually define a `c' node, even if it doesn't have any extra
> attributes).  Note that this already happens if such an edge is
> specified inside a different cluster than the already known node is
> in; in that case then the stand-alone node is defined in the cluster
> its edge is defined in, and the actual edge is moved into the outer
> context that combines the two clusters.
>
> 2a) As with 2), but define all edges in the overall graph rather than
> grouping them internally inside clusters, etc.
>
> 3) Group common attributes together as much as possible (e.g. inside
> the bar cluster, define another subgroup which has a top-level
> attribute "node [color=blue]" and define the `a' and `b' nodes in
> there).  Not sure how well this will work with edges though, and is
> probably not possible in conjunction with option 2a).  This is a bit
> tricky and subtle: if `a' and `b' are already in such a subgraph, but
> an edge specifying one of them is defined _before_ the subgraph is
> defined, then that node will have an explicit color attribute defined
> of "" (i.e. empty string), though I would classify this as a bug.
>
> 4) As an alternate to option 3), remove all explicit top-level node
> and edge attributes and move them all to the affected node and edge
> definitions themselves; this may require option 2).
>
> I'm also willing to hear and consider other possible variants;
> however, I'm only going to code (at most) one.
>

-- 
-KQ