[Haskell-cafe] What do _you_ want to see in FGL?

Mon May 10 08:27:10 EDT 2010

Henning Thielemann <lemming at henning-thielemann.de> writes:

> On Wed, 28 Apr 2010, Ivan Miljenovic wrote:
>
>> So you don't want the labels to be part of the actual datatype?  And
>> for users to then have to deal with any labels they want themselves?
>
> Recently I wrote cabal-sort using FGL
>   http://hackage.haskell.org/package/cabal-sort
>
> It sorts cabal packages topologically according to their
> dependencies. However, I was neither happy with the way FGL currently
> works, 

In what sense?

> nor with the way I proposed recently (splitting into unlabelled and
> labelled graphs). I like to use the package name as node identifier. I
> do not need any label, but I need a node type different from Int.
> With current FGL I need to maintain a Map PkgName Int. Would it be
> sensible to generalize the Node type to any Ord type or does FGL use
> optimizations specific to Int? In another example I used FGL for
> finding all topological orderings of a set of database
> transactions. In this case I used an enumeration type as node
> type. Are there other applications for alternative Node types?

We're considering doing this.

Pros for allowing you to use a custom node type:
* Matches your data better
* No need for extra lookup maps when converting your data to FGL form

Cons:
* Makes type-sigs uglier/more verbose
* Restricts the ability to optimise

Using Int gives us a fixed-size data type with known good comparison
performance and with a range that should suit most purposes.

As an alternative, see how I have Graphalyze create an FGL graph from
[n] and [(n,n,e)]:
http://hackage.haskell.org/packages/archive/Graphalyze/0.9.0.0/doc/html/Data-Graph-Analysis.html#v%3AimportData

We're also considering using MPTCs or ATs to let you place restrictions
on the label types (when defining a new graph instance).  However,
before you again state how you don't want labels, consider this:

* It's easier to make a labelled graph act as an unlabelled one than the
  other way around.

* We don't want to implement two "types" of graphs (labelled and
  unlabelled) as that would involve duplicate work and there would be
  difficulty in switching between the two.

* Graph labels provide you with a great deal of flexibility: how else
  would you do graph colouring (both vertex and edge)?  Clustering? etc.

The current "type Node = Int" alias is only there to provide a unique
index type for referencing nodes; the actual Ints don't really represent
the nodes IMHO, the labels do (in conjunction with the index for
colouring, etc.).  In your example case of using PkgName (I assume you
mean PackageName or PackageIdentifier?), can you guarantee that each
PkgName value is unique before you go blindly using it (what about
having the same library installed in both the global and user
database?)?  It's a lot easier to do this kind of stuff (and do generic
algorithms on FGL graphs) when you know what the type of the index type
is.

That's not to say that there aren't cases where having a generic type
wouldn't be useful, especially for quick-and-dirty hacks.  But in
general, I think that using Int for the index type makes more sense
(unless you have more vertices than the number of Int values allowed).

-- 
Ivan Lazar Miljenovic
Ivan.Miljenovic at gmail.com
IvanMiljenovic.wordpress.com