[Haskell-cafe] FGL/Haskell and Hierarchical Clustering/dendograms

Nikolas Borrel-Jensen nikolasborrel at gmail.com
Mon Dec 28 20:15:26 EST 2009


Thank you very much for your reply! I have been looking at the code, and
there are two problems, as I can see. First, trying with the example

t1 :: Tree (Id, Cost)
t1 = Node (4,0)
     [Node (3,2) [Node (1,12) []]
     ,Node (2,3) [Node (5,1) [Node (6,2) [Node (7,2) [] ]]]]

printed as

(4,0)
|
+- (3,2)
|  |
|  `- (1,12)
|
`- (2,3)
   |
   `- (5,1)
      |
      `- (6,2)
         |
         `- (7,2)

your function 'cluster fst snd t1' returns

Many [Many [Many [Many [Many [One (0,[4]),One (1,[5])],One (2,[3])],One
(2,[6,7])],One (3,[2])],One (12,[1])]

I can't see how this representation is giving the hierarchical clusters. The
example above should resolve into

level 1: [[(2,3),(5,1)],[(6,2)],[(7,2)],[(4,0)], [(3,2)], [(1,12)]]

level 2: [[(2,3),(5,1),(6,2),(7,2)], [(4,0),(3,2)], [(1,12)]]

level 3: [[(2,3),(5,1),(6,2),(7,2),(4,0),(3,2)], [(1,12)]]

level 4 (or (cost) level 12): [[(2,3),(5,1),(6,2),(7,2),(4,0),(3,2),(1,12)]]

By doing it this way, we cluster all nodes connected with edges less than or
equal x at (cost) level x. Clearly, we can have level 1:
[[(1,1),(2,1)],[(3,1),(4,1)],...] if the edges between [(1,1),(2,1)] and
[(3,1),(4,1)] are greater than 1.

Second, I don't think it is trivial to tree-i-fy the root path tree. I have
done the function treeifyMST, which surely isn't efficient, since the list
encounteredNodes is traversed as many times as the number of nodes (a binary
search tree would be more efficient). But more important, the tree isn't
correct, since each path is connected at the root of the tree.

Example (LRTree Int): [ [(1,0)],[(5,1),(1,0) ], [(2,2),(1,0)] ,
[(3,3),(2,2),(1,0)] , [(4,4),(2,2),(1,0)] ] -> [ [(5,1),(1,0) ] ,
[(3,3),(2,2),(1,0)] , [(4,4),(2,2),(1,0)] ]

In my code, all 3 paths are branching at the root (1,0), but should for the
last two paths branch at node (2,2). How should I cope with that in an
efficient way?

I wonder if if it is easier to implement it from the ground using the
approach given at
http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/hierarchical.html?

---------------------------------------------------------------------
--TO DO: now all paths are connected at the root of the tree. Should be
patched at the right places inside the tree. The search in the list
encounteredNodes is not efficient.
treeifyMST :: LRTree Int -> Tree (Id,Cost)
treeifyMST rootpathtree =
let
(LP rpt:rpts) = rootpathtree
root = head rpt
revrootpathtree = reverse rootpathtree
in
Node root (constructTree [] revrootpathtree)
where
constructTree :: [Int] -> LRTree Int -> [Tree (Id,Cost)]
constructTree encounteredNodes (LP x:[]) = []
constructTree encounteredNodes (LP x:xs) =
let
path1 = x !! 0
path2 = x !! 1
id1 = fst path1
id2 = fst path2
in
case (L.find (==id1) encounteredNodes) of
-- because we have encountered an already processed id, we can skip this
sublist
Just _ -> constructTree (id2:encounteredNodes) xs
-- new id, meaning that we have encountered a new path
Nothing -> let
lenpath = length x
revpath = reverse $ take (lenpath-1) x
tree = listToNode revpath
  in
tree:constructTree (id2:encounteredNodes) xs
constructTree _ _ = []

listToNode (p:ps:[]) = Node p [Node ps []]
listToNode (p:ps) = Node p [listToNode ps]
---------------------------------------------------------------------

Nikolas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20091228/f1e585bd/attachment.html


More information about the Haskell-Cafe mailing list