<div dir="ltr"><div><div><div>Dear All<br><br></div>I write a code for Clustering with Data.Clustering.Hierarchical, but it's slow.<br><br></div>I use the profiling and change some code, but I don't know why zipwith take so many time? (even I change list to vector)<br>
<br></div>My code is as blow, Any one kindly give me some advices. <span></span><div>======================<br>main = do<br> ....<br> let cluster = dendrogram SingleLinkage vectorList getVectorDistance <br>
....<br><br>getExp2 v1 v2 = d*d<br> where<br> d = v1 - v2<br><br>getExp v1 v2 <br> | v1 == v2 = 0<br> | otherwise = getExp2 v1 v2<br><br>tfoldl d = DV.foldl1' (+) d<br><br>changeDataType:: Int -> Double<br>
changeDataType d = fromIntegral d<br><br>getVectorDistance::(a,DV.Vector Int)->(a, DV.Vector Int )->Double<br>getVectorDistance v1 v2 = fromIntegral $ tfoldl dat<br> where<br> l1 = snd v1<br> l2 = snd v2<br>
dat = DV.zipWith getExp l1 l2<br><br>=======================================<br><br></div><div>build with ghc -prof -fprof-auto -rtsopts -O2 log_cluster.hs<br><br></div><div>run with log_cluster.exe +RTS -p<br><br>
</div>profiling result is <br><br> log_cluster.exe +RTS -p -RTS<br><br> total time = 8.43 secs (8433 ticks @ 1000 us, 1 processor)<br> total alloc = 1,614,252,224 bytes (excludes profiling overheads)<br>
<br>COST CENTRE MODULE %time %alloc<br><br>getVectorDistance.dat Main 49.4 37.8<br>tfoldl Main 5.7 0.0<br>getExp Main 4.5 0.0<br>getExp2 Main 0.5 1.5</div>