[Haskell-cafe] [darcs #222] performance problem with massive changes in a single file

David Roundy via RT bugs at darcs.net
Sun Feb 20 09:31:16 EST 2005


I'm cc'ing haskell-cafe on this darcs bug, since I imagine there might
be a student (or professor) hanging around there who might be interested
in the implementation of an efficient LCS algorithm in haskell.

[markjugg - Sun Feb 20 09:15:57 2005]:
> Typically in software, 1000's of line aren't changed at the time, which
> is why this hasn't been an big issue before. 

Indeed.  The issue here is that darcs uses the exact Hunt-Szymanski LCS
algorithm when it computes the diff, while GNU diff uses an approximate
--but faster--algorithm, which works "almost" as well.  The approximate
algorithm is reasonably well-documented, but more complex to implement
than the exact one, just because it's harder to understand.

This would be a reasonable project for an interested haskell CS student.
 The LCS is implemented in pure haskell 98, and is of signature

lcs :: Ord a => [a] -> [a] -> [a]

It's an interesting problem that's easily testable, and has different
scaling behavior in different limits (finite alphabet, infinite
alphabet, permutations of unique objects).

David (ever optimistic for new contributors to darcs!)
http://darcs.net


More information about the Haskell-Cafe mailing list