diff in Haskell: clarification

Andrew J Bromage ajb@spamcop.net
Fri, 22 Nov 2002 13:49:13 +1100


G'day all.

On Fri, Nov 22, 2002 at 05:13:07AM +1100, Fergus Henderson wrote:

> Would a Mercury version help?  The Mercury distribution includes a
> Mercury version of Myer's algorithm: it's in the directory `samples/diff'.

Disclaimer: I wrote the Mercury version.

That particular algorithm heavily relies on destructively updated
arrays, which don't map neatly onto Haskell lists.  In addition,
it's pretty complicated (all the caching between passes, mostly).  It's
also optimised for very long sequences, which may not help you here.

Just for jollies, here's a Haskell version of Hirschberg's LCSS
algorithm.  It's O(N^2) time but O(N) space at any given point in
time, assuming eager evaluation.  You should be able to make diff out
of this.  You should also be able to find many opportunities for
optimisation here.

@article{360861,
 author = {D. S. Hirschberg},
 title = {A linear space algorithm for computing maximal common subsequences},
 journal = {Communications of the ACM},
 volume = {18},
 number = {6},
 year = {1975},
 issn = {0001-0782},
 pages = {341--343},
 doi = {http://doi.acm.org/10.1145/360825.360861},
 publisher = {ACM Press},
 }

Cheers,
Andrew Bromage


module Lcss ( lcss ) where

algb :: (Eq a) => [a] -> [a] -> [Int]
algb xs ys
  = 0 : algb1 xs [ (y,0) | y <- ys ]
  where
    algb1 [] ys' = map snd ys'
    algb1 (x:xs) ys'
      = algb1 xs (algb2 0 0 ys')
      where
	algb2 _ _ [] = []
	algb2 k0j1 k1j1 ((y,k0j):ys)
	  = let kjcurr = if x == y then k0j1+1 else max k1j1 k0j
	    in (y,kjcurr) : algb2 k0j kjcurr ys

algc :: (Eq a) => Int -> Int -> [a] -> [a] -> [a] -> [a]
algc m n xs []  = id
algc m n [x] ys = if x `elem` ys then (x:) else id
algc m n xs ys
  = algc m2 k xs1 (take k ys) . algc (m-m2) (n-k) xs2 (drop k ys)
  where
    m2 = m `div` 2

    xs1 = take m2 xs
    xs2 = drop m2 xs

    l1 = algb xs1 ys
    l2 = reverse (algb (reverse xs2) (reverse ys))

    k = findk 0 0 (-1) (zip l1 l2)

    findk k km m [] = km
    findk k km m ((x,y):xys)
      | x+y >= m  = findk (k+1) k  (x+y) xys
      | otherwise = findk (k+1) km m     xys

lcss :: (Eq a) => [a] -> [a] -> [a]
lcss xs ys = algc (length xs) (length ys) xs ys []