[Haskell-cafe] Need for speed: the Burrows-Wheeler Transform

David Roundy droundy at darcs.net
Fri Jun 22 17:01:15 EDT 2007


On Fri, Jun 22, 2007 at 09:26:40PM +0100, Andrew Coppin wrote:
> OK, so I *started* writing a request for help, but... well I answered my 
> own question! See the bottom...

...

> module BWT3 (bwt) where
> 
> import Data.List
> import qualified Data.ByteString as Raw
> 
> rotate :: Int -> Raw.ByteString -> Int -> Raw.ByteString
> rotate l xs n = (Raw.drop (l-n) xs) `Raw.append` (Raw.take (l-n) xs)
> 
> bwt xs =
>  let l  = Raw.length xs
>      ys = rotate l xs
>  in  Raw.pack $
>      map (Raw.last . rotate l xs) $
>      sortBy (\n m -> compare (ys n) (ys m)) [0..(l-1)]
> 
> 
> Now I can transform 52 KB in 54 seconds + 30 MB RAM. Still nowhere near 
> C++, but a big improvement none the less.

The trouble is that Raw.append is an O(N) operation, making the computation
O(N^2) where it ought to be O(NlogN).

> Woah... What the hell? I just switched to Data.ByteString.Lazy and WHAM! 
> Vast speed increases... Jeepers, I can transform 52 KB so fast I can't 
> even get to Task Manager fast enough to *check* the RAM usage! Blimey...
> 
> OK, just tried the 145 KB test file that Mr C++ used. That took 2 
> seconds + 43 MB RAM. Ouch.

In this case append is an O(1) operation.  But you're still getting killed
on prefactors, because you're generating a list of size N and then sorting
it.  Lists are just not nice data structures to sort, nor are they nice to
have for large N.

To get better speed and memory use, I think you'd want to avoid the
intermediate list in favor of some sort of strict array, but that'd be
ugly.
-- 
David Roundy
Department of Physics
Oregon State University


More information about the Haskell-Cafe mailing list