[Haskell-cafe] Need for speed: the Burrows-Wheeler Transform

Fri Jun 22 16:26:40 EDT 2007

OK, so I *started* writing a request for help, but... well I answered my 
own question! See the bottom...

I was reading Wikipedia, and I found this:

  http://en.wikipedia.org/wiki/Burrows-Wheeler_transform

I decided to sit down and see what that looks like in Haskell. I came up 
with this:

module BWT where

import Data.List

rotate1 :: [x] -> [x]
rotate1 [] = []
rotate1 xs = last xs : init xs

rotate :: [x] -> [[x]]
rotate xs = take (length xs) (iterate rotate1 xs)

bwt :: (Ord x) => [x] -> [x]
bwt = map last . sort . rotate

step :: (Ord x) => [x] -> [[x]] -> [[x]]
step xs = zipWith (:) xs . sort

inv_bwt :: (Ord x) => x -> [x] -> [x]
inv_bwt mark xs =
  head . filter (\xs -> head xs == mark) .
  head . drop ((length xs) - 1) . iterate (step xs) .
  map (\x -> [x]) $ xs

My my, isn't that SO much shorter? I love Haskell! :-D

Unfortunately, the resident C++ expert fails to grasp the concept of 
"example code", and insists on comparing the efficiency of this program 
to the C one on the website.

Fact is, he's translated the presented C into C++, and it can apparently 
transform a 145 KB file in 8 seconds using only 3 MB of RAM. The code 
above, however, took about 11 seconds to transform 4 KB of text, and 
that required about 60 MB of RAM. (I tried larger, but the OS killed the 
process for comsuming too much RAM.)

Well anyway, the code was written for simplicity, not efficiency. I've 
tried to explain this, but apparently that is beyond his comprehension. 
So anyway, it looks like we have a race on. >:-D

The first thing I did was the optimisation mentioned on Wikipedia: you 
don't *need* to build a list of lists. You can just throw pointers 
around. So I arrived at this:

module BWT2 (bwt) where

import Data.List

rotate :: Int -> [x] -> Int -> [x]
rotate l xs n = (drop (l-n) xs) ++ (take (l-n) xs)

bwt xs =
  let l  = length xs
      ys = rotate l xs
  in  map (last . rotate l xs) $
      sortBy (\n m -> compare (ys n) (ys m)) [0..(l-1)]

This is indeed *much* faster. With this, I can transform 52 KB of text 
in 9 minutes + 60 MB RAM. The previous version seemed to have quadratic 
memory usage, whereas this one seems to be linear. 52 KB would have 
taken many months with the first version!

Still, 9 minutes (for a file 3 times smaller) is nowhere near 8 seconds. 
So we must try harder... For my next trick, ByteStrings! (Never used 
them before BTW... this is my first try!)

module BWT3 (bwt) where

import Data.List
import qualified Data.ByteString as Raw

rotate :: Int -> Raw.ByteString -> Int -> Raw.ByteString
rotate l xs n = (Raw.drop (l-n) xs) `Raw.append` (Raw.take (l-n) xs)

bwt xs =
  let l  = Raw.length xs
      ys = rotate l xs
  in  Raw.pack $
      map (Raw.last . rotate l xs) $
      sortBy (\n m -> compare (ys n) (ys m)) [0..(l-1)]

Now I can transform 52 KB in 54 seconds + 30 MB RAM. Still nowhere near 
C++, but a big improvement none the less.

Woah... What the hell? I just switched to Data.ByteString.Lazy and WHAM! 
Vast speed increases... Jeepers, I can transform 52 KB so fast I can't 
even get to Task Manager fast enough to *check* the RAM usage! Blimey...

OK, just tried the 145 KB test file that Mr C++ used. That took 2 
seconds + 43 MB RAM. Ouch.