# [Haskell-cafe] Re: Need for speed: the Burrows-Wheeler Transform

apfelmus apfelmus at quantentunnel.de
Sat Jun 23 06:17:06 EDT 2007

```Andrew Coppin wrote:
> apfelmus wrote:
>> Note that the one usually adds an "end of string" character \$ in the
>> Burrows-Wheeler transform for compression such that sorting rotated
>> strings becomes sorting suffices.
>
> Yeah, I noticed that the output from by program can never actually be
> reverted to its original form.

Well it can, but that's a different story told in

Richard S. Bird and Shin-Cheng Mu.
Inverting the Burrows-Wheeler transform.
http://web.comlab.ox.ac.uk/oucl/work/richard.bird/publications.html
#DBLP:journals/jfp/BirdM04

Oh, and you had a function inv_bwt, right?

>> Concerning the algorithm at hand, you can clearly avoid calculating
>> Raw.append over and over:
>>
>>   bwt :: Raw.ByteString -> Raw.ByteString
>>   bwt xs = Raw.pack . map (Raw.last) . sort \$ rotations
>>     where
>>     n         = length xs
>>     rotations = take n . map (take n) . tails \$ xs `Raw.append` xs
>>
>> assuming that take n is O(1).
>
> I was trying to avoid O(n^2) RAM usage. :-}

Note that for ByteStrings, this takes only O(n) RAM because the
substrings are shared. But for lists, this would take O(n^2) RAM since
(take n) cannot share hole sublists. An O(n) choice for lists that
doesn't recalculate ++ all the time would be

bwt :: Ord a => [a] -> [a]
bwt xs = map last . sortBy (compare `on` (take n)) \$ rotations
where
n         = length xs
rotations = take n . tails \$ xs ++ xs

with the well-known

on :: (a -> a -> c) -> (b -> a) -> (b -> b -> c)
on g f x y = g (f x) (f y)

Regards,
apfelmus

```