[Haskell-cafe] Re: Space Efficiency When Sorting a List of Many Lists

Thu Dec 31 11:27:27 EST 2009

Felipe Lessa wrote:
> Luke Palmer wrote:
>> But if you're serious, you can probably do better than just generating
>> them all and passing them to sort.  I get the impression that there is
>> some structure here that can be taken advantage of.
> 
> Isn't what he wants a trie?  In particular, a Patricia trie?

Peter, this is a very nice problem. Is this a programming exercise or
did you encounter it in the "real world"?

There is indeed a structure that can be taken advantage of and it
involves tries.

The key point is that thanks to the lexicographic ordering, you can
*interleave* exploding and sorting the rows. In other word, we can
exploit the fact that for example

  (sort . cartesian) ([8,12]:[11]:[7,13]:[10]:[])
  =   [8  : (sort . cartesian) ([11]:[7,13]:[10]:[]) ]
   ++ [12 : (sort . cartesian) ([11]:[7,13]:[10]:[]) ]

where  cartesian  denotes the cartesian product. The code will mainly
work with functions like

   type Row a = [a]

   headsIn :: [Row a] -> [(a, [Row a])]

which groups rows by their first element. The result type is best
understood as a finite map from  a  to  [Row a]

   headsIn :: [Row a] -> Map a [Row a]

And unsurprisingly, the fixed point of the  (Map a)  functor is  the
trie for  [a] .

Without much explanation, here the full formulation in terms of
catamorphisms and anamorphisms.

    {-# LANGUAGE NoMonomorphismRestriction #-}
    import qualified Data.Map
    import Control.Arrow (second)

        -- the underlying structure
    newtype Map a b = Map { unMap :: [(a,b)] } deriving (Eq, Show)

        -- category theory: bananas and lenses
    instance Functor (Map a) where
        fmap f = Map . (map . second) f . unMap

    newtype Fix f = In { out :: f (Fix f) }

    cata  f = In . fmap (cata f) . f
    ana   f = f . fmap (ana f) . out

        -- very useful type synonym to keep track of rows and colums
    type Row a = [a]

        -- grouping and "ungrouping" by the first elements of each row
    headsIn :: [Row a] -> Map a [Row a]
    headsIn xss = Map [(x,[xs]) | x:xs <- xss]

    headsOut :: Map a [Row a] -> [Row a]
    headsOut (Map []) = [[]]
    headsOut xxs      = [x:xs | (x,xss) <- unMap xxs, xs <- xss]

        -- cartesian product
    cartesian1 :: Row [a] -> Map a (Row [a])
    cartesian1 []       = Map []
    cartesian1 (xs:xss) = Map [(x,xss) | x <- xs]

    cartesian = ana headsOut . cata cartesian1

        -- sorting
    sort1 :: Ord a => Map a [b] -> Map a [b]
    sort1 = Map . Data.Map.toList . Data.Map.fromListWith (++) . unMap

    sortRows = ana headsOut . cata (sort1 . headsIn)

        -- and cold fusion!
        --    sortCartesian = sortRows . cartesian
        -- best written as hylomorphism
    sortCartesian = ana headsOut . cata (sort1 . cartesian1)

This is readily extended to handle the  explode  function as well. And
thanks to lazy evaluation, I expect this to run with a much better
memory footprint.

Regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com