[Haskell-cafe] Reifying case expressions [was: Re: On stream processing, and a new release of timeplot coming]

Heinrich Apfelmus apfelmus at quantentunnel.de
Sun Dec 25 11:25:22 CET 2011


Eugene Kirpichov wrote:
> In the last couple of days I completed my quest of making my graphing
> utility timeplot ( http://jkff.info/software/timeplotters ) not load the
> whole input dataset into memory and consequently be able to deal with
> datasets of any size, provided however that the amount of data to *draw* is
> not so large. On the go it also got a huge speedup - previously visualizing
> a cluster activity dataset with a million events took around 15 minutes and
> a gig of memory, now it takes 20 seconds and 6 Mb max residence.
> (I haven't yet uploaded to hackage as I have to give it a bit more testing)
> 
> The refactoring involved a number of interesting programming patterns that
> I'd like to share with you and ask for feedback - perhaps something can be
> simplified.
> 
> The source is at http://github.com/jkff/timeplot
> 
> The datatype of incremental computations is at
> https://github.com/jkff/timeplot/blob/master/Tools/TimePlot/Incremental.hs .
> Strictness is extremely important here - the last memory leak I eliminated
> was lack of bang patterns in teeSummary.

Your  StreamSummary  type has a really nice interpretation: it's a 
reification of  case  expressions.

For instance, consider the following simple function from lists to integers

     length :: [a] -> Int
     length xs = case xs of
         []     -> 0
         (y:ys) -> 1 + length ys

We want to reify the case expression as constructor of a data type. What 
type should it have? Well, a case expression maps a list  xs  to a 
result, here of type Int, via two cases: the first case gives a result 
and the other maps a value of type  a  to a function from lists to 
results again. This explanation was probably confusing, so I'll just go 
ahead and define a data type that represents functions from lists  [a] 
to some result of type  r

     data ListTo a r = CaseOf r (a -> ListTo a r)

     interpret :: ListTo a r -> ([a] -> r)
     interpret (CaseOf nil cons) xs =
         case xs of
             []     -> nil
             (y:ys) -> interpret (cons y) ys

As you can see, we are just mapping each  CaseOf  constructor back to a 
built-in case expression.

Likewise, each function from lists can be represented in terms of our 
new data type: simply replace all built-in case expressions with the new 
constructor

     length' :: ListTo a Int
     length' = CaseOf
         (0)
         (\x -> fmap (1+) length')

     length = interpret length'

The CaseOf may look a bit weird, but it's really just a straightforward 
translation of the case expression you would use to define the function 
  go  instead.

Ok, this length function is really inefficient because it builds a huge 
expression of the form  (1+(1+...)). Let's implement a strict variant 
instead

     lengthL :: ListTo a Int
     lengthL = go 0
         where
         go !n = CaseOf (n) (\x -> go (n+1))

While we're at it, let's translate two more list functions

     foldL' :: (b -> a -> b) -> b -> ListTo a b
     foldL' f b = Case b (\a -> foldL' f $! f b a)

     sumL    :: ListTo Int Int
     sumL    = foldL' (\b a -> a+b) 0


And now we can go for the point of this message: unlike ordinary 
functions from lists, we can compose these in lock-step! In particular, 
the following applicative instance

     instance Applicative (ListTo a) where
         pure b = CaseOf b (const $ pure b)
         (CaseOf f fs) <*> (CaseOf x xs) =
             CaseOf (f x) (\a -> fs a <*> xs a)

allows us to write a function

     average :: ListTo Int Double
     average = divide <$> sumL <*> lengthL
         where
         divide a b = fromIntegral a / fromIntegral b

that runs in constant space! Why does this work? Well, since we can now 
inspect case expressions, we can choose to evaluate them in lock-step, 
essentially computing  sum  and  length  with just one pass over the 
input list. Remember that the original definition

     average xs = sum xs / length xs

has a space leak because the input list xs is being shared.


Remarks:
1. Reified case expressions are, of course, the same thing as Iteratees, 
modulo chunking and weird naming.

2. My point is topped by scathing irony: if Haskell had a form of 
*partial evaluation*, we could write applicative combinators for 
*ordinary* functions  [a] -> r  and express  average  in constant space. 
  In other words, partial evaluation would make it unnecessary to reify 
case expressions for the purpose of controlling performance / space leaks.


Best regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com




More information about the Haskell-Cafe mailing list