Proposal: Make intersperse lazier

Fri Sep 17 14:17:45 EDT 2010

On Friday 17 September 2010 14:21:36, Bas van Dijk wrote:
> On Fri, Sep 17, 2010 at 10:48 AM, Christian Maeder
> > ...
> > Therefore I proposed the following implementation:
> >
> > intersperse :: a -> [a] -> [a]
> > intersperse s l = case l of
> >  [] -> l
> >  x : r -> x : if null r then r else
> >    s : intersperse s r
>
> One additional benefit about the original proposed implementation is
> that it applies the static argument transformation: the 'sep' argument
> is brought into scope of the worker function 'go' which then doesn't
> need to pass it to each recursion as in the original implementation.
>
> It would be interesting to see if this provides a noticeable
> performance benefit. Criterion anyone?
>
> Bas

Okay, if you don't then I do :)
I've benchmarked a couple of variants:

module Interspersing where

isgo :: a -> [a] -> [a]
isgo _ [] = []
isgo s (x:xs) = x : go xs
  where
    go [] = []
    go (y:ys) = s : y: go ys

isrec :: a -> [a] -> [a]
isrec s l = case l of
              [] -> l
              (x:r) -> x : if null r then r else (s : isrec s r)

ispreplc :: a -> [a] -> [a]
ispreplc s (x:xs) =  x : concat [[s,y] | y <- xs]
ispreplc _ _ = []

isprepM :: a -> [a] -> [a]
isprepM s (x:xs) = x : (xs >>= \y -> [s,y])
isprepM _ _ = []

isprepT :: a -> [a] -> [a]
isprepT _ [] = []
isprepT s xs = tail $ concatMap (\x -> [s,x]) xs

isprepD :: a -> [a] -> [a]
isprepD s xs = case concatMap (\x -> [s,x]) xs of
                (_:ys) -> ys
                _ -> []

I've also tried with ((s:) . (:[])) instead of the lambdas, but that gives 
poor code (calls to (++), much slower).

The explicit worker/wrapper isgo and the list comprehension version 
ispreplc give identical core with optimisations (-O and -O2, quite 
different with -O0), isprepD gets almost the same core with the obvious 
case after the worker, the other isprepX versions get the same worker loop 
inside, but with more wrapping. The direct recursion isrec produces 
different core, but also with a local function for the recursion.

Benchmark:
intersperse a '.' in a text, then compute the length.
Two texts were tried, one ~1.4 million characters, the other ~6.5 million, 
the text was kept in memory for the entire benchmark .

Results:
With -O2:
unsurprisingly, isgo and ispreplc have nearly identical means in each run, 
about 33.6 ms for the small benchmark and 153 ms for the large.
isprepD is slightly slower, 33.9 ms resp 155 ms.
isprepM and isprepT are a little slower again, 34.4 ms resp 157 ms.
isrec lags behind, 43.4 ms resp. 193 ms.

With -O1, the outcome is almost identical, isprepD is slightly slower than 
with -O2, now identical to isprepM/T.

With -O0, the outcome is drastically different.
isgo produces core almost identical to the optimised and the performance is 
indistinguishable.
The top-level recursive isrec suffers camparatively little, the times are 
51.3 ms resp. 238.8 ms.
isprepD is again slightly faster than isprepM/T, but now much slower than 
isrec, 72.5 ms / 73.5 ms resp. 340 ms / 344 ms.
The list comprehension suffers most and is now the slowest, 85.9 ms resp. 
404 ms.

Conclusion:
isprepD/M/T were only included out of curiosity, they didn't give a reason 
to choose any of them in the benchmarks.

The top-level recursion is clearly slower than the manual worker/wrapper, 
hence it's not the way to go.

Unless a further good implementation is offered, the choice is between the 
manual worker/wrapper, isgo, and the list comprehension, ispreplc.

If there's a realistic chance that base is ever compiled without 
optimisations, it has to be the explicit worker.

If we can safely assume that base is always compiled with optimisations, 
it's a matter of taste.