[Haskell-cafe] Unnecessarily strict implementations

Thu Sep 2 07:41:47 EDT 2010

On Thursday 02 September 2010 09:25:59, Jan Christiansen wrote:
> Hi,
>
> On 02.09.2010, at 01:35, Daniel Fischer wrote:
> > It's not that it's not as non-strict as possible per se. (Sorry, had
> > to :)
> > It's that intersperse's current definition (in GHC at least) can
> > cause a
> > space leak. In this case, making the function less strict can cure
> > it, in
> > other cases, more strictness might be the solution.
>
> I would be very happy if you would share this example with me. I am
> looking for an example where the current implementation of intersperse
> or inits causes a space leak for quite a while now.
>

I don't see how the current implementation of inits or tails could cause a 
space leak that the lazier versions wouldn't, so you'd have to wait longer 
for such an example.

For intersperse,

$ cabal update && cabal install stringsearch

You need the new version 0.3.1, Data.ByteString.Lazy.Search[.DFA].splitXXX
had their own space leak in 0.3.0 [caused by too much laziness].

Then

===========================================
{-# LANGUAGE BangPatterns #-}
module Main (main) where

import System.Environment (getArgs)
import qualified Data.ByteString as S
import qualified Data.ByteString.Lazy as L
import qualified Data.ByteString.Char8 as C

import Data.ByteString.Lazy.Search (split)

main :: IO ()
main = do
    (file : pat : sub : _ ) <- getArgs
    let !spat = C.pack pat
        !ssub = L.fromChunks [C.pack sub]
        work = ical ssub . split spat
    L.readFile file >>= L.putStrLn . L.take 100 . work

ical :: L.ByteString -> [L.ByteString] -> L.ByteString
ical new = L.concat . intersperse new

intersperse :: a -> [a] -> [a]
intersperse sep [] = []
intersperse sep (x:xs) = x : go xs
    where
        go [] = []
        go (y:ys) = sep : y : go ys
============================================

has no space leak, if you replace the local intersperse with 
Data.List.intersperse (equivalent, ical = L.intercalate), you have a space 
leak.

To expose the leak, take a sufficiently large file (say 10MB or larger) and 
replace a pattern that does not occur in the file or occurs late in the 
file,

$ ./noleak file pat sub

runs fast in small memory, 

$ ./leak file pat sub

takes a little to run and keeps the entire file until the first occurrence 
of pat in memory.

Note that the above implementation of intersperse has different semantics 
from Data.List.intersperse,
Data.List.intersperse ',' ('a':_|_) = _|_
intersperse ',' ('a':_|_) = 'a':_|_
Data.List.intersperse ',' ('a':'b':_|_) = 'a' : ',' : _|_
intersperse ',' ('a':'b':_|_) = 'a' : ',' : 'b' : _|_
etc.

> > On the other hand, we currently have
> >
> > intersect [] _|_ = []
> >
> > and one of intersect _|_ [] and intersect [] _|_ must give _|_.
> > Which one is a matter of choice.
>
> I am sorry for not being precise. You are right. But right now we have
> intersect xs [] = _|_ for every list xs terminated by _|_. But I
> suffices to evaluate xs to head normal to decide that the result
> should be []. That is, we could have
>
>    intersect [] _|_ = []   and   intersect (_|_:_|_) [] = []
>
> or
>
>    intersect [] (_|_:_|_) = []   and   intersect _|_ [] = []
>
> and the current implementation satisfies neither.
>

Right. So the question is, has the current implementation advantages over 
either of these? (I don't see any.) If not, which of these two behaviours 
is preferable?

> > And before that, the rule intersect [] _ = [] if the current
> > behaviour of
> > intersect [] should be retained.
>
> That's a deal.
>
> >> The implication (<=) :: Bool -> Bool -> Bool is too strict as well.
> >> We
> >> have False <= _|_ = _|_ as well as _|_ <= True = _|_ while one of
> >> these cases could yield True.
> >
> > I'm not convinced either should (nor that they shouldn't).
>
> I think this is a matter of elegance rather than a matter of
> efficiency. In the same way as I prefer
>
>    False && _|_ = False
>
> over
>
>    False && _|_ = _|_
>
> I prefer
>
>    False <= _|_ = True
>
> over
>
>    False <= _|_ = _|_
>

I have mixed feelings about those. Part of me dislikes breaking the 
symmetry between (<=), (==) and compare.

> > The last slide lists among the problems
> > "proposes undesirably inefficient functions (reverse)".
> > I wouldn't equate 'not minimally strict' with 'too strict'.
> > Minimal strictness also can have negative effects, one must look at
> > each
> > case individually.
>
> I second this but in my opinion the minimally strict implementation
> should be the default if there is no reason against it.

Agreed - except I have to object to your use of the definite article, some 
functions have several minimally strict implementations.
(Ambiguity of minimal strictness *can* be a reason for a stricter choice, 
though probably rarely.)

>
> Cheers, Jan

Cheers,
Daniel