Jan Christiansen jac at informatik.uni-kiel.de
Thu Sep 2 18:22:14 EDT 2010

```Hi,

On 02.09.2010, at 13:41, Daniel Fischer wrote:

> takes a little to run and keeps the entire file until the first
> occurrence
> of pat in memory.

first of all thanks very much for the detailed instructions.

I have rewritten the example slightly using Strings instead of
Bytestrings. Replacing all occurrences of 'ä' by "&auml;" in the
collected works of Shakespeare ; ) has a maximum memory usage of
around 65MB with the current implementation of intersperse while it
has a maximum memory usage of only around 5KB with the less strict
implementation.

replaceBy :: Eq alpha => alpha -> [alpha] -> [alpha] -> [alpha]
replaceBy x sep = concat . intersperse sep . splitBy (==x)

splitBy :: (alpha -> Bool) -> [alpha] -> [[alpha]]
splitBy _ []  = []
splitBy p xs  =
case break p xs of
(l,ys) -> l : case ys of
[]     -> []
(_:zs) -> splitBy p zs

This function only runs in constant space if I use the strict pattern
matching on the result of break. If I use the following implementation
I observe a linear memory usage instead.

splitBy' :: (alpha -> Bool) -> [alpha] -> [[alpha]]
splitBy' _ []  = []
splitBy' p xs  =
l : case  ys of
[]     -> []
(_:zs) -> splitBy' p zs
where
(l,ys) = break p xs

I think this is due to the Wadler tuple space leak. The same would
apply to the current implementation of lines. I wonder whether an
implementation of lines analogous to splitBy has any disadvantages.

>> That is, we could have
>>
>>   intersect [] _|_ = []   and   intersect (_|_:_|_) [] = []
>>
>> or
>>
>>   intersect [] (_|_:_|_) = []   and   intersect _|_ [] = []
>>
>> and the current implementation satisfies neither.

> Right. So the question is, has the current implementation advantages
> over
> either of these? (I don't see any.) If not, which of these two
> behaviours
> is preferable?

I'd prefer the first one as it is in line with the left to right

> I have mixed feelings about those. Part of me dislikes breaking the
> symmetry between (<=), (==) and compare.

I think you should not blame (<=) for the existence of a function that
yields a superset of the information that (<=) yields ; )

Cheers, Jan
```