Pre-proposal discussion: add a version of dropWhileEnd with different laziness properties to Data.List

David Feuer david.feuer at gmail.com
Sun Sep 28 20:39:40 UTC 2014


BACKGROUND:

A somewhat common idiom I discovered digging around the GHC tree is the use
of

    reverse . dropWhile p . reverse

to remove some unwanted things from the end of a list. The lists involved
tend to be short (e.g., file names, package names, lines of user input,
etc.) and the amount removed from the end of the list tends to be short (a
trailing newline, the last part of a filename, extra spaces, etc.). I
initially intended to replace all of these with Data.List.dropWhileEnd p.
Unfortunately, my benchmarking showed that this had a substantial negative
impact on performance. Data.List.dropWhileEnd is defined like this:

    dropWhileEnd p = foldr (\x r -> if p x && null r then [] else x:r) []

This is lazy in the *spine* of the list--it can "flush its buffer" and
produce list elements any time p x is found to be false. This is the best
you can do if you need to process a very large list and don't want to have
to load the whole thing into memory, and/or your predicate is *very* cheap.
Unfortunately, it pays a steep price: for non-trivial p, it's strict in the
*elements* of the list. This makes it unsuitable, from a performance
standpoint, for most of the purposes for which I've seen reverse .
dropWhile p . reverse used.

CONCEPT:

Add (by some name) a function that is lazy in the elements while strict in
the spine:

dropWhileEndLE :: (a -> Bool) -> [a] -> [a]
-- specification:  dropWhileEndLE p = reverse . dropWhile p . reverse
dropWhileEndLE p = foldr (\x r -> if null r && p x then [] else x:r) []

This is a good bit faster than the double-reversing version, presumably
because it presumably allocates its buffer stack on the GHC stack rather
than spraying junk across the heap.

If I were a Time Lord, I'd go back in time and add to the library, instead
of dropWhileEnd as it is, a function with a name like

    takeUntilLastSatisfying p = dropWhileEnd (not . p)

which has a name that explains better what it does, and I'd define
dropWhileEnd to be dropWhileEndLE. But I'm not a Time Lord, so I'd like to
hear what name or naming approach other people would recommend.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/libraries/attachments/20140928/fdccd5f0/attachment-0001.html>


More information about the Libraries mailing list