Proposal #1464: add dropPrefix to Data.List

Fri Jul 6 15:30:08 EDT 2007

Ketil Malde wrote:
> On Sun, 2007-07-01 at 12:45 +0200, apfelmus wrote:
>
>> Here's an (admittedly crazy) approach
>
> Why is it so crazy?  The orthogonality issues with the different ways of
> breaking up lists (split/break/span/take/drop), and the multitude of
> possible predicates (either too complicated, or too specific) has always
> been an annoyance to me.  I thought your solution was quite nice!

One problem is that you have to use

    drop (first 2)  instead of   drop 2

now. This can be remedied with some type-class hackery. Another problem
is that performance will suffer a bit with the general approach. So, the
specialized versions are likely to be kept around anyway. (Type classes
can help with specialization, too.)

Other than that, the general approach to drop & friends is not so crazy.
  But I don't like my implementation, so let's build a better one:

One problem of the implementation is that I think it doesn't handle
nicely the different semantics of  dropPrefix  compared to  drop  or
dropWhile : whereas the latter don't fail on a premature end of the
list, the dropPrefix version should fail. (The question whether it
should fail with an error or with Nothing can be delegated by providing
different variants of drop).

The solution comes automatically when pondering what a Dropper really
is: it's a *parser*. In other words,  drop & friends are just functions
that parse the beginning of a string and return how much has been
parsed. Put differently, their feature is to ignore the "AST" resulting
from a parse.

    type Dropper a  = Parser a () -- token type a, result type ()

Here, I don't mean the usual (s -> (a,s)) parsers, but an implementation
that fits the stream-like nature of our dropper: either a determinstic

    data Parser c r = Get (c -> Dropper c r) | Result r | Fail

or a non-deterministic parser

    data Parser c a
        = Get (c -> Dropper c r)
        | Result r (Dropper c r)
        | Fail

The latter are, of course, Koen Classen's parallel parsing processes
(http://www.cs.chalmers.se/~koen/pubs/jfp04-parser.ps). Now, which ones
to choose? With deterministic parsers, we loose the normal behavior of
drop  and  dropWhile  to accept lists that are too small. Thus, we
choose non-deterministic parsers and implement drop with  a "maximum
munch" behavior

        -- drop as much as we can parse, but not more
    drop :: Dropper a -> [a] -> [a]
    drop p xs = case drop' p xs of
            Nothing -> error "drop: parse failed"
            Just xs -> xs
        where
        drop' Fail         _      = Nothing
        drop' (Result _ p) xs     = drop' p xs `mplus` Just xs
        drop' (Get f)      (x:xs) = drop' (f x) xs
        drop' (Get _)      []     = Nothing

Here, the second equation of drop tries to drop more but jumps back via
Maybe's `mplus` if that fails.

With the usual Monad and MonadPlus instances for Parser c a, we can now
write

        -- take while the condition is satisfied
    while :: (a -> Bool) -> Dropper a
    while = many' . satisfy
        where
        many' p = return () `mplus` p >> many'

        -- accept the first n characters or less
    first :: Int -> Dropper a
    first 0 = return ()
    first n = return () `mplus` (get >> first (n-1))

        -- parse a given String
    prefix :: Eq a => [a] -> Dropper a
    prefix []     = eaten
    prefix (x:xs) = get >>= \c -> if c == x then prefix xs else mzero

By returning successes early,  while  and  first  accept an unexpected
end of input. An alternative version of  first  that complains when not
enough characters are available to drop would be

    exactly :: Int -> Dropper a
    exactly 0 = return ()
    exactly n = get >> exactly (n-1)

or

    exactly n = sequence_ (replicate n get)

Regards,
apfelmus