Policy change for regex libraries

mail at justinbogner.com mail at justinbogner.com
Mon Jan 12 18:36:03 EST 2009


Chris Kuklewicz <haskell at list.mightyreason.com> writes:
> The authors of sed are in agreement with your intuition.  But I think
> policy 2 as a recursive definition it is unusual.  And I see policy 2
> as very hard to summarize.
>
> The single-match is always easy to summarize: the leftmost longest match.
>
> Policy 1 for multiple matches can be summarized as:
>
>> All leftmost longest non-overlapping matches, where overlapping means sharing
>> the same matching characters.
>
> Policy 2 for multiple matches can be summarized as:
>
>> All leftmost longest non-overlapping matches, where overlapping means sharing
>> the same matching characters, excluding zero-length matches that coincide with
>> the end of a non-zero-length match.
>
> Policy 0 has been:
>
>> All leftmost longest non-overlapping matches, where overlapping means sharing
>> the same matching characters, stopping with the first zero-length match.

Given this point, policy 1 does indeed seem the most consistent
behaviour. Thanks for the explanation!



More information about the Libraries mailing list