[Haskell-beginners] remove XML tags using Text.Regex.Posix

Tom Tobin korpios at korpios.com
Wed Sep 30 13:30:31 EDT 2009


On Wed, Sep 30, 2009 at 11:11 AM, Jan Jakubuv <jakubuv at gmail.com> wrote:
> This is so simple that I would not recommend anything other than regular
> expressions. Use the following pattern:
>
>    pat = "<tag>(.*)</tag>"

Don't use this; the * operator is greedy by default, meaning that will
match stuff like "<tag>foo</tag>bar<tag>baz</tag>", and your data will
end up being "foo</tag>bar<tag>baz".  In other words, a greedy
operator tries to consume as much of the string as it possibly can
while still matching.  If that regex module supports non-greedy
operators, you want something like this:

pat = "<tag>(.*?)</tag>"

A "?" after a greedy operator makes it non-greedy, meaning it will try
to match while consuming as little of the string as it can.  If the
posix regex module doesn't support this, the PCRE-based one should.


More information about the Beginners mailing list