[Haskell-cafe] is this a bug ?

Sat Jul 17 11:10:37 EDT 2010

On Saturday 17 July 2010 05:39:00, gate03 at landcroft.co.uk wrote:
> On Sat 17/07/10 04:17 , Alexander Solla ajs at 2piix.com sent:
> > Why are you performing unsafe IO actions?  They don't play nice
> > with laziness.
>
> OK, fair cop, but without the unsafe IO action, it still misbehaves.
>
> http://hpaste.org/fastcgi/hpaste.fcgi/view?id=27650
>
> Michael.

Source-diving reveals: it's a bug.
Text.Regex.Posix.ByteString.Lazy is just a thin wrapper around the strict 
variant, lazy ByteStrings are transformed into strict ones before the 
functions of Text.Regex.Posix.ByteString are called.
To avoid copying twice, if the lazy ByteString does not end with a '\0', a 
'\0' is snoc'ed to the end before transforming to a strict ByteString.
Thus the regexec of Text.Regex.Posix.ByteString takes slices of a longer 
ByteString than it should and no measures are taken to chop the trailing 
'\0' off again.

A related problem is that ByteStrings (and Strings) may legitimately 
contain '\0's, but regex-posix (and probably [almost] all other regex 
packages) treats them as CStrings, so the regex functions will stop 
processing at the first '\0' (naturally, they call C) but on the Haskell 
side, that may be only a small part of the string.