[web-devel] xml-types IsString instance for Name causes crashes

Mon Jun 13 15:47:19 CEST 2011

I agree with Michael here. My 2 cents:

I think the OverloadedStrings extension, for readability as well as
for improved performance (as Simon Meier said). Hence, I am certainly
opposed to removing it.

As for the ByteString and Text instances, I think their benefits
outweigh the fact that they break on some inputs. This can be compared
to the `head` function, which is certainly useful as well -- even
though it is a partial function. The case in which a developer puts
invalid unicode in his source file seems very unlikely. The ByteString
case is more dangerous, but then again, not less dangerous than using
Data.ByteString.Char8.pack!

The OverloadedStrings extension seems to be a very good fit for types
that can be converted from "almost all" strings. Text is obvious, but
also, for example, the Html type in blaze-html, and the YamlScalar
type in data-object-yaml.

In cases where the conversion from strings is less straightforward, or
a more complicated "syntax" is involved, I would argue to switch the
relevant code to QuasiQuoting -- but only when OverloadedStrings
causes too many runtime errors/invalid markup for developers.

Cheers,
Jasper

On Sun, Jun 12, 2011 at 2:23 PM, Michael Snoyman <michael at snoyman.com> wrote:
> On Sun, Jun 12, 2011 at 2:27 PM, Aristid Breitkreuz
> <aristidb at googlemail.com> wrote:
>> I don't think there is any kind of consensus for removing it.
>
> I agree. If I can suddenly join the fray, let's take a step back for a
> second and reanalyze the issue here. We have this incredibly useful
> IsString instance for Name, which in all honesty is a lie: the type of
> fromString is "String -> Name", when not all Strings can be properly
> converted into XML names. There are two *separate* reasons for this:
>
> 1) Not all characters can be used in a name. Simple example: a
> less-than sign (<) is not allowed. For full information, see [1] and
> [2].
> 2) xml-types implements Clark notation, which allows a very convenient
> way to define namespaces. (This is a feature I use a lot in my own
> code.) But missing a right brace invalidates the Clark notation.
>
> Ideally, the compiler would catch invalid Names and error out.
> Unfortunately, due to the way OverloadedStrings works, this isn't
> possible currently. (Though I think such a solution would be ideal,
> and is something we should consider separately.) I think we have three
> possible responses to the dilemma:
>
> a) Ignore invalid Names, and simply allow invalid XML to be generated.
> b) Throw an asynchronous exception.
> c) Realize that the IsString instance is not correct, and therefore remove it.
>
> Currently, xml-types follows option (a) for (1) above, and (b) for (2)
> above. I personally don't think either option is obviously better or
> worse, but I do in general prefer consistency. And I think that
> writing the validation rules for an XML name is outside the scope of
> xml-types, so I prefer option (a)... but not by any great margin.
>
> The one thing I'd hate to see happen is option (c). It's true that the
> instance of IsString is not really "correct," but the same argument
> could be made for ByteStrings as well, where characters over 255 get
> truncated (I believe). The fact is that invalid input here should be
> *incredibly* rare.
>
> I suppose a fourth option would be to force the String into a valid
> name, either through some escape mechanism or removing characters. But
> again, I personally think it's outside the scope of xml-types.
>
> Michael
>
> PS: Quasi-quoting is actual a great fit here as well, but it's just
> not nearly as convenient as OverloadedStrings.
>
> [1] http://www.w3.org/TR/xml/#NT-NameStartChar
> [2] http://www.w3.org/TR/xml/#NT-NameChar
>
>> Am 12.06.2011 13:21 schrieb "Yitzchak Gale" <gale at sefer.org>:
>>> John Millikin wrote:
>>>> To me, the choice is between raising an exception
>>>> or removing IsString.
>>>
>>> That would be a shame, but removing it may be the
>>> only way out of this conundrum.
>>>
>>>> IsString without namespaces is pointless.
>>>
>>> I am making good use of it in a project that doesn't
>>> involve namespaces at all. It would actually be
>>> a lot of work to back out at this point.
>>>
>>>> IsString without input checking is dangerous. If fromString cannot
>>>> fail on invalid input, then it shouldn't be defined.
>>>
>>> I appreciate your concerns, but Haskell has other means of
>>> providing such guarantees. Raising an asynchronous exception
>>> is just not an option in an IsString instance.
>>>
>>>>> The Name type already produces invalid XML.
>>>
>>>> You're right -- it is already possible for Names to be invalid. There
>>>> should probably be stricter input checking on names, to ensure they
>>>> match the XML spec. Something like this...
>>>
>>> Yes, as I mentioned earlier, newtype wrappers with hidden
>>> constructors is the way we would do that if we wanted to
>>> guarantee those kinds of things at the type level.
>>> You could then provide several constructor functions that
>>> either do or do not raise exceptions. See, for example,
>>> Data.Text.Encoding, Neil Mitchell's Safe library, Michael's
>>> xml-enumerator.
>>>
>>> But you certainly could not use the version that raises an
>>> exception for an IsString instance.
>>>
>>> In fact, I don't think an IsString instance makes sense at
>>> all for a validating type. So maybe just removing it
>>> really is the right thing to do after all.
>>>
>>> Thanks,
>>> Yitz
>>>
>>> _______________________________________________
>>> web-devel mailing list
>>> web-devel at haskell.org
>>> http://www.haskell.org/mailman/listinfo/web-devel
>>
>> _______________________________________________
>> web-devel mailing list
>> web-devel at haskell.org
>> http://www.haskell.org/mailman/listinfo/web-devel
>>
>>
>
> _______________________________________________
> web-devel mailing list
> web-devel at haskell.org
> http://www.haskell.org/mailman/listinfo/web-devel
>