[Haskell-cafe] Why so many strings in Network.URI, System.Posix and similar libraries?

Wed Mar 14 07:43:12 CET 2012

2012/3/12 Jeremy Shaw <jeremy at n-heptane.com>:
> On Sun, Mar 11, 2012 at 1:33 PM, Jason Dusek <jason.dusek at gmail.com> wrote:
>> Well, to quote one example from RFC 3986:
>>
>>  2.1.  Percent-Encoding
>>
>>   A percent-encoding mechanism is used to represent a data octet in a
>>   component when that octet's corresponding character is outside the
>>   allowed set or is being used as a delimiter of, or within, the
>>   component.
>
> Right. This describes how to convert an octet into a sequence of characters,
> since the only thing that can appear in a URI is sequences of characters.
>
>> The syntax of URIs is a mechanism for describing data octets,
>> not Unicode code points. It is at variance to describe URIs in
>> terms of Unicode code points.
>
>
> Not sure what you mean by this. As the RFC says, a URI is defined entirely
> by the identity of the characters that are used. There is definitely no
> single, correct byte sequence for representing a URI. If I give you a
> sequence of bytes and tell you it is a URI, the only way to decode it is to
> first know what encoding the byte sequence represents.. ascii, utf-16, etc.
> Once you have decoded the byte sequence into a sequence of characters, only
> then can you parse the URI.

Mr. Shaw,

Thanks for taking the time to explain all this. It's really
helped me to understand a lot of parts of the URI spec a lot
better. I have deprecated my module in the latest release

  http://hackage.haskell.org/package/URLb-0.0.1

because a URL parser working on bytes instead of characters
stands out to me now as a confused idea.

--
Jason Dusek
pgp  ///  solidsnack  1FD4C6C1 FED18A2B