<div dir="ltr">On Mon, Mar 26, 2012 at 06:08, Christian Siefkes <span dir="ltr"><<a href="mailto:christian@siefkes.net">christian@siefkes.net</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">On 03/26/2012 02:39 AM, Gabriel Dos Reis wrote:<br>
> True, but should the language definition default to a string type<br>
> that is one the most unsuited for text processing in the 21st<br>
> century where global multilingualism abounds? Even C has qualms<br>
> about that.<br>
</div>...<br>
<div class="im">> I have no doubt believing that if all texts my students have to<br>
> process are US ASCII, [Char] is more than sufficient. So, I have<br>
> sympathy for your position. However, I doubt [Char] would be<br>
> adequate if I ask them to shared texts from their diverse cultures.<br>
<br>
</div>Uh, while a C char is (usually) just a byte (2^8 bits of information, like<br>
Word8 in Haskell), a Haskell Char is a Unicode character (2^21 bits of<br>
information). A single C char cannot contain arbitrary Unicode character,<br>
while a Haskell Char can, and does. Hence [Char] is (efficiency issues<br>
aside) perfectly adequate for dealing with texts written in arbitrary languages.<br></blockquote><div><br></div><div>...as long as you ignore combining characters and the like. I claim ignoring them in this way is just continuing the same "good enough for my language" attitude that has plagued text handling ever since someone got the notion that maybe text processing should consider more than just ISO 8859/1 and got roundly pooh-poohed by the community.</div>
<div><br></div></div>-- <br>brandon s allbery <a href="mailto:allbery.b@gmail.com" target="_blank">allbery.b@gmail.com</a><br>wandering unix systems administrator (available) (412) 475-9364 vm/sms<br>
<br>
</div>