<div dir="ltr">On Mon, Mar 26, 2012 at 06:08, Christian Siefkes <span dir="ltr">&lt;<a href="mailto:christian@siefkes.net">christian@siefkes.net</a>&gt;</span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="im">On 03/26/2012 02:39 AM, Gabriel Dos Reis wrote:<br>

&gt; True, but should the language definition default to a string type<br>

&gt; that is one the most unsuited for text processing in the 21st<br>

&gt; century where global multilingualism abounds?  Even C has qualms<br>

&gt; about that.<br>

</div>...<br>

<div class="im">&gt; I have no doubt believing that if all texts my students have to<br>

&gt; process are US ASCII, [Char] is more than sufficient.  So, I have<br>

&gt; sympathy for your position.  However,  I doubt [Char] would be<br>

&gt; adequate if I ask them to shared texts from their diverse cultures.<br>

<br>

</div>Uh, while a C char is (usually) just a byte (2^8 bits of information, like<br>

Word8 in Haskell), a Haskell Char is a Unicode character (2^21 bits of<br>

information). A single C char cannot contain arbitrary Unicode character,<br>

while a Haskell Char can, and does. Hence [Char] is (efficiency issues<br>

aside) perfectly adequate for dealing with texts written in arbitrary languages.<br></blockquote><div><br></div><div>...as long as you ignore combining characters and the like.  I claim ignoring them in this way is just continuing the same &quot;good enough for my language&quot; attitude that has plagued text handling ever since someone got the notion that maybe text processing should consider more than just ISO 8859/1 and got roundly pooh-poohed by the community.</div>

<div><br></div></div>-- <br>brandon s allbery                                      <a href="mailto:allbery.b@gmail.com" target="_blank">allbery.b@gmail.com</a><br>wandering unix systems administrator (available)     (412) 475-9364 vm/sms<br>

<br>

</div>