<div>Fortunately, the bytewise encoding of '\n' is sufficient to recognize a newline, any other attempted representation in UTF8 (i.e. as a 2-byte symbol starting with 0xc0) would be non-canonical and per RFC 3629 should be rejected anyways.</div>
<div> </div>
<div>So if you view ByteString as a stream of bytes that may or may not be utf8 encoded, scanning for 0x0a gives you the correct behavior for both scenarios.</div>
<div> </div>
<div>-Edward Kmett<br></div>
<div class="gmail_quote">On Fri, May 15, 2009 at 7:02 AM, Simon Marlow <span dir="ltr"><<a href="mailto:marlowsd@gmail.com">marlowsd@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">
<div class="im">On 15/05/2009 03:07, Bryan O'Sullivan wrote:<br></div>
<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">
<div class="im">On Thu, May 14, 2009 at 4:23 PM, Simon Michael <<a href="mailto:simon@joyful.com" target="_blank">simon@joyful.com</a><br></div>
<div>
<div></div>
<div class="h5"><mailto:<a href="mailto:simon@joyful.com" target="_blank">simon@joyful.com</a>>> wrote:<br><br> I'd like to request that utf8-string be added to the haskell<br> platform, so that HP users can work with non-ascii text.<br>
<br><br>I'd rather this wasn't added. It's an acceptable crutch for the short<br>term, but we shouldn't be using String for text manipulation, and<br>bundling utf8-string implicitly blesses that approach. The text library<br>
needs a few weeks of polish and some more testing work for QA, but it'll<br>be the right answer well before the end of this year.<br></div></div></blockquote><br>We ought to think about the interaction between text (and bytestring) and the new Unicode IO library. What does text have in the way of IO operations?<br>
<br>I've been wondering about what bytestring's hGetLine should do. Right now I have it doing decoding and then taking the low 8 bits, but that's not right. OTOH, looking for '\n' in a stream of bytes doesn't seem right. Maybe it should just be deprecated.<br>
<br>Cheers,<br><font color="#888888"> Simon</font>
<div>
<div></div>
<div class="h5"><br>_______________________________________________<br>Libraries mailing list<br><a href="mailto:Libraries@haskell.org" target="_blank">Libraries@haskell.org</a><br><a href="http://www.haskell.org/mailman/listinfo/libraries" target="_blank">http://www.haskell.org/mailman/listinfo/libraries</a><br>
</div></div></blockquote></div><br>