Proposal: Add concatMapM function (#2042)

Johan Tibell johan.tibell at gmail.com
Fri Feb 1 03:56:09 EST 2008


> I think the Haskell Way of interpreting bytes as Latin-1 - while
> unfortunate in today's multi-everything environment - is something we
> just have to live with.  Too much code expects this behavior, and too
> many tasks require just reading ASCII to be burdened with
> complications of locales and character sets.

Why do we have to live with it?  I understand why we ended up with the
situation we have today. Most languages have/had the same problem.
It's being fixed in other languages like Python. Not fixing it makes
it a huge pain to deal with Strings from different sources (e.g.
libraries) since you don't know if the content is Unicode code points
(which String is defined as containing) or raw bytes because the
programmer used the wrong type.

> .... But how about a 'withDefaultEncoding' modifier that
> inspects the first two (or four?) bytes for a Unicode BOM, and either
> sets decoding accordingly and continues, or sets encoding according
> to locale *and* lets the user read the first bytes when reading from
> the handle.

The BOM mark is not always present and is not enough to decide which
encoding was used. You could invent and encoding of Unicode that
doesn't use one.

Some recommended reading (for everyone):
http://www.joelonsoftware.com/articles/Unicode.html

-- Johan


More information about the Libraries mailing list