Picky details about Unicode (was RE: Haskell 98 Report possible errors, part one)

Mark P Jones mpj@cse.ogi.edu
Mon, 23 Jul 2001 11:23:30 -0700


| 2.2. Identifiers can use small and large Unicode letters ...

If we're picking on the report's handling of Unicode, here's
another minor quibble to add to the list.  In describing the
lexical syntax of operator symbols, the report uses:

   varsym    -> (symbol {symbol | :})_<reservedop>
   symbol    -> ascSymbol | uniSymbol
   uniSymbol -> any Unicode symbol or punctuation

The last line seems to include more characters than I'd expect.
Specifically:

  ()[]{}  are punctuation (Unicode type Pe, Ps)
  `       is a symbol, modifier (Unicode type Sk)
  "':;,   are punctuation, other (Unicode type Po)
  _       is punctuation, connector (Unicode type Pc)

And, so, if I read the report correctly, I should be able to
define :-) as a consym and `div`, [], and "hello" as varsyms!
(Not to mention some altogether more bizarre choices!)

I guess the intention here is that:

  symbol  -> ascSymbol | uniSymbol_<special | _ | : | " | '>

In fact, since all the characters in ascSymbol are either
punctuation or symbols in Unicode, the inclusion of ascSymbol
is redundant, and a better specification might be:

  symbol  -> uniSymbol_<special | _ | : | " | '>

All the best,
Mark

P.S.  A caveat: I'm not a Unicode expert!  Perhaps Marcin can
advise ...