[Haskell-i18n] unicode notation \uhhhh implementation

Sven Moritz Hallberg pesco@gmx.de
16 Aug 2002 15:02:54 +0200


On Fri, 2002-08-16 at 14:04, Ketil Z. Malde wrote:
> Readability is one thing, however, I'm not quite sure how layout would
> be affected with this.  I'm often surprised to hear about the problems
> people experience with layout, it just seems to work for me.  (Using
> Emacs and auto-indent; there's rarely any problem pressing TAB until
> the right indentation is reached.)  
> 
> However, now it appears that indentation might change, according to
> encoding used.  How do we solve that?  
> 
> The simple solution is to count one Unicode character as one
> indentation character, but that would mean having alignments visually
> distorted if we are using other notations.  Emacs could probably
> handle this and display things correctly, but do we want that extra
> complexity?  
> 
>         case t of Rad _ -> foo
>                   Deg _ -> bar
> 
> --                ^visual alignment
> 
>         case &theta of Rad _ -> foo
>                   Deg _ -> bar              
> 
> --                ^aligned, but only by counting
> 
> (Ditto for \uXXXXXXXX, of course)
> 
> After all, isn't layout intended to make things *easier* to read?

Oh you're right, I hadn't even thought of that. This would be a pain to
use, I suppose. I'm starting to feel this whole unicode-preproc thing
brings more problems than it solves. I'll try to summarize:

The problem we're trying to address is this: Alice develops in Unicode.
Bob's system doesn't support Unicode, yet. He wants to help Alice.

Unicode escapes allow Bob to recode Alice's code before using it. When
he sends it back to her, she'll recode it back again to her Unicode
encoding. The problems with the current approach are:

  - Ambiguous-looking escape pitfalls.
  - Tiresome recoding between Alice and Bob.
  - Badly-readable indentation if Alice uses layout as you describe.
  - Badly-readable code if Alice uses characters for which no shorthands
    exist.

I think it will be better if we drop \uhhhh escapes as Simon suggested
in the first place. That would force Alice and Bob to agree on a common
source format but save us from all the problems. If Alice wants Bob to
help her, she won't have a big problem with dropping Unicode for this
particular program (also, there exist no \uhhhh escapes in the real
world to date, so we can assume the ability to think about it
beforehand). In the long run, Bob might even manage to get a
Unicode-capable system, solving everyone's problem.


Sven Moritz