digit groups

Wed Oct 25 10:37:33 EDT 2006

On 2006-10-24 at 12:43PDT Ashley Yakeley wrote:
> Ketil Malde wrote:
> > Tempting to use B8 Cedilla, since it looks somewhat like
> > a comma, and is less useful for other purposes -- but
> > perhaps it would be to easily confused with a real
> > comma?

I have some dim recollection that there is an ISO (or
possibly some other standards body) standard that says that
rather than commas or points, we should use narrow spaces
between groups of digits in numbers.  I can't find it now,
though -- can anyone? If true, this would suggest the use of
one of the SPACE unicodes 2006, 2009, 200a ... but this
would of course be a bad idea in a language that uses space
for application. Underline is much better.

> I would advise against this until we have a bit more of a
> plan for extended characters in Haskell source. [...]

I think the original proposal -- of allowing underlines in
lieu of spaces in numbers -- is far better than using an
operator. This is a piece of light-weight convenience syntax
at a purely lexical level, and is exactly the sort of thing
that is easy to do in a language definition/compiler but
thorny if done post-hoc.

If an operator, what happens to hexadecimal numbers?

0xffff_3729 makes perfect sense as hex and the "_" does a
nice job of separating the digits into readable groups.

0xffff~~3729 looks similar, but doesn't mean the same thing
at all.

0xffff~~0x3729 is ugly and probably less readable than the
unbroken form.

There's also the (perhaps unlikely, but truly grotesque)
possibility of wanting a number like 0x3864_face, entering
0x3864~~face and having face = 42 elsewhere in the code. Or,
decimal 124~~l24 -- if you are lucky you'll get an
undefined variable message, which would be the same as for
124l24, but if unlucky, you'll get no error message
instead of "No instance for (Num (Integer -> a))"

Furthermore, there's no way for an operator to distinguish
between three and some other number of digits (at compile
time!), leading to such misleading looking presentations as
22~~40~~65.

No. A small alteration to the lexical syntax for the sake of
improved readability seems perfectly justifiable as long as
it doesn't make the lexical syntax /significantly/ more
complicated or harder to learn.

So in the simplest form, we would have 

        decimal         -> digit{[_]digit}
        octal           ->      octit{[_]octit}
        hexadecimal     ->      hexit{[_]hexit}

        integer         ->      decimal
                |       0o octal | 0O octal
                |       0x hexadecimal | 0X hexadecimal
        float           ->      decimal . decimal [exponent]
                |       decimal exponent
        exponent        ->      (e | E) [+ | -] decimal

although my preference would be something a bit more
restrictive, requiring numbers to have groups of the same
number of digits after each “_” and beginning with a shorter
group (ie 12_000_000 and 1200_0000 would be valid but
1247_000 would not). I'm not wedded to this requirement (and
it would take a more sophisticated grammar to formalise).

I have another dim recollection that something like this was
discussed (verbally) at one of the early Haskell meetings,
but no idea what became of it. Does anyone remember?

 Jón

-- 
Jón Fairbairn                              Jon.Fairbairn at cl.cam.ac.uk