8 bit characters?

George Russell ger@tzi.de
Thu, 08 Aug 2002 17:14:41 +0200


Patryk Zadarnowski wrote
[snip]
> > 
> > But there's no guarantee that char is exactly 8 bits wide, is there? So 
> > it's appropriate to have separate types, CChar/CSChar/CUChar and 
> > Word8/Int8.
> 
> No, and neither are there any guarantees about there being *any*
> specific-width integer types at all! If you're worried about char not
> being 8 bit, it's likely that the C compiler won't provide an 8 bit
> integer, either, so Word8/Int8 doesn't solve the problem.
[snip]

There is no reason why a Haskell implementation should not provide Word8 or Int8
even if that is not supported by the underlying hardware.

Actually I remember doing something like this for MLj (ML for Java).  8 bit words
are not provided by the Java Virtual Machine (or at least, weren't then).  However
they can be provided fairly cheaply.  I think I stored them as chars, but for
operations treated them as shorts or ints with extra high junk bits.  You need to
put in extra work to clear the bits when comparing, or doing right shifts, or dividing,
but not for addition, subtraction, multiplication, other bit operations, or loading or
storing.  So it's pretty cheap.

Manual wrote
[snip]
> As I understand it, in ANSI C, the only freedom that an
> implementation has in choosing a concrete representation for
> "char" is to decide whether it is signed or unsigned.  In
> any case, it is going to be an 8 bit entity.
[snip]

This is false.  Look at section 5.2.4.2.1
   http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n843.htm
This says only "Their  implementation-defined  values shall be             
equal or greater in  magnitude  (absolute  value)  to  those             
shown, with the same sign."  Thus CHAR_BIT must be at least 8.
It may be greater than 8.

Also see section 5.2.1.

This is the C revision, but I am very sure it was the same with ANSI C.