New back end for GHC
simonpj at microsoft.com
Wed Feb 4 15:12:50 EST 2004
Dear GHC developers (and Cambridge CPRG members, for interest)
You may remember a message (5 Jan) from me describing the outlines of a
new back end for GHC.
| * Combine Stix and Abstract C into a single new data type Cmm (short
| C minus minus).
| * Cmm becomes the central back end data type:
| - it can be printed as C (both registerised .hc, and
| unregisterised .c)
| - it can be printed as C--
| - it is consumed by the native code generator
| * The current run-time-system .hc files will be converted modestly to
| new syntax (basically a subset of C--) which can be read in by GHC
| * All the stuff in includes/StgMacros will become "smart constructors"
| for Cmm.
| This will mean that all routes (e.g. native code gen) will support
| all ways (e.g. profiling)
This message is just to say that the major surgery is now done (in the
sense that the compiler builds and runs), though there is lots still to
* We have the new data type, Cmm, settled in its own directory cmm/.
* We have a new code generator (in codeGen/ as before) that produces
* The old absSyn/ directory is dead and gone.
* Simon M is working hard on a new native code generator + linear scan
register allocator and x86 emitter
* Wolfgang Thaller is working on the PowerPC emitter for the native code
* Don Stewart and Mark Wotton are working on the C-- printer, the C
(so we can still go via C), and a C-- parser (for a subset of
C-- at least).
If anyone else wants to join in, we'd be very happy. The new Cmm data
type is small, with hardly any GHC-specific parts, so it may be of more
general interest. I've sketched the data type below not because you'll
want the details (see the source for that) but because it'll give a
visceral idea of what it's like.
[CmmStaticLit] -- Info table, may be empty
CLabel -- Used to generate both info & entry labels
[LocalReg] -- Argument locals live on entry (C-- procedure
[CmmBasicBlock] -- Code, may be empty. The first block is
-- the entry point. The order is otherwise
-- unimportant, but at some point the code gen
-- fix the order.
-- the BlockId of the first block does not give
-- to a label. To jump to the first block in a
-- use the appropriate CLabel.
-- some static data.
| CmmData Section [CmmStaticLit] -- constant values only
type CmmBasicBlock = BasicBlock CmmStmt
| CmmComment FastString
| CmmAssign CmmReg CmmExpr -- Assign to register
| CmmStore CmmExpr CmmExpr -- Assign to memory location. Size is
-- given by cmmExprRep of the rhs.
| CmmBranch BlockId -- branch to another BB in this fn
| CmmCondBranch CmmExpr BlockId -- conditional branch
| CmmSwitch CmmExpr [Maybe BlockId] -- Table branch
-- The scrutinee is zero-based;
-- zero -> first block
-- one -> second block etc
-- Undefined outside range, and when there's a Nothing
| CmmJump CmmExpr [LocalReg] -- Jump to another function, with
| CmmCall -- A foreign call, with
[CmmReg] -- zero or more results
[CmmExpr] -- zero or more arguments
(Maybe [GlobalReg]) -- Global regs that may need to be
-- if they will be clobbered by the
-- Nothing <=> save *all* globals that
-- might be clobbered.
= CmmLit CmmLit -- Literal
| CmmLoad CmmExpr MachRep -- Read memory location
| CmmReg CmmReg -- Contents of register
| CmmMachOp MachOp [CmmExpr] -- Machine operation (+, -, *, etc.)
| CmmRegOff CmmReg Int
= CmmLocal LocalReg
| CmmGlobal GlobalReg
deriving( Eq )
= CmmInt Integer MachRep
-- Interpretation: the 2's complement representation of the
-- is truncated to the specified size. This is easier than
-- to keep the value within range, because we don't know whether
-- it will be used as a signed or unsigned value (the MachRep
-- distinguish between signed & unsigned).
| CmmFloat Rational MachRep
| CmmLabel CLabel -- Address of label
| CmmLabelOff CLabel Int -- Address of label + byte
More information about the Cvs-ghc