New back end for GHC

Simon Peyton-Jones simonpj at microsoft.com
Wed Feb 4 15:12:50 EST 2004


Dear GHC developers (and Cambridge CPRG members, for interest)

You may remember a message (5 Jan) from me describing the outlines of a
new back end for GHC.

| * Combine Stix and Abstract C into a single new data type Cmm (short
for
| C minus minus).
| 
| * Cmm becomes the central back end data type:
| 	- it can be printed as C (both registerised .hc, and
| 		unregisterised .c)
| 	- it can be printed as C--
| 	- it is consumed by the native code generator
| 
| * The current run-time-system .hc files will be converted modestly to
|    new syntax (basically a subset of C--) which can be read in by GHC
as
|	Cmm
| 
| * All the stuff in includes/StgMacros will become "smart constructors"
| 	for Cmm.
|    This will mean that all routes (e.g. native code gen) will support
| 	all ways (e.g. profiling)

This message is just to say that the major surgery is now done (in the
sense that the compiler builds and runs), though there is lots still to
do.
 
*  We have the new data type, Cmm, settled in its own directory cmm/.  

*  We have a new code generator (in codeGen/ as before) that produces
Cmm. 

*  The old absSyn/ directory is dead and gone.  

*  Simon M is working hard on a new native code generator + linear scan 
	register allocator and x86 emitter

* Wolfgang Thaller is working on the PowerPC emitter for the native code
gen

* Don Stewart and Mark Wotton are working on the C-- printer, the C
printer
	(so we can still go via C), and a C-- parser (for a subset of
C-- at least).


If anyone else wants to join in, we'd be very happy.  The new Cmm data
type is small, with hardly any GHC-specific parts, so it may be of more
general interest.   I've sketched the data type below not because you'll
want the details (see the source for that) but because it'll give a
visceral idea of what it's like.

Simon


data CmmTop
  = CmmProc
     [CmmStaticLit]	       -- Info table, may be empty
     CLabel            -- Used to generate both info & entry labels
     [LocalReg]        -- Argument locals live on entry (C-- procedure
params)
     [CmmBasicBlock] -- Code, may be empty.  The first block is
                       -- the entry point.  The order is otherwise
initially 
                       -- unimportant, but at some point the code gen
will
                       -- fix the order.

		       -- the BlockId of the first block does not give
rise
		       -- to a label.  To jump to the first block in a
Proc,
		       -- use the appropriate CLabel.

  -- some static data.
  | CmmData Section [CmmStaticLit]	-- constant values only

type CmmBasicBlock = BasicBlock CmmStmt

data CmmStmt
  = CmmNop
  | CmmComment FastString

  | CmmAssign CmmReg CmmExpr	 -- Assign to register

  | CmmStore CmmExpr CmmExpr     -- Assign to memory location.  Size is
                                 -- given by cmmExprRep of the rhs.

  | CmmBranch BlockId             -- branch to another BB in this fn

  | CmmCondBranch CmmExpr BlockId -- conditional branch

  | CmmSwitch CmmExpr [Maybe BlockId]   -- Table branch
	-- The scrutinee is zero-based; 
	--	zero -> first block
	--	one  -> second block etc
	-- Undefined outside range, and when there's a Nothing

  | CmmJump CmmExpr [LocalReg]    -- Jump to another function, with
these 
				  -- parameters.

  | CmmCall	 		 -- A foreign call, with 
     CmmCallTarget
     [CmmReg]			 -- zero or more results
     [CmmExpr]			 -- zero or more arguments
     (Maybe [GlobalReg])	 -- Global regs that may need to be
saved
				 -- if they will be clobbered by the
call.
				 -- Nothing <=> save *all* globals that
				 -- might be clobbered.


data CmmExpr
  = CmmLit CmmLit               -- Literal
  | CmmLoad CmmExpr MachRep     -- Read memory location
  | CmmReg CmmReg		-- Contents of register
  | CmmMachOp MachOp [CmmExpr]  -- Machine operation (+, -, *, etc.)
  | CmmRegOff CmmReg Int	

data CmmReg 
  = CmmLocal  LocalReg
  | CmmGlobal GlobalReg
  deriving( Eq )

data CmmLit
  = CmmInt Integer  MachRep
	-- Interpretation: the 2's complement representation of the
value
	-- is truncated to the specified size.  This is easier than
trying
	-- to keep the value within range, because we don't know whether
	-- it will be used as a signed or unsigned value (the MachRep
doesn't
	-- distinguish between signed & unsigned).
  | CmmFloat  Rational MachRep
  | CmmLabel    CLabel			-- Address of label
  | CmmLabelOff CLabel Int		-- Address of label + byte
offset


More information about the Cvs-ghc mailing list