10. Porting GHC

This section describes how to port GHC to a currenly unsupported platform. There are two distinct possibilities:

10.1. Booting/porting from C (.hc) files

Bootstrapping GHC on a system without GHC already installed is achieved by taking the intermediate C files (known as HC files) from another GHC compilation, compiling them using gcc to get a working GHC.

NOTE: GHC versions 5.xx were hard to bootstrap from C. We recommend using GHC 6.0.1 or later.

HC files are platform-dependent, so you have to get a set that were generated on the same platform. There may be some supplied on the GHC download page, otherwise you'll have to compile some up yourself, or start from unregisterised HC files - see Section 10.2, “Porting GHC to a new architecture”.

The following steps should result in a working GHC build with full libraries:

  • Unpack the HC files on top of a fresh source tree (make sure the source tree version matches the version of the HC files exactly!). This will place matching .hc files next to the corresponding Haskell source (.hs or .lhs) in the compiler subdirectory ghc/compiler and in the libraries (subdirectories of libraries).

  • The actual build process is fully automated by the hc-build script located in the distrib directory. If you eventually want to install GHC into the directory dir, the following command will execute the whole build process (it won't install yet):

    $ distrib/hc-build --prefix=dir

    By default, the installation directory is /usr/local. If that is what you want, you may omit the argument to hc-build. Generally, any option given to hc-build is passed through to the configuration script configure. If hc-build successfully completes the build process, you can install the resulting system, as normal, with

    $ make install

10.2. Porting GHC to a new architecture

The first step in porting to a new architecture is to get an unregisterised build working. An unregisterised build is one that compiles via vanilla C only. By contrast, a registerised build uses the following architecture-specific hacks for speed:

  • Global register variables: certain abstract machine “registers” are mapped to real machine registers, depending on how many machine registers are available (see ghc/includes/MachRegs.h).

  • Assembly-mangling: when compiling via C, we feed the assembly generated by gcc though a Perl script known as the mangler (see ghc/driver/mangler/ghc-asm.lprl). The mangler rearranges the assembly to support tail-calls and various other optimisations.

In an unregisterised build, neither of these hacks are used — the idea is that the C code generated by the compiler should compile using gcc only. The lack of these optimisations costs about a factor of two in performance, but since unregisterised compilation is usually just a step on the way to a full registerised port, we don't mind too much.

Notes on GHC portability in general: we've tried to stick to writing portable code in most parts of the system, so it should compile on any POSIXish system with gcc, but in our experience most systems differ from the standards in one way or another. Deal with any problems as they arise - if you get stuck, ask the experts on .

Lots of useful information about the innards of GHC is available in the GHC Commentary, which might be helpful if you run into some code which needs tweaking for your system.

10.2.1. Cross-compiling to produce an unregisterised GHC

NOTE! These instructions apply to GHC 6.4 and (hopefully) later. If you need instructions for an earlier version of GHC, try to get hold of the version of this document that was current at the time. It should be available from the appropriate download page on the GHC homepage.

In this section, we explain how to bootstrap GHC on a new platform, using unregisterised intermediate C files. We haven't put a great deal of effort into automating this process, for two reasons: it is done very rarely, and the process usually requires human intervention to cope with minor porting issues anyway.

The following step-by-step instructions should result in a fully working, albeit unregisterised, GHC. Firstly, you need a machine that already has a working GHC (we'll call this the host machine), in order to cross-compile the intermediate C files that we will use to bootstrap the compiler on the target machine.

  • On the target machine:

    • Unpack a source tree (preferably a released version). We will call the path to the root of this tree T.

    • $ cd T
      $ ./configure --enable-hc-boot --enable-hc-boot-unregisterised

      You might need to update configure.in to recognise the new architecture, and re-generate configure with autoreconf.

    • $ cd T/ghc/includes
      $ make
  • On the host machine:

    • Unpack a source tree (same released version). Call this directory H.

    • $ cd H
      $ ./configure
    • Create H/mk/build.mk, with the following contents:

      GhcUnregisterised = YES
      GhcLibHcOpts = -O -fvia-C -keep-hc-files
      GhcRtsHcOpts = -keep-hc-files
      GhcLibWays =
      SplitObjs = NO
      GhcWithNativeCodeGen = NO
      GhcWithInterpreter = NO
      GhcStage1HcOpts = -O
      GhcStage2HcOpts = -O -fvia-C -keep-hc-files
      SRC_HC_OPTS += -H32m
      GhcBootLibs = YES
    • Edit H/mk/config.mk:

      • change TARGETPLATFORM appropriately, and set the variables involving TARGET to the correct values for the target platform. This step is necessary because currently configure doesn't cope with specifying different values for the --host and --target flags.

      • copy LeadingUnderscore setting from target.

    • Copy T/ghc/includes/ghcautoconf.h, T/ghc/includes/DerivedConstants.h, and T/ghc/includes/GHCConstants.h to H/ghc/includes. Note that we are building on the host machine, using the target machine's configuration files. This is so that the intermediate C files generated here will be suitable for compiling on the target system.

    • Touch the generated configuration files, just to make sure they don't get replaced during the build:

      $ cd H/ghc/includes
      $ touch ghcautoconf.h DerivedConstants.h GHCConstants.h mkDerivedConstants.c
      $ touch mkDerivedConstantsHdr mkDerivedConstants.o mkGHCConstants mkGHCConstants.o

      Note: it has been reported that these files still get overwritten during the next stage. We have installed a fix for this in GHC 6.4.2, but if you are building a version before that you need to watch out for these files getting overwritte by the Makefile in ghc/includes. If your system supports it, you might be able to prevent it by making them immutable:

      $ chflags uchg  ghc/includes/{ghcautoconf.h,DerivedConstants.h,GHCConstants.h}
    • Now build the compiler:

      $ cd H/glafp-utils && make boot && make
      $ cd H/ghc && make boot && make

      Don't worry if the build falls over in the RTS, we don't need the RTS yet.

    • $ cd H/libraries
      $ make boot && make
    • $ cd H/ghc/compiler
      $ make boot stage=2 && make stage=2
    • $ cd H/ghc/lib/compat
      $ make clean
      $ rm .depend
      $ make boot UseStage1=YES
      $ make -k UseStage1=YES EXTRA_HC_OPTS='-O -fvia-C -keep-hc-files'
      $ cd H/ghc/utils
      $ make clean
      $ make -k UseStage1=YES EXTRA_HC_OPTS='-O -fvia-C -keep-hc-files'
    • $ cd H
      $ make hc-file-bundle Project=Ghc
    • copy H/*-hc.tar.gz to T/...

  • On the target machine:

    At this stage we simply need to bootstrap a compiler from the intermediate C files we generated above. The process of bootstrapping from C files is automated by the script in distrib/hc-build, and is described in Section 10.1, “Booting/porting from C (.hc) files”.

    $ ./distrib/hc-build --enable-hc-boot-unregisterised

    However, since this is a bootstrap on a new machine, the automated process might not run to completion the first time. For that reason, you might want to treat the hc-build script as a list of instructions to follow, rather than as a fully automated script. This way you'll be able to restart the process part-way through if you need to fix anything on the way.

    Don't bother with running make install in the newly bootstrapped tree; just use the compiler in that tree to build a fresh compiler from scratch, this time without booting from C files. Before doing this, you might want to check that the bootstrapped compiler is generating working binaries:

    $ cat >hello.hs
    main = putStrLn "Hello World!\n"
    ^D
    $ T/ghc/compiler/ghc-inplace hello.hs -o hello
    $ ./hello
    Hello World!

    Once you have the unregisterised compiler up and running, you can use it to start a registerised port. The following sections describe the various parts of the system that will need architecture-specific tweaks in order to get a registerised build going.

10.2.2. Porting the RTS

The following files need architecture-specific code for a registerised build:

ghc/includes/MachRegs.h

Defines the STG-register to machine-register mapping. You need to know your platform's C calling convention, and which registers are generally available for mapping to global register variables. There are plenty of useful comments in this file.

ghc/includes/TailCalls.h

Macros that cooperate with the mangler (see Section 10.2.3, “The mangler”) to make proper tail-calls work.

ghc/rts/Adjustor.c

Support for foreign import "wrapper" (aka foreign export dynamic). Not essential for getting GHC bootstrapped, so this file can be deferred until later if necessary.

ghc/rts/StgCRun.c

The little assembly layer between the C world and the Haskell world. See the comments and code for the other architectures in this file for pointers.

ghc/rts/MBlock.h , ghc/rts/MBlock.c

These files are really OS-specific rather than architecture-specific. In MBlock.h is specified the absolute location at which the RTS should try to allocate memory on your platform (try to find an area which doesn't conflict with code or dynamic libraries). In Mblock.c you might need to tweak the call to mmap() for your OS.

10.2.3. The mangler

The mangler is an evil Perl-script (ghc/driver/mangler/ghc-asm.lprl) that rearranges the assembly code output from gcc to do two main things:

  • Remove function prologues and epilogues, and all movement of the C stack pointer. This is to support tail-calls: every code block in Haskell code ends in an explicit jump, so we don't want the C-stack overflowing while we're jumping around between code blocks.

  • Move the info table for a closure next to the entry code for that closure. In unregisterised code, info tables contain a pointer to the entry code, but in registerised compilation we arrange that the info table is shoved right up against the entry code, and addressed backwards from the entry code pointer (this saves a word in the info table and an extra indirection when jumping to the closure entry code).

The mangler is abstracted to a certain extent over some architecture-specific things such as the particular assembler directives used to herald symbols. Take a look at the definitions for other architectures and use these as a starting point.

10.2.4. The splitter

The splitter is another evil Perl script (ghc/driver/split/ghc-split.lprl). It cooperates with the mangler to support object splitting. Object splitting is what happens when the -split-objs option is passed to GHC: the object file is split into many smaller objects. This feature is used when building libraries, so that a program statically linked against the library will pull in less of the library.

The splitter has some platform-specific stuff; take a look and tweak it for your system.

10.2.5. The native code generator

The native code generator isn't essential to getting a registerised build going, but it's a desirable thing to have because it can cut compilation times in half. The native code generator is described in some detail in the GHC commentary.

10.2.6. GHCi

To support GHCi, you need to port the dynamic linker (fptools/ghc/rts/Linker.c). The linker currently supports the ELF and PEi386 object file formats - if your platform uses one of these then things will be significantly easier. The majority of Unix platforms use the ELF format these days. Even so, there are some machine-specific parts of the ELF linker: for example, the code for resolving particular relocation types is machine-specific, so some porting of this code to your architecture will probaly be necessary.

If your system uses a different object file format, then you have to write a linker — good luck!