3.8. Optimisation (code improvement)

The -O* options specify convenient ``packages'' of optimisation flags; the -f* options described later on specify individual optimisations to be turned on/off; the -m* options specify machine-specific optimisations to be turned on/off.

3.8.1. -O*: convenient ``packages'' of optimisation flags.

There are many options that affect the quality of code produced by GHC. Most people only have a general goal, something like ``Compile quickly'' or ``Make my program run like greased lightning.'' The following ``packages'' of optimisations (or lack thereof) should suffice.

Once you choose a -O* ``package,'' stick with it—don't chop and change. Modules' interfaces will change with a shift to a new -O* option, and you may have to recompile a large chunk of all importing modules before your program can again be run safely (see Section 3.7.4).

No -O*-type option specified:

This is taken to mean: ``Please compile quickly; I'm not over-bothered about compiled-code quality.'' So, for example: ghc -c Foo.hs

-O or -O1:

Means: ``Generate good-quality code without taking too long about it.'' Thus, for example: ghc -c -O Main.lhs

-O2:

Means: ``Apply every non-dangerous optimisation, even if it means significantly longer compile times.''

The avoided ``dangerous'' optimisations are those that can make runtime or space worse if you're unlucky. They are normally turned on or off individually.

At the moment, -O2 is unlikely to produce better code than -O.

-O2-for-C:

Says to run GCC with -O2, which may be worth a few percent in execution speed. Don't forget -fvia-C, lest you use the native-code generator and bypass GCC altogether!

-Onot:

This option will make GHC ``forget'' any -Oish options it has seen so far. Sometimes useful; for example: make all EXTRA_HC_OPTS=-Onot.

-Ofile <file>:

For those who need absolute control over exactly what options are used (e.g., compiler writers, sometimes :-), a list of options can be put in a file and then slurped in with -Ofile.

In that file, comments are of the #-to-end-of-line variety; blank lines and most whitespace is ignored.

Please ask if you are baffled and would like an example of -Ofile!

At Glasgow, we don't use a -O* flag for day-to-day work. We use -O to get respectable speed; e.g., when we want to measure something. When we want to go for broke, we tend to use -O -fvia-C -O2-for-C (and we go for lots of coffee breaks).

The easiest way to see what -O (etc.) ``really mean'' is to run with -v, then stand back in amazement. Alternatively, just look at the HsC_minus<blah> lists in the GHC driver script.

3.8.2. -f*: platform-independent flags

Flags can be turned off individually. (NB: I hope you have a good reason for doing this…) To turn off the -ffoo flag, just use the -fno-foo flag. So, for example, you can say -O2 -fno-strictness, which will then drop out any running of the strictness analyser.

The options you are most likely to want to turn off are:

Should you wish to turn individual flags on, you are advised to use the -Ofile option, described above. Because the order in which optimisation passes are run is sometimes crucial, it's quite hard to do with command-line options.

Here are some ``dangerous'' optimisations you might want to try:

-fvia-C:

Compile via C, and don't use the native-code generator. (There are many cases when GHC does this on its own.) You might pick up a little bit of speed by compiling via C. If you use _ccall_gc_s or _casm_s, you probably have to use -fvia-C.

The lower-case incantation, -fvia-c, is synonymous.

Compiling via C will probably be slower (in compilation time) than using GHC's native code generator.

-funfolding-interface-threshold<n>:

(Default: 30) By raising or lowering this number, you can raise or lower the amount of pragmatic junk that gets spewed into interface files. (An unfolding has a ``size'' that reflects the cost in terms of ``code bloat'' of expanding that unfolding in another module. A bigger function would be assigned a bigger cost.)

-funfolding-creation-threshold<n>:

(Default: 30) This option is similar to -funfolding-interface-threshold, except that it governs unfoldings within a single module. Increasing this figure is more likely to result in longer compile times than faster code. The next option is more useful:

-funfolding-use-threshold<n>:

(Default: 8) This is the magic cut-off figure for unfolding: below this size, a function definition will be unfolded at the call-site, any bigger and it won't. The size computed for a function depends on two things: the actual size of the expression minus any discounts that apply (see -funfolding-con-discount).

-funfolding-con-discount<n>:

(Default: 2) If the compiler decides that it can eliminate some computation by performing an unfolding, then this is a discount factor that it applies to the funciton size before deciding whether to unfold it or not.

OK, folks, these magic numbers `30', `8', and '2' are mildly arbitrary; they are of the ``seem to be OK'' variety. The `8' is the more critical one; it's what determines how eager GHC is about expanding unfoldings.

-funbox-strict-fields:

This option causes all constructor fields which are marked strict (i.e. ``!'') to be unboxed or unpacked if possible. For example:

data T = T !Float !Float

will create a constructor T containing two unboxed floats if the -funbox-strict-fields flag is given. This may not always be an optimisation: if the T constructor is scrutinised and the floats passed to a non-strict function for example, they will have to be reboxed (this is done automatically by the compiler).

This option should only be used in conjunction with -O, in order to expose unfoldings to the compiler so the reboxing can be removed as often as possible. For example:

f :: T -> Float
f (T f1 f2) = f1 + f2

The compiler will avoid reboxing f1 and f2 by inlining + on floats, but only when -O is on.

Any single-constructor data is eligible for unpacking; for example

data T = T !(Int,Int)

will store the two Ints directly in the T constructor, by flattening the pair. Multi-level unpacking is also supported:

data T = T !S
data S = S !Int !Int

will store two unboxed Int#s directly in the T constructor.

-fsemi-tagging:

This option (which does not work with the native-code generator) tells the compiler to add extra code to test for already-evaluated values. You win if you have lots of such values during a run of your program, you lose otherwise. (And you pay in extra code space.)

We have not played with -fsemi-tagging enough to recommend it. (For all we know, it doesn't even work anymore… Sigh.)

3.8.3. -m*: platform-specific flags

Some flags only make sense for particular target platforms.

-mv8:

(SPARC machines) Means to pass the like-named option to GCC; it says to use the Version 8 SPARC instructions, notably integer multiply and divide. The similiar -m* GCC options for SPARC also work, actually.

-mlong-calls:

(HPPA machines) Means to pass the like-named option to GCC. Required for Very Big modules, maybe. (Probably means you're in trouble…)

-monly-[32]-regs:

(iX86 machines) GHC tries to ``steal'' four registers from GCC, for performance reasons; it almost always works. However, when GCC is compiling some modules with four stolen registers, it will crash, probably saying:
Foo.hc:533: fixed or forbidden register was spilled.
This may be due to a compiler bug or to impossible asm
statements or clauses.
Just give some registers back with -monly-N-regs. Try `3' first, then `2'. If `2' doesn't work, please report the bug to us.

3.8.4. Code improvement by the C compiler.

The C compiler (GCC) is run with -O turned on. (It has to be, actually).

If you want to run GCC with -O2—which may be worth a few percent in execution speed—you can give a -O2-for-C option.