Difference between revisions of "Performance/GHC"

From HaskellWiki
Jump to navigation Jump to search
m (+infobox)
m
Line 49: Line 49:
 
When you are ''really'' desperate for speed, and you want to get right down to the “raw bits.” Please see [http://www.haskell.org/ghc/docs/latest/html/users_guide/primitives.html GHC Primitives] for some information about using unboxed types.
 
When you are ''really'' desperate for speed, and you want to get right down to the “raw bits.” Please see [http://www.haskell.org/ghc/docs/latest/html/users_guide/primitives.html GHC Primitives] for some information about using unboxed types.
   
This should be a last resort, however, since unboxed types and primitives are non-portable. Fortunately, it is usually not necessary to resort to using explicit unboxed types and primitives, because GHC's optimiser can do the work for you by inlining operations it knows about, and unboxing strict function arguments (see [[Performance:Strictness]]). Strict and unpacked constructor fields can also help a lot (see [[Performance:Data Types]]). Sometimes GHC needs a little help to generate the right code, so you might have to look at the Core output to see whether your tweaks are actually resulting in the desired results.
+
This should be a last resort, however, since unboxed types and primitives are non-portable. Fortunately, it is usually not necessary to resort to using explicit unboxed types and primitives, because GHC's optimiser can do the work for you by inlining operations it knows about, and unboxing strict function arguments (see [[Performance:Strictness]]). Strict and unpacked constructor fields can also help a lot (see [[Performance:Data Types]]). Sometimes GHC needs a little help to generate the right code, so you might have to look at the Core output to see whether your tweaks are actually having the desired effect.
   
 
One thing that can be said for using unboxed types and primitives is that you ''know'' you're writing efficient code, rather than relying on GHC's optimiser to do the right thing, and being at the mercy of changes in GHC's optimiser down the line. This may well be important to you, in which case go for it.
 
One thing that can be said for using unboxed types and primitives is that you ''know'' you're writing efficient code, rather than relying on GHC's optimiser to do the right thing, and being at the mercy of changes in GHC's optimiser down the line. This may well be important to you, in which case go for it.

Revision as of 16:38, 12 January 2006

Haskell Performance Resource

Constructs:
Data Types - Functions
Overloading - FFI - Arrays
Strings - Integers - I/O
Floating point - Concurrency
Modules - Monads

Techniques:
Strictness - Laziness
Avoiding space leaks
Accumulating parameter

Implementation-Specific:
GHC - nhc98 - Hugs
Yhc - JHC

Please report any overly-slow GHC-compiled programs. Since GHC doesn't have any credible competition in the performance department these days it's hard to say what overly-slow means, so just use your judgement! Of course, if a GHC compiled program runs slower than the same program compiled by another compiler, then it's definitely a bug.

Use Optimisation

Optimise, using -O or -O2: this is the most basic way to make your program go faster. Compilation time will be slower, especially with -O2.

At present, -O2 is nearly indistinguishable from -O.

GHCi cannot optimise interpreted code, so when using GHCi, compile critical modules using -O or -O2, then load them into GHCi.

Measuring Performance

The first thing to do is measure the performance of your program, and find out whether all the time is being spent in the garbage collector or not. Run your program with the +RTS -sstderr option:

$ ./clausify 20 +RTS -sstderr
42,764,972 bytes allocated in the heap
 6,915,348 bytes copied during GC (scavenged)
   360,448 bytes copied during GC (not scavenged)
    36,616 bytes maximum residency (7 sample(s))
        81 collections in generation 0 (  0.07s)
         7 collections in generation 1 (  0.00s)
         2 Mb total memory in use
 INIT  time    0.00s  (  0.00s elapsed)
 MUT   time    0.65s  (  0.94s elapsed)
 GC    time    0.07s  (  0.06s elapsed)
 EXIT  time    0.00s  (  0.00s elapsed)
 Total time    0.72s  (  1.00s elapsed)
 %GC time       9.7%  (6.0% elapsed)
 Alloc rate    65,792,264 bytes per MUT second
 Productivity  90.3% of total user, 65.1% of total elapsed

This tells you how much time is being spent running the program itself (MUT time), and how much time spent in the garbage collector (GC time).

If your program is doing a lot of GC, then your first priority should be to check for Space Leaks using heap profiling, and then to try to reduce allocations by time and allocation profiling.

If you can't reduce the GC cost any further, then using more memory by tweaking the GC options will probably help. For example, increasing the default heap size with +RTS -H128m will reduce the number of GCs.

If your program isn't doing too much GC, then you should proceed to time and allocation profiling to see where the big hitters are.

Unboxed types

When you are really desperate for speed, and you want to get right down to the “raw bits.” Please see GHC Primitives for some information about using unboxed types.

This should be a last resort, however, since unboxed types and primitives are non-portable. Fortunately, it is usually not necessary to resort to using explicit unboxed types and primitives, because GHC's optimiser can do the work for you by inlining operations it knows about, and unboxing strict function arguments (see Performance:Strictness). Strict and unpacked constructor fields can also help a lot (see Performance:Data Types). Sometimes GHC needs a little help to generate the right code, so you might have to look at the Core output to see whether your tweaks are actually having the desired effect.

One thing that can be said for using unboxed types and primitives is that you know you're writing efficient code, rather than relying on GHC's optimiser to do the right thing, and being at the mercy of changes in GHC's optimiser down the line. This may well be important to you, in which case go for it.


Looking at the Core

(ToDo) -ddump-simpl