Personal tools

Performance

From HaskellWiki

(Difference between revisions)
Jump to: navigation, search
(Changed link to old wiki (now points to page in Wayback Machine); minor corrections)
(Additional Tips: link to TS home page)
(14 intermediate revisions by 13 users not shown)
Line 4: Line 4:
 
== Introduction ==
 
== Introduction ==
   
In most cases it is possible to write a Haskell program that performs as well as, or better than, the same program written in [''insert language here'']. There's a big caveat though: you may have to modify your code significantly in order to improve its performance. Compilers such as GHC are good at eliminating layers of abstraction, but they aren't perfect, and often need some help.
+
One question that often comes up is along the general lines of "Can I write this program in Haskell so that it performs as well as, or better than, the same program written in some other language?"
  +
  +
This is a difficult question to answer in general because Haskell is a language, not an implementation. Performance can only be measured relative to a specific language implementation.
  +
  +
Moreover, it's often not clear if two programs which supposedly have the same functionality really do the same thing. Different languages sometimes require very different ways of expressing the same intent. Certain types of bug are rare in typical Haskell programs that are more common in other languages and vice versa, due to strong typing, automatic memory management and lazy evaluation.
  +
  +
Nonetheless, it is usually possible to write a Haskell program that performs as well as, or better than, the same program written in any other language. The main caveat is that you may have to modify your code significantly in order to improve its performance. Compilers such as GHC are good at eliminating layers of abstraction, but they aren't perfect, and often need some help.
   
 
There are many non-invasive techniques: compiler options, for example. Then there are techniques that require adding some small amounts of performance cruft to your program: strictness annotations, for example. If you still don't get the best performance, though, it might be necessary to resort to larger refactorings.
 
There are many non-invasive techniques: compiler options, for example. Then there are techniques that require adding some small amounts of performance cruft to your program: strictness annotations, for example. If you still don't get the best performance, though, it might be necessary to resort to larger refactorings.
Line 57: Line 57:
 
* There are plenty of good examples of Haskell code written for performance in the [http://shootout.alioth.debian.org/ The Computer Language Shootout Benchmarks]
 
* There are plenty of good examples of Haskell code written for performance in the [http://shootout.alioth.debian.org/ The Computer Language Shootout Benchmarks]
 
* And many alternatives, with discussion, on the [http://web.archive.org/web/20060209215702/http://haskell.org/hawiki/ShootoutEntry old Haskell wiki]
 
* And many alternatives, with discussion, on the [http://web.archive.org/web/20060209215702/http://haskell.org/hawiki/ShootoutEntry old Haskell wiki]
  +
* There are ~100 [http://blog.johantibell.com/2010/09/slides-from-my-high-performance-haskell.html slides on High-Performance Haskell] from the 2010 CUFP tutorial on that topic.
   
 
== Specific comparisons of data structures ==
 
== Specific comparisons of data structures ==
* Data.Sequence VS lists
+
=== Data.Sequence vs. lists ===
   
  +
Data.Sequence has complexity O(log(min(i,n-i))) for access, insertion and update to position i of a sequence of length n.
  +
  +
List has complexity O(i).
  +
  +
List is a non-trivial constant-factor faster for operations at the head (cons and head), making it a more efficient choice for stack-like and stream-like access patterns. Data.Sequence is faster for every other access pattern, such as queue and random access.
  +
  +
See the following program for proof:
 
<haskell>
 
<haskell>
 
import Data.Sequence
 
import Data.Sequence
+
 
insert_million 0 sequence = sequence
 
insert_million 0 sequence = sequence
 
insert_million n sequence = insert_million (n - 1)(sequence |> n)
 
insert_million n sequence = insert_million (n - 1)(sequence |> n)
   
main = putStrLn (show (Data.Sequence.length (insert_million 1000000 empty)))
+
main = print (Data.Sequence.length (insert_million 1000000 empty))
 
 
</haskell>
 
</haskell>
ghc -O2 --make InsertMillionElements.hs
+
<pre>
time ./InsertMillionElements +RTS -K100M
+
$ ghc -O2 --make InsertMillionElements.hs && time ./InsertMillionElements +RTS -K100M
 
 
1000000
 
1000000
+
real 0m7.238s
real 0m7.238s
+
user 0m6.804s
+
sys 0m0.228s
user 0m6.804s
+
</pre>
 
sys 0m0.228s
 
 
 
<haskell>
 
<haskell>
 
insert_million 0 list = reverse list
 
insert_million 0 list = reverse list
 
insert_million n list = insert_million (n -1) (n:list)
 
insert_million n list = insert_million (n -1) (n:list)
+
main = putStrLn (show (length (insert_million 1000000 [])))
+
main = print (length (insert_million 1000000 []))
</haskell>
+
</haskell>
+
<pre>
ghc -O2 --make InsertMillionElements.hs
+
$ ghc -O2 --make InsertMillionElements.hs && time ./InsertMillionElementsList +RTS -K100M
time ./InsertMillionElementsList +RTS -K100M
 
 
 
1000000
 
1000000
  +
real 0m0.588s
  +
user 0m0.528s
  +
sys 0m0.052s
  +
</pre>
  +
Lists are substantially faster on this micro-benchmark.
   
real 0m0.588s
+
A sequence uses between 5/6 and 4/3 times as much space as the equivalent list (assuming an overhead of one word per node, as in GHC).
+
If only deque operations are used, the space usage will be near the lower end of the range, because all internal nodes will be ternary.
user 0m0.528s
+
Heavy use of split and append will result in sequences using approximately the same space as lists.
  +
In detail:
  +
* a list of length ''n'' consists of ''n'' cons nodes, each occupying 3 words.
  +
* a sequence of length ''n'' has approximately ''n''/(''k''-1) nodes, where ''k'' is the average arity of the internal nodes (each 2 or 3). There is a pointer, a size and overhead for each node, plus a pointer for each element, i.e. ''n''(3/(''k''-1) + 1) words.
   
sys 0m0.052s
+
== Additional Tips ==
   
Lists are substantially faster on this micro-benchmark.
+
* Use strict returns ( return $! ...) unless you absolutely need them lazy.
  +
* Profile, profile, profile - understand who is hanging on to the memory (+RTS -hc) and how it's being used (+RTS -hb).
  +
* Use +RTS -p to understand who's doing all the allocations and where your time is being spent.
  +
* Approach profiling like a science experiment - make one change, observe if anything is different, rollback and make another change - observe the change. Keep notes!
  +
* Use [[ThreadScope]] to visualize GHC eventlog traces.
   
 
[[Category:Idioms]]
 
[[Category:Idioms]]

Revision as of 08:34, 17 May 2012

Haskell Performance Resource

Constructs:
Data Types - Functions
Overloading - FFI - Arrays
Strings - Integers - I/O
Floating point - Concurrency
Modules - Monads

Techniques:
Strictness - Laziness
Avoiding space leaks
Accumulating parameter

Implementation-Specific:
GHC - nhc98 - Hugs
Yhc - JHC

Welcome to the Haskell Performance Resource, the collected wisdom on how to make your Haskell programs go faster.

Contents

1 Introduction

One question that often comes up is along the general lines of "Can I write this program in Haskell so that it performs as well as, or better than, the same program written in some other language?"

This is a difficult question to answer in general because Haskell is a language, not an implementation. Performance can only be measured relative to a specific language implementation.

Moreover, it's often not clear if two programs which supposedly have the same functionality really do the same thing. Different languages sometimes require very different ways of expressing the same intent. Certain types of bug are rare in typical Haskell programs that are more common in other languages and vice versa, due to strong typing, automatic memory management and lazy evaluation.

Nonetheless, it is usually possible to write a Haskell program that performs as well as, or better than, the same program written in any other language. The main caveat is that you may have to modify your code significantly in order to improve its performance. Compilers such as GHC are good at eliminating layers of abstraction, but they aren't perfect, and often need some help.

There are many non-invasive techniques: compiler options, for example. Then there are techniques that require adding some small amounts of performance cruft to your program: strictness annotations, for example. If you still don't get the best performance, though, it might be necessary to resort to larger refactorings.

Sometimes the code tweaks required to get the best performance are non-portable, perhaps because they require language extensions that aren't implemented in all compilers (e.g. unboxing), or because they require using platform-specific features or libraries. This might not be acceptable in your setting.

If the worst comes to the worst, you can always write your critical code in C and use the FFI to call it. Beware of the boundaries though - marshaling data across the FFI can be expensive, and multi-language memory management can be complex and error-prone. It's usually better to stick to Haskell if possible.

2 Basic techniques

The key tool to use in making your Haskell program run faster is profiling. Profiling is provided by GHC and nhc98. There is no substitute for finding where your program's time/space is really going, as opposed to where you imagine it is going.

Another point to bear in mind: By far the best way to improve a program's performance dramatically is to use better algorithms. Once profiling has thrown the spotlight on the guilty time-consumer(s), it may be better to re-think your program than to try all the tweaks listed below.

Another extremely efficient way to make your program snappy is to use library code that has been Seriously Tuned By Someone Else. You might be able to write a better sorting function than the one in Data.List, but it will take you much longer than typing import Data.List.

We have chosen to organise the rest of this resource first by Haskell construct (data types, pattern matching, integers), and then within each category to describe techniques that apply across implementations, and also techniques that are specific to a certain Haskell implementation (e.g. GHC). There are some implementation-specific techniques that apply in general - those are linked from the General Implementation-Specific Techniques section below.

3 Haskell constructs

4 General techniques

5 Compiler specific techniques

6 More information

7 Specific comparisons of data structures

7.1 Data.Sequence vs. lists

Data.Sequence has complexity O(log(min(i,n-i))) for access, insertion and update to position i of a sequence of length n.

List has complexity O(i).

List is a non-trivial constant-factor faster for operations at the head (cons and head), making it a more efficient choice for stack-like and stream-like access patterns. Data.Sequence is faster for every other access pattern, such as queue and random access.

See the following program for proof:

import Data.Sequence
 
insert_million 0 sequence = sequence
insert_million n sequence = insert_million (n - 1)(sequence |> n)
 
main = print (Data.Sequence.length (insert_million 1000000 empty))
 $ ghc -O2 --make InsertMillionElements.hs && time ./InsertMillionElements +RTS -K100M
1000000
real 0m7.238s
user 0m6.804s
sys 0m0.228s
insert_million 0 list = reverse list
insert_million n list = insert_million (n -1) (n:list)
 
main = print (length (insert_million 1000000 []))
 $ ghc -O2 --make InsertMillionElements.hs && time ./InsertMillionElementsList +RTS -K100M
1000000
real 0m0.588s
user 0m0.528s
sys 0m0.052s

Lists are substantially faster on this micro-benchmark.

A sequence uses between 5/6 and 4/3 times as much space as the equivalent list (assuming an overhead of one word per node, as in GHC). If only deque operations are used, the space usage will be near the lower end of the range, because all internal nodes will be ternary. Heavy use of split and append will result in sequences using approximately the same space as lists. In detail:

  • a list of length n consists of n cons nodes, each occupying 3 words.
  • a sequence of length n has approximately n/(k-1) nodes, where k is the average arity of the internal nodes (each 2 or 3). There is a pointer, a size and overhead for each node, plus a pointer for each element, i.e. n(3/(k-1) + 1) words.

8 Additional Tips

  • Use strict returns ( return $! ...) unless you absolutely need them lazy.
  • Profile, profile, profile - understand who is hanging on to the memory (+RTS -hc) and how it's being used (+RTS -hb).
  • Use +RTS -p to understand who's doing all the allocations and where your time is being spent.
  • Approach profiling like a science experiment - make one change, observe if anything is different, rollback and make another change - observe the change. Keep notes!
  • Use ThreadScope to visualize GHC eventlog traces.