Difference between revisions of "Performance"

From HaskellWiki
Jump to navigation Jump to search
(Do not remove this benchmark code, you can move it somewhere else, but the point of it is to show actual performance, not complexity. The two data structures both can be used to cons in constant time.)
(note on Data.Sequence space usage)
Line 59: Line 59:
   
 
== Specific comparisons of data structures ==
 
== Specific comparisons of data structures ==
* Data.Sequence vs. lists
+
=== Data.Sequence vs. lists ===
   
 
Data.Sequence has complexity O(log(min(i,n-i))) for access, insertion and update to position i of a sequence of length n.
 
Data.Sequence has complexity O(log(min(i,n-i))) for access, insertion and update to position i of a sequence of length n.
Line 74: Line 74:
 
insert_million n sequence = insert_million (n - 1)(sequence |> n)
 
insert_million n sequence = insert_million (n - 1)(sequence |> n)
   
 
 
main = putStrLn (show (Data.Sequence.length (insert_million 1000000 empty)))
 
main = putStrLn (show (Data.Sequence.length (insert_million 1000000 empty)))
 
</haskell>
 
</haskell>
  +
<pre>
 
 
ghc -O2 --make InsertMillionElements.hs time ./InsertMillionElements +RTS -K100M
 
ghc -O2 --make InsertMillionElements.hs time ./InsertMillionElements +RTS -K100M
 
 
1000000
 
1000000
 
 
real 0m7.238s
 
real 0m7.238s
 
 
user 0m6.804s
 
user 0m6.804s
 
 
sys 0m0.228s
 
sys 0m0.228s
  +
</pre>
 
<haskell>
 
<haskell>
 
insert_million 0 list = reverse list
 
insert_million 0 list = reverse list
Line 93: Line 89:
 
main = putStrLn (show (length (insert_million 1000000 [])))
 
main = putStrLn (show (length (insert_million 1000000 [])))
 
</haskell>
 
</haskell>
  +
<pre>
 
 
ghc -O2 --make InsertMillionElements.hs time ./InsertMillionElementsList +RTS -K100M
 
ghc -O2 --make InsertMillionElements.hs time ./InsertMillionElementsList +RTS -K100M
 
 
1000000
 
1000000
 
 
real 0m0.588s
 
real 0m0.588s
 
user 0m0.528s
 
user 0m0.528s
 
sys 0m0.052s
 
sys 0m0.052s
  +
</pre>
 
Lists are substantially faster on this micro-benchmark.
   
  +
A sequence uses between 0.75 and 1.5 times as much space as the equivalent list (assuming an overhead of one word per node, as in GHC). If only deque operations are used, the space usage will be near the lower end of the range, because all internal nodes will be ternary.
Lists are substantially faster on this micro-benchmark.
 
   
 
== Additional Tips ==
 
== Additional Tips ==

Revision as of 10:05, 24 June 2008

Haskell Performance Resource

Constructs:
Data Types - Functions
Overloading - FFI - Arrays
Strings - Integers - I/O
Floating point - Concurrency
Modules - Monads

Techniques:
Strictness - Laziness
Avoiding space leaks
Accumulating parameter

Implementation-Specific:
GHC - nhc98 - Hugs
Yhc - JHC

Welcome to the Haskell Performance Resource, the collected wisdom on how to make your Haskell programs go faster.

Introduction

In most cases it is possible to write a Haskell program that performs as well as, or better than, the same program written in [insert language here]. There's a big caveat though: you may have to modify your code significantly in order to improve its performance. Compilers such as GHC are good at eliminating layers of abstraction, but they aren't perfect, and often need some help.

There are many non-invasive techniques: compiler options, for example. Then there are techniques that require adding some small amounts of performance cruft to your program: strictness annotations, for example. If you still don't get the best performance, though, it might be necessary to resort to larger refactorings.

Sometimes the code tweaks required to get the best performance are non-portable, perhaps because they require language extensions that aren't implemented in all compilers (e.g. unboxing), or because they require using platform-specific features or libraries. This might not be acceptable in your setting.

If the worst comes to the worst, you can always write your critical code in C and use the FFI to call it. Beware of the boundaries though - marshaling data across the FFI can be expensive, and multi-language memory management can be complex and error-prone. It's usually better to stick to Haskell if possible.

Basic techniques

The key tool to use in making your Haskell program run faster is profiling. Profiling is provided by GHC and nhc98. There is no substitute for finding where your program's time/space is really going, as opposed to where you imagine it is going.

Another point to bear in mind: By far the best way to improve a program's performance dramatically is to use better algorithms. Once profiling has thrown the spotlight on the guilty time-consumer(s), it may be better to re-think your program than to try all the tweaks listed below.

Another extremely efficient way to make your program snappy is to use library code that has been Seriously Tuned By Someone Else. You might be able to write a better sorting function than the one in Data.List, but it will take you much longer than typing import Data.List.

We have chosen to organise the rest of this resource first by Haskell construct (data types, pattern matching, integers), and then within each category to describe techniques that apply across implementations, and also techniques that are specific to a certain Haskell implementation (e.g. GHC). There are some implementation-specific techniques that apply in general - those are linked from the General Implementation-Specific Techniques section below.

Haskell constructs

General techniques

Compiler specific techniques

More information

Specific comparisons of data structures

Data.Sequence vs. lists

Data.Sequence has complexity O(log(min(i,n-i))) for access, insertion and update to position i of a sequence of length n.

List has complexity O(i).

List is a non-trivial constant-factor faster for operations at the head (cons and head), making it a more efficient choice for stack-like and stream-like access patterns. Data.Sequence is faster for every other access pattern, such as queue and random access.

See the following program for proof:

import Data.Sequence
 
insert_million 0 sequence = sequence
insert_million n sequence = insert_million (n - 1)(sequence |> n)

main = putStrLn (show (Data.Sequence.length (insert_million 1000000 empty)))
ghc -O2 --make InsertMillionElements.hs time ./InsertMillionElements +RTS -K100M
1000000
real 0m7.238s
user 0m6.804s
sys 0m0.228s
insert_million 0 list = reverse list
insert_million n list = insert_million (n -1) (n:list)
 
main = putStrLn (show (length (insert_million 1000000 [])))
ghc -O2 --make InsertMillionElements.hs time ./InsertMillionElementsList +RTS -K100M
1000000
real 0m0.588s
user 0m0.528s
sys 0m0.052s

Lists are substantially faster on this micro-benchmark.

A sequence uses between 0.75 and 1.5 times as much space as the equivalent list (assuming an overhead of one word per node, as in GHC). If only deque operations are used, the space usage will be near the lower end of the range, because all internal nodes will be ternary.

Additional Tips

  • Use strict returns ( return $! ...) unless you absolutely need them lazy.
  • foldl' over foldr unless you have to use foldr.
  • Profile, profile, profile - understand who is hanging on to the memory (+RTS -hc) and how it's being used (+RTS -hb).
  • Use +RTS -p to understand who's doing all the allocations and where your time is being spent.
  • Approach profiling like a science experiment - make one change, observe if anything is different, rollback and make another change - observer the change. Keep notes!