Personal tools

Performance/Strings

From HaskellWiki

Jump to: navigation, search
Haskell Performance Resource

Constructs:
Data Types - Functions
Overloading - FFI - Arrays
Strings - Integers - I/O
Floating point - Concurrency
Modules - Monads

Techniques:
Strictness - Laziness
Avoiding space leaks
Accumulating parameter

Implementation-Specific:
GHC - nhc98 - Hugs
Yhc - JHC

Contents

1 Strings

Sometimes the cost of representing strings as lists of Char can be too much. In this case, you can instead use packed strings. There are a number of options:

  • The standard Data.PackedString type
  • One of the newer packed string libraries, for example Data.ByteString
  • Unboxed arrays of Word8 or Char
  • Ptrs to foreign malloced Word8 buffers

The packed string libraries have the benefit over arrays of Word8 or Char types, in that they provide the usual list-like operations.

Some interesting results for Data.ByteString are documented here. In particular, it compares FPS against the existing PackedString and [Char] functions, and is used successfully with 1 terabyte strings.

2 Example

Pete Chown asked the question:

I want to read a text file.  As an example, let's use
/usr/share/dict/words and try to print out the last line of the file.

The python version completes in around 0.05s.

2.1 Attempt 1 : [Char]

import System.IO                                                                                  
main = readFile "/usr/share/dict/words" >>= putStrLn.last.lines

Run in hugs, this program took several seconds to complete. Problem: interpreted (solution, use a Haskell compiler). Compiled, the program completes in a fairly quick 0.2s. Still, we can do better.

2.2 Attempt 2 : Packed Strings

Using fast, packed strings, we get:

import qualified Data.FastPackedString as P                                                     
import IO                                                                                       
main = P.readFile "/usr/share/dict/words" >>= P.putStr . last . P.lines

Runs in 0.063s

2.3 Attempt 3 : No Lists

Avoid splitting the file into lists at all, and just keep a single buffer (as a C programmer would perhaps do):

import qualified Data.ByteString as P
import Maybe
import IO
 
main = P.readFile "/usr/share/dict/words" >>= P.putStrLn . snd . fromJust . P.breakLast '\n'

Runs in 0.013s

2.4 Related work

An extended tutorial on using PackedStrings/ByteStrings for high performance string manipulating code is here.

A discussion of the fastest way to parse a file of numbers, comparing various approaches using ByteStrings.