Difference between revisions of "Performance/Strings"

From HaskellWiki
Jump to navigation Jump to search
m
(Take string/packed string problem from haskell-cafe@ as example)
Line 13: Line 13:
 
The packed string libraries have the benefit over arrays of Word8 or
 
The packed string libraries have the benefit over arrays of Word8 or
 
Char types, in that they provide the usual list-like operations.
 
Char types, in that they provide the usual list-like operations.
  +
  +
==Example==
  +
  +
Pete Chown asked the question:
  +
  +
I want to read a text file. As an example, let's use
  +
/usr/share/dict/words and try to print out the last line of the file.
  +
  +
The python version completes in around 0.05s.
  +
  +
=== Attempt 1 : [Char] ===
  +
  +
<haskell>
  +
import System.IO
  +
main = readFile "/usr/share/dict/words" >>= putStrLn.last.lines
  +
</haskell>
  +
  +
Run in hugs, this program took several seconds to complete. Problem:
  +
interpreted (solution, use a Haskell compiler). Compiled, the program
  +
completes in a fairly quick 0.2s. Still, we can do better.
  +
  +
=== Attempt 2 : Packed Strings ===
  +
  +
Using [http://www.cse.unsw.edu.au/~dons/fps.html fast, packed strings], we get:
  +
  +
<haskell>
  +
import qualified Data.FastPackedString as P
  +
import IO
  +
main = P.readFile "/usr/share/dict/words" >>= P.hPut stdout . last . P.lines
  +
</haskell>
  +
  +
Runs in 0.063s
  +
  +
=== Attempt 3 : No Lists ===
  +
  +
Avoid splitting the file into lists at all, and just keep a single
  +
buffer (as a C programmer would perhaps do):
  +
  +
<haskell>
  +
import qualified Data.FastPackedString as P
  +
import IO
  +
  +
main = do P.readFile "/usr/share/dict/words" >>= P.hPut stdout . snd . P.spanEnd (/='\n') . P.init
  +
putChar '\n'
  +
</haskell>
  +
  +
Runs in 0.013s

Revision as of 15:11, 19 March 2006

Haskell Performance Resource

Constructs:
Data Types - Functions
Overloading - FFI - Arrays
Strings - Integers - I/O
Floating point - Concurrency
Modules - Monads

Techniques:
Strictness - Laziness
Avoiding space leaks
Accumulating parameter

Implementation-Specific:
GHC - nhc98 - Hugs
Yhc - JHC

Strings

Sometimes the cost of representing strings as lists of Char can be too much. In this case, you can instead use packed strings. There are a number of options:

  • The standard Data.PackedString type
  • One of the newer packed string libraries, for example FastPackedString
  • Unboxed arrays of Word8 or Char
  • Ptrs to foreign malloced Word8 buffers

The packed string libraries have the benefit over arrays of Word8 or Char types, in that they provide the usual list-like operations.

Example

Pete Chown asked the question:

I want to read a text file.  As an example, let's use
/usr/share/dict/words and try to print out the last line of the file.

The python version completes in around 0.05s.

Attempt 1 : [Char]

import System.IO                                                                                  
main = readFile "/usr/share/dict/words" >>= putStrLn.last.lines

Run in hugs, this program took several seconds to complete. Problem: interpreted (solution, use a Haskell compiler). Compiled, the program completes in a fairly quick 0.2s. Still, we can do better.

Attempt 2 : Packed Strings

Using fast, packed strings, we get:

import qualified Data.FastPackedString as P                                                     
import IO                                                                                       
main = P.readFile "/usr/share/dict/words" >>= P.hPut stdout . last . P.lines

Runs in 0.063s

Attempt 3 : No Lists

Avoid splitting the file into lists at all, and just keep a single buffer (as a C programmer would perhaps do):

import qualified Data.FastPackedString as P                                                     
import IO

main = do P.readFile "/usr/share/dict/words" >>= P.hPut stdout . snd . P.spanEnd (/='\n') . P.init
          putChar '\n'

Runs in 0.013s