Personal tools

Performance/Strings

From HaskellWiki

< Performance(Difference between revisions)
Jump to: navigation, search
(Packed strings are a performance-centric alternative to Strings)
 
(Change Data.FastPackedString do Data.ByteString)
 
(13 intermediate revisions by 6 users not shown)
Line 1: Line 1:
''Strings''
+
{{Performance infobox}}
  +
[[Category:Performance|Strings]]
  +
==Strings==
   
 
Sometimes the cost of representing strings as lists of ''Char'' can be
 
Sometimes the cost of representing strings as lists of ''Char'' can be
 
too much. In this case, you can instead use packed strings. There are a
 
too much. In this case, you can instead use packed strings. There are a
 
number of options:
 
number of options:
* The standard Data.PackedString type
+
* One of the newer packed string libraries, for example [http://www.cse.unsw.edu.au/~dons/fps.html FastPackedString]
+
* One of the packed string libraries, for example [http://www.cse.unsw.edu.au/~dons/fps.html Data.ByteString]
* Unboxed arrays of Word8 or Char
+
* Unboxed arrays of Word8 or Char
* Ptrs to foreign malloced Word8 buffers
+
* Ptrs to foreign malloced Word8 buffers
   
 
The packed string libraries have the benefit over arrays of Word8 or
 
The packed string libraries have the benefit over arrays of Word8 or
Char types, in that the provide the usual list-like operations.
+
Char types, in that they provide the usual list-like operations.
  +
  +
Some interesting results for Data.ByteString are documented
  +
[http://www.cse.unsw.edu.au/~dons/code/fps/README here]. In particular,
  +
it compares FPS against the existing PackedString and [Char] functions,
  +
and is used successfully with 1 terabyte strings.
  +
  +
==Example==
  +
  +
Pete Chown asked the question:
  +
  +
I want to read a text file. As an example, let's use
  +
/usr/share/dict/words and try to print out the last line of the file.
  +
  +
The python version completes in around 0.05s.
  +
  +
=== Attempt 1 : [Char] ===
  +
  +
<haskell>
  +
import System.IO
  +
main = readFile "/usr/share/dict/words" >>= putStrLn.last.lines
  +
</haskell>
  +
  +
Run in hugs, this program took several seconds to complete. Problem:
  +
interpreted (solution, use a Haskell compiler). Compiled, the program
  +
completes in a fairly quick 0.2s. Still, we can do better.
  +
  +
=== Attempt 2 : Data.ByteString ===
  +
  +
Using [http://www.cse.unsw.edu.au/~dons/fps.html ByteString], we get:
  +
  +
<haskell>
  +
import qualified Data.ByteString as B
  +
import IO
  +
main = B.readFile "/usr/share/dict/words" >>= B.putStr . last . B.lines
  +
</haskell>
  +
  +
Runs in 0.063s
  +
  +
=== Attempt 3 : No Lists ===
  +
  +
Avoid splitting the file into lists at all, and just keep a single
  +
buffer (as a C programmer would perhaps do):
  +
  +
<haskell>
  +
import qualified Data.ByteString as P
  +
import Maybe
  +
import IO
  +
  +
main = P.readFile "/usr/share/dict/words" >>= P.putStrLn . snd . fromJust . P.breakLast '\n'
  +
</haskell>
  +
  +
Runs in 0.013s
  +
  +
=== Related work ===
  +
  +
An extended tutorial on using PackedStrings/ByteStrings for high performance string manipulating code is [[Wc|here]].
  +
  +
[http://groups.google.com/group/fa.haskell/browse_thread/thread/4133fa71ce97eb0e/fef34d1c3943bbe0#fef34d1c3943bbe0 A discussion] of the fastest way to parse a file of numbers, comparing various approaches using ByteStrings.

Latest revision as of 15:59, 31 August 2009

Haskell Performance Resource

Constructs:
Data Types - Functions
Overloading - FFI - Arrays
Strings - Integers - I/O
Floating point - Concurrency
Modules - Monads

Techniques:
Strictness - Laziness
Avoiding space leaks
Accumulating parameter

Implementation-Specific:
GHC - nhc98 - Hugs
Yhc - JHC

Contents

[edit] 1 Strings

Sometimes the cost of representing strings as lists of Char can be too much. In this case, you can instead use packed strings. There are a number of options:

  • One of the packed string libraries, for example Data.ByteString
  • Unboxed arrays of Word8 or Char
  • Ptrs to foreign malloced Word8 buffers

The packed string libraries have the benefit over arrays of Word8 or Char types, in that they provide the usual list-like operations.

Some interesting results for Data.ByteString are documented here. In particular, it compares FPS against the existing PackedString and [Char] functions, and is used successfully with 1 terabyte strings.

[edit] 2 Example

Pete Chown asked the question:

I want to read a text file.  As an example, let's use
/usr/share/dict/words and try to print out the last line of the file.

The python version completes in around 0.05s.

[edit] 2.1 Attempt 1 : [Char]

import System.IO                                                                                  
main = readFile "/usr/share/dict/words" >>= putStrLn.last.lines

Run in hugs, this program took several seconds to complete. Problem: interpreted (solution, use a Haskell compiler). Compiled, the program completes in a fairly quick 0.2s. Still, we can do better.

[edit] 2.2 Attempt 2 : Data.ByteString

Using ByteString, we get:

import qualified Data.ByteString as B                                                     
import IO                                                                                       
main = B.readFile "/usr/share/dict/words" >>= B.putStr . last . B.lines

Runs in 0.063s

[edit] 2.3 Attempt 3 : No Lists

Avoid splitting the file into lists at all, and just keep a single buffer (as a C programmer would perhaps do):

import qualified Data.ByteString as P
import Maybe
import IO
 
main = P.readFile "/usr/share/dict/words" >>= P.putStrLn . snd . fromJust . P.breakLast '\n'

Runs in 0.013s

[edit] 2.4 Related work

An extended tutorial on using PackedStrings/ByteStrings for high performance string manipulating code is here.

A discussion of the fastest way to parse a file of numbers, comparing various approaches using ByteStrings.