[Haskell-cafe] Re: String vs ByteString

Daniel Fischer daniel.is.fischer at web.de
Fri Aug 13 12:55:49 EDT 2010


On Friday 13 August 2010 17:57:36, Bryan O'Sullivan wrote:
>    3. Some commonly used functions, such as substring searching, are
> *way*faster than their ByteString counterparts.

That's an unfortunate example. Using the stringsearch package, substring 
searching in ByteStrings was considerably faster than in Data.Text in my 
tests.
Replacing substrings blew Data.Text to pieces even, with a factor of 10-65 
between ByteString and Text (and much smaller memory footprint).

stringsearch (Data.ByteString.Lazy.Search):

$ ./bmLazy +RTS -s -RTS ../../bigfile Gutenberg Hutzenzwerg > /dev/null                                   
./bmLazy ../../bigfile Gutenberg Hutzenzwerg +RTS -s
      92,045,816 bytes allocated in the heap
          31,908 bytes copied during GC
         103,368 bytes maximum residency (1 sample(s))
          39,992 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:   158 collections,     0 parallel,  0.01s,  0.00s elapsed
  Generation 1:     1 collections,     0 parallel,  0.00s,  0.00s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    0.07s  (  0.17s elapsed)
  GC    time    0.01s  (  0.00s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    0.08s  (  0.17s elapsed)

  %GC time      10.5%  (2.1% elapsed)

  Alloc rate    1,353,535,321 bytes per MUT second

  Productivity  89.5% of total user, 40.1% of total elapsed

Data.Text.Lazy:

$ ./textLazy +RTS -s -RTS ../../bigfile Gutenberg Hutzenzwerg > /dev/null                                 
./textLazy ../../bigfile Gutenberg Hutzenzwerg +RTS -s
   4,916,133,652 bytes allocated in the heap
       6,721,496 bytes copied during GC
      12,961,776 bytes maximum residency (58 sample(s))
      12,788,968 bytes maximum slop
              39 MB total memory in use (1 MB lost due to fragmentation)

  Generation 0:  8774 collections,     0 parallel,  0.70s,  0.73s elapsed
  Generation 1:    58 collections,     0 parallel,  0.03s,  0.03s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    9.87s  ( 10.23s elapsed)
  GC    time    0.73s  (  0.75s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   10.60s  ( 10.99s elapsed)

  %GC time       6.9%  (6.9% elapsed)

  Alloc rate    497,956,181 bytes per MUT second

bigfile is a ~75M file.


The point of the more adequate API for text manipulation stands, of course.

Cheers,
Daniel


More information about the Haskell-Cafe mailing list