[Haskell-cafe] Streaming bytes and performance

Don Stewart dons00 at gmail.com
Tue Mar 19 21:32:58 CET 2013


Oh, I forgot the technique of inlining the lazy bytestring chunks, and
processing each chunk seperately.

$ time ./fast
4166680
./fast  1.25s user 0.07s system 99% cpu 1.325 total

Essentially inline Lazy.foldlChunks and specializes is (the inliner should
really get that).
And now we have a nice unboxed inner loop, which llvm might spot:

$ ghc -O2 -funbox-strict-fields fast.hs  --make -fllvm
$ time ./fast
4166680
./fast  1.07s user 0.06s system 98% cpu *1.146 total*

So about 8x faster. Waiting for some non-lazy bytestring benchmarks... :)

{-# LANGUAGE BangPatterns #-}

import Data.ByteString.Internal
import Data.ByteString.Unsafe
import qualified Data.ByteString.Char8      as S
import qualified Data.ByteString.Lazy.Char8 as L
import qualified Data.ByteString.Lazy.Internal as L
import System.IO.Posix.MMap.Lazy

main = do
    f <- unsafeMMapFile "test.txt"
    print . new 0 $ L.toChunks f

new :: Int -> [ByteString] -> Int
new i [] = i
new i (x:xs) = new (add i x) xs

-- jump into the fast path
{-# INLINE add #-}
add :: Int -> ByteString -> Int
add !i !s | S.null s   = i
          | isSpace' x = add (i+1) xs
          | otherwise  = add i     xs
  where T x xs = uncons s

data T = T !Char ByteString

uncons s = T (w2c (unsafeHead s)) (unsafeTail s)

isSpace' c = c == '\n'    || c == ' '
{-# INLINE isSpace' #-}




On Tue, Mar 19, 2013 at 7:36 PM, Don Stewart <dons00 at gmail.com> wrote:

> Just for fun. Here's some improvements. about 6x faster.
> I'd be interested to see what io-streams could do on this.
>
> Using a 250M test file.
>
> -- strict state monad and bang patterns on the uncons and accumulator
> argument:
>
> $ time ./A
> 4166680
> ./A  8.42s user 0.57s system 99% cpu 9.037 total
>
> -- just write a loop
>
> $ time ./A
> 4166680
> ./A  3.84s user 0.26s system 99% cpu 4.121 total
>
> -- switch to Int
>
> $ time ./A
> 4166680
> ./A  1.89s user 0.23s system 99% cpu 2.134 total
>
> -- custom isSpace function
>
> $ time ./A
> 4166680
> ./A  1.56s user 0.24s system 99% cpu 1.808 total
>
> -- mmap IO
>
> $ time ./A
> 4166680
> ./A  1.54s user 0.09s system 99% cpu 1.636 total
>
> Here's the final program:
>
>
> {-# LANGUAGE BangPatterns #-}
>
> import qualified Data.ByteString        as S
> import qualified Data.ByteString.Lazy.Char8 as L
> import System.IO.Posix.MMap.Lazy
>
> main = do
>     f <- unsafeMMapFile "test.txt"
>     print $ go 0 f
>   where
>     go :: Int -> L.ByteString -> Int
>     go !a !s = case L.uncons s of
>         Nothing     -> a
>         Just (x,xs) | isSpaceChar8 x -> go (a+1) xs
>                     | otherwise      -> go a     xs
>
> isSpaceChar8 c = c == '\n'    || c == ' '
> {-# INLINE isSpaceChar8 #-}
>
>
> On Mon, Mar 18, 2013 at 8:53 AM, Konstantin Litvinenko <
> to.darkangel at gmail.com> wrote:
>
>> Hi All!
>>
>> I tune my toy project for performance and hit the wall on simple, in
>> imperative world, task. Here is the code that model what I'm trying to
>> achieve
>>
>> import qualified Data.ByteString.Lazy as L
>> import Data.Word8(isSpace)
>> import Data.Word
>> import Control.Monad.State
>>
>> type Stream = State L.ByteString
>>
>> get_byte :: Stream (Maybe Word8)
>> get_byte = do
>>     s <- get
>>     case L.uncons s of
>>         Nothing -> return Nothing
>>         Just (x, xs) -> put xs >> return (Just x)
>>
>> main = do
>>     f <- L.readFile "test.txt"
>>     let r = evalState count_spaces f
>>     print r
>>   where
>>     count_spaces = go 0
>>       where
>>         go a = do
>>             x <- get_byte
>>             case x of
>>                 Just x' ->  if isSpace x' then go (a + 1) else go a
>>                 Nothing -> return a
>>
>> It takes the file and count spaces, in imperative way, consuming bytes
>> one by one. The problem is: How to rewrite this to get rid of constant
>> allocation of state but still working with stream of bytes? I can rewrite
>> this as one-liner L.foldl, but that doesn't help me in any way to optimize
>> my toy project where all algorithms build upon consuming stream of bytes.
>>
>> PS. My main lang is C++ over 10 years and I only learn Haskell :)
>>
>>
>> ______________________________**_________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/**mailman/listinfo/haskell-cafe<http://www.haskell.org/mailman/listinfo/haskell-cafe>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130319/d42d3289/attachment.htm>


More information about the Haskell-Cafe mailing list