[Haskell-cafe] Streaming bytes and performance

Don Stewart dons00 at gmail.com
Tue Mar 19 20:36:03 CET 2013


Just for fun. Here's some improvements. about 6x faster.
I'd be interested to see what io-streams could do on this.

Using a 250M test file.

-- strict state monad and bang patterns on the uncons and accumulator
argument:

$ time ./A
4166680
./A  8.42s user 0.57s system 99% cpu 9.037 total

-- just write a loop

$ time ./A
4166680
./A  3.84s user 0.26s system 99% cpu 4.121 total

-- switch to Int

$ time ./A
4166680
./A  1.89s user 0.23s system 99% cpu 2.134 total

-- custom isSpace function

$ time ./A
4166680
./A  1.56s user 0.24s system 99% cpu 1.808 total

-- mmap IO

$ time ./A
4166680
./A  1.54s user 0.09s system 99% cpu 1.636 total

Here's the final program:


{-# LANGUAGE BangPatterns #-}

import qualified Data.ByteString        as S
import qualified Data.ByteString.Lazy.Char8 as L
import System.IO.Posix.MMap.Lazy

main = do
    f <- unsafeMMapFile "test.txt"
    print $ go 0 f
  where
    go :: Int -> L.ByteString -> Int
    go !a !s = case L.uncons s of
        Nothing     -> a
        Just (x,xs) | isSpaceChar8 x -> go (a+1) xs
                    | otherwise      -> go a     xs

isSpaceChar8 c = c == '\n'    || c == ' '
{-# INLINE isSpaceChar8 #-}


On Mon, Mar 18, 2013 at 8:53 AM, Konstantin Litvinenko <
to.darkangel at gmail.com> wrote:

> Hi All!
>
> I tune my toy project for performance and hit the wall on simple, in
> imperative world, task. Here is the code that model what I'm trying to
> achieve
>
> import qualified Data.ByteString.Lazy as L
> import Data.Word8(isSpace)
> import Data.Word
> import Control.Monad.State
>
> type Stream = State L.ByteString
>
> get_byte :: Stream (Maybe Word8)
> get_byte = do
>     s <- get
>     case L.uncons s of
>         Nothing -> return Nothing
>         Just (x, xs) -> put xs >> return (Just x)
>
> main = do
>     f <- L.readFile "test.txt"
>     let r = evalState count_spaces f
>     print r
>   where
>     count_spaces = go 0
>       where
>         go a = do
>             x <- get_byte
>             case x of
>                 Just x' ->  if isSpace x' then go (a + 1) else go a
>                 Nothing -> return a
>
> It takes the file and count spaces, in imperative way, consuming bytes one
> by one. The problem is: How to rewrite this to get rid of constant
> allocation of state but still working with stream of bytes? I can rewrite
> this as one-liner L.foldl, but that doesn't help me in any way to optimize
> my toy project where all algorithms build upon consuming stream of bytes.
>
> PS. My main lang is C++ over 10 years and I only learn Haskell :)
>
>
> ______________________________**_________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/**mailman/listinfo/haskell-cafe<http://www.haskell.org/mailman/listinfo/haskell-cafe>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130319/d7edeede/attachment.htm>


More information about the Haskell-Cafe mailing list