[Haskell-cafe] Fast number parsing with strict bytestrings [Was: Re: Seemingly subtle change causes large performance variation]

Donald Bruce Stewart dons at cse.unsw.edu.au
Thu Jun 7 21:59:14 EDT 2007


dons:
> mdanish:
> > Hello,
> > 
> > I've been playing with the INTEST problem on SPOJ which demonstrates
> > the ability to write a program which processes large quantities of
> > input data.  http://www.spoj.pl/problems/INTEST/
>   
> > But when I make a slight modification, the program chews up a ton more memory
> > and takes more time:
> > 
> > import Control.Monad
> > import Data.Maybe
> > import qualified Data.ByteString.Char8 as B
> > 
> > divisibleBy :: Int -> Int -> Bool
> > a `divisibleBy` n = a `rem` n == 0
> > 
> > main :: IO ()
> > main = do
> >     [n,k] <- (map int . B.split ' ') `fmap` B.getLine :: IO [Int]
> > 
> >     let
> >         doLine :: Int -> Int -> IO Int
> >         doLine r _ = B.getLine >>= return . testDiv r
> >         -- 'return' moved here      ^^
> 


And just following up with some GC statistics:

Original,

    95% cpu 1.668 total

    <<ghc: 258766440 bytes,
           452 GCs,
           3036/3036 avg/max bytes residency (1 samples), 
           3M in use, 0.00 INIT (0.00 elapsed), 
           1.51 MUT (1.63 elapsed), 
           0.01 GC (0.03 elapsed) :ghc>>

Too lazy:

    96% cpu 4.219 total

    <<ghc: 278683532 bytes,
           495 GCs,
   -->     14729345/52642396 avg/max bytes residency (7 samples),
   -->     85M in use,
           0.00 INIT (0.00 elapsed),
           1.68 MUT (1.81 elapsed),
   -->     2.07 GC (2.36 elapsed) :ghc>>

    (clear space leak)

Fixing above program with $!:

    94% cpu 1.656 total
    <<ghc: 257394052 bytes
           451 GCs,
    -->    2288/2288 avg/max bytes residency (1 samples),
    -->    1M in use,
           0.00 INIT (0.00 elapsed),
           1.49 MUT (1.64 elapsed),
           0.01 GC (0.01 elapsed) :ghc>>

Using lazy bytestrings for pure processing:

    90% cpu 1.424 total
    <<ghc: 219403252 bytes,
           410 GCs,
           70527/74236 avg/max bytes residency (10 samples),
           2M in use,
           0.00 INIT (0.00 elapsed), 
           1.25 MUT (1.40 elapsed),
           0.01 GC (0.01 elapsed) :ghc>>

And the killer strict chunk parser:

    78% cpu 0.327 total
    <<ghc: 20685092 bytes,
    -->    38 GCs,
    -->    81348/81348 avg/max bytes residency (1 samples),
           2M in use,
           0.00 INIT (0.00 elapsed),
    -->    0.21 MUT (0.32 elapsed),
           0.00 GC (0.00 elapsed) :ghc>>

Very little data shuffled around in the last one.

-- Don


More information about the Haskell-Cafe mailing list