[Haskell-cafe] performance question

Bob Ippolito bob at redivi.com
Wed Feb 13 06:16:49 CET 2013


On Tuesday, February 12, 2013, wrote:

> On Tue, 12 Feb 2013 15:57:37 -0700
> Nicolas Bock <nicolasbock at gmail.com <javascript:;>> wrote:
>
> > >  Here is haskell version that is faster than python, almost as fast as
> c++.
> > > You need to install bytestring-lexing package for readDouble.
>
>
> I was hoping Branimir could comment on how the improvements were allocated.
>
> how much is due to text.regex.pcre (which looks to be a wrapper to
> libpcre) ?
>
> how much can be attributed to using data.bytestring ?
>
> you have to admit, it's amazing how well a byte-compiled, _dynamically
> typed_ interpreter can do against an actualy native code compiler.  Can't
> regex be done effectively in haskell ?  Is it something that can't be done,
> or is it just such minimal effort to link to pcre that it's not worth the
> trouble ?


I think that there are two bottlenecks: the regex engine, and converting a
bytestring to a double. There doesn't appear to be a fast and accurate
strtod implementation for Haskell, and the faster regex implementations
that I could find appear to be unmaintained.


>
>
> Brian
>
> > > import Text.Regex.PCRE
> > > import Data.Maybe
> > > import Data.Array.IO
> > > import Data.Array.Unboxed
> > > import qualified Data.ByteString.Char8 as B
> > > import Data.ByteString.Lex.Double (readDouble)
> > >
> > > strataBounds :: UArray Int Double
> > > strataBounds = listArray (0,10) [ 0.0, 1.0e-8, 1.0e-7, 1.0e-6, 1.0e-5,
> > > 1.0e-4, 1.0e-3, 1.0e-2, 1.0e-1, 1.0, 2.0 ]
> > >
> > > newStrataCounts :: IO(IOUArray Int Int)
> > > newStrataCounts = newArray (bounds strataBounds) 0
> > >
> > > main = do
> > >     l <- B.getContents
> > >     let a = B.lines l
> > >     strataCounts <- newStrataCounts
> > >     n <- calculate strataCounts a 0
> > >     let
> > >         printStrataCounts :: IO ()
> > >         printStrataCounts = do
> > >             let s = round $ sqrt (fromIntegral n::Double) :: Int
> > >             printf "read %d matrix elements (%dx%d = %d)\n" n s s n
> > >             printStrataCounts' 0 0
> > >         printStrataCounts' :: Int -> Int -> IO ()
> > >         printStrataCounts' i total
> > >             | i < (snd $ bounds strataBounds) = do
> > >                 count <- readArray strataCounts i
> > >                 let
> > >                     p :: Double
> > >                     p = (100.0*(fromIntegral count) ::
> > > Double)/(fromIntegral n :: Double)
> > >                 printf "[%1.2e, %1.2e) = %i (%1.2f%%) %i\n"
> (strataBounds
> > > ! i) (strataBounds ! (i+1))
> > >                                                                 count p
> > > (total + count)
> > >                 printStrataCounts' (i+1) (total+count)
> > >             | otherwise = return ()
> > >     printStrataCounts
> > >
> > > calculate :: IOUArray Int Int -> [B.ByteString] -> Int -> IO Int
> > > calculate _ [] n = return n
> > > calculate counts (l:ls) n = do
> > >     let
> > >         a = case getAllTextSubmatches $ l =~ B.pack "matrix.*=
> > > ([0-9eE.+-]+)$" :: [B.ByteString] of
> > >                 [_,v] -> Just (readDouble v) :: Maybe (Maybe
> > > (Double,B.ByteString))
> > >                 _ -> Nothing
> > >         b = (fst.fromJust.fromJust) a
> > >         loop :: Int -> IO()
> > >         loop i
> > >             | i < (snd $ bounds strataBounds) =
> > >                 if (b >= (strataBounds ! i)) && (b < (strataBounds !
> > > (i+1)))
> > >                 then do
> > >                     c <- readArray counts i
> > >                     writeArray counts i (c+1)
> > >                 else
> > >                     loop (i+1)
> > >             | otherwise = return ()
> > >     if isNothing a
> > >         then
> > >             calculate counts ls n
> > >         else do
> > >             loop 0
> > >             calculate counts ls (n+1)
> > >
> > >
> > > ------------------------------
> > > From: nicolasbock at gmail.com <javascript:;>
> > > Date: Fri, 8 Feb 2013 12:26:09 -0700
> > > To: haskell-cafe at haskell.org <javascript:;>
> > > Subject: [Haskell-cafe] performance question
> > >
> > > Hi list,
> > >
> > > I wrote a script that reads matrix elements from standard input, parses
> > > the input using a regular expression, and then bins the matrix
> elements by
> > > magnitude. I wrote the same script in python (just to be sure :) ) and
> find
> > > that the python version vastly outperforms the Haskell script.
> > >
> > > To be concrete:
> > >
> > > $ time ./createMatrixDump.py -N 128 | ./printMatrixDecay
> > > real    0m2.655s
> > > user    0m2.677s
> > > sys     0m0.095s
> > >
> > > $ time ./createMatrixDump.py -N 128 | ./printMatrixDecay.py -
> > > real    0m0.445s
> > > user    0m0.615s
> > > sys     0m0.032s
> > >
> > > The Haskell script was compiled with "ghc --make printMatrixDecay.hs".
> > >
> > > Could you have a look at the script and give me some pointers as to
> where
> > > I could improve it, both in terms of performance and also generally,
> as I
> > > am very new to Haskell.
> > >
> > > Thanks already,
> > >
> > > nick
> > >
> > >
> > > _______________________________________________ Haskell-Cafe mailing
> list
> > > Haskell-Cafe at haskell.org <javascript:;>
> > > http://www.haskell.org/mailman/listinfo/haskell-cafe
> > >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130212/d2756498/attachment.htm>


More information about the Haskell-Cafe mailing list