Benchmarks Game/Parallel/RegexDNA

From HaskellWiki
< Benchmarks Game‎ | Parallel
Revision as of 18:04, 21 September 2008 by Newsham (talk | contribs) (created page for regex-dna bench)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Regex-DNA

Old submission is:

Code:

This is almost identical to the old code, with a parallel map used to perform the first phase of the benchmark (counting matches of each variant). I had trouble compiling the original with Text.Regex.Posix so I used Text.Regex.PCRE (which is probably faster, are hackage packages fair game?). I see a speedup from 46 seconds to 33 seconds when running this case -N2 vs the original case (with PCRE) unthreaded.

I see some weirdness in the shootout page's numbers. They list haskell running at 70 seconds (with Posix regex) and Python running as 25 seconds. I have a comparable system (roughly), and when I run the Python test case I get 46 seconds (nearly identical to the Haskell PCRE case). This probably means their machine has better memory bandwidth and that moving to PCRE is a big win (I believe python uses PCRE), but I'm not sure.

-- The Computer Language Benchmarks Game
-- http://shootout.alioth.debian.org/
-- Contributed by: Sergei Matusevich 2007

import Control.Parallel.Strategies
import List
import Text.Regex.PCRE -- requires regex-pcre-builtin
import qualified Data.ByteString.Char8 as B

variants = [
  "agggtaaa|tttaccct",
  "[cgt]gggtaaa|tttaccc[acg]",
  "a[act]ggtaaa|tttacc[agt]t",
  "ag[act]gtaaa|tttac[agt]ct",
  "agg[act]taaa|ttta[agt]cct",
  "aggg[acg]aaa|ttt[cgt]ccct",
  "agggt[cgt]aa|tt[acg]accct",
  "agggta[cgt]a|t[acg]taccct",
  "agggtaa[cgt]|[acg]ttaccct" ]

main = do
  file <- B.getContents
  let [s1,s2,s3] = map (B.concat . tail) $ groupBy notHeader $ B.split '\n' file
      showVars r = r ++ ' ' : show ((s2 =~ r :: Int) + (s3 =~ r :: Int))
  mapM_ putStrLn $ parMap rnf showVars  variants
  putChar '\n'
  print (B.length file)
  print (B.length s1 + B.length s2 + B.length s3)
  print (B.length s1 + B.length s3 + length (B.unpack s2 >>= substCh))
  where notHeader _ s = B.null s || B.head s /= '>'
        substCh 'B' = "(c|g|t)"
        substCh 'D' = "(a|g|t)"
        substCh 'H' = "(a|c|t)"
        substCh 'K' = "(g|t)"
        substCh 'M' = "(a|c)"
        substCh 'N' = "(a|c|g|t)"
        substCh 'R' = "(a|g)"
        substCh 'S' = "(c|g)"
        substCh 'V' = "(a|c|g)"
        substCh 'W' = "(a|t)"
        substCh 'Y' = "(c|t)"
        substCh etc = [etc]