Benchmarks Game/Parallel/RegexDNA
From HaskellWiki
m |
(add diff) |
||
| Line 57: | Line 57: | ||
substCh 'Y' = "(c|t)" | substCh 'Y' = "(c|t)" | ||
substCh etc = [etc] | substCh etc = [etc] | ||
| + | </haskell> | ||
| + | |||
| + | The changes should be directly applicable to the non-PCRE version as well. Just change the import to use the Posix library. Here are the parallelizing changes: | ||
| + | |||
| + | <haskell> | ||
| + | --- regexdna2.hs 2008-09-22 07:56:49.000000000 -1000 | ||
| + | +++ regexdna3.hs 2008-09-21 07:50:32.000000000 -1000 | ||
| + | @@ -2,6 +2,7 @@ | ||
| + | -- http://shootout.alioth.debian.org/ | ||
| + | -- Contributed by: Sergei Matusevich 2007 | ||
| + | |||
| + | +import Control.Parallel.Strategies | ||
| + | import List | ||
| + | import Text.Regex.PCRE -- requires regex-pcre-builtin | ||
| + | import qualified Data.ByteString.Char8 as B | ||
| + | @@ -20,7 +21,8 @@ | ||
| + | main = do | ||
| + | file <- B.getContents | ||
| + | let [s1,s2,s3] = map (B.concat . tail) $ groupBy notHeader $ B.split '\n' file | ||
| + | - mapM_ (printVars s2 s3) variants | ||
| + | + showVars r = r ++ ' ' : show ((s2 =~ r :: Int) + (s3 =~ r :: Int)) | ||
| + | + mapM_ putStrLn $ parMap rnf showVars variants | ||
| + | putChar '\n' | ||
| + | print (B.length file) | ||
| + | print (B.length s1 + B.length s2 + B.length s3) | ||
| + | @@ -38,8 +40,4 @@ | ||
| + | substCh 'W' = "(a|t)" | ||
| + | substCh 'Y' = "(c|t)" | ||
| + | substCh etc = [etc] | ||
| + | - printVars s2 s3 r = do | ||
| + | - putStr r | ||
| + | - putChar ' ' | ||
| + | - print ((s2 =~ r :: Int) + (s3 =~ r :: Int)) | ||
| + | |||
</haskell> | </haskell> | ||
Revision as of 17:58, 22 September 2008
Regex-DNA
Old submission is:
Running:
- ghc --make -O2 -fglasgow-exts -package regex-posix -optc-O3 -threaded regexdna3.hs
- ./regexdna3 +RTS -N2 < big.txt
Code:
This is almost identical to the old code, with a parallel map used to perform the first phase of the benchmark (counting matches of each variant). I had trouble compiling the original with Text.Regex.Posix so I used Text.Regex.PCRE (which is probably faster, are hackage packages fair game?). I see a speedup from 46 seconds to 33 seconds when running this case -N2 vs the original case (with PCRE) unthreaded.
I see some weirdness in the shootout page's numbers. They list haskell running at 70 seconds (with Posix regex) and Python running as 25 seconds. I have a comparable system (roughly), and when I run the Python test case I get 46 seconds (nearly identical to the Haskell PCRE case). This probably means their machine has better memory bandwidth and that moving to PCRE is a big win (I believe python uses PCRE), but I'm not sure.
-- The Computer Language Benchmarks Game -- http://shootout.alioth.debian.org/ -- Contributed by: Sergei Matusevich 2007 import Control.Parallel.Strategies import List import Text.Regex.PCRE -- requires regex-pcre-builtin import qualified Data.ByteString.Char8 as B variants = [ "agggtaaa|tttaccct", "[cgt]gggtaaa|tttaccc[acg]", "a[act]ggtaaa|tttacc[agt]t", "ag[act]gtaaa|tttac[agt]ct", "agg[act]taaa|ttta[agt]cct", "aggg[acg]aaa|ttt[cgt]ccct", "agggt[cgt]aa|tt[acg]accct", "agggta[cgt]a|t[acg]taccct", "agggtaa[cgt]|[acg]ttaccct" ] main = do file <- B.getContents let [s1,s2,s3] = map (B.concat . tail) $ groupBy notHeader $ B.split '\n' file showVars r = r ++ ' ' : show ((s2 =~ r :: Int) + (s3 =~ r :: Int)) mapM_ putStrLn $ parMap rnf showVars variants putChar '\n' print (B.length file) print (B.length s1 + B.length s2 + B.length s3) print (B.length s1 + B.length s3 + length (B.unpack s2 >>= substCh)) where notHeader _ s = B.null s || B.head s /= '>' substCh 'B' = "(c|g|t)" substCh 'D' = "(a|g|t)" substCh 'H' = "(a|c|t)" substCh 'K' = "(g|t)" substCh 'M' = "(a|c)" substCh 'N' = "(a|c|g|t)" substCh 'R' = "(a|g)" substCh 'S' = "(c|g)" substCh 'V' = "(a|c|g)" substCh 'W' = "(a|t)" substCh 'Y' = "(c|t)" substCh etc = [etc]
The changes should be directly applicable to the non-PCRE version as well. Just change the import to use the Posix library. Here are the parallelizing changes:
--- regexdna2.hs 2008-09-22 07:56:49.000000000 -1000 +++ regexdna3.hs 2008-09-21 07:50:32.000000000 -1000 @@ -2,6 +2,7 @@ -- http://shootout.alioth.debian.org/ -- Contributed by: Sergei Matusevich 2007 +import Control.Parallel.Strategies import List import Text.Regex.PCRE -- requires regex-pcre-builtin import qualified Data.ByteString.Char8 as B @@ -20,7 +21,8 @@ main = do file <- B.getContents let [s1,s2,s3] = map (B.concat . tail) $ groupBy notHeader $ B.split '\n' file - mapM_ (printVars s2 s3) variants + showVars r = r ++ ' ' : show ((s2 =~ r :: Int) + (s3 =~ r :: Int)) + mapM_ putStrLn $ parMap rnf showVars variants putChar '\n' print (B.length file) print (B.length s1 + B.length s2 + B.length s3) @@ -38,8 +40,4 @@ substCh 'W' = "(a|t)" substCh 'Y' = "(c|t)" substCh etc = [etc] - printVars s2 s3 r = do - putStr r - putChar ' ' - print ((s2 =~ r :: Int) + (s3 =~ r :: Int))
