Was this with tossing the partial sums code into the optimised bangs program? Weird. I wonder if profiling will help explain why? In any case, If nobody comes up with any other tweaks, I'll probably submit the optimised bangs version to the shootout this weekend.
<br><br>--S<br><br><div class="gmail_quote">On Nov 30, 2007 1:30 PM, Richard Kelsall <<a href="mailto:r.kelsall@millstream.com">r.kelsall@millstream.com</a>> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div class="Ih2E3d">Sterling Clover wrote:<br>> I'm still curious if the pre-calculation of partial sums that I did<br>> works well across processors, as I don't see why it shouldn't. My<br>> less-strictified version of Don's code is attached, and below are the
<br>> functions you'll need to insert/replace to make the partial-sums<br>> optimization work.<br><br></div>Hello Sterling, I've timed your new Fasta with optimised bangs - it's<br>the fastest so far. But the pre-calculated partial-sums version seems
<br>to go a bit slower for some unknown reason.<br><br> Seconds<br>Optimised bangs program 11.20 compiled ghc --make<br>Optimised bangs program 10.73 compiled with -O -fglasgow-exts<br>
<div class="Ih2E3d"> -optc-mfpmath=sse -optc-msse2<br> -optc-march=pentium4<br></div>Partial-sums program 11.97 compiled ghc --make<br>
Partial-sums program 11.14 compiled with -O -fglasgow-exts<br><div class="Ih2E3d"> -optc-mfpmath=sse -optc-msse2<br> -optc-march=pentium4
<br><br></div>This is on my GHC 6.6.1, W2K, Intel Core 2 Duo 2.33GHz machine - same<br>as for the previous timings I gave in this thread.<br><font color="#888888"><br><br>Richard.<br><br></font></blockquote></div><br>