<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 10pt;
font-family:Tahoma
}
--></style></head>
<body class='hmmessage'><div dir='ltr'>
<br>I don't see such behavior neither.<div>ubuntu 12.10, ghc 7.4.2.</div><div><br></div><div>Perhaps this has to do with how malloc allocates /cache</div><div>behavior. If you try not to allocate array rather use existing </div><div>one perhaps there would be no inconsistency?</div><div>It looks to me that's about CPU cache performance.</div><div><br></div><div>Branimir</div><div><br><div><div id="SkyDrivePlaceholder"></div>> <br>> I'm using GHC 7.4.2 on x86_64 openSUSE Linux, kernel 2.6.37.6. <br>> <br>> Janek<br>> <br>> Dnia piątek, 23 listopada 2012, Edward Z. Yang napisał:<br>> > Running the sample code on GHC 7.4.2, I don't see the "one<br>> > fast, rest slow" behavior. What version of GHC are you running?<br>> ><br>> > Edward<br>> ><br>> > Excerpts from Janek S.'s message of Fri Nov 23 13:42:03 -0500 2012:<br>> > > > What happens if you do the benchmark without unsafePerformIO involved?<br>> > ><br>> > > I removed unsafePerformIO, changed copy to have type Vector Double -> IO<br>> > > (Vector Double) and modified benchmarks like this:<br>> > ><br>> > > bench "C binding" $ whnfIO (copy signal)<br>> > ><br>> > > I see no difference - one benchmark runs fast, remaining ones run slow.<br>> > ><br>> > > Janek<br>> > ><br>> > > > Excerpts from Janek S.'s message of Fri Nov 23 10:44:15 -0500 2012:<br>> > > > > I am using Criterion library to benchmark C code called via FFI<br>> > > > > bindings and I've ran into a problem that looks like a bug.<br>> > > > ><br>> > > > > The first benchmark that uses FFI runs correctly, but subsequent<br>> > > > > benchmarks run much longer. I created demo code (about 50 lines,<br>> > > > > available at github: https://gist.github.com/4135698 ) in which C<br>> > > > > function copies a vector of doubles. I benchmark that function a<br>> > > > > couple of times. First run results in avarage time of about 17us,<br>> > > > > subsequent runs take about 45us. In my real code additional time was<br>> > > > > about 15us and it seemed to be a constant factor, not relative to<br>> > > > > "correct" run time. The surprising thing is that if my C function<br>> > > > > only allocates memory and does no copying:<br>> > > > ><br>> > > > > double* c_copy( double* inArr, int arrLen ) {<br>> > > > > double* outArr = malloc( arrLen * sizeof( double ) );<br>> > > > ><br>> > > > > return outArr;<br>> > > > > }<br>> > > > ><br>> > > > > then all is well - all runs take similar amount of time. I also<br>> > > > > noticed that sometimes in my demo code all runs take about 45us, but<br>> > > > > this does not seem to happen in my real code - first run is always<br>> > > > > shorter.<br>> > > > ><br>> > > > > Does anyone have an idea what is going on?<br>> > > > ><br>> > > > > Janek<br>> <br>> <br>> <br>> _______________________________________________<br>> Haskell-Cafe mailing list<br>> Haskell-Cafe@haskell.org<br>> http://www.haskell.org/mailman/listinfo/haskell-cafe<br></div></div>                                            </div></body>
</html>