<div dir="ltr">On Fri, Mar 8, 2013 at 6:36 PM, Simon Marlow <span dir="ltr"><<a href="mailto:marlowsd@gmail.com" target="_blank">marlowsd@gmail.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">1GB/s for copying a file is reasonable - it's around half the memory bandwidth, so copying the data twice would give that result (assuming no actual I/O is taking place, which is what you want because actual I/O will swamp any differences at the software level).<br>
<br>
The Handle overhead should be negligible if you're only using hGetBufSome and hPutBuf, because those functions basically just call read() and write() when the amount of data is larger than the buffer size.<br>
<br>
There's clearly something suspicious going on here, unfortunately I don't have time right now to investigate, but I'll keep an eye on the thread.<br></blockquote><div><br></div><div>Possibly disk caching/syncing issues? If some of the tests are able to either read entirely from cache (on the 1MB test), or don't completely sync after the write, they could happen much faster than others that have to actually hit the disk. For the 60MB test, it's almost guaranteed that actual IO would take place and dominate the timings.<br>
</div><div><br></div><div>John L.<br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Cheers,<br>
Simon<div class="im"><br>
<br>
On 08/03/13 08:36, Gregory Collins wrote:<br>
</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">
+Simon Marlow<br>
A couple of comments:<br>
<br></div>
* maybe we shouldn't back the file by a Handle. io-streams does this<div class="im"><br>
by default out of the box; I had a posix file interface for unix<br>
(guarded by CPP) for a while but decided to ditch it for simplicity.<br>
If your results are correct, given how slow going by Handle seems to<br>
be I may revisit this, I figured it would be "good enough".<br></div>
* io-streams turns Handle buffering off in withFileAsOutput. So the<div class="im"><br>
difference shouldn't be as a result of buffering. Simon: is this an<br>
expected result? I presume you did some Handle debugging?<br></div>
* the IO manager should not have any bearing here because file code<div class="im"><br>
doesn't actually ever use it (epoll() doesn't work for files)<br></div>
* does the difference persist when the file size gets bigger?<br>
* your file descriptor code doesn't handle EINTR properly, although<div class="im"><br>
you said you checked that the file copy is being done?<br></div>
* Copying a 1MB file in 1ms gives a throughput of ~1GB/s. The other<div class="im"><br>
methods have a more believable ~70MB/s throughput.<br>
<br>
G<br>
<br>
<br>
On Fri, Mar 8, 2013 at 7:30 AM, Michael Snoyman <<a href="mailto:michael@snoyman.com" target="_blank">michael@snoyman.com</a><br></div><div><div class="h5">
<mailto:<a href="mailto:michael@snoyman.com" target="_blank">michael@snoyman.com</a>>> wrote:<br>
<br>
Hi all,<br>
<br>
I'm turning to the community for some help understanding some<br>
benchmark results[1]. I was curious to see how the new io-streams<br>
would work with conduit, as it looks like a far saner low-level<br>
approach than Handles. In fact, the API is so simple that the entire<br>
wrapper is just a few lines of code[2].<br>
<br>
I then added in some basic file copy benchmarks, comparing<br>
conduit+Handle (with ResourceT or bracket), conduit+io-streams,<br>
straight io-streams, and lazy I/O. All approaches fell into the same<br>
ballpark, with conduit+bracket and conduit+io-streams taking a<br>
slight lead. (I haven't analyzed that enough to know if it means<br>
anything, however.)<br>
<br>
Then I decided to pull up the NoHandle code I wrote a while ago for<br>
conduit. This code was written initially for Windows only, to work<br>
around the fact that System.IO.openFile does some file locking. To<br>
avoid using Handles, I wrote a simple FFI wrapper exposing open,<br>
read, and close system calls, ported it to POSIX, and hid it behind<br>
a Cabal flag. Out of curiosity, I decided to expose it and include<br>
it in the benchmark.<br>
<br>
The results are extreme. I've confirmed multiple times that the copy<br>
algorithm is in fact copying the file, so I don't think the test<br>
itself is cheating somehow. But I don't know how to explain the<br>
massive gap. I've run this on two different systems. The results you<br>
see linked are from my local machine. On an EC2 instance, the gap<br>
was a bit smaller, but the NoHandle code was still 75% faster than<br>
the others.<br>
<br>
My initial guess is that I'm not properly tying into the IO manager,<br>
but I wanted to see if the community had any thoughts. The relevant<br>
pieces of code are [3][4][5].<br>
<br>
Michael<br>
<br>
[1] <a href="http://static.snoyman.com/streams.html" target="_blank">http://static.snoyman.com/<u></u>streams.html</a><br>
[2]<br>
<a href="https://github.com/snoyberg/conduit/blob/streams/io-streams-conduit/Data/Conduit/Streams.hs" target="_blank">https://github.com/snoyberg/<u></u>conduit/blob/streams/io-<u></u>streams-conduit/Data/Conduit/<u></u>Streams.hs</a><br>
[3]<br>
<a href="https://github.com/snoyberg/conduit/blob/streams/conduit/System/PosixFile.hsc" target="_blank">https://github.com/snoyberg/<u></u>conduit/blob/streams/conduit/<u></u>System/PosixFile.hsc</a><br>
[4]<br>
<a href="https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L54" target="_blank">https://github.com/snoyberg/<u></u>conduit/blob/streams/conduit/<u></u>Data/Conduit/Binary.hs#L54</a><br>
[5]<br>
<a href="https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L167" target="_blank">https://github.com/snoyberg/<u></u>conduit/blob/streams/conduit/<u></u>Data/Conduit/Binary.hs#L167</a><br>
<br>
______________________________<u></u>_________________<br>
Haskell-Cafe mailing list<br></div></div>
<a href="mailto:Haskell-Cafe@haskell.org" target="_blank">Haskell-Cafe@haskell.org</a> <mailto:<a href="mailto:Haskell-Cafe@haskell.org" target="_blank">Haskell-Cafe@haskell.<u></u>org</a>><br>
<a href="http://www.haskell.org/mailman/listinfo/haskell-cafe" target="_blank">http://www.haskell.org/<u></u>mailman/listinfo/haskell-cafe</a><br>
<br>
<br>
<br><span class="HOEnZb"><font color="#888888">
<br>
--<br>
Gregory Collins <<a href="mailto:greg@gregorycollins.net" target="_blank">greg@gregorycollins.net</a> <mailto:<a href="mailto:greg@gregorycollins.net" target="_blank">greg@gregorycollins.<u></u>net</a>>><br>
</font></span></blockquote><div class="HOEnZb"><div class="h5">
<br>
<br>
______________________________<u></u>_________________<br>
Haskell-Cafe mailing list<br>
<a href="mailto:Haskell-Cafe@haskell.org" target="_blank">Haskell-Cafe@haskell.org</a><br>
<a href="http://www.haskell.org/mailman/listinfo/haskell-cafe" target="_blank">http://www.haskell.org/<u></u>mailman/listinfo/haskell-cafe</a><br>
</div></div></blockquote></div><br></div></div>