[Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

Fri Mar 8 07:59:57 CET 2013

I would have expected sourceFileNoHandle to make the most difference, since
that's one location (write) where you've obviously removed a copy. Does
sourceFileNoHandle allocate less?

Incidentally, I've recently been making similar changes to IO code
(removing buffer copies) and getting similar speedups.  Although the
results tend to be less pronounced in code that isn't strictly IO-bound.

On Fri, Mar 8, 2013 at 2:50 PM, Michael Snoyman <michael at snoyman.com> wrote:

> One clarification: it seems that sourceFile and sourceFileNoHandle have
> virtually no difference in speed. The gap comes exclusively from sinkFile
> vs sinkFileNoHandle. This makes me think that it might be a buffer copy
> that's causing the slowdown, in which case the benchmark may in fact be
> accurate.
> On Mar 8, 2013 8:30 AM, "Michael Snoyman" <michael at snoyman.com> wrote:
>
>> Hi all,
>>
>> I'm turning to the community for some help understanding some benchmark
>> results[1]. I was curious to see how the new io-streams would work with
>> conduit, as it looks like a far saner low-level approach than Handles. In
>> fact, the API is so simple that the entire wrapper is just a few lines of
>> code[2].
>>
>> I then added in some basic file copy benchmarks, comparing conduit+Handle
>> (with ResourceT or bracket), conduit+io-streams, straight io-streams, and
>> lazy I/O. All approaches fell into the same ballpark, with conduit+bracket
>> and conduit+io-streams taking a slight lead. (I haven't analyzed that
>> enough to know if it means anything, however.)
>>
>> Then I decided to pull up the NoHandle code I wrote a while ago for
>> conduit. This code was written initially for Windows only, to work around
>> the fact that System.IO.openFile does some file locking. To avoid using
>> Handles, I wrote a simple FFI wrapper exposing open, read, and close system
>> calls, ported it to POSIX, and hid it behind a Cabal flag. Out of
>> curiosity, I decided to expose it and include it in the benchmark.
>>
>> The results are extreme. I've confirmed multiple times that the copy
>> algorithm is in fact copying the file, so I don't think the test itself is
>> cheating somehow. But I don't know how to explain the massive gap. I've run
>> this on two different systems. The results you see linked are from my local
>> machine. On an EC2 instance, the gap was a bit smaller, but the NoHandle
>> code was still 75% faster than the others.
>>
>> My initial guess is that I'm not properly tying into the IO manager, but
>> I wanted to see if the community had any thoughts. The relevant pieces of
>> code are [3][4][5].
>>
>> Michael
>>
>> [1] http://static.snoyman.com/streams.html
>> [2]
>> https://github.com/snoyberg/conduit/blob/streams/io-streams-conduit/Data/Conduit/Streams.hs
>> [3]
>> https://github.com/snoyberg/conduit/blob/streams/conduit/System/PosixFile.hsc
>> [4]
>> https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L54
>> [5]
>> https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L167
>>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130308/484be9de/attachment.htm>