I/O overhead in opening and writing files

Austin Seipp mad.one at gmail.com
Mon Aug 27 22:52:37 CEST 2012


In this vein, you may be interested in trying out the unix-bytestring
package (it contains ByteString based bindings for POSIX I/O - but
you'll still need the unix package to get at the underlying file
descriptor.)

http://hackage.haskell.org/packages/archive/unix-bytestring/0.3.5.4/doc/html/System-Posix-IO-ByteString.html

On Mon, Aug 27, 2012 at 3:48 PM, Johan Tibell <johan.tibell at gmail.com> wrote:
> On Mon, Aug 27, 2012 at 1:43 PM, J Baptist <arc38813 at hotmail.com> wrote:
>> I'm looking into high-performance I/O, particularly on a tmpfs (in-memory)
>> filesystem. This involves creating lots of little files. Unfortunately, it
>> seems that Haskell's performance in this area is not comparable to that of
>> C. I assume that this is because of the overhead involved in opening and
>> closing files. Some cursory profiling confirmed this: most of the runtime of
>> the program is in taken by openFile, hPutStr, and hClose.
>>
>> I thought that it might be faster to call the C library functions exposed as
>> foreign imports in System.Posix.Internals, and thereby cut out some of
>> Haskell's overhead. This indeed improved performance, but the program is
>> still nearly twice as slow as the corresponding C program.
>>
>> I took some benchmarks. I wrote a program to create 500.000 files on a tmpfs
>> filesystem, and write an integer into each of them. I did this in C, using
>> the open; and twice in Haskell, using openFile and c_open. Here are the
>> results:
>>
>> C program, using open and friends (gcc 4.4.3)
>> real    0m4.614s
>> user    0m0.380s
>> sys     0m4.200s
>>
>> Haskell, using System.IO.openFile and friends (ghc 7.4.2)
>> real    0m14.892s
>> user    0m7.700s
>> sys     0m6.890s
>>
>> Haskell, using System.Posix.Internals.c_open and friends (ghc 7.4.2)
>> real    0m7.372s
>> user    0m2.390s
>> sys     0m4.570s
>>
>> Why question is: why is this so slow? Could the culprit be the marshaling
>> necessary to pass the parameters to the foreign functions? If I'm calling
>> the low-level function c_open anyway, shouldn't performance be closer to C?
>> Does anyone have suggestions for how to improve this?
>>
>> If anyone is interested, I can provide the code I used for these benchmarks.
>
> Please do. You can paste them at http://hpaste.org/
>
> Could you try using the Data.ByteString API. I don't have the code in
> front of me so I don't know if the System.Posix API uses Strings. If
> it does, that's most likely the issue.
>
> -- Johan
>
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



-- 
Regards,
Austin



More information about the Glasgow-haskell-users mailing list