new i/o library

Sat Jan 28 13:40:07 EST 2006

Hello Duncan,

Saturday, January 28, 2006, 3:08:04 PM, you wrote:

>> yes, i want to save exactly this bit of performance - after i
>> optimized all other expenses on the path of text i/o

DC> There is a trade off, using mmap gives you zero-copy access to the page
DC> cache however there is a not-insignificant performance overhead in
DC> setting up and tearing down memory mappings. This is true on unix and
DC> win32. So for small writes (eg 4k blocks) it is likely to be cheaper to
DC> just use read()/write() on page aligned buffers rather than use mmap.

DC> You would need to do benchmarks on each platform to see which method is
DC> quicker. Given the code complexity that other people have mentioned I do
DC> not think it would be worth it.

i use 64k buffers and tried mmapped files last night. it's not easy to
properly implement this and then to ensure good speed. at least
windows very lazily flushes the buffers that was filled using mmap.
when i wrote 1 gb file in this mode, windows tried to swap out all
programs and itself but delayed writing of already unmapped data!

DC> Using page aligned and sized buffers can help read()/write() performance
DC> on some OSes like some of the BSDs.

i will try to cutout aligned 64k buffer inside 128k block and will
publish this code here so anyone can test it on his OS

>> in other words, i interested in having zero-wait operation both for
>> reading and writing,

DC> As I said that is not possible with either read() or mmaped read.
DC> Conversely it works automatically with write() and mmaped writes.

DC> Zero-copy and zero-wait are not the same thing.

i mean that mmap guarantee us zero-copy operation and i wish to use
mmap in such way that zero-wait operation can be ensured

DC> An important factor for optimising IO performance is using sufficiently
DC> large block sizes to avoid making frequent kernel calls. That includes
DC> read()/write() calls and mmap()/unmap() calls.

that's true and easy to implement

DC> Perhaps it is possible to move the complexity needed for the lazy
DC> hPutStr case into the hPutStr implementation rather than the Handle
DC> implementation. For example perhaps it'd be possible for the Handle to
DC> just have one buffer but to have a method for writing out an external
DC> buffer that is passed to it. Then hPutStr would allocate it's own
DC> buffer, evaluate the string, copying it into the buffer. Then it would
DC> call on the Handle to write out the buffer. The Handle would flush its
DC> existing internal buffer and write out the extra buffer.

1) "lazy hPutStr" is not some rare case. we can't distinguish strict
and lazy strings with current GHC and in any hPutStr invocation we
should assume that evaluation of its argument can lead to side
effects. that is the whole problem - we want to optimize hPutStr for
the fast work with strict strings, but need to ensure that it will
work correctly even with slow lazy strings having any side effects

2) the scheme above can be implemented using hPutBuf to write this
additional buffer. it's just less efficient (although is not so much -
memcpy works 10 times faster than traversing of [Char])

on the other side, Simon don't counted that locking itself is rather
slow and using two locks instead of one lead to some slowness of his
scheme, especially on small strings

DC> Perhaps a better solution for your single-threaded operation case is to
DC> have a handle type that is a bit specialised and does not have to deal
DC> with the general case. If we're going to get a I/O system that supports
DC> various layers and implementations then perhaps you could have an one
DC> that implements only the minimal possible I/O class. That could not use
DC> any thread locks (ie it'd not work predictably for multiple Haskell
DC> threads)

moreover - we can implement locking as special "converter" type, that
can be applied to any mutable object - stream, collection, counter.
that allows to simplify implementations and add locking only to those
Streams where we really need it. like these:

h <- openFD "test"
       >>= addUsingOfSelect
       >>= addBuffering 65536
       >>= addCharEncoding utf8
       >>= attachUserData dictionary
       >>= addLocking

DC> and use mmap on the entire file. So you wouldn't get the normal
DC> feature that a file extends at the end as it's written to, it'd need a
DC> method for determining the size at the beginning or extending it in
DC> large chunks. On the other hand it would not need to manage any buffers
DC> since reads and writes would just be reads to/from memory.

yes, i done it. but simple MapViewOfFile/UnMapViewOfFile don't work
well enough, at least on writing. windows don't hurry to flush these
buffers, even after unmap, and using of flushViewOfFile results in
synchronous flushing of buffer to the cache. so i need to try doing
flushViewOfFile in separate thread, like the GHC does for its i/o

DC> So it'd depend on what the API of the low level layers of the new I/O
DC> system are like as to whether such a simple and limited implementation
DC> would be possible.

it's no problem. m/m file just implements API that tells the user the
address/size of the next buffer to fill/read. this interface used also
for plain memory buffers and interprocess communications via shared
memory:

    -- | Receive next buffer which contains data / must be filled with data
    vReceiveBuf :: (Integral size) => h -> ReadWrite -> m (Ptr a, size)
    -- | Release buffer after reading `len` bytes / Send buffer filled with `len` bytes
    vSendBuf    :: (Integral size, Integral len) => h -> Ptr a -> size -> len -> m ()

-- 
Best regards,
 Bulat                            mailto:bulatz at HotPOP.com