new i/o library

Sat Jan 28 11:34:05 EST 2006

Hello Simon,

Friday, January 27, 2006, 7:25:44 PM, you wrote:

>> i'm now write some sort of new i/o library. one area where i currently
>> lacks in comparision to the existing Handles implementation in GHC, is
>> the asynchronous i/o operations. can you please briefly describe how
>> this is done in GHC and partially - why the multiple buffers are used?

SM> Multiple buffers were introduced to cope with the semantics we wanted 
SM> for hPutStr.

thank you. i was read hPutStr comments, but don't understood that this
problem is the only cause of introducing multiple buffers

SM> The problem is that you don't want hPutStr to hold a lock 
SM> on the Handle while it evaluates its argument list, because that could 
SM> take arbitrary time.  Furthermore, things like this:

SM>    putStr (trace "foo" "bar")

SM> used to cause deadlocks, because putStr holds the lock, evaluates its 
SM> argument list, which causes trace to also attempt to acquire the lock on 
SM> stdout, leading to deadlock.

SM> So, putStr first grabs a buffer from the Handle, then unlocks the Handle 
SM> while it fills up the buffer, then it takes the lock again to write the 
SM> buffer.  Since another thread might try to putStr while the lock is 
SM> released, we need multiple buffers.

i don't understand the last sentence. you are said about problems with
performing I/O inside computation of putStr argument, not about
another thread?

i understand that locks basically needed because multiple threads can
try to do i/o with the same Handle simultaneously

SM> For async IO on Unix, we use non-blocking read() calls, and if read() 
SM> indicates that we need to block, we send a request to the IO Manager 
SM> thread (see GHC.Conc) which calls select() on behalf of all the threads 
SM> waiting for I/O.  For async IO on Windows, we either use the threaded 
SM> RTS's blocking foreign call mechanism to invoke read(), or the 
SM> non-threaded RTS has a similar mechanism internally.

so, async I/O in GHC is have nothing common with "zero-wait
operation" in single-threaded environment and can only help to overlap
i/o in one thread with execution of other threads?

SM> We ought to be using the various alternatives to select(), but we 
SM> haven't got around to that yet.

yes, i read these threads and even remember Trac ticket about this.
btw, in the typeclasses-based i/o library this facility can be added
as additional middle layer, in the same way as buffering and Char
encoding. i even think that it can be done as 3-party library, w/o any
changes to the main library itself

>> moreover, i have an idea how to implement async i/o without complex
>> burecreacy: use mmapped files, may be together with miltiple buffers.

SM> I don't think we should restrict the implementation to mmap'd files, for 
SM> all the reasons that Einar gave.  Lots of things aren't mmapable, mainly.

i'm interested because mmap can be used to speed up i/o-bound
programs. but it seems that m/m files can't be used to overlap i/o in
multi-threaded applications. anyway, i use class-based design so at
least we can provide m/m files as one of Stream instances

SM> My vision for an I/O library is this:

SM>    - a single class supporting binary input (resp. output) that is
SM>      implemented by various transports: files, sockets, mmap'd files,
SM>      memory and arrays.  Windowed mmap is an option here too.

i don't consider fully-mapped files as an separate instance, because
they can be simulated by using window-mapped files with large window

SM>    - layers of binary filters on top of this: you could add buffering,
SM>      and compression/decompression.

SM>    - a layer of text translation at the top.

SM> This is more or less how the Stream-based I/O library that I was working 
SM> on is structured.

SM> The binary I/O library would talk to a binary transport, perhaps with a 
SM> layer of buffering, whereas text-based applications talk to the text layer.

that's more or less close to what i do. it is no wonder - i was
substantially influenced by the design of your "new i/o" library. the
only difference is that i use one Stream class for any streams

-- 
Best regards,
 Bulat                            mailto:bulatz at HotPOP.com