[Haskell-cafe] How to make asynchronous I/O composable and safe?

Tue Jan 17 21:20:26 CET 2012

I'd say use of asynchronous exceptions should be a last resort. Developers
should be encouraged to explicitly model any event notification system they
use.

Regards,

Dave

On Tue, Jan 17, 2012 at 1:42 AM, Simon Marlow <marlowsd at gmail.com> wrote:

> This is an interesting problem, I think I might incorporate parts of it
> into the next revision of my Concurrent Haskell tutorial.
>
> It sounds like you're getting overwhelmed by several different problems,
> and dealing with them separately would probably help.  e.g. you want some
> infrastructure to run two threads and send an exception to one whenever the
> other one dies.  You also want to be able to avoid a thread being
> interrupted while performing an operation that should be atomic, like
> sending a message - this is slightly tricky, because there's a tradeoff
> between keeping the thread responsive and not interrupting an operation.
>  The biggest hammer is maskUninterruptible, which can be used if all else
> fails.
>
> Whether Network.TLS supports simultaneous read and write I don't know, but
> you can examine the code or talk to the maintainer.  If it doesn't, adding
> a layer of locking is straightforward, and doesn't increase overall
> complexity (it's localised).
>
> Cheers,
>        Simon
>
>
> On 14/01/2012 05:24, Joey Adams wrote:
>
>> I'm not happy with asynchronous I/O in Haskell.  It's hard to reason
>> about, and doesn't compose well.  At least in my code.
>>
>> I'm currently trying to build a networking layer for my application
>> using Network.TLS.  Here is a rather minimalist API:
>>
>>    newtype Connection = Connection (TLSCtx Handle)
>>
>>    connectClient :: Handle         -- ^ Connection handle, as returned
>> by 'connectTo'
>>                  ->  X509           -- ^ TLS certificate (i.e. public key)
>>                  ->  IO Connection
>>
>>    connectServer :: Handle         -- ^ Connection handle, as returned
>> by 'accept'
>>                  ->  X509           -- ^ TLS certificate (i.e. public key)
>>                  ->  TLS.PrivateKey -- ^ TLS private key
>>                  ->  IO Connection
>>
>>    close :: Connection ->  IO ()
>>
>>    sendMessage :: Connection ->  Message ->  IO ()
>>
>>    recvMessage :: Connection ->  ByteString ->  IO (Message, ByteString)
>>
>> The module provides little more than connection initialization and
>> message serialization.  I don't try to use locks or STM to multiplex
>> the connection or, in the case of recvMessage, hide connection state.
>> I just be sure to only use sendMessage in one thread at a time, only
>> use recvMessage in one thread at a time, and marshal the "extra bytes"
>> parameter of recvMessage from call to call (with the help of StateT).
>>
>> I wrote a simple "chat server" to test it.  The client turned out okay:
>>
>>    main :: IO ()
>>    main = do
>>        cert<- getCertificate
>>        handle<- connectTo "localhost" (PortNumber 1337)
>>        conn<- connectClient handle cert
>>        _<- forkIO $ forever $ do
>>            s<- getLine
>>            sendMessage conn $ TestMessage s
>>        forever $ flip runStateT B.empty $ do
>>            msg<- StateT $ recvMessage conn
>>            case msg of
>>                TestMessage s ->
>>                    liftIO $ putStrLn s
>>                _ ->
>>                    liftIO $ hPrintf stderr
>>                        "Warning: unrecognized message from server: %s\n"
>>                        (messageTypeName msg)
>>
>> The only glaring problem is that, if the user presses Ctrl+D, the
>> forked (sending) thread dies, but the main (receiving) thread lingers.
>>  I'd have to add exception handlers to ensure that when one thread
>> dies, the other thread dies too.
>>
>> However, the server is an abomination (see attachment).
>>
>> Unfortunately, it's not as simple as "spawn one thread per client".
>> We need at least two threads, one to listen for messages from the
>> client, and another to send messages to the client.  GHC won't let us
>> simultaneously, in the same thread, wait for input from a connection
>> and wait for an STM transaction to succeed.
>>
>> Another source of complexity is: what if we throw an exception at a
>> thread while it is in the middle of sending a packet?  Then we can't
>> shut down the connection properly (i.e. Network.TLS.bye), because the
>> receiver might think the close_notify packet is part of the
>> interrupted packet.
>>
>> Having a thread for each client is good, as it:
>>
>>  * Lets us think about each client separately.  No need to turn our
>> code inside out or write one big loop that juggles all the clients.
>>
>>  * Isolates exceptions.  If sendMessage or recvMessage throws an
>> exception, it doesn't bring the whole server down.
>>
>> On the other hand, having multiple threads interact with a single
>> client is hard to think about:
>>
>>  * We have to synchronize the threads (e.g. when one dies, kill the other
>> one)
>>
>>  * Multiple places where an exception can arise
>>
>>  * Can multiple threads interact with the connection handle
>> simultaneously?
>>
>> So why don't I make my connection API handle some of this?  Well, I
>> tried.  There are so many ways to do it, and I couldn't find a way
>> that simplified usage much.  The approach used by Handle and by
>> Network.TLS is to use MVars and IORefs to ensure that, if two threads
>> access the same connection, the connection doesn't become totally
>> corrupt.  If I do the same, then I'll have *three* layers of locking
>> under the hood.
>>
>> Worse, the locking done by Handle and Network.TLS doesn't guarantee
>> much.  I don't know if it's safe to have one thread sending and
>> another thread receiving.  Especially in the case of Network.TLS,
>> where 'recvData' automatically handshakes in some cases, which sends
>> packets.  Since I don't know how much thread safety to expect, I can't
>> write networking code and know for sure that it is safe.
>>
>> I'm certainly not protected from interleaved data if multiple threads
>> send on the same handle.  For example:
>>
>>     import Control.Concurrent
>>     import System.IO
>>
>>     main :: IO ()
>>     main = do
>>         hSetBuffering stdout NoBuffering
>>         _<- forkIO $ putStrLn "One sentence."
>>         putStrLn "Another sentence."
>>
>> produces:
>>
>>     AnOonteh esre nsteenntceen.c
>>     e.
>>
>> That is, I can't rely on putStrLn being "atomic".  To produce
>> intelligible output (without changing the buffering mode), I have to
>> "lock" the output each time I write something.  putStrLn doesn't do it
>> for me.
>>
>> === Summary ===
>>
>> In Haskell, sound logic and a great type system lead to elegant,
>> composable code in a variety of domains, such as:
>>
>>  * Expression evaluation
>>  * Parsing
>>  * Concurrent programming (thanks to STM)
>>
>> Asynchronous I/O is tricky.  However, Haskell currently does little to
>> alleviate the complexity (at least for me).
>>
>> How can we structure network protocol APIs so that they stack well
>> (e.g. only lock once, rather than locking each layer's connection
>> state)?  How can we deal with I/O errors without having to think about
>> them at every turn?
>>
>> For now, how can I structure my application's communication API so
>> it's less messy to use?
>>
>> Thanks,
>> - Joey
>>
>>
>>
>> ______________________________**_________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/**mailman/listinfo/haskell-cafe<http://www.haskell.org/mailman/listinfo/haskell-cafe>
>>
>
>
> ______________________________**_________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/**mailman/listinfo/haskell-cafe<http://www.haskell.org/mailman/listinfo/haskell-cafe>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20120117/672509e2/attachment.htm>