On Wed, Feb 10, 2010 at 1:15 PM, Bardur Arantsson <span dir="ltr">&lt;<a href="mailto:spam@scientician.net">spam@scientician.net</a>&gt;</span> wrote:<br><div class="gmail_quote"><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


I&#39;ve also been contemplating some solutions, but I cannot see any solutions to this problem which could reasonably be implemented outside of GHC itself. GHC lacks a &quot;threadWaitError&quot;, so there&#39;s no way to detect the problem except by timeout or polling. Solutions involving timeouts and polling are bad in this case because they arbitrarily restrict the client connection rate.<br>


<br>

Cheers,</blockquote><div><br></div><div>I believe solutions involving polling and timeouts may be the *only* solution due to the way TCP works. There are two cases to consider here:</div><div><br></div><div> 1. what happens when the remote client does a proper disconnect by sending a FIN packet, etc</div>

<div> 2. what happens when the remote client just drops the connection</div><div><br></div><div>Case #1 - Proper Disconnect </div><div><br></div><div>I believe that in case we are ok. select() may not wakeup due to the socket being closed -- but something will eventually cause select() to wakeup, and then next time through the loop, the call to select will fail with EBADF. This will cause everyone to wakeup. We can test this case by writing a client that purposely (and correctly) terminations the connection while threadWaitWrite is blocking and see if that causes it to wakeup. To ensure that the IOManager is eventually waking up, the server can have an IO thread that just does, forever $ threadDelay (1*10^6)</div>

<div><br></div><div>Look here for more details: <a href="http://darcs.haskell.org/packages/base/GHC/Conc.lhs">http://darcs.haskell.org/packages/base/GHC/Conc.lhs</a></div><div><br></div><div>Case #2 - Sudden Death</div><div>

<br></div><div>In this case, there is no way to tell if the client is still there with out trying to send / recv data. A TCP connection is not a &#39;tangible&#39; link. It is just an agreement to send packets to/from certain ports with certain sequence numbers. It&#39;s much closer to snail mail than a telephone call. </div>

<div><br></div><div>If you set the keepalive socket option, then the TCP layer will automatically ping the connection to make sure it is still alive. However, I believe the default time between keepalive packets is 2 hours, and can only be changed on a system wide basis?</div>

<div><br></div><div><a href="http://www.unixguide.net/network/socketfaq/2.8.shtml">http://www.unixguide.net/network/socketfaq/2.8.shtml</a></div><div><br></div><div>The other option is to try to send some data. There are at least two cases that can happen here.</div>

<div><br></div><div> 1. the network cable is unplugged -- this is not an &#39;error&#39;. The write buffer will fill up and it will wait until it can send the data. If the write buffer is full, it will either block or return EAGAIN depending on the mode. Eventually, after 2 hours, it might give up.</div>

<div><br></div><div> 2. the remote client has terminated the connection as far as it is concerned but not notified the server -- when you try to send data it will reject it, and send/write/sendfile/etc will raise sigPIPE.</div>

<div><br></div><div>Looking at your debug output, we are seeing the sigPIPE / Broken Pipe error most of the time. But then there is the case where we get stuck on the threadWaitWrite.</div><div><br></div><div>threadWaitWrite is ultimately implemented by passing the file descriptor to the list of write descriptors in a call to select(). It seems, however, that select() is not waking up just because calling write() on a file descriptor *would* cause sigPIPE.</div>

<div><br></div><div>The easiest way to confirm this case is probably to write a small, pure C program and see what really happens.</div><div><br></div><div>If this is the case, then it means the only way to tell if the client has abruptly dropped the connection is to actually try sending the data and see if the sending function calls sigPIPE. And that means doing some sort of polling/timeout?</div>

<div><br></div><div>What do you think?</div><div><br></div><div>I do not have a good explanation as to why the portable version does not fail. Except maybe it is just so slow that it does not ever fill up the buffer, and hence does not get stuck in threadWaitWrite?</div>

<div><br></div><div>Any way, the fundamental question is:</div><div><br></div><div> When your write buffer is full, and you call select() on that file descriptor, will select() return in the case where calling write() again would raise sigPIPE?</div>

<div><br></div><div>- jeremy</div></div>