On Wed, Feb 17, 2010 at 1:27 PM, Bardur Arantsson <span dir="ltr">&lt;<a href="mailto:spam@scientician.net" target="_blank">spam@scientician.net</a>&gt;</span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div><br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

(Obviously, if people are using sendfile with something other than happstack,<br>

it does not help them, but it  sounds like trying to fix things in<br>

</blockquote>

&gt; sendfile is misguided anyway.)<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

</blockquote>

<br></div>

How so? As a user I expect sendfile to work and not semi-randomly block threads indefinitely.<br>

<br></blockquote><div><br></div><div>Because it only addresses *one* case when this type of blocking can happen.</div><div><br></div><div>Shouldn&#39;t hPut and friends also block indefinitely since they also use threadWaitWrite? If so, what good is just fixing sendfile, when all other network I/O will still block indefinitely?</div>

<div><br></div><div>If things are &#39;fixed&#39; at a higher-level, by using SO_KEEPALIVE, then does sendfile really need a hack to deal with it?</div><div><br></div><div>With your proposed fix, if the user unplugs the network cable, then won&#39;t you get an polling loop that never terminates? That doesn&#39;t sound any better than the current situation..</div>

<div><br></div><div>You said that you have not seen this issue when using the code that uses hPut, only the code that uses sendfile(). But my research indicates that we *should* see the error. So, I am not very comfortable fixing just sendfile and ignoring the fact that all network I/O might be borked..</div>

<div><br></div><div>I am also not 100% pleased by the SO_KEEPALIVE solution. There are really two errors which can occur:</div><div><br></div><div>  1. the remote end drops the connection in such a manner that we immediately get notified of it by seeing that a read select() on the socket is successful but there are 0 bytes available to read. This happens because the remote end sent a notification to us that they have terminated the connection.</div>

<div><br></div><div>  2. the remote end drops off the network (for example, the network cable is disconnected). In this case, we will not get any notification via read select(), because the remote server is not there to send the notification. The only solution is to eventually timeout.</div>

<div><br></div><div>By using a timeout to handle #2, we implicitly handle #1, but in a very untimely manner.</div><div><br></div><div>Ideally, we would like to handle both these cases separately. In case #1, we know immediately, that the connection is dead, and can therefore clean things up. With case #2, the remote client might actually come back online, (someone plugs the cable back in), and the transfer resumes. Perhaps in some applications we want infinite timeouts for case #2. That does not mean we do not want case #1 handled. </div>

<div><br></div><div>However, I do not really see a good way of handle #1 right now that works for all network code, not just sendfile. </div><div><br></div><div>The issue seems to be that select() was designed as a way to *avoid* using threads. There seems to be the assumption in the network code that you are going to do a select on the read and write aspects of the socket. When the select returns you will then look at what happened, and take the correct action.</div>

<div><br></div><div>But, in Haskell, we are using multiple threads. So the code that is looking to read data and the code that is looking to write data don&#39;t really know about each other. So even if the read thread detects the closed socket, it has no idea that some other thread needs to be killed. </div>

<div><br></div><div>so, what to do? Perhaps it is wrong to use a socket in more than one thread? Obviously, having multiple threads trying read the same socket, or write to the same socket would be a mess. So why do we expect it is ok to have one thread reading and a different thread writing? But, even if we do restrict ourselves to only accessing a socket from one thread at a time, we still have the issue that every place which uses threadWaitWrite needs to handle the disconnect case. We could, of course, write a wrapper function that does the check, and call that instead. But we still have not really solved the problem. The code in the I/O libraries that eventually implements hPut calls threadWaitWrite. But it has no idea that the file descriptor it is waiting on is a socket which has special requirements. That code is also used for writing to plain old files, etc, so it probably wouldn&#39;t make sense for it to behave that way by default..</div>

<div><br></div><div>- jeremy</div><div><br></div>

</div>