[Haskell-cafe] Re: sendfile leaking descriptors on Linux?

Wed Feb 17 23:44:02 EST 2010

On Wed, Feb 17, 2010 at 3:54 PM, Jeremy Shaw <jeremy at n-heptane.com> wrote:

> On Wed, Feb 17, 2010 at 1:27 PM, Bardur Arantsson <spam at scientician.net>wrote:
>
>>
>>  (Obviously, if people are using sendfile with something other than
>>> happstack,
>>> it does not help them, but it  sounds like trying to fix things in
>>>
>> > sendfile is misguided anyway.)
>>
>>>
>>>
>> How so? As a user I expect sendfile to work and not semi-randomly block
>> threads indefinitely.
>>
>>
> Because it only addresses *one* case when this type of blocking can happen.
>
> Shouldn't hPut and friends also block indefinitely since they also use
> threadWaitWrite? If so, what good is just fixing sendfile, when all other
> network I/O will still block indefinitely?
>
> If things are 'fixed' at a higher-level, by using SO_KEEPALIVE, then does
> sendfile really need a hack to deal with it?
>
>
I think I understand the SO_KEEPALIVE + SO_ERROR solution, and that does not
really fix things either.

Setting SO_KEEPALIVE by itself does not cause the write select() to behave
any differently. What it does do is cause the TCP stack to eventually send
and empty packet to the remote host and hopefully get a response back. The
response might be an error, or it might just be an ACK. But either way, I
believe it is intended to cause the read select() to wakeup. But, in the
case that started this discussion, we are already getting this information.
So this won't help with that at all.

The second part of the solution is to poll SO_ERROR to determine if
something went wrong. This is an alternative to doing a read() on the socket
and see if it returns 0 bytes. It is a nice alternative *because* it does
not require a read(). However, it is still problematic. When you poll
SO_ERROR, it will clear the error value, so there is a potential race
condition if multiple threads are doing it.

In happstack, we fork a new thread to handle each incoming connection. So at
first it seems like we could just fork a second thread that polls the
SO_ERROR option on the socket and kills the first thread if an error
happens. Unfortunately, it is not that simple. The first thread might fork
another thread that is actually doing the threadWaitWrite. Killing the
parent thread will not kill that child thread.

So, at present, I don't see a solution that is going to fix the problem in
the rest of the IO code. There are multiple ways to hack only sendfile.. but
that is only one place this error can happen.

If this error truly never happens with hPut, then we should figure out why.
If there is a solution that works for write() it should work for sendfile(),
because the real issue is with the select() call anyway..

- jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100217/03887602/attachment.html