[Haskell-cafe] Re: sendfile leaking descriptors on Linux?

Thu Feb 11 14:57:28 EST 2010

Jeremy Shaw wrote:
> On Wed, Feb 10, 2010 at 1:15 PM, Bardur Arantsson <spam at scientician.net>wrote:
> 
> I've also been contemplating some solutions, but I cannot see any solutions
>> to this problem which could reasonably be implemented outside of GHC itself.
>> GHC lacks a "threadWaitError", so there's no way to detect the problem
>> except by timeout or polling. Solutions involving timeouts and polling are
>> bad in this case because they arbitrarily restrict the client connection
>> rate.
>>
>> Cheers,
> 
> 
> I believe solutions involving polling and timeouts may be the *only*
> solution due to the way TCP works. There are two cases to consider here:
> 

True, but my point was rather that a solution in the sendfile libary 
would incur an _extra_ timeout on top of the timeout which is handled by 
the OS. It's very hard to come up with a "proper" timeout here because 
apps will have different requirements depending on the expected 
connection rate, etc. This is what I see as unacceptable since it would 
have to be a completely arbitrary timeout -- there's no way for the 
application to specify a timeout to the sendfile library since the API 
doesn't permit it.

[--snip--]
> Case #1 - Proper Disconnect
> 
> I believe that in case we are ok. select() may not wakeup due to the socket
> being closed -- but something will eventually cause select() to wakeup, and
> then next time through the loop, the call to select will fail with EBADF.
> This will cause everyone to wakeup. We can test this case by writing a
> client that purposely (and correctly) terminations the connection while
> threadWaitWrite is blocking and see if that causes it to wakeup. To ensure
> that the IOManager is eventually waking up, the server can have an IO thread
> that just does, forever $ threadDelay (1*10^6)
> 
> Look here for more details:
> http://darcs.haskell.org/packages/base/GHC/Conc.lhs
> 

I don't have time to write a C test program right now. I'm actually not 
100% convinced that this case is *not* problematic, but my limited 
testing with "well-behaved" clients (wget, curl) hasn't turned up any 
problems so far.

> Case #2 - Sudden Death
> 
> In this case, there is no way to tell if the client is still there with out
> trying to send / recv data. A TCP connection is not a 'tangible' link. It is
> just an agreement to send packets to/from certain ports with certain
> sequence numbers. It's much closer to snail mail than a telephone call.
> 
> If you set the keepalive socket option, then the TCP layer will
> automatically ping the connection to make sure it is still alive. However, I
> believe the default time between keepalive packets is 2 hours, and can only
> be changed on a system wide basis?
> 
> http://www.unixguide.net/network/socketfaq/2.8.shtml

There are some options you can set via setsockopt(), see man 7 tcp:

    tcp_keepalive_intvl    (default: 75s)
    tcp_fin_timeout        (default: 60s)

(The latter is the amount of time to wait for the final FIN before 
forcing a the socket to close.)

These can be set per-socket.

> 
> The other option is to try to send some data. There are at least two cases
> that can happen here.

This is what I tried. The trouble here is that you have to force the 
thread doing threadWaitWrite to wake up periodically... and how do you 
decide how often? Too often and you're burning CPU doing nothing, too 
seldom and you're letting threads (and by implication 
used-but-really-disconnected-as-far-as-the-OS-is-concerned file 
descriptors) pile up. The overhead of mempcy (avoidance of which is 
sendfile's raison-d'être) is probably much less than the overhead of 
doing all this administration in userspace instead of just letting the 
kernel do its thing.

Even waking up very seldom (~1/s IIRC) incurred a lot of CPU overhead in 
my test case... but I suppose I could give it another try to see if I'd 
made some mistake in my code which caused it to use more CPU than necessary.

> 
>  1. the network cable is unplugged -- this is not an 'error'. The write
> buffer will fill up and it will wait until it can send the data. If the
> write buffer is full, it will either block or return EAGAIN depending on the
> mode. Eventually, after 2 hours, it might give up.

I believe the socket is actually in non-blocking mode in my application. 
  I'm not putting it into non-blocking mode, so I'm guessing that the 
"accept" call is doing that -- or maybe it's just the default behavior 
of accept() on Linux. Converting a socket to a Handle (which is what the 
portable sendfile does) automatically puts it into blocking mode.

Actually, I think this whole issue could be avoided if the socket could 
just be forced into blocking mode. In that case, there would be no need 
to call threadWaitWrite: The native sendfile() call could never return 
EAGAIN (it would block instead), and so there'd be no need to call 
threadWaitWrite to avoid busy-waiting.

>  2. the remote client has terminated the connection as far as it is
> concerned but not notified the server -- when you try to send data it will
> reject it, and send/write/sendfile/etc will raise sigPIPE.
> 
> Looking at your debug output, we are seeing the sigPIPE / Broken Pipe error
> most of the time. But then there is the case where we get stuck on the
> threadWaitWrite.
> 
> threadWaitWrite is ultimately implemented by passing the file descriptor to
> the list of write descriptors in a call to select(). It seems, however, that
> select() is not waking up just because calling write() on a file descriptor
> *would* cause sigPIPE.

That's what I expect select() with an "errfd" FDSET would do.

> 
> The easiest way to confirm this case is probably to write a small, pure C
> program and see what really happens.
> 
> If this is the case, then it means the only way to tell if the client has
> abruptly dropped the connection is to actually try sending the data and see
> if the sending function calls sigPIPE. And that means doing some sort of
> polling/timeout?

Correct, but the trouble is deciding how often to poll and/or how long 
the timeout should be.

I don't see any easy answer to that. That's why my suggested "solution" 
is to simply punt it to the OS (by using portable mode) and suck up the 
extra overhead of the portable solution. Hopefully the new GHC I/O 
manager will make it possible to have a proper solution.

> 
> I do not have a good explanation as to why the portable version does not
> fail. Except maybe it is just so slow that it does not ever fill up the
> buffer, and hence does not get stuck in threadWaitWrite?

The portable version doesn't call threadWaitWrite. It simply turns the 
Socket into a handle (which causes it to become blocking)  and so the 
kernel is tasked with handling all the gritty details.

> 
> Any way, the fundamental question is:
> 
>  When your write buffer is full, and you call select() on that file
> descriptor, will select() return in the case where calling write() again
> would raise sigPIPE?
> 

I believe so, *if* you give it the FD in the exceptfds FD_SET parameter. 
Let's face it, any other behavior doesn't make any sense since it's the 
equivalent of forcing all timeout handling onto the user, just like 
threadWaitWrite currently does. I've written my fair share of networking 
code in various languages (including C/C++) and I've never seen this 
problem of "missing wakeups" before.

Cheers,