[Haskell-cafe] Re: sendfile leaking descriptors on Linux?

Jeremy Shaw jeremy at n-heptane.com
Sun Feb 21 16:45:27 EST 2010


On Feb 21, 2010, at 11:50 AM, Donn Cave wrote:
>
> The problem is that this definition of `closed' is, precisely,
> `failed to respond within 2 seconds.'  If there is no observable
> difference between a connection that has been abandoned by the PS3,
> and a connection that just suffered a momentary lapse, then there's
> no way to catch the former without making connections more fragile.

No. (i think)

What happens is the PS3 has closed the connection, and if you attempt  
to send any more packets the PS3 will tell you it has closed the  
connection and the write() / sendfile() call will raise SIGPIPE.

The problem is we never try to send those packets, because we are  
sitting at threadWaitWrite waiting to write -- and there is nothing  
that is going to happen that will cause that call to select () (by  
threadWaitWrite) to actually wakeup.

I believe the proposal is to add a 2 second time out to the  
threadWaitWrite call. If it wakes up and can't write (because the  
remote side has lost connections, etc) then it will just go back to  
sleep. But if it wakes up, tries to write, and then gets sigPIPE, then  
it knows the connection is actually dead and will clean up after itself.

The problem is that we have not successfully figure out what is  
causing this issue in the first place.

I wrote a haskell server and a C client to try to emulate the  
situation which causes threadWaitWrite to never wake-up.. but I could  
not actually get that to happen. So for the PS3 client is the only  
thing that causes it.

I think that applying a fix with out really understanding the problem  
is asking for trouble.

Among other things, since the problem is with threadWaitWrite (not  
sendfile), then the same issue ought to exist when we are calling  
hPutStr, etc, since they ultimately call threadWaitWrite as well. If  
hPut never has this problem, then we should understand why and use the  
same solution for sendfile. If hPut does have this problem, then  
fixing just sendfile isn't much of a solution.

So far there is:

  - no way for anyone besides Bardur to reproduce the problem
  - no sound explanation for why the PS3 client causes the error, but  
nothing else does
  - no proof that this error does or does not affect all the normal I/ 
O functions in Haskell (hPut, etc).

- jeremy 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100221/eb401da4/attachment.html


More information about the Haskell-Cafe mailing list