I/O manager: relying solely upon kqueue is not a safe way to go

Andreas Voellmy andreas.voellmy at gmail.com
Sat Mar 16 16:08:40 CET 2013


On Fri, Mar 15, 2013 at 3:54 PM, PHO <pho at cielonegro.org> wrote:

> I found the HEAD stopped working on MacOS X 10.5.8 since the parallel
> I/O manager got merged to HEAD. Stage-2 compiler successfully builds
> (including Language.Haskell.TH.Syntax contrary to the report by Kazu
> Yamamoto) but the resulting binary is very unstable especially for
> ghci:
>
>   % inplace/bin/ghc-stage2  --interactive
>   GHCi, version 7.7.20130313: http://www.haskell.org/ghc/  :? for help
>   Loading package ghc-prim ... linking ... done.
>   Loading package integer-gmp ... linking ... done.
>   Loading package base ... linking ... done.
>   Prelude>
>   <stdin>: hGetChar: failed (Operation not supported)
>
> So I took a dtruss log and found it was kevent(2) that returned
> ENOTSUP. GHC.Event.KQueue was just registering the stdin for
> EVFILT_READ, whose type was of course tty, and then kevent(2) said
> "tty is not supported". Didn't the old I/O manager do the same thing?
> Why was it working then?
>
> After a hard investigation, I concluded that the old I/O manager was
> not really working. It just looked fine but in fact wasn't. Here's an
> explanation: If a fd to be registered is unsupported by kqueue,
> kevent(2) returns -1 iff no incoming event buffer is passed
> together. Otherwise it successfully returns with an incoming kevent
> whose "flags" is EV_ERROR and "data" contains an errno. The I/O
> manager has always been passing a non-empty event buffer until the
> commit e5f5cfcd, while it wasn't (and still isn't) checking if a
> received event in fact represents an error. That is, the KQueue
> backend asks the kernel to monitor the stdin's readability. The kernel
> then immediately delivers an event saying ENOTSUP. The KQueue backend
> thinks "Hey, the stdin is now readable!" so it invokes a callback
> associated with the fd. The thread which called "threadWaitRead" is
> now awakened and performs a supposedly non-blocking read on the fd,
> which in fact blocks but works anyway.
>
> However the situation has changed since the commit e5f5cfcd. The I/O
> manager now registers fds without passing an incoming event buffer, so
> kevent(2) no longer successfully delivers an error event instead it
> directly returns -1 with errno set to ENOTSUP, hence the "Operation
> not supported" exception.
>

One thing we can easily do is have the new IO manager pass in an incoming
event buffer so we can distinguish this case and treat it exactly as the
old IO manager did. Then this exception would not occur and the waiting
thread would just continue to retry the read until it succeeded. This is
inefficient, but is no worse than the old IO manager.

Note that there is nothing about the IO manager that would cause the
awakened thread to make a blocking read call - that is determined entirely
by how the thread performs the read.  For example, if you take a look at
the code in the network package, you will see that whenever a socket is
created, the socket is put in non-blocking mode. Then the code to receive
from a socket does a recv() which is now non-blocking and calls
threadWaitRead if that would block.

Going beyond this immediate fix, we can try to really tackle the problem.
The simplest and arguably safest approach is probably to just use select
for everything (on os x). That would have the downside of limiting the
number of files that programs can wait on to 1024 per capability.

A better approach would be to try to register with kqueue and then if it
doesn't work, register it with an IO manager thread that is using select
for the backend. We can probably reuse the IO manager thread that is
watching timers for this purpose. With the parallel IO manager, we no
longer use it to wait on files, but we certainly could do that. That would
save us from adding more threads.  By only failing over to the
manager-thread-using-select-backend if kqueue fails, we don't need to
maintain a list of files types that kqueue works for, which might be a pain
to maintain reliably.

-Andi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130316/23402894/attachment.htm>


More information about the ghc-devs mailing list