[Haskell-cafe] Re: Hugs vs GHC (again)was: Re: Somerandomnewbiequestions

Wed Jan 19 16:56:03 EST 2005

Glynn Clements <glynn at gclements.plus.com> writes:

> They're similar, but not identical. Traditionally, Unix non-blocking
> I/O (along with asynchronous I/O, select() and poll()) were designed
> for "slow" streams such as pipes, terminals, sockets etc. Regular
> files and block devices are assumed to return the data "immediately".

Indeed. Reading from a slow block device is also not interruptible by
a signal; a signal usually causes reading from a pipe/socket/terminal
to fail with EINTR.

There is no non-blocking interface to various functions like readdir,
mkdir, stat etc.

OTOH close() is interruptible.

It seems that the only way to parallelize them is to use a separate
OS thread.

gethostbyname, gethostbyaddr, getservbyname and getservbyport are
mostly superseded by getaddrinfo and getnameinfo. They are all
blocking and non-interruptible by signals (they restart their loops
on receiving EINTR from low-level calls).

Glibc provides getaddrinfo_a which is non-blocking (implemented using
pthreads). Contrary to documentation it's not interruptible by a
signal (its implementation expects pthread_cond_wait to fail with
EINTR which is not possible) and it's not cancellable in a useful way
(the interface allows for cancellation, which may nevertheless answer
that it cannot be cancelled, and the glibc implementation is able to
cancel a request only if it hasn't yet started being processed by the
thread pool). There is no non-blocking counterpart of getnameinfo.

Since asynchronous name resolution is quite important, implementation
of my language uses pthreads and getaddrinfo / getnameinfo, if
pthreads are available. For simplicity I just make one thread per
request.

A tricky API to parallelize is waitpid. Pthreads are supposed to be
able to wait for child processes started by any thread, but according
to man pages this was broken in Linux before version 2.4. Fortunately
it's easy to avoid blocking other threads indefinitely without OS
threads if we agree to waste CPU time (not CPU cycles), such that a
thread waiting for a process takes as much time as if it was doing
some useful work. Because waitpid *is* interruptible by signals. So it
will either finish, or the timer signal will interrupt it and control
can be passed to other threads.

Leaving the timer signal interrupting syscalls can break libraries
which don't expect EINTR. For example the Python runtime doesn't
handle EINTR specially and it is translated to a Python exception.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/