[Haskell-cafe] Re: Bound threads

Wed Mar 2 16:04:15 EST 2005

"Simon Marlow" <simonmar at microsoft.com> writes:

>> I've now implemented a threaded runtime in my language Kogut, based
>> on the design of Haskell. The main thread is bound. The thread which
>> holds the capability performs I/O multiplexing itself, without a
>> separate service thread.
>
> We found that doing this was excessively complex (well, I thought so
> anyway).

Indeed, my brain is melting, but I did it :-)

I think our approaches are incomparable in terms of additional
overhead, it depends on the program. I have added some optimizations:

If a thread which wants to perform a safe C call sees that there are
no other threads running, waiting for I/O, or waiting for timeout,
and that we are the thread which handles Unix signals, then it doesn't
notify or start another thread to enter the scheduler. When a C call
returns, it doesn't have to wake up the scheduler in this case.

Even if other threads are running, if there is currently no scheduler
doing epoll/poll/select, then a returning C call doesn't wake up the
scheduler. It only links itself to a list which will be examined by
the scheduler.

* * *

There are interesting complications with fork. POSIX only provides
fork which causes other pthreads in the child process to evaporate.
This is exactly what is wanted if the fork is soon followed by exec,
but can be disastrous if the program tries to use other threads in
the meantime.

Depending on the system pthread_join on a thread which has existed
before the fork either says that it has returned, or hangs, or fails
with ESRCH or EINVAL. And there is no way to fork while keeping other
threads running (there has been some proposal for forkall but it has
been rejected).

This means that a fork in an unfortunate state, e.g. while some
thread was holding a mutex, will left the mutex permanently locked;
pthread_atfork is supposed to protect against that. It also means that
if our language tries to continue running its threads after the fork,
then there is no way to do this if they are bound to other OS threads.
And the worker pool is useless, it should better be emptied before the
fork to reduce resource leak.

There is no semantics of fork wrt. threads which would be correct in
all cases.

Shortly before implementing bound threads I've designed and
implemented a semantics for three variants of fork, which were easy
when I have full control over what happens with my threads in the
child process (well, the third was a challenge to implement):

- ForkProcessCloneThreads - easiest to describe, but the least useful.
  Threads continue to run in both processes.

- ForkProcessKillThreads - other threads are atomically killed in the
  child process, similarly to raw POSIX. This is used before exec.
  If the program attempts to wait for the threads, the behavior is
  defined: they look as if they failed with ThreadKilled exception,
  even though they were killed without a chance to recover (this is
  a different exception than the one used for cancellation which
  signifies that it could recover).

- ForkProcess - the safest default: all threads are sent "signals" (in
  the sense of asynchronous communication in my language) which cause
  them to be suspended when they have signal handling unblocked (this
  roughly corresponds to Haskell's blocking of asynchronous exceptions,
  but e.g. a thread holding a mutex has signals blocked by default).
  This includes chasing newly created threads. When all threads are
  suspended, we do ForkProcessCloneThreads. Then in the parent process
  threads are resumed, and in the child they are cancelled in a polite
  way so they can release resources.

Bound threads introduced problems. They can partially be solved,
e.g. the worker pool, the wakeup pipe, epoll descriptor are correctly
recreated. But there is simply no way to return from callbacks because
the corresponding C contexts no longer exist. So I made them as
follows:

All threads except the thread performing the fork become unbound.
They have a chance to handle the thread cancellation exception until
they return from their innermost callbacks. At this time they become
killed. If ForkProcessAllThreads is done while some threads were
executing non-blocking foreign code, they become killed as well.

Besides this, there are "at fork" handlers, similar to pthread_atfork
but scoped over the forking action.

* * *

I measured the speed of some syscalls on my system, to see what is
worth optimizing:

- pthread_mutex_lock + unlock (NPTL)   0.1 us
- pthread_sigmask                      0.3 us
- setitimer                            0.3 us
- read + write through a pipe          2.5 us
- gettimeofday                         1.9 us

A producer/consumer test in my language (which uses mutexes and
condition variables) needs 1.4 us for one iteration if both threads
are unbound.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/