FFI: number of worker threads?

Wed Jun 21 12:55:12 EDT 2006

On Wed, 2006-06-21 at 12:31 -0400, Li, Peng wrote:
> On 6/21/06, Simon Peyton-Jones <simonpj at microsoft.com> wrote:
> > New worker threads are spawned on as needed.  You'll need as many of
> > them as you have simultaneously-blocked foreign calls. If you have 2000
> > simultaneously-blocked foreign calls, you'll need 2000 OS threads to
> > support them, which probably won't work.
> 
> 2000 OS threads definitely sound scary, but it is possible to work.
> The Linux NPTL threads can scale well up to 10K threads and the stack
> address spaces would be sufficient on 64-bit systems.
> 
> I am thinking about some p2p applications where each peer is
> maintaining a huge amount of TCP connections to other peers, but most
> of these connections are idle. Unforturnately the default GHC RTS is
> multiplexing I/O using "select", which is O(n) and it seems to have a
> FDSET size limit of 1024.
> 
> That makes me wonder if the current design of the GHC RTS is optimal
> in the long run. As software and hardware evolves, we will have
> efficient OS threads (like NPTL)  and huge (64-bit) address spaces.
> My guess is
> 
> (1) It is always a good idea to multiplex GHC user-level threads on OS
> threads, because it improve performance.

Indeed.

> (2) It may not be optimal to multiplex nonblocking I/O inside the GHC
> RTS, because it is unrealistic to have an event-driven I/O interface
> that is both efficient (like AIO/epoll) and portable (like
> select/poll). What is worse, nonblocking I/O still blocks on disk
> accesses. On the other hand, the POSIX threads are portable and it can
> be efficiently implemented on many systems. At least on Linux, NPTL
> easily beats "select"!

On linux, epoll scales very well with minimal overhead. Using multiple
OS threads to do blocking IO would not scale in the case of lots of idle
socket connections, you'd need one OS thread per socket.

The IO is actually no longer done inside the RTS, it's done by a Haskell
worker thread. So it should be easier now to use platform-specific
select() replacements. It's already different between unix/win32.

So I'd suggest the best approach is to keep the existing multiplexing
non-blocking IO system and start to take advantage of more scalable IO
APIs on the platforms we really care about (either select/poll
replacements or AIO).

Duncan