Personal tools

Simple Servers

From HaskellWiki

Revision as of 13:07, 22 June 2011 by Benmachine (Talk | contribs)

Jump to: navigation, search

This page is a little out of date, and since it was written:

  • GHC's IO manager has been rewritten to use epoll, which should mean all the forkIO examples now run faster.
  • network-bytestring has been merged into the network package, so you don't need to get the two libraries separately.

Some example of simple web server designs in Haskell, using preemptive concurrency, or event-driven approaches. Requirements:

Some more context on the background to this problem is available.

Benchmarks with httperf,

   $ httperf --server=localhost --port=5002 --uri=/ --num-conns=10000

Author: dons

Contents

1 Results

Req/sec with different IO and event mechanisms

2 Basic concurrent server

Concurrent, with String IO. Here on each accept from the main thread, we create a new Handle, and forkIO a lightweight Haskell thread to write a string back to the client. Relies on the runtime scheduler to wake up the main thread in a timely fashion (i.e. via the current 'select' mechanism).

import Network
import Control.Concurrent
import System.IO
 
main = withSocketsDo $ do
    sock <- listenOn $ PortNumber 5002
    loop sock
 
loop sock = do
   (h,_,_) <- accept sock
   forkIO $ body h
   loop sock
  where
   body h = do
       hPutStr h msg
       hFlush h
       hClose h
 
msg = "HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n"

Measurements:

  • $ ghc -O2 --make A.hs
  • Request rate: 6569.1 req/s (0.2 ms/req)

3 Concurrent, with network-bytestring

Now, using bytestring IO (via the network-bytestring package) (but still using the rts' select-based preemptive threads). Just means we allocate nothing in the body, and avoid a couple of copies to do the IO.

{-# LANGUAGE OverloadedStrings #-}
 
import Data.ByteString.Char8
 
import Network hiding (accept)
import Network.Socket
import Network.Socket.ByteString (sendAll)
import Control.Concurrent
 
main = withSocketsDo $ do
    sock <- listenOn $ PortNumber 5002
    loop sock
 
loop sock = do
   (conn, _) <- accept sock
   forkIO $ body conn
   loop sock
  where
   body c = do sendAll c msg
               sClose c
 
msg = "HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n"

Measurements:

  • $ ghc -O2 --make H.hs
  • Request rate: 9901.7 req/s (0.1 ms/req)

4 Epoll-based event callbacks

Now, instead of using the RTS' select mechanism to wake up threads, we use a custom epoll handler. Using epoll-based event handling, and bytestring IO. The epoll approach will replace GHC's select model soon (design here showing how the concurrent Haskell primitives may be implemented in terms of epoll).

{-# LANGUAGE OverloadedStrings #-}
 
-- A simple example of an epoll based http server in Haskell.
--
-- Uses two libraries:
--   * network-bytestring, bytestring-based socket IO.
--      - cabal install network-bytestring: 
--
--   * haskell-event, epoll-based scalable IO events
--      - git clone git://github.com/tibbe/event.git
--      - autoreconf ; then cabal install
 
import Network hiding (accept)
import Network.Socket (fdSocket, accept)
import Network.Socket.ByteString
import Data.ByteString.Char8
import System.Event
import System.Posix
import System.Posix.IO
 
main = withSocketsDo $ do
    sock <- listenOn $ PortNumber 5002
    let fd = fromIntegral (fdSocket sock)
    mgr <- new
    registerFd mgr (client sock) fd evtRead
    loop mgr
 
client sock _ _ = do
    (c,_) <- accept sock
    sendAll c msg
    sClose c
 
msg = "HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n"

Measurements:

  • ghc -O2 --make Epoll.hs
  • Request rate: 15042.6 req/s (0.1 ms/req)

So significantly better. By the way, under the same conditions, this Python epoll version achieves 10k req/sec.

Further work: there are still traditional calls to accept and sendAll, going via the Haskell concurrent IO layer, which are have redundant threading calls, so a fair bit of additional performance may be untapped.

5 Notes

Simon Marlow states: The Haskell program as it stands won’t scale up on a multicore because it only has a single accept loop, and the subtasks are too small. The cost of migrating a thread for load-balancing is too high compared to the cost of completing the request, so it’s impossible to get a speedup this way. If you create one accept loop per CPU then in principle it ought to scale, but in practice it won’t at the moment because there is only one IO manager thread calling select(). Hopefully this will be fixed as part of the ongoing epoll() work that was mentioned earlier.

Regarding the slowdown you see with -threaded, this is most likely because you’re running the accept loop in the main thread. The main thread is special – it is a “bound thread”, which means it is effectively a fully-fledged OS thread rather than a lightweight thread, and hence communication with the main thread is very expensive. Fork a subthread for the accept loop, and you should see a speedup with -threaded.

More background on a similar benchmark in this ticket: http://hackage.haskell.org/trac/ghc/ticket/3758