Difference between revisions of "WebApplicationInterface"

Revision as of 09:45, 12 April 2008

Note: This page is currently being written and is in an intermediate state.

Abstract

This document specifies a proposed standard interface between web servers and Haskell web applications or frameworks, to promote web application portability across a variety of web servers.

Rationale and Goals

As Haskell is getting more widely known and used more people want to use Haskell for writing web applications which has up until now been a domain dominated by dynamic languages like Python and Ruby. To write a web application that can be put in production and used by real users you generally need two things:

A production quality web server. Production web servers need to be stable, have good performance and be easy to configure and manage. Writing such a web server takes considerable effort.
A framework. A framework helps the user render HTML pages, persist state between request, etc. There are several web frameworks available for Haskell such as HAppS, WASH, HSP and others.

Currently, picking a framework limits the choice of usable web servers, and vice versa. By creating a standardized interface between web servers and frameworks we would separate the choice of framework from the choice of web server. This frees the server and framework developers to focus on working on their preferred part.

This specification purposes a standardized interface between web servers and web applications or framework: the Haskell Web Application Interface (WAI).

A standardized interface is of no use of no one uses it. Therefor this interface must be simple so the cost of implementing it is very low.

Specification Overview

The WAI interface has two sides: the "server" side, and the "application" or "framework" side. The server calls a function that is provided by the application side.

The Application/Framework Side

An application is simply a function of one argument representing some kind of environment (specified later) in which the application is to be run. A simple application might look like this:

simpleApp _ = return (status, responseHeaders, enumerator) 
    where
      status = 200
      responseHeaders = [(pack "Content-type", pack "text/plain")]
      enumerator f z =
        case f z (pack "Hello world!\n") of
          z' -> z'
          z' -> z'

This simple application completely ignores the environment passed to it. A real application would do things like examine the request URL, etc.

The Server Side

The server calls the application function once for each request it receives from an HTTP client, that is directed at the application. Here is a simple CGI web server:

-- TODO: Add example.

Specification Details

A Data Type for Representing Bytes

To be able to represent an HTTP message we need a type to represent bytes. Although some parts of a message can be represented by other types, be it integers or strings, some parts are properly viewed as a sequence of bytes (e.g. the message body). Therefor, we need a type to represent bytes. Haskell has three different types that could be and are used for this purpose:

String - Used both to represent bytes (e.g. in the Socket API) and text. Has an inefficient memory representation giving it a larger memory footprint and poor cache behavior. Using it for storing bytes is also considered bad style since it is intended to represent Unicode code points.

[Word8] - Has the same properties as String except for being explicitly intended to only contain binary data.

ByteString - A fast and memory efficient representation. Used in this proposal as it can be easily be converted to the above two types but the opposite is not possible without a performance penalty.

In this specification we use the strict, Word8 flavor:

import Data.ByteString (ByteString)

The Enumerator Type

The web server needs to provide the web application with the data in the request body and the web application needs to provide the web server with a response body (e.g. an HTML page). They could do so using bytestrings. However, if the the amount of data to send is large (e.g. a big file) all data would have to be kept in memory leading to unnecessary high memory usage. A way to stream data between the server and the client is needed.

Streams can be represented in Haskell using lists or some optimized representation like lazy bytestrings. However, using either of these two options is problematic in a web server serving hundreds or even thousands of request per second for the following reason: When the web application opens a file for sending to the client (or some other resource) it needs to free this resource when it is no longer needed. Stream I/O using lists or lazy bytestrings both uses unsafeInterleaveIO together with a finalizer that gets run by the garbage collector to free the resource (i.e. file) when it's no longer needed. But since the garbage collector runs at some unpredictable time in the future the server might run out of resources (e.g. file handles) before it is run leading to it crashing or being unresponsive.

To avoid this problem resources need to be freed as soon as they are no longer needed. There are (at least) two different ways to achieve this. The first is to use an iterator type interface that provides an explicit closing of the underlying resource:

class InputStream where
    read  :: IO Word8
    readN :: Int -> IO ByteString  -- ^ efficient block read
    close :: IO ()

This is the solution used in most imperative languages e.g. Python and Java. The other option is to use an enumerator (e.g. for-each) style interface and have the enumerator free the resource automatically when the iteration is finished. Oleg showed how this can be implemented using a left fold.

type Enumerator = forall a. (a -> ByteString -> IO (Either a a)) -> a -> IO a

This particular enumerator which will be used for all data streaming in this specification is a normal, strict left fold with some support for early termination. If the consumer wants to signal that it doesn't want to consume more data it can return Left seed. If it wants to continue it return Right seed instead. The consumed likely wants to perform some I/O during each iteration to e.g. send a part of a large file over the network.

The Environment Type

The Environment type providing information regarding the request.

data Environment = Environment
    { requestMethod   :: Method
    , scriptName      :: ByteString
    , pathInfo        :: ByteString
    , queryString     :: Maybe (ByteString)
    , requestProtocol :: (Int, Int)
    , headers         :: Headers
    , input           :: Enumerator
    , errors          :: String -> IO ()
    }

Acknowledgments

This specification is heavily influence by the excellent work done in the Python community during the development of the Python Web Server Gateway Interface (WSGI).