[Haskell-cafe] Yet another top-level state proposal

Fri May 25 20:29:07 EDT 2007

Hi all,

Given the recent discussion about adding top-level mutable state to
Haskell, I thought it might be a good time to throw my own proposal
into the ring.  If enough people think it's worth considering, I can
add it to the wiki page.
(http://www.haskell.org/haskellwiki/Top_level_mutable_state)

In contrast to recent proposals, this one requires no extra syntax or
use of unsafe functions by the programmer.  Any nonstandard "magic"
that might occur is kept within the compiler internals.  Furthermore,
top-level initializations are only executed when needed; merely
importing a module does not cause any additional actions to be run at
startup.

The core idea, similar to that of "type-based execution contexts" on
the above wiki page, is to associate each top-level action with its
own type.  For example, the current way to declare a source for unique
integers is:

------------------------------------------------
{-# NOINLINE uniqueRef #-}
uniqueRef :: IORef Integer
uniqueRef = unsafePerformIO $ newIORef 0

uniqueInt :: IO Integer
uniqueInt = do
    n <- readIORef uniqueRef
    writeIORef uniqueRef (n+1)
    return n
------------------------------------------------
Under this proposal, we would write instead:
------------------------------------------------
newtype UniqueRef = UniqueRef (IORef Integer)
                        deriving OnceIO

instance OnceInit UniqueRef where
    onceInit = liftM UniqueRef (newIORef 0)

uniqueInt :: IO Integer
uniqueInt = do
    UniqueRef uniqueRef <- runOnceIO
    n <- readIORef uniqueRef
    writeIORef uniqueRef (n+1)
    return n
------------------------------------------------

The above code uses two classes:

class OnceInit a where
    onceInit :: IO a

class OnceInit a => OnceIO a where
    runOnceIO :: IO a

The OnceInit class lets the programmer to specify how a type is
initialized; above, it just allocates a new IORef, but we could also
read a configuration file or parse command-line arguments, for
example.

In contrast, instances of the OnceIO class are not written by the
programmer; instead, they are generated automatically by a "deriving
OnceIO" clause.    Each type for which OnceIO is
derived will have a special top-level action associated with it, which
is accessed through the runOnceIO function.  Its semantics are:

- The first time that "runOnceIO" is called, it runs the corresponding
"onceInit" action and caches and returns the result.
- Every subsequent time that "runOnceIO" is called, it returns the
cached result.

This behavior is safe precisely because runOnceIO is an IO action.
Even though one can't guarantee when in the program an initialization
will occur, when the initialization does happen it will be sequenced
among other IO actions.

To illustrate this behavior, here are a couple sample implementations
in plain Haskell.  These do use unsafePerformIO, but in practice any
such details would be hidden in the instance derived by the compiler
(along with any related NOINLINE/NOCSE pragmas):

instance Once UniqueRef where
    runOnceIO = return $! unsafePerformIO onceInit

or (less efficient, but multithreaded safe):

instance Once UniqueRef where
    runOnceIO = modifyMVar onceUniqueRef $ \mx -> case mx of
        Just x -> return (Just x, x)
        Nothing -> do {x <- onceInit; return (Just x, x)}

onceUniqueRef = unsafePerformIO $ newMVar Nothing

Finally, note that the deriving clause can easily check whether the
type in question is monomorphic (as is necessary for type-safety),
since it already has access to the type definition.

Anyway, that's the gist of my proposal; I hope I've explained it well,
but please let let me know if you have questions, suggestions or
criticisms.

Best,
-Judah