Personal tools

DataDriven

From HaskellWiki

(Difference between revisions)
Jump to: navigation, search
m (Caching and weak pointers: link fix)
m (Caching and weak pointers: link fixed again, thanks to kosmikus)
Line 119: Line 119:
 
Various forms of ''caching'' have this same problem. Suppose we use a hash table to memoize an expensive function. Even though the hash table is in service to function's arguments, the table's key/value entries keep the key values (function arguments) from ever getting reclaimed. Ideally, the situation would be reversed: the key would keep its table entry alive, and when the key was GC'd, they entry would shortly follow. Unfortunately, the direction of pointers, from entry to key, means that the entries keep the values alive, wasting space and slowing down search for useful entries.
 
Various forms of ''caching'' have this same problem. Suppose we use a hash table to memoize an expensive function. Even though the hash table is in service to function's arguments, the table's key/value entries keep the key values (function arguments) from ever getting reclaimed. Ideally, the situation would be reversed: the key would keep its table entry alive, and when the key was GC'd, they entry would shortly follow. Unfortunately, the direction of pointers, from entry to key, means that the entries keep the values alive, wasting space and slowing down search for useful entries.
   
The classic solution to this problem is to use "weak" pointers, i.e., pointers that the GC disregards in its decision about whether to retain a previously allocated object. For more discussion of these issues, see the [http://www.haskell.org/ghc/docs/latest/html/libraries/base-3.0.0.0/System-Mem-Weak.html <hask>System.Mem.Weak</hask> documentation] and "[http://citeseer.ist.psu.edu/peytonjones99stretching.html Stretching the storage manager]". (These ideas were a motivating application for the "Stretching" paper, but the description got cut for lack of space.)
+
The classic solution to this problem is to use "weak" pointers, i.e., pointers that the GC disregards in its decision about whether to retain a previously allocated object. For more discussion of these issues, see the [http://www.haskell.org/ghc/docs/6.8.1/html/libraries/base-3.0.0.0/System-Mem-Weak.html <hask>System.Mem.Weak</hask> documentation] and "[http://citeseer.ist.psu.edu/peytonjones99stretching.html Stretching the storage manager]". (These ideas were a motivating application for the "Stretching" paper, but the description got cut for lack of space.)
   
 
=== Ephemeral listeners ===
 
=== Ephemeral listeners ===

Revision as of 19:08, 26 November 2007


Contents

1 Abstract

Warning: The Haddock docs are not ready yet. I'm trying to get a working haddock 2.0 running (on my windows machine).

DataDriven is a library for functional events and time-varying values ("sources"). The ideas and interface come mainly from functional reactive programming (FRP). Most FRP implementations I'm aware of have a demand-driven implementation, while the implementation of DataDriven is data-driven (surprise). This library is a resurrection of some ideas from an old, incomplete Fran reimplementation that also became the basis of Meurig Sage's FranTk. This time around, I've been particularly interested in using standard classes as much as possible, most centrally <div class="inline-code">
Applicative
and
Monoid
</div>
.

Besides this wiki page, here are more ways to find out about DataDriven:

Please leave comments at the Talk page.

2 Events

2.1 Background

The heart of the library is a notion of functional, composable events, with a data-driven implementation. Most of the ideas and vocabulary are borrowed from Fran, when Fran's events came to mean multiple occurrences (see Declarative Event-Oriented Programming, rather than the initial ICFP '97 publication). As in Fran, you can think of an event as a stream of "occurrences", each of which has a time and a value. The implementation, however, is radically different from Fran's, being data-driven rather than demand-driven. And in some cases, the functions are not pure. There are also several event-related functions, to create time-varying values.

2.2 A first look at the interface

Some of the useful event operations come through standard classes.

  • Functor
    :
    fmap f e
    is the event that occurs whenever
    e
    occurs, but whose occurrence values come from applying
    f
    to the values from
    e
    . (Fran's
    (==>)
    .)
  • Monoid
    :
    mempty
    is the event that never occurs, and
    e `mappend` e'
    is the event that combines occurrences from
    e
    and
    e'
    . (Fran's
    neverE
    and
    (.|.)
    .)
  • Monad
    :
    return a
    is an event with a single occurrence. This one doesn't quite fit the original semantics, as the occurrence is delivered immediately on "listening" to an event (discussed later). In
    e >>= f
    , each occurrence of
    e
    leads, through
    f
    , to a new event. Similarly for
    join ee
    , which is somehow simpler for me to think about. The occurrences of
    e >>= f
    (or
    join ee
    ) correspond to the union of the occurrences of all such events. For example, suppose we're playing Asteroids and tracking collisions. Each collision can break an asteroid into more of them, each of which has to be tracked for more collisions. Another example: A chat room has an "enter" event, whose occurrences contain new events like "speak".

As a simple example, the following function transforms and combines two events:

show2 :: (Show a, Show b) => Event a -> Event b -> Event String
show2 ea eb = showE ea `mappend` showE eb
 where
   showE e = fmap show e

2.3 Events as continuations

The
Event
type is not actually a new type, but merely a specialization of the familiar type of continuation-based computations, <div class="inline-code">
Cont
</div>
:
newtype Cont o a = Cont { runCont :: (a -> o) -> o }
The
Functor
and
Monad
instances come from
Cont
. The
Monoid
instance for
Cont
is missing (as of 2007-09-08), so it is defined in this module (and thus is an "orphan") simply by
deriving
. The more specialized event type is simply
type Event = Cont (IO ())
Why does it make sense to think of continuation-based computations as events? Because an event is something that one can subscribe to. Subscription provides a "listener" (a continuation) to be invoked on every occurrence of the event. If the occurrence value has type
a
, and the result of the listener and of registration has type
o
, then subscribing has type
(a -> o) -> o
, which is the type wrapped by
Cont
. The
Monoid
,
Functor
, and
Monad
operations are simple. Given a listener
l :: a -> o
,
  • Subscribing
    l
    to
    mempty
    has no effect, since the
    mempty
    is guaranteed never to occur.
  • Subscribing
    l
    to
    ea `mappend` eb
    subscribes
    l
    to each of
    ea
    and
    eb
    .
  • Subscribing
    l
    to
    fmap f e
    subscribes
    l . f
    to
    e
    .
  • Subscribing
    l
    to
    return a
    immediately invokes
    l a
    .
  • Subscribing
    l
    to
    join e
    subscribes to
    e
    a listener that subscribes to every event generated by
    e
    . (Similarly for
    e >>= f == join (fmap f e)
    .)
The functions in the
Event
module operate on this general notion of events (
Cont
) or something more specialized. I expect the most common use to be the
Event
(
IO
) specialization, and the types are often much easier to read for that type. General functions are given general signatures, with the
Event
specializations as comments.

3 Sources

Sources are time-varying values, akin to Fran's "behaviors". They are built up mainly from constant values and application (via the
Applicative
interface), as well as reaction to events.

3.1 Composing Sources

Like events, sources have a more general, and surprisingly simple, form:

type SourceG change m = ((,) change) `O` m
The
change
type parameter provides a description of everything that can affect a source (cause it to change). The
m
parameter is a way to sample the value when changed. Here
g `O` h
means the composition of two type constructors, and is defined in TypeCompose. Without the fancy type constructors,
type SourceG' change m a = (change, m a)
One of the delightful properties of functors and of applicative functors is that they compose. That is, two functors compose to a functor and two AFs compose to form an AF. For any monoid
o
,
((,) o)
is an AF (corresponding to the writer monad). So, when
change
is a monoid and
m
is an AF,
SourceG change m
is an AF. There are many possible monoid choices for
change
. One especially useful one is a continuation/event:
type SourceC o m = SourceG (Cont (m o) ()) m

Still more specifically

type Source = SourceC () IO
A source, then, is simply a change event together with a sampler
IO
. Given an AF application
f <*> a
for AFs
f
and
a
, the change event combines (
mappend
) change events for
f
and
a
, and sampling just applies a sampling of
f
to a sampling of
a
.

3.2 Sources and events

As an example of event-based sources, the following function makes a source with an initial value and changing at every occurrence of an event. The resulting source remembers the event's most recent occurrence value.

mkStepper :: a -> Event a -> IO (Source a)
The result of
mkStepper a e
is an
IO
because it starts reacting to occurrences of
e
only after it is executed. The semantic difference is clearer with the following function, which accumulates event occurrence values:
mkAccumS :: a -> Event (a -> a) -> IO (Source a)
Sources are also used to make events. For instance, the
snapshot
function samples a source whenever an event occurs, and pairs the occurrence and source values.
snapshot :: Event a -> Source b -> Event (a,b)

4 Ephemeral values

4.1 GC favors demand-driven computation

The purpose of garbage collection is to keep services alive as long as they are useful to clients and then free up the services' computational resources (effort and memory). Conventional garbage collection works very well for demand-driven (pull-based) computation, but not for data-driven (push-based) computation.

Consider a piece of information supplied by a service and used by a client. In a demand-driven scenario, the client has a pointer to the service and uses that pointer to get more of the information. The client keeps the serice alive. When the client get GC'd, its pointer to the service goes away. If there are no more pointers to the service, then it will also get GC'd. Both the computational effort and the memory are freed up for other uses. GC did its job.

The situation is reversed for data-driven computation. Here, the service pushes information to the client, so the service has a pointer to the client. This pointer means that the service keeps the client alive and keeps computing even when the client is no longer of any use. GC fails to satisfy its purpose.

4.2 Caching and weak pointers

Various forms of caching have this same problem. Suppose we use a hash table to memoize an expensive function. Even though the hash table is in service to function's arguments, the table's key/value entries keep the key values (function arguments) from ever getting reclaimed. Ideally, the situation would be reversed: the key would keep its table entry alive, and when the key was GC'd, they entry would shortly follow. Unfortunately, the direction of pointers, from entry to key, means that the entries keep the values alive, wasting space and slowing down search for useful entries.

The classic solution to this problem is to use "weak" pointers, i.e., pointers that the GC disregards in its decision about whether to retain a previously allocated object. For more discussion of these issues, see the <div class="inline-code">
System.Mem.Weak
</div> documentation
and "Stretching the storage manager". (These ideas were a motivating application for the "Stretching" paper, but the description got cut for lack of space.)

4.3 Ephemeral listeners

What does all this have to events and sources? Recall that an event is simply a means of registering a "listener" (continuation) to be invoked on every occurrence. The event must have some kind of reference to the listener in order to invoke it. It must not keep the listener alive, however, since the event is the service and the listener is the client. The solution is for events to hold their clients weakly, i.e., point to them via weak references. Once a listener's strong pointers all disappear, the GC nulls out ("tombstones") the event's weak pointer. The next time the event occurs, it finds that it no longer has a live pointer, so it stops notifying the listener.

One of the (currently four) functions in
Data.Ephemeral
converts a value into an ephemeral one:
ephemeral :: (WeakM m, Monoid o) => o -> m o
WeakM
refers to monads having weak pointers, currently just
IO
. The result of
ephemeral o
is an "ephemeral" monadic version
mo'
. Initially,
mo'
just returns
o
. Once
o
is GC'd,
mo'
instead returns
mempty
. The
ephemeral
function is a special case of the more general
ephemeralWith
, which drops the
Monoid
constraint and takes an explicit fall-back value. The implementation of DataDriven takes care of ephemerality automatically, so client code doesn't have to worry about it. The only sign of this issue is the
WeakM
monad constraint in the most general forms of
Event
and
Source
functions. In fact, in the implementation of DataDriven, only one primitive worries about ephemerality, and all of the others inherit the benefits.