Unsafe hGetContents

Duncan Coutts duncan.coutts at googlemail.com
Sun Oct 11 10:05:41 EDT 2009


On Sat, 2009-10-10 at 10:59 -0700, Iavor Diatchki wrote:
> Hello,
> 
> well, I think that the fact that we seem to have a program context
> that can distinguish "f1" from "f2" is worth discussing because I
> would have thought that in a pure language they are interchangable.

Crucially they are contexts in an IO program.

> The question is, does the context in Oleg's example really distinguish
> between "f1" and "f2"?  You seem to be saying that this is not the
> case:  in both cases you end up with the same non-deterministic
> program that reads two numbers from the standard input and subtracts
> them but you can't assume anything about the order in which the
> numbers are extracted from the input---it is merely an artifact of the
> GHC implementation that with "f1" the subtraction always happens the
> one way, and with "f2" it happens the other way.

Right.

> I can (sort of) buy this argument, after all, it is quite similar to
> what happens with asynchronous exceptions (f1 (error "1") (error "2")
> vs f2 (error "1") (error "2")).  Still, the whole thing does not
> "smell right":  there is some impurity going on here,

No, there's no impurity.

> and trying to offload the problem onto the IO monad only makes
> reasoning about IO computations even harder (and it is petty hard to
> start with).

Sure, reasoning about non-deterministic IO programs is tricky. But then
nobody here is advocating writing non-deterministic IO programs. Lazy IO
is sensible and useful when the non-determinism doesn't make any
difference to the results.

Lets look at a simplified case, instead of general IO and the whole OS
API at our disposal, lets look at the case of a single thread of control
and mutable variables, specifically the ST monad.

We can construct a semantics for this based on a sequence of read /
write events for the mutable variables. The ST monad bind gives
guarantees about the ordering of the events. So the ST programs are
deterministic.

do x <- readSTRef v
   writeSTRef v (x+1)
   writeSTRef v (x+2)

The semantics of this ST program is the trace

read(v,x)
write(v,x+1)
write(v,x+2)

We could introduce non-determinism to this system by allowing read /
write events to be arbitrarily interleaved with other subsequent events:

do x <- readSTRef v
   unsafeInterleaveST $ writeSTRef v (x+1)
   writeSTRef v (x+2)

now we can have two traces:

read(v,x)
write(v,x+1)
write(v,x+2)

or

read(v,x)
write(v,x+2)
write(v,x+1)

The semantics is the set of traces, in this case just the two.

Of course with this modified ST system we cannot allow a pure runST
because we've got non-deterministic ST programs (or we could make it
pure by returning the full set of traces). But it'd be ok for IO.

Now working with and reasoning about these non-deterministic ST programs
is tricky. Depending on the implementation choice for the interleaving
we'll get different results and under some implementation choices we'll
be able to influence the result by coding pure bits of the program
differently. None of this changes the semantics since the semantics just
says any possible interleaving is OK.

Another interesting thing to note is that we can limit the interleaving
somewhat by forcing deferred events to come before other subsequent
events:

do writeSTRef v 1
   x <- unsafeInterleaveST $ readSTRef v
   writeSTRef v 2
   evaluate x
   writeSTRef v 3

So in the traces for this program, x can have the value 1 or 2 but not 3
because of the partial order on events that we impose using evaluate.

We can also do something like Oleg's example (simplified to only a
single getChar rather than reading the whole input stream)

do
  let fileContent = "hello"
  seekPoint <- newSTRef 0
  let getChar = do
        s <- readIORef seekPoint
        writeIORef seekPoint (s+1)
        return (fileContent !! s)

  s1 <- unsafeInterleaveST getChar
  s2 <- unsafeInterleaveST getChar
  
  --evaluate (f1 s1 s2)
  evaluate (f2 s1 s2)

Under some implementations of the interleaving we can expect to get
different event interleavings for the f1 program vs the f2 program. So
we apparently have a pure function influencing the event ordering. Of
course the semantics says we have both event orderings anyway.

It is also possible to write ST programs that produce the same result
irrespective of the event interleaving. These programs might actually be
useful. For example:

do writeSTRef v 1
   x <- unsafeInterleaveST $ readSTRef v
   ...
   -- no more writes to v

So here we allow the read from v do be performed any time. We still have
loads of different possible traces, but the value of x is the same in
each, because the v variable is never written to again.

In IO with the full OS API and other programs running concurrently it is
harder to reason about. But we can see similar possibilities for
non-deterministic primitives where we can still get a deterministic
result. One of those is if we read from a mutable variable (a file) and
can be sure that there are no other writes to that mutable variable.
It's further complicated by the fact that we don't directly read from
files but rather from a file descriptor which is a mutable variable
containing a seek point into a file. When we read from a file we mutate
the seek point. So we also have to guarantee that we are not doing that
more than once on the same buffer/seek point (OS FD). If we can assure
ourselves of all that then we should be able to show that the result is
the same irrespective of the event interleavings (because we've banned
the "interfering" events that would influence the result).

Of course if we do not maintain those side conditions then we just get
back to the non-deterministic hard-to-reason-about IO programs.

Duncan



More information about the Haskell-prime mailing list