[Haskell-cafe] On the purity of Haskell

AUGER Cédric sedrikov at gmail.com
Wed Dec 28 20:38:32 CET 2011


Le Wed, 28 Dec 2011 17:39:52 +0000,
Steve Horne <sh006d3592 at blueyonder.co.uk> a écrit :

> This is just my view on whether Haskell is pure, being offered up for 
> criticism. I haven't seen this view explicitly articulated anywhere 
> before, but it does seem to be implicit in a lot of explanations - in 
> particular the description of Monads in SBCs "Tackling the Awkward 
> Squad". I'm entirely focused on the IO monad here, but aware that
> it's just one concrete case of an abstraction.
> 
> Warning - it may look like trolling at various points. Please keep
> going to the end before making a judgement.

It is not yet a troll for me as I am a newbie ^^

> To make the context explicit, there are two apparently conflicting 
> viewpoints on Haskell...
> 
>  1. The whole point of the IO monad is to support programming with
>     side-effecting actions - ie impurity.
>  2. The IO monad is just a monad - a generic type (IO actions), a
> couple of operators (primarily return and bind) and some rules -
> within a pure functional language. You can't create impurity by
> taking a subset of a pure language.
> 
> My view is that both of these are correct, each from a particular
> point of view. Furthermore, by essentially the same arguments, C is
> also both an impure language and a pure one.
> 
> See what I mean about the trolling thing? I'm actually quite serious 
> about this, though - and by the end I think Haskell advocates will 
> generally approve.
> 
> First assertion... Haskell is a pure functional language, but only
> from the compile-time point of view. The compiler manipulates and
> composes IO actions (among other things). The final resulting IO
> actions are finally swallowed by unsafePerformIO or returned from
> main. However, Haskell is an impure side-effecting language from the
> run-time point of view - when the composed actions are executed.
> Impurity doesn't magically spring from the ether - it results from
> the translation by the compiler of IO actions to executable code and
> the execution of that code.
> 
> In this sense, IO actions are directly equivalent to the AST nodes in
> a C compiler. A C compiler can be written in a purely functional way
> - in principle it's just a pure function that accepts a string
> (source code) and returns another string (executable code). I'm
> fudging issues like separate compilation and #include, but all of
> these can be resolved in principle in a pure functional way.
> Everything a C compiler does at compile time is therefore, in
> principle, purely functional.
> 
> In fact, in the implementation of Haskell compilers, IO actions
> almost certainly *are* ASTs. Obviously there's some interesting
> aspects to that such as all the partially evaluated and unevaluated
> functions. But even a partially evaluated function has a
> representation within a compiler that can be considered an AST node,
> and even AST nodes within a C compiler may represent partially
> evaluated functions.
> 
> Even the return and bind operators are there within the C compiler in
> a sense, similar to the do notation in Haskell. Values are converted
> into actions. Actions are sequenced. Though the more primitive form
> isn't directly available to the programmer, it could easily be
> explicitly present within the compiler.
> 
> What about variables? What about referential transparency?
> 
> Well, to a compiler writer (and equally for this argument) an
> identifier is not the same thing as the variable it references.
> 
> One way to model the situation is that for every function in a C 
> program, all explicit parameters are implicitly within the IO monad. 
> There is one implicit parameter too - a kind of IORef to the whole 
> system memory. Identifiers have values which identify where the
> variable is within the big implicit IORef. So all the manipulation of
> identifiers and their reference-like values is purely functional.
> Actual handling of variables stored within the big implicit IORef is
> deferred until run-time.
> 
> So once you accept that there's an implicit big IORef parameter to
> every function, by the usual definition of referential transparency,
> C is as transparent as Haskell. The compile-time result of each
> function is completely determined by its (implicit and explicit)
> parameters - it's just that that result is typically a way to look up
> the run-time result within the big IORef later.

Now as you ask here is my point of view:

IO monad doesn't make the language impure for me, since you can give
another implementation which is perfectly pure and which has the same
behaviour (although completely unrealistic):

-- An alternative to IO monad

data IO_ = IO_
  { systemFile :: String -> String -- ^ get the contents of the file
  , handlers :: Handler -> (Int, String)
    -- ^ get the offset of the handler
  }

type IO a = IO { trans :: IO_ -> (a, IO_) }

instance Monad IO where
  bind m f = IO { trans = \io1 -> let (a, io2) = trans m io1 in
                                  trans (f a) io2
                }
  return a = IO { trans = \io_ -> (a, io_) }

-- An example: hGetChar

hGetChar ::  Handler -> IO Char
hGetChar h =
  IO { trans = \io_ -> let (pos, file) = handlers io_ h
                           c = (systemFile file) !! pos
                           newHandlers = \k -> if h==k
                                               then (1+pos, file)
					       else handlers io_ k
                       in (c, IO_ {systemFile = systemFile io_
                                  ,handlers = newHandlers})
     }

Now how would this work?
In a first time, you load all your system file before running the
program (a "side-effect" which does not modify already used structures;
it is just initialization), then you run the program in a perfectly
pure way, and at the end you commit all to the system file (so you
modify structures the running program won't access as it has
terminated).

I am too lazy (like many Haskellers) to show how to implement the other
IO functions; but if you believe me you can do it, you can see that I
haven't done any side effect.
Now the "main" function has type: IO (), that is it is an encapsulated
function of type : IO_ -> ((), IO_); in other words, simply an
environment transformation. main really is a pure function.

It is quite different from C, as in Haskell you need to provide the
environment in which it is executed (stored in the type IO_); in C, it
is not the case
"read(5/*file descriptor, that is an int which cannot contain an
         environment*/,
      buff/*buffer (int*), cannot contain an environment*/,
      len/*size (int) to be read, cannot contain an environment*/);
 /*return type: void, cannot contain an environment*/"
but this expression modifies an environment, which is the side effect.
To make C pure, we need to say that all function calls are implicitly
given an environment, but in C we have no way to expose this
environment, and the typing system doesn't tell us if the call of the
function only access or also modifies the environment.

In haskell, the environment is explicitely passed to each function
(although the 'do' notations tends to hide it, but it is only syntactic
sugar), so by reading the signature and unfolding all
"data/type/newtype" definitions, we can tell if there is or not side
effect; for C, we must unfold 'terms' to understand that.

In Haskell,
'hGetChar h >>= \c -> hPutChar i' always has the same value, but
'trans (hGetChar h >>= \c -> hPutChar i) (IO_ A)'
'trans (hGetChar h >>= \c -> hPutChar i) (IO_ B)'
may have different values according to A and B.

In C, you cannot express this distinction, since you only have:
'read(h, &c, 1); write(i, &c, 1);' and cannot pass explicitely the
environment.

As you can see my point of view is really different, as I do not think
in the "AST way" (even if I have always heard of it, and think it is
another good way to think of the monads; in fact the 2 point of view
are the same for me, but express differently). 

> What's different about Haskell relative to C therefore...
> 
>  1. The style of the "AST" is different. It still amounts to the same
>     thing in this argument, but the fact that most AST nodes are
> simply partially-evaluated functions has significant practical
>     consequences, especially with laziness mixed in too. There's a
> deep connection between the compile-time and run-time models (contrast
>     C++ templates).
>  2. The IO monad is explicit in Haskell - side-effects are only
>     permitted (even at run-time) where the programmer has explicitly
>     opted to allow them.
>  3. IORefs are explicit in Haskell - instead of always having one you
>     can have none, one or many. This is relevant to an alternative
>     definition of referential transparency. Politicians aren't
>     considered transparent when they bury the relevant in a mass of
> the irrelevant, and even pure functions can be considered to lack
>     transparency in that sense. Haskell allows (and encourages) you to
>     focus in on the relevant - to reference an IORef Bool or an IORef
>     Int rather than dealing with an IORef Everything.
> 
> That last sentence of the third point is my most recent eureka - not
> so long ago I posted a "Haskell is just using misleading definitions
> - it's no more transparent than C" rant, possibly on Stack Overflow.
> Wrong again :-(
> 
> So - what do you think?
> 

my 2 cents

NB. I encountered a situation where my vision of things was broken:

hGetContents :: Handle -> IO ByteString

in Data.ByteString.Lazy as BS

The following two programs

BS.hGetContents h >>= \b -> close h >> let x = BS.length b in print x
-- ^ Dangerous, never do it!

BS.hGetContents h >>= \b -> let x = BS.length b in close h >> print x
-- ^ The right way to do

is a problem for me, as BS.length doesn't receive an environment as a
parameter, so as they are given the SAME b, they should give the same
result. (For me all BS.ByteString operations should be done inside a
Monad; the non-lazy ByteString doesn't have this problem of course)



More information about the Haskell-Cafe mailing list