[Haskell-beginners] "pure" versus "impure" code

Fri May 20 17:59:15 CEST 2011

Paul Sargent <psarge at gmail.com> wrote:
>On 19 May 2011, at 21:12, Costello, Roger L. wrote:
>> My understanding is that:
>> 
>> - Pure code is code that does no I/O
>> 
>> - Impure code is code that does I/O
>
>I don't particularly like that definition. It's true, but the
>definition of I/O has to be very board. 

I would say so! I'm used to "Pure functions don't change state", where IO is pretty much a change of state by definition.

>The definition I use is that a pure function is one that
>	- takes input only from it's parameters.
>	- outputs only via it's return value.

Not quite mine, but I think it's better for reasoning about Haskell code. Maybe - I have a question at the end.

>If x is an external variable in memory
>Pure  : f(b,c) = b + c
>Impure: f(b)   = b + x;         (violates first  rule - Uses value in
>x)

Right, but ... see the question at the end.

>Impure: f(b,c) = 1 ; x = b + c; (violates second rule - Changes value
>in x)
>
>Pure functions can combine to make larger pure functions. Impure
>functions combined with pure functions make larger impure functions.
>
>> Also, it is my understanding that good software design is to
>isolate/separate the impure code from the pure code. Is that correct?
>As the result of pure code is dependent only on it's inputs you can
>make statements like:
>	f(1,2) is always is equal to f(1,2) at any time
>...without knowing what function f() is. That's quite powerful, and can
>be useful in optimisation.

Among other places! The point of this kindof distinction is that it's easier to reason about "pure" code - because you can make statements like that about them.

>Another way of phrasing it - A pure piece of code causes no
>side-effects, and it is not effected by other code's side-effects. This
>property is very useful in parallel computations.

Yup. And reasoning about parallel computation is hard enough as it is.

>Separating pure code out can be useful, but most programs in most
>languages don't. That's mainly due to the use of shared state variables
>between functions. Object orientation is inherently impure as every
>object has retained state, modified and used by the methods. This isn't
>necessarily a bad thing. At some point all programs have to be impure
>otherwise they'll always calculate the same result.
>
>> Does that principle apply to all programming languages, or just
>Haskell?
>
>Yes, it applies to all languages.

But in different ways to different languages.

Look at the "doesn't change state" definition - basically, yours without the "depends on state that other code can change" restriction. That means that you can safely ignore the function when analyzing the calling code - you know it doesn't change any state. If the langauge is greedy, an optimizer can safely inline the function at that point. If the language is lazy, then the optimizer can't do the inlining, because the external state may change between the function invocation and actually evaluating the function, which would change the results of the call.

In other words, in a greedy language you might get more leverage out of the looser definition of "pure."

In your example, x - a value that's not a paramenter - is conventionally known as a free variable. IIUC, in Haskell,  names can't change their value unless they're in a monad. Which means the two different definitions you gave for "pure" describe different types of functions: A function which takes input only from it's parameters has no free variables. On the other hand, a function which uses free variables that aren't in a monad can't be affected by other code's side-effects, even though it takes input other than it's parameters (at least if you have sane scoping rules).

Hence the question: which of these is more useful for Haskell?
-- 
Sent from my Android tablet with K-9 Mail. Please excuse my brevity.