# Correctness of short cut fusion

(Difference between revisions)
 Revision as of 14:41, 6 July 2006 (edit) (commit additions)← Previous diff Revision as of 15:02, 6 July 2006 (edit) (undo)Next diff → Line 59: Line 59: We can distinguish two situations, depending on whether g is defined using seq or not. We can distinguish two situations, depending on whether g is defined using seq or not. - ===In absence of seq=== + ===In the absence of seq=== + + ====foldr/build==== If g does not use seq, then the foldr/build-rule really is a semantic equivalence, that is, it holds that If g does not use seq, then the foldr/build-rule really is a semantic equivalence, that is, it holds that Line 68: Line 70: The two sides are semantically interchangeable. The two sides are semantically interchangeable. + + ====destroy/unfoldr==== The destroy/unfoldr-rule, however, is not a semantic equivalence. The destroy/unfoldr-rule, however, is not a semantic equivalence. Line 119: Line 123: - ===In presence of seq=== + ===In the presence of seq=== + + This is the more interesting setting, given that in Haskell there is no way to restrict the use of seq, so in any given program we must be prepared for the possibility that the g appearing in the foldr/build- or the destroy/unfoldr-rule is defined using seq. + Unsurprisingly, it is also the setting in which more can go wrong than above. + + ====foldr/build==== + + In the presence of seq, the foldr/build-rule is not anymore a semantic equivalence. + The instance + + + g = seq + c = undefined + n = 0 + + + shows, via similar "evaluations" as above, that the right-hand side (g c n) can be strictly less defined than the right-hand side (foldr c n (build g)). + + The converse cannot happen, because the following always holds: + + + foldr c n (build g) [itex]\sqsupseteq[/itex] g c n + + + Moreover, semantic equivalence can again be recovered by putting restrictions on the involved functions. + More precisely, if (c [itex]\bot~\bot)\neq\bot[/itex] and n [itex]\neq\bot[/itex], then even in the presence of seq: + + + foldr c n (build g) = g c n + - This is the more interesting setting, given that in Haskell there is no way to retrict the use of seq, so in any given program we must be prepared for the possibility that the g appearing in the foldr/build- or the destroy/unfoldr-rule is defined using seq. + ====destroy/unfoldr====

## 1 Short cut fusion

Short cut fusion allows elimination of intermediate data structures using rewrite rules that can also be performed automatically during compilation.

The two most popular instances are the
foldr
/
build
- and the
destroy
/
unfoldr

### 1.1 foldr/build

The
foldr
/
build
-rule eliminates intermediate lists produced by
build
and consumed by
foldr
, where these functions are defined as follows:
```foldr :: (a -> b -> b) -> b -> [a] -> b
foldr c n []     = n
foldr c n (x:xs) = c x (foldr c n xs)

build :: (forall b. (a -> b -> b) -> b -> b) -> [a]
build g = g (:) []```
Note the rank-2 polymorphic type of
build
. The
foldr
/
build
-rule now means the following replacement for appropriately typed
g
,
c
, and
n
:
`foldr c n (build g) <nowiki>&rarr;</nowiki> g c n`

### 1.2 destroy/unfoldr

The
destroy
/
unfoldr
-rule eliminates intermediate lists produced by
unfoldr
and consumed by
destroy
, where these functions are defined as follows:
```destroy :: (forall b. (b -> Maybe (a,b)) -> b -> c) -> [a] -> c
destroy g = g listpsi

listpsi :: [a] -> Maybe (a,[a])
listpsi []     = Nothing
listpsi (x:xs) = Just (x,xs)

unfoldr :: (b -> Maybe (a,b)) -> b -> [a]
unfoldr p e = case p e of Nothing     -> []
Just (x,e') -> x:unfoldr p e'```
Note the rank-2 polymorphic type of
destroy
. The
destroy
/
unfoldr
-rule now means the following replacement for appropriately typed
g
,
p
, and
e
:
`destroy g (unfoldr p e) <nowiki>&rarr;</nowiki> g p e`

## 2 Correctness

If the
foldr
/
build
- and the
destroy
/
unfoldr
-rule are to be automatically performed during compilation, as is possible using GHC's RULES pragmas, we clearly want them to be equivalences.

That is, the left- and right-hand sides should be semantically the same for each instance of either rule. Unfortunately, this is not so in Haskell.

We can distinguish two situations, depending on whether
g
is defined using
seq
or not.

### 2.1 In the absence of seq

#### 2.1.1 foldr/build

If
g
does not use
seq
, then the
foldr
/
build
-rule really is a semantic equivalence, that is, it holds that
`foldr c n (build g) = g c n`

The two sides are semantically interchangeable.

#### 2.1.2 destroy/unfoldr

The
destroy
/
unfoldr
-rule, however, is not a semantic equivalence.

To see this, consider the following instance:

```g = \x y -> case x y of Just z -> 0
p = \x -> if x==0 then Just undefined else Nothing
e = 0```
These values have appropriate types for being used in the
destroy
/
unfoldr
-rule. But with them, that rule's left-hand side "evaluates" as follows:
```destroy g (unfoldr p e) = g listpsi (unfoldr p e)
= case listpsi (unfoldr p e) of Just z -> 0
= case listpsi (case p e of Nothing     -> []
Just (x,e') -> x:unfoldr p e') of Just z -> 0
= case listpsi (case Just undefined of Nothing     -> []
Just (x,e') -> x:unfoldr p e') of Just z -> 0
= undefined```

while its right-hand side "evaluates" as follows:

```g p e = case p e of Just z -> 0
= case Just undefined of Just z -> 0
= 0```
Thus, by applying the
destroy
/
unfoldr
-rule, a nonterminating (or otherwise failing) program can be transformed into a safely terminating one.

The obvious questions now are:

1. Can the converse also happen, that is, can a safely terminating program be transformed into a failing one?
2. Can a safely terminating program be transformed into another safely terminating one that gives a different value as result?

There is no formal proof yet, but strong evidence supporting the conjecture that the answer to both questions is "No!".

The conjecture goes that if
g
does not use
seq
, then the
destroy
/
unfoldr
-rule is a semantic approximation from left to right, that is, it holds that
`destroy g (unfoldr p e) <math>\sqsubseteq</math> g p e`

What is known is that semantic equivalence can be recovered here by putting moderate restrictions on p.

More precisely, if
g
does not use
seq
and
p
is a strict function that never returns
Just <math>\bot</math>
(where $\bot$ denotes any kind of failure or nontermination), then indeed:
`destroy g (unfoldr p e) = g p e`

### 2.2 In the presence of seq

This is the more interesting setting, given that in Haskell there is no way to restrict the use of
seq
, so in any given program we must be prepared for the possibility that the
g
appearing in the
foldr
/
build
- or the
destroy
/
unfoldr
-rule is defined using
seq
.

Unsurprisingly, it is also the setting in which more can go wrong than above.

#### 2.2.1 foldr/build

In the presence of
seq
, the
foldr
/
build
-rule is not anymore a semantic equivalence.

The instance

```g = seq
c = undefined
n = 0```
shows, via similar "evaluations" as above, that the right-hand side (
g c n
) can be strictly less defined than the right-hand side (
foldr c n (build g)
).

The converse cannot happen, because the following always holds:

`foldr c n (build g) <math>\sqsupseteq</math> g c n`

Moreover, semantic equivalence can again be recovered by putting restrictions on the involved functions.

More precisely, if
(c <math>\bot~\bot)\neq\bot</math>
and
n <math>\neq\bot</math>
, then even in the presence of
seq
:
`foldr c n (build g) = g c n`