# Infinity and efficiency

### From HaskellWiki

(link to non-strict semantics) |
m (minor edit) |
||

(2 intermediate revisions by 2 users not shown) | |||

Line 1: | Line 1: | ||

[[Category:Idioms]] |
[[Category:Idioms]] |
||

− | In this article we want to demonstrate how to check efficiency of an implementation by checking for proper results for infinite input. |
+ | In this article we demonstrate how to check the efficiency of an implementation by checking for proper results for infinite input. |

− | In general it is harder to reason about time and memory complexity of an implementation than on its correctness. |
+ | In general, it is harder to reason about time and memory complexity of an implementation than about its correctness. |

− | In fact in Haskell inefficient implementations sometimes turn out to be wrong implementations. |
+ | In fact, in Haskell inefficient implementations sometimes turn out to be wrong implementations. |

− | A very simple examples is the function <code>reverse . reverse</code> which seems to be an inefficient implementation of <code>id</code>. |
+ | |

+ | A very simple example is the function <code>reverse . reverse</code> which seems to be an inefficient implementation of <code>id</code>. |
||

In a language with [[strict semantics]], these two functions are the same. |
In a language with [[strict semantics]], these two functions are the same. |
||

− | But since the [[non-strict semantics]] of Haskell allows infinite data structures, there is subtle difference, |
+ | But since the [[non-strict semantics]] of Haskell allows infinite data structures, there is a subtle difference, |

− | because for infinite input list, the function <code>reverse . reverse</code> is undefined, whereas <code>id</code> is defined. |
+ | because for an infinite input list, the function <code>reverse . reverse</code> is undefined, whereas <code>id</code> is defined. |

− | Now lets consider a more complicated example. |
+ | Now let's consider a more complicated example. |

− | Say we want to program a function, that removes elements from the end of a list, |
+ | Say we want to program a function that removes elements from the end of a list, |

just like <code>dropWhile</code> removes elements from the beginning of a list. |
just like <code>dropWhile</code> removes elements from the beginning of a list. |
||

We want to call it <code>dropWhileRev</code>. |
We want to call it <code>dropWhileRev</code>. |
||

− | (As a more concrete example imagine a function which removes trailing spaces.) |
+ | (As a more concrete example, imagine a function which removes trailing spaces.) |

A simple implementation is |
A simple implementation is |
||

<haskell> |
<haskell> |
||

Line 19: | Line 19: | ||

dropWhileRev p = reverse . dropWhile p . reverse |
dropWhileRev p = reverse . dropWhile p . reverse |
||

</haskell> |
</haskell> |
||

− | You will already guess, that it also does not work for infinite input lists. |
+ | You probably have already guessed that it also does not work for infinite input lists. |

− | And actually it is also inefficient, because <code>reverse</code> requires to compute all list nodes |
+ | Incidentally, it is also inefficient, because <code>reverse</code> requires to compute all list nodes |

− | (but it does not require to compute the values stored in the nodes). |
+ | (although it does not require to compute the values stored in the nodes). |

− | Thus the full list skeleton must be hold in memory. |
+ | Thus the full list skeleton must be held in memory. |

− | However it is possible to implement <code>dropWhileRev</code> in a way that works for more kinds of input. |
+ | However, it is possible to implement <code>dropWhileRev</code> in a way that works for more kinds of inputs. |

<haskell> |
<haskell> |
||

dropWhileRev :: (a -> Bool) -> [a] -> [a] |
dropWhileRev :: (a -> Bool) -> [a] -> [a] |
||

Line 30: | Line 30: | ||

</haskell> |
</haskell> |
||

Here, <code>foldr</code> formally inspects the list from right to left, |
Here, <code>foldr</code> formally inspects the list from right to left, |
||

− | but actually it processes data from left to right. |
+ | but it actually processes data from left to right. |

− | Whenever a run of elements occurs, which matches the condition <code>p</code>, |
+ | Whenever a run of elements that matches the condition <code>p</code> occurs, these elements are held until the end of the list is encountered (then they are dropped), |

− | then these elements are hold until the end of the list is encountered (then they are dropped), |
+ | or when a non-matching list element is found (then they are emitted). |

− | or a non-matching list element is found (then they are emitted). |
||

The crux is the part <code>null xs</code>, which requires to do recursive calls within <code>foldr</code>. |
The crux is the part <code>null xs</code>, which requires to do recursive calls within <code>foldr</code>. |
||

− | This works in many cases, but fails if the number of matching elements becomes too large. |
+ | This works in many cases, but it fails if the number of matching elements becomes too large. |

The maximum memory consumption depends on the length of the runs of non-matching elements, |
The maximum memory consumption depends on the length of the runs of non-matching elements, |
||

which is much more efficient than the naive implementation above. |
which is much more efficient than the naive implementation above. |

## Latest revision as of 18:41, 25 April 2009

In this article we demonstrate how to check the efficiency of an implementation by checking for proper results for infinite input.
In general, it is harder to reason about time and memory complexity of an implementation than about its correctness.
In fact, in Haskell inefficient implementations sometimes turn out to be wrong implementations.

A very simple example is the function `reverse . reverse`

which seems to be an inefficient implementation of `id`

.
In a language with strict semantics, these two functions are the same.
But since the non-strict semantics of Haskell allows infinite data structures, there is a subtle difference,
because for an infinite input list, the function `reverse . reverse`

is undefined, whereas `id`

is defined.

Now let's consider a more complicated example.
Say we want to program a function that removes elements from the end of a list,
just like `dropWhile`

removes elements from the beginning of a list.
We want to call it `dropWhileRev`

.
(As a more concrete example, imagine a function which removes trailing spaces.)
A simple implementation is

dropWhileRev :: (a -> Bool) -> [a] -> [a] dropWhileRev p = reverse . dropWhile p . reverse

You probably have already guessed that it also does not work for infinite input lists.
Incidentally, it is also inefficient, because `reverse`

requires to compute all list nodes
(although it does not require to compute the values stored in the nodes).
Thus the full list skeleton must be held in memory.
However, it is possible to implement `dropWhileRev`

in a way that works for more kinds of inputs.

dropWhileRev :: (a -> Bool) -> [a] -> [a] dropWhileRev p = foldr (\x xs -> if p x && null xs then [] else x:xs) []

Here, `foldr`

formally inspects the list from right to left,
but it actually processes data from left to right.
Whenever a run of elements that matches the condition `p`

occurs, these elements are held until the end of the list is encountered (then they are dropped),
or when a non-matching list element is found (then they are emitted).
The crux is the part `null xs`

, which requires to do recursive calls within `foldr`

.
This works in many cases, but it fails if the number of matching elements becomes too large.
The maximum memory consumption depends on the length of the runs of non-matching elements,
which is much more efficient than the naive implementation above.