PR Stanley prstanley at ntlworld.com
Tue May 6 17:34:44 EDT 2008

```After you grok induction over the naturals, you can start to think
about structural induction, which is usually what we use in
understand the other (structural induction actually made more sense to
me when I was learning, because I started as a programmer and then
became a mathematician, so I thought in terms of data structures).
Paul: I was hoping that understanding the classic mathematical
concept would help me appreciate the structural computer science)
variation better. I don't know what it is about induction that I'm
not seeing. It's so frustrating! Deduction in spite of the complexity
in some parts makes perfect sense. This, however, is a different beast!

So let's say you have a tree, and we want to count the number of
leaves in the tree.

data Tree = Leaf Int | Branch Tree Tree

countLeaves :: Tree -> Int
countLeaves (Leaf _) = 1
countLeaves (Branch l r) = countLeaves l + countLeaves r

We want to prove that countLeaves is correct.   So P(x) means
"countLeaves x == the number of leaves in x".
Paul: By 'correct' presumably you mean sound.

First we prove P(Leaf i), since leaves are the trees that have no
subtrees.  This is analogous to proving P(0) over the naturals.
Paul: I'd presume 'proof' here means applying the function to one
leaf to see if it returns 1. If I'm not mistaken this is establishing
the base case.

countLeaves (Leaf i) = 1, by definition of countLeaves.
"Leaf i"  has exactly one leaf, obviously.
So countLeaves (Leaf i) is correct.

Now to prove P(Branch l r), we get to assume that P holds for all of
its subtrees, namely we get to assume P(l) and P(r).
Paul: How did you arrive at this decision? Why can we assume that P
holds fr all its subtrees?

You can think of this as constructing an algorithm to prove P for
any tree, and we have
"already proved" it for l and r in our algorithm.
Paul: Is this the function definition for countLeaves?

This is analogous to proving P(n) => P(n+1) in the case of naturals.
Paul:I thought P(n) was the induction hypothesis and P(n+1) was the
proof that the formula/property holds for the subsequent element in
the sequence if P(n) is true. I don't see how countLeaves l and
countLeaves r are analogous to P(n) and P(n+1).

So:

Assume P(l) and P(r).
P(l) means "countLeaves l == the number of leaves in l"
P(r) means "countLeaves r == the number of leaves in r"
countLeaves (Branch l r) = countLeaves l + countLeaves r, by definition.
And that is the number of leaves in "Branch l r", the sum of the
number of leaves in its two subtress.
Therefore P(Branch l r).

Now that we have those two cases, we are done; P holds for any Tree whatsoever.

In general you will have to do one such proof for each case of your
data structure in order to prove a property for the whole thing.  At
each case you get to assume the property holds for all substructures.

Generally not so many steps are written down.  In fact in this
example, most people who do this kind of thing a lot would write
"straightforward induction", and that would be the whole proof :-)

The analogy between structural induction and induction over the
naturals is very strong; in fact induction over the naturals is just
induction over this data structure:

data Nat = Zero | Succ Nat

Hope this helps.

Luke

```