Deforestration harmful?

Shin-Cheng Mu scm@comlab.ox.ac.uk
Tue, 7 May 2002 20:51:49 +0100


Dear Simon and Olaf,

Thank you very much for your replies.

Monday, May 6, 2002, 02:37  PM, Olaf Chitil wrote:
> Shin-Cheng Mu wrote:
>> The intermediate datatype Tree1 was thus eliminated.
>> The problem was: the fused program ran slower than the
>> original one!
> Note that you have done more than just deforestation. You have moved f
> through join, obtaining join'. Because as you say (fg s2) is much =
longer
> than (g s2), join' is applied more often than join was applied before. =
I
> don't know how expensive your join and join' are.

I will be a bit more precise. Let f and g have type:

   g :: Seed -> [Tree]
   f :: Tree -> [Expr]

where Tree1 is a tip-valued binary tree, and Tree2 is also a
binary tree but the forks are arithmetic operators:

   data Tree =3D Tip Int | Bin Tree1 Tree1
   data Expr =3D Val Int | Add Tree2 Tree2 | Sub Tree2 Tree2 | Mul ....

The function g is defined like:

     g seed =3D [ z | (s1, s2) <- split seed,
                     x <- g s1, y <- g s2, z <- Bin x y ]

The function f converts a Tree to many Exprs by replacing
Bin with one of Add, Sub, Mul or Div:

     f (Tip i) =3D [Val i]
     f (Bin x y) =3D [op x' y' | x' <- f x, y' <- f y, op <- join x y]

The program before fusion looks like:

     concat . map f . g

while the fused program simply:

     fg seed =3D [ z | (s1, s2) <- split seed,
                      x <- fg s1, y <- fg s2, z <- join x y ]
     join x y =3D [op x y | op <- [Add, Sub, ... ]]

And the surprise was the fused program ran slower than the
naive one before fusion.

The time profiling showed that, in one run, the naive
program itself took 2.00 sec to finish, with 9% extra GC time. The
fused program itself, on the other hand, took only 1.84 sec but
spent an extra 67% of time on GC (the heap size is the default
64M).

Considering their heap profiles, both of them look like several
steep, slanted triangles standing next to each other. The highest peak
in the heap profile of the naive program reaches only 50K of memory,
while that for the fused program reaches as much as 1500K. In
the former case, the heap is mostly occupied by chunks created
by g. In the latter case, the heap is mostly occupied by join.

> Furthermore, deforestation itself does not reduce runtime very much.

Indeed. Looks like it helps to speed up the mutator a bit but
end up spending more time on GC, due to increased heap demand.

> More important is the secondary effect that deforstation moves =
formerly
> separated expressions close to each other, so that further =
optimisations
> can take place. Maybe in your example there are not many further
> optimisations?

I think so. As you can see there are not much optimisations
to be done, other than the ones done by hand. :)

Monday, May 6, 2002, 12:02  PM, Simon Marlow wrote=A1G
 > It is entirely plausible that as you increase the heap size the =
mutator
 > time increases: this is because you get fewer cache hits with a =
larger
 > heap.  GHC starts off with a small heap size so as to try to make the
 > most of the cache.  However, on most examples we've seen the benefits =
of
 > reducing garbage collections by increasing the heap size tend to
 > outweigh the benefits from staying in the cache.
 > Did you try heap profiling to discover what structure was filling up
 > the heap?

Thanks for the explanation. :)

More info. about the heap was given above. By the way, could there be
situations where increasing the heap size also increases the GC time?

sincerely,
Shin