[Haskell-cafe] How can we detect and fix memory leak due to lazyness?

Mon Aug 7 20:39:26 EDT 2006

Recently, I'm facing the dark side of laziness
-- the memory leak because of laziness.

Typical pattern that I encounter the problem is like this.

My code was working fine and I was happy.
I just wanted to inspect some properties of my code so
I made a slight chage go the code such as adding counter
argument or attaching auxiliary data filed to original data for
tracing how the data has been constructed.
All of a sudden the program runs out of memory or overflows
the stack.

One problem is that it comes up unexpectedly. Another even
worse problem is that sometimes I get no idea for the exact
location causing the leak!

It really panics facing such darkness of lazy evaluation.
Just a small innocent looking fix for inspection or tracing
blow things up, sometime with no clue for its reason.

When we implement a debugging or tracing option in the
software and let the user toggle those features, how could
we be sure that turning on those features won't crash the
software written in Haskell?

Are there standardized approaches for detecting and fixing
these kind of problems?

Haskell may be type safe but not safe at all from unexpanded
diversion, which is not because of the programmers' mistake
but just because of the laziness.

I have posted an wiki article including one example of adding
a counter to count the number of basic operations in sorting algorithm.

http://www.haskell.org/haskellwiki/Physical_equality

This was a rather simple situation and we figured out how to
cure this by self equality check ( x==x ) forcing evaluation.

There are worse cases not being able to figure out the cure.
I wrote a function for analyzing some property of a graph,
which worked fine.

fixOnBy t p f x = if t x' `p` t x then x else fixOnBy t p f x' where x' = f x

fixSize f x = fixOnBy Set.size (==) f x

sctAnal gs = null cgs || all (not . null) dcs
 where
   gs' = fixSize compose $ Set.fromList [(x,y,cs) | To _ x y cs<-Set.toList gs]
   cgs = [z | z@(x,y,cs)<-Set.toList gs', x==y]
   dcs = [ [c| c@(a,D,b)<-Set.toList cs , a==b] | (_,_,cs)<-cgs]
   compose gs = trace ("## "++show (Set.size gs)) $ foldr Set.insert gs $ do
     (x1,y1,cs1) <- Set.toList gs
     (_,y2,cs2)  <-  takeWhileFst y1 $ Set.toList $ setGT
(y1,Al""(-1),Set.empty) gs
     return (x1,y2,cs1 `comp` cs2)
   takeWhileFst y = takeWhile (\(y',_,_) -> y==y')

This function makes a transitive closure of the given set of relations
by fixpoint iteration on the size of the set of weighted edges.

Sample output is like this.

*Main> main
## 170
## 400
## 1167
## 2249
## 2314
False

When I add an extra data field for tracing how the new relation was
constructed, (e.g. tag [a,b,c] on a->c if it came from a->b and b->c)
it suddenly overflows the stack even before printing out the trace.
The following is the code that leaks memory.

sctAnal gs = null cgs || all (not . null) dcs
 where
   gs' = fixSize compose $ Set.fromList [TT (x,y,cs) [] | To _ x y
cs<-Set.toList gs]
   cgs = [z | z@(TT (x,y,cs) _)<-Set.toList gs', x==y]
   dcs = [[c| c@(a,D,b)<-Set.toList cs , a==b] | TT (_,_,cs) _<-cgs]
   compose gs = trace ("## "++show (Set.size gs)) $ foldr checkInsert gs $ do
     TT (x1,y1,cs1) l1 <- Set.toList gs
     TT (_,y2,cs2) l2 <- takeWhileTTfrom y1 . Set.toList $ setGT (TT
(y1,Al""(-1),Set.empty) []) gs
     return $ TT (x1,y2,cs1 `comp` cs2) (l1++y1:l2)
   takeWhileTTfrom y = takeWhile (\(TT (y',_,_) _) -> y==y')
   checkInsert x s
                   | Set.member x s = s
                   | otherwise      = Set.insert x s

data TT a b = TT a b deriving (Show)
instance (Eq a, Eq b) => Eq (TT a b) where
  (TT x lx) == (TT y ly) = lx==lx && ly==ly && x == y
instance (Ord a, Ord b) => Ord (TT a b) where
  (TT x lx) < (TT y ly) = lx==lx && ly==ly && x < y

The really intersting thing happens when I just make the Ord derived
the stack does not overflow and starts to print out the trace.
(It is not the result that I want though. My intention is to ignore the
tags in the set operation)

data TT a b = TT a b deriving (Show,Eq,Ord)

I believe my Eq and Ord instances defined above are even more
stricter than the derived ones. Is there some magic in "deriving"
that prevents memory leak?

I've even followed the instance declaration like the following
that would be the same as deriving but still leaks memory.

data TT a b = TT a b deriving (Show)
instance (Eq a, Eq b) => Eq (TT a b) where
  (TT x lx) == (TT y ly) = x == y && lx == ly
instance (Ord a, Ord b) => Ord (TT a b) where
  (TT x lx) < (TT y ly) = x < y || x == y && lx < ly

This is really a panic.

--
Ahn, Ki Yung