GHC lazy eval optimization bug

Sun Feb 3 14:37:42 EST 2008

The haskell program below demonstrates an optimization bug with the
GHC compiler.  I have tried this with ghc 6.8.1 on my MacBook Pro and
on a Linux i686 machine.  This email should build as a literate haskell
program.  Build the program as

ghc --make -O lazyevalbug.lhs

and run it with no arguments.  An IO action runs 4 times, printing out
how long it took each time.  It takes 1 to 2 seconds to run each time,
but it should only take 1 second or 2 the first time, and finish
almost instantaneously the rest of the times.  If you run the program
with "fast" as a cmdline arg, as in

   ./lazyevalbug fast

then it will behave how I'd expect it to: the first IO action takes a
couple of seconds tand the rest finish instantaneously.  If you build  
the
program without the -O switch, then it behaves the same with our
without the "fast" commandline arg.

DETAILS:

The bug involves lazy evalutation behaving in an unexpected way when
the program is compiled with the -O switch.  If I create an IO action
which refers to a variable which was defined outside of the IO action,
and then run that IO action multiple times, then in some circumstances
the variable value will be re-computed each time the IO action is run.
This seems to violate the behavior I expect from lazy evaluation.
I.e. if I have code like this:

let val = really_expensive_computation
in map (\x -> val + x) [1, 2, 3]

I would expect val to only be computed once.  And it is.  However if
instead of referring to val in a lambda function I refer to val from
an IO action, then in some cases it seems to be re-evaluated every
time the action is performed.  I.e.

let val = really_expensive_computation
in replicateM_ 3 (print val)

When compiled with -O, the above will recompute val 3 times.  However
adding a second reference to "val" seems to make the problem go away.

Another thing I noticed is, the problem seems to only occur if
really_expensive_computation involves the IO monad somehow, for
example if it's doing a computation based on some input read from
stdin.  If really_expensive_computation only involves data generated
within the program then the bug never occurs.

The program below demonstrates this problem with the doSlow and doFast
functions.  See the comment above these functions.  Also, in main, if
you comment out the getStdGen line and uncomment the mkStdGen line,
then the problem goes away.  This demonstrates the fact that if the
computation uses data from an IO action the problem occurs, otherwise
it doesn't.

 > import System (getArgs)
 > import qualified System.Time as ST
 > import Control.Monad (when, replicateM_)
 > import System.Random (getStdGen, mkStdGen, randoms)

Takes an IO action, runs it, and prints line saying how long it took
to run

 > timeIt :: IO a -> IO ()
 > timeIt action =
 >     do tic <- ST.getClockTime
 >        action
 >        toc <- ST.getClockTime
 >        putStrLn $ "Took " ++ (timeDiffToString (ST.diffClockTimes  
toc tic))
 >     -- System.Time.timeDiffToString doesn't handle fractions of a
 >     -- second correctly, so I implement my own version
 >     where timeDiffToString d =
 >               let secs = ST.tdMin d * 60 + ST.tdSec d
 >                   frac = (realToFrac $ abs $ ST.tdPicosec d) /  
(realToFrac $ 10^12)
 >               in show ((realToFrac secs) + frac) ++ " secs"

True if a number is prime.  Just a function meant to use up some
time

 > isPrime :: (Integral n) => n -> Bool
 > isPrime n =
 >     let lim = round (sqrt $ realToFrac n) in
 >     not $ any (\x -> n `rem` x == 0) [2..lim]

doSlow and doFast are exactly the same, except that doFast has an
extra "when" statement which references "primes".  The mere existence of
this extra reference to "primes" in doFast seems to cause the primes
list to only be generated once, so the first time action runs
takes around 1.6 seconds, but subsequent actions finish in a small
fraction of a second.  In doSlow, the action takes 1.6 seconds each
time it runs, not just the first time, indicating that primes is
getting re-created each time the action is run.

 > doSlow, doFast :: (Integral a) => [a] -> IO ()

 > doSlow lst = let primes = filter isPrime lst
 >                  action = timeIt $ print (length primes)
 >              in do replicateM_ 4 action

 > doFast lst = let primes = filter isPrime lst
 >                  action = timeIt $ print (length primes)
 >              in do replicateM_ 4 action
 >                    when (null primes) (print "No primes")

In main, comment out the getStdGen line and uncomment the mkStdGen
line and the problem goes away as well. The same is true if I read
data from stdin vs generating a list in memory.

 > main = do args <- getArgs
 >           let doFunc = if (length args) > 0 && args!!0 == "fast"
 >                        then doFast
 >                        else doSlow
 >           --gen <- return $ mkStdGen 42
 >           gen <- getStdGen
 >           let lst = take 10000 $ randoms gen :: [Integer]
 >           doFunc lst