writing a function to build a list of ints from user input

Galen Menzel galen@alumni.utexas.net
Tue, 6 May 2003 08:14:46 -0500


On Tue, May 06, 2003 at 07:22:20AM -0400, David Roundy wrote:
> On Tue, May 06, 2003 at 05:02:56AM -0500, Galen Menzel wrote:
> > 
> > makeList :: IO [Int]
> > makeList = do s <- getLine
> > 	      case read s of
> > 		0 -> return []
> > 		x -> do xs <- makeList
> > 			return (x:xs)
> > 
> > I changed the if to a case so that (read s) doesn't get called twice.
> 
> Does the compiler not optimize away identical calls? Or is it not allowed
> to, for some reason?

This is certainly possible, particularly with a referentially
transparent language.  There's a technique called memoization that
basically keeps a table of values that a function has already
computed.  When a memoized function is called on a value, a lookup is
performed to see if the function has already been called on this value
before.  If so, the value recorded in the table is returned.  If not,
the function is called and its returned value is stored in the table.

This is not the default way of handling functions in Haskell, I guess
to keep function-call overhead low.  However, GHC and hugs both
provide memo functions that do this for you.  Unfortunately, they're
not always useful.  From the GHC memo documentation:

    memo :: (a -> b) -> a -> b

    So, for example, memo f is a version of f that caches the results of
    previous calls.

    The searching is very fast, being based on pointer equality. One
    consequence of this is that the caching will only be effective if
    exactly the same argument is passed again to the memoised function.
    This means not just a copy of a previous argument, but the same
    instance. It's not useful to memoise integer functions using this
    interface, because integers are generally copied a lot and two
    instances of '27' are unlikely to refer to the same object.

Thus they don't work for the prototypical memoized function, fib:

memo-fib = memo fib

fib 0 = 0
fib 1 = 1
fib x = memo-fib (x-1) + memo-fib (x-2)

Comparing pointers is a reasonable thing to do, since it can be
inefficient and impractical to check parameter equality for functions
of more complex data structures.  But not being able to memoize
functions of integers seems kind of silly.

> I agree that the case looks nicer, and would do it in general, but have
> often wondered how hard the compilers try (or are allowed to try... as it
> might mess up space usage) to optimize away multiple identical calls.
>
> Basically, if I write:
> 
> thrice_length s = length s + length s + length s
> 
> (A very stupid example...)
> 
> Is my program really going to run length s three times, or will it just
> call it once (assuming I have full optimizations on)? I would have thought
> that one of the major advantages of a purely functional language would be
> that the compiler could safely eliminate common subexpressions, since it
> knows that they will always be identical.

As far as I know, this actually calls length s three times (although
this is a case where memo would work).

> On the other hand, if I write a function like:
> 
> long_list = [1..100000] ++ [1..100000] ++ [1..100000] ++ [1..100000]
> 
> And then run a
> 
>  putStr $ unlines $ map show long_list
>
> I would expect this to take very little memory, which would mean that the
> [1..100000] would need to be calculated four separate times rather than
> being calculated once and stored...

After I ran this my ghci process had an 82MB footprint.  I don't know
how many times [1..100000] gets calculated.  My guess is four, but
don't quote me on that.

galen