[commit: ghc] master: Re-do the "function application discount" (fixes Trac #6048) (980372f)

Simon Peyton Jones simonpj at microsoft.com
Wed May 9 18:56:23 CEST 2012

Repository : ssh://darcs.haskell.org//srv/darcs/ghc

On branch  : master

>---------------------------------------------------------------

commit 980372f357667c1ba63b28acbf5798826890b7a5
Author: Simon Peyton Jones <simonpj at microsoft.com>
Date:   Wed May 9 16:22:49 2012 +0100

Re-do the "function application discount" (fixes Trac #6048)

* Undoes Max's very aggressive function-inlining change

* Resticts function application discount to functions
that occur just once in the body. It was the multiple
occurrences that led to the exponential behavour in
Trac #6048.

See Note [Function application discount] in CoreUnfold.

Module binary sizes are down 2% on average, which is good.
Allocations wobble about a bit, but only on a few benchmarks
and not by much, so it seems a price worth paying to avoid
exponential behaviour!

Allocs
Min           -1.2%
Max           +2.8%
Geometric Mean           +0.0%

>---------------------------------------------------------------

compiler/coreSyn/CoreUnfold.lhs |  120 +++++++++++----------------------------
1 files changed, 34 insertions(+), 86 deletions(-)

diff --git a/compiler/coreSyn/CoreUnfold.lhs b/compiler/coreSyn/CoreUnfold.lhs
index ce729b4..4ab1bec 100644
--- a/compiler/coreSyn/CoreUnfold.lhs
+++ b/compiler/coreSyn/CoreUnfold.lhs
@@ -163,7 +163,7 @@ mkUnfolding src top_lvl is_bottoming expr
, not (exprIsTrivial expr)
= NoUnfolding    -- See Note [Do not inline top-level bottoming functions]
| otherwise
-  = CoreUnfolding { uf_tmpl   	    = occurAnalyseExpr expr,
+  = CoreUnfolding { uf_tmpl   	    = occ_anald_expr,
uf_src          = src,
uf_arity        = arity,
uf_is_top 	    = top_lvl,
@@ -173,7 +173,8 @@ mkUnfolding src top_lvl is_bottoming expr
uf_is_work_free = exprIsWorkFree   expr,
uf_guidance     = guidance }
where
-    (arity, guidance) = calcUnfoldingGuidance expr
+    occ_anald_expr    = occurAnalyseExpr expr
+    (arity, guidance) = calcUnfoldingGuidance occ_anald_expr
-- Sometimes during simplification, there's a large let-bound thing
-- which has been substituted, and so is now dead; so 'expr' contains
-- two copies of the thing while the occurrence-analysed expression doesn't
@@ -501,80 +502,6 @@ sizeExpr bOMB_OUT_SIZE top_args expr
d2  -- Ignore d1
\end{code}

-Note [Function application discount]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-I noticed that the output of the supercompiler generates a lot of code
-with this form:
-
-"""
-module Inlining where
-
-h1 k = k undefined undefined undefined
-        undefined undefined undefined
-        undefined undefined undefined
-        undefined undefined undefined
-        undefined undefined undefined
-        undefined undefined undefined
-
-a = h1 (\x _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ -> x)
-b = h1 (\_ x _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ -> x)
-c = h1 (\_ _ x _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ -> x)
-d = h1 (\_ _ _ x _ _ _ _ _ _ _ _ _ _ _ _ _ _ -> x)
-e = h1 (\_ _ _ _ x _ _ _ _ _ _ _ _ _ _ _ _ _ -> x)
-f = h1 (\_ _ _ _ _ x _ _ _ _ _ _ _ _ _ _ _ _ -> x)
-g = h1 (\_ _ _ _ _ _ x _ _ _ _ _ _ _ _ _ _ _ -> x)
-h = h1 (\_ _ _ _ _ _ _ x _ _ _ _ _ _ _ _ _ _ -> x)
-i = h1 (\_ _ _ _ _ _ _ _ x _ _ _ _ _ _ _ _ _ -> x)
-j = h1 (\_ _ _ _ _ _ _ _ _ x _ _ _ _ _ _ _ _ -> x)
-k = h1 (\_ _ _ _ _ _ _ _ _ _ x _ _ _ _ _ _ _ -> x)
-l = h1 (\_ _ _ _ _ _ _ _ _ _ _ x _ _ _ _ _ _ -> x)
-m = h1 (\_ _ _ _ _ _ _ _ _ _ _ _ x _ _ _ _ _ -> x)
-n = h1 (\_ _ _ _ _ _ _ _ _ _ _ _ _ x _ _ _ _ -> x)
-o = h1 (\_ _ _ _ _ _ _ _ _ _ _ _ _ _ x _ _ _ -> x)
-p = h1 (\_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ x _ _ -> x)
-q = h1 (\_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ x _ -> x)
-r = h1 (\_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ x -> x)
-"""
-
-With GHC head the applications of h1 are not inlined, which hurts the
-quality of the generated code a bit. I was wondering why h1 wasn't
-getting inlined into each of "a" to "i" - after all, it has a manifest
-lambda argument.
-
-It turns out that the code in CoreUnfold gives a fixed discount of
-opt_UF_FunAppDiscount to a function argument such as "k" if it applied
-to any arguments. This is enough to ensure that h1 is inlined if the number
-of arguments applied to k is below a certain limit, but if many arguments are
-applied to k then the fixed discount can't overcome the size of the
-chain of apps, and h1 is never inlined.
-
-My proposed solution is to change CoreUnfold.funSize so that longer
-chains of arguments being applied to a lambda-bound function give a
-bigger discount. The motivation for this is that we would *generally*
-expect that the lambda at the callsite has enough lambdas such that
-all of the applications within the body can be beta-reduced away. This
-change might lead to over eager inlining in cases like this, though:
-
-{{{
-h1 k = k x y z
-
-{-# NOINLINE g #-}
-g = ...
-
-main = ... h1 (\x -> g x) ...
-}}}
-
-In this case we aren't able to beta-reduce away all of the
-applications in the body of h1 because the lambda at the call site
-only binds 1 argument, not the 3 allowed by the type. I don't expect
-this case to be particularly common, however.
-
-I chose the bonus to be (size - 20) so that application to 1 arg got
-same bonus as the old fixed bonus (i.e. opt_UF_FunAppDiscount, which is 60).
-If you have the bonus being (size - 40) then $fMonad[]_$c>>= with interesting
-2nd arg doesn't inline in cryptarithm2 so we lose some deforestation, and
-overall binary size hardly falls.

\begin{code}
-- | Finds a nominal size of a string literal.
@@ -615,23 +542,29 @@ funSize top_args fun n_val_args
where
some_val_args = n_val_args > 0

-        -- See Note [Function application discount]
-    arg_discount | some_val_args && fun elem top_args
-    		 = unitBag (fun, opt_UF_FunAppDiscount + (size - 20))
+    size | some_val_args = 10 * (1 + n_val_args)
+         | otherwise     = 0
+	-- The 1+ is for the function itself
+	-- Add 1 for each non-trivial arg;
+	-- the allocation cost, as in let(rec)
+
+        --                  DISCOUNTS
+        --  See Note [Function application discounts]
+    arg_discount | some_val_args && one_call fun top_args
+    		 = unitBag (fun, opt_UF_FunAppDiscount)
| otherwise = emptyBag
-- If the function is an argument and is applied
-- to some values, give it an arg-discount

-        -- See Note [Function application discount]
-    res_discount | idArity fun > n_val_args = opt_UF_FunAppDiscount + (size - 20)
+    res_discount | idArity fun > n_val_args = opt_UF_FunAppDiscount
| otherwise   	 	    = 0
-- If the function is partially applied, show a result discount
-    size | some_val_args = 10 * (1 + n_val_args)
-         | otherwise     = 0
-	-- The 1+ is for the function itself
-	-- Add 1 for each non-trivial arg;
-	-- the allocation cost, as in let(rec)

+    one_call _   []                     = False
+    one_call fun (arg:args) | fun==arg  = case idOccInfo arg of
+                                           OneOcc _ one_branch _ -> one_branch
+                                           _                     -> False
+                            | otherwise = one_call fun args

conSize :: DataCon -> Int -> ExprSize
conSize dc n_val_args
@@ -648,6 +581,21 @@ conSize dc n_val_args
-- [SDM, 25/5/11]
\end{code}

+Note [Function application discount]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+We want a discount if the function is applied. A good example is
+monadic combinators with continuation arguments, where inlining is
+quite important.
+
+But we don't want a big discount when a function is called many times
+(see the detailed comments with Trac #6048) because if the function is
+big it won't be inlined at its many call sites and no benefit results.
+Indeed, we can get exponentially big inlinings this way; that is what
+
+So, we only give a function-application discount when the function appears
+textually once, albeit possibly inside a lambda.
+
Note [Literal integer size]
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Literal integers *can* be big (mkInteger [...coefficients...]), but