patch applied (ghc): The Big INLINE Patch: totally reorganise way that INLINE pragmas work

Simon Peyton Jones simonpj at microsoft.com
Thu Oct 29 11:58:48 EDT 2009


Thu Oct 29 07:30:51 PDT 2009  simonpj at microsoft.com
  * The Big INLINE Patch: totally reorganise way that INLINE pragmas work
  Ignore-this: b85cb41c6dc703597e137f438a2636d8
  
  This patch has been a long time in gestation and has, as a
  result, accumulated some extra bits and bobs that are only
  loosely related.  I separated the bits that are easy to split
  off, but the rest comes as one big patch, I'm afraid.
  
  Note that:
   * It comes together with a patch to the 'base' library
   * Interface file formats change slightly, so you need to
     recompile all libraries
  
  The patch is mainly giant tidy-up, driven in part by the
  particular stresses of the Data Parallel Haskell project. I don't
  expect a big performance win for random programs.  Still, here are the
  nofib results, relative to the state of affairs without the patch
  
          Program           Size    Allocs   Runtime   Elapsed
  --------------------------------------------------------------------------------
              Min         -12.7%    -14.5%    -17.5%    -17.8%
              Max          +4.7%    +10.9%     +9.1%     +8.4%
   Geometric Mean          +0.9%     -0.1%     -5.6%     -7.3%
  
  The +10.9% allocation outlier is rewrite, which happens to have a
  very delicate optimisation opportunity involving an interaction
  of CSE and inlining (see nofib/Simon-nofib-notes). The fact that
  the 'before' case found the optimisation is somewhat accidental.
  Runtimes seem to go down, but I never kno wwhether to really trust
  this number.  Binary sizes wobble a bit, but nothing drastic.
  
  
  The Main Ideas are as follows.
  
  InlineRules
  ~~~~~~~~~~~
  When you say 
        {-# INLINE f #-}
        f x = <rhs>
  you intend that calls (f e) are replaced by <rhs>[e/x] So we
  should capture (\x.<rhs>) in the Unfolding of 'f', and never meddle
  with it.  Meanwhile, we can optimise <rhs> to our heart's content,
  leaving the original unfolding intact in Unfolding of 'f'.
  
  So the representation of an Unfolding has changed quite a bit
  (see CoreSyn).  An INLINE pragma gives rise to an InlineRule 
  unfolding.  
  
  Moreover, it's only used when 'f' is applied to the
  specified number of arguments; that is, the number of argument on 
  the LHS of the '=' sign in the original source definition. 
  For example, (.) is now defined in the libraries like this
     {-# INLINE (.) #-}
     (.) f g = \x -> f (g x)
  so that it'll inline when applied to two arguments. If 'x' appeared
  on the left, thus
     (.) f g x = f (g x)
  it'd only inline when applied to three arguments.  This slightly-experimental
  change was requested by Roman, but it seems to make sense.
  
  Other associated changes
  
  * Moving the deck chairs in DsBinds, which processes the INLINE pragmas
  
  * In the old system an INLINE pragma made the RHS look like
     (Note InlineMe <rhs>)
    The Note switched off optimisation in <rhs>.  But it was quite
    fragile in corner cases. The new system is more robust, I believe.
    In any case, the InlineMe note has disappeared 
  
  * The workerInfo of an Id has also been combined into its Unfolding,
    so it's no longer a separate field of the IdInfo.
  
  * Many changes in CoreUnfold, esp in callSiteInline, which is the critical
    function that decides which function to inline.  Lots of comments added!
  
  * exprIsConApp_maybe has moved to CoreUnfold, since it's so strongly
    associated with "does this expression unfold to a constructor application".
    It can now do some limited beta reduction too, which Roman found 
    was an important.
  
  Instance declarations
  ~~~~~~~~~~~~~~~~~~~~~
  It's always been tricky to get the dfuns generated from instance
  declarations to work out well.  This is particularly important in 
  the Data Parallel Haskell project, and I'm now on my fourth attempt,
  more or less.
  
  There is a detailed description in TcInstDcls, particularly in
  Note [How instance declarations are translated].   Roughly speaking
  we now generate a top-level helper function for every method definition
  in an instance declaration, so that the dfun takes a particularly
  stylised form:
    dfun a d1 d2 = MkD (op1 a d1 d2) (op2 a d1 d2) ...etc...
  
  In fact, it's *so* stylised that we never need to unfold a dfun.
  Instead ClassOps have a special rewrite rule that allows us to
  short-cut dictionary selection.  Suppose dfun :: Ord a -> Ord [a]
                                              d :: Ord a
  Then   
      compare (dfun a d)  -->   compare_list a d 
  in one rewrite, without first inlining the 'compare' selector
  and the body of the dfun.
  
  To support this
  a) ClassOps have a BuiltInRule (see MkId.dictSelRule)
  b) DFuns have a special form of unfolding (CoreSyn.DFunUnfolding)
     which is exploited in CoreUnfold.exprIsConApp_maybe
  
  Implmenting all this required a root-and-branch rework of TcInstDcls
  and bits of TcClassDcl.
  
  
  Default methods
  ~~~~~~~~~~~~~~~
  If you give an INLINE pragma to a default method, it should be just
  as if you'd written out that code in each instance declaration, including
  the INLINE pragma.  I think that it now *is* so.  As a result, library
  code can be simpler; less duplication.
  
  
  The CONLIKE pragma
  ~~~~~~~~~~~~~~~~~~
  In the DPH project, Roman found cases where he had
  
     p n k = let x = replicate n k
             in ...(f x)...(g x)....
  
     {-# RULE f (replicate x) = f_rep x #-}
  
  Normally the RULE would not fire, because doing so involves 
  (in effect) duplicating the redex (replicate n k).  A new
  experimental modifier to the INLINE pragma, {-# INLINE CONLIKE
  replicate #-}, allows you to tell GHC to be prepared to duplicate
  a call of this function if it allows a RULE to fire.
  
  See Note [CONLIKE pragma] in BasicTypes
  
  
  Join points
  ~~~~~~~~~~~
  See Note [Case binders and join points] in Simplify
  
  
  Other refactoring
  ~~~~~~~~~~~~~~~~~
  * I moved endPass from CoreLint to CoreMonad, with associated jigglings
  
  * Better pretty-printing of Core
  
  * The top-level RULES (ones that are not rules for locally-defined things)
    are now substituted on every simplifier iteration.  I'm not sure how
    we got away without doing this before.  This entails a bit more plumbing
    in SimplCore.
  
  * The necessary stuff to serialise and deserialise the new
    info across interface files.
  
  * Something about bottoming floats in SetLevels
        Note [Bottoming floats]
  
  * substUnfolding has moved from SimplEnv to CoreSubs, where it belongs
  
  
  --------------------------------------------------------------------------------
          Program           Size    Allocs   Runtime   Elapsed
  --------------------------------------------------------------------------------
             anna          +2.4%     -0.5%      0.16      0.17
             ansi          +2.6%     -0.1%      0.00      0.00
             atom          -3.8%     -0.0%     -1.0%     -2.5%
           awards          +3.0%     +0.7%      0.00      0.00
           banner          +3.3%     -0.0%      0.00      0.00
       bernouilli          +2.7%     +0.0%     -4.6%     -6.9%
            boyer          +2.6%     +0.0%      0.06      0.07
           boyer2          +4.4%     +0.2%      0.01      0.01
             bspt          +3.2%     +9.6%      0.02      0.02
        cacheprof          +1.4%     -1.0%    -12.2%    -13.6%
         calendar          +2.7%     -1.7%      0.00      0.00
         cichelli          +3.7%     -0.0%      0.13      0.14
          circsim          +3.3%     +0.0%     -2.3%     -9.9%
         clausify          +2.7%     +0.0%      0.05      0.06
    comp_lab_zift          +2.6%     -0.3%     -7.2%     -7.9%
         compress          +3.3%     +0.0%     -8.5%     -9.6%
        compress2          +3.6%     +0.0%    -15.1%    -17.8%
      constraints          +2.7%     -0.6%    -10.0%    -10.7%
     cryptarithm1          +4.5%     +0.0%     -4.7%     -5.7%
     cryptarithm2          +4.3%    -14.5%      0.02      0.02
              cse          +4.4%     -0.0%      0.00      0.00
            eliza          +2.8%     -0.1%      0.00      0.00
            event          +2.6%     -0.0%     -4.9%     -4.4%
           exp3_8          +2.8%     +0.0%     -4.5%     -9.5%
           expert          +2.7%     +0.3%      0.00      0.00
              fem          -2.0%     +0.6%      0.04      0.04
              fft          -6.0%     +1.8%      0.05      0.06
             fft2          -4.8%     +2.7%      0.13      0.14
         fibheaps          +2.6%     -0.6%      0.05      0.05
             fish          +4.1%     +0.0%      0.03      0.04
            fluid          -2.1%     -0.2%      0.01      0.01
           fulsom          -4.8%     +9.2%     +9.1%     +8.4%
           gamteb          -7.1%     -1.3%      0.10      0.11
              gcd          +2.7%     +0.0%      0.05      0.05
      gen_regexps          +3.9%     -0.0%      0.00      0.00
           genfft          +2.7%     -0.1%      0.05      0.06
               gg          -2.7%     -0.1%      0.02      0.02
             grep          +3.2%     -0.0%      0.00      0.00
           hidden          -0.5%     +0.0%    -11.9%    -13.3%
              hpg          -3.0%     -1.8%     +0.0%     -2.4%
              ida          +2.6%     -1.2%      0.17     -9.0%
            infer          +1.7%     -0.8%      0.08      0.09
          integer          +2.5%     -0.0%     -2.6%     -2.2%
        integrate          -5.0%     +0.0%     -1.3%     -2.9%
          knights          +4.3%     -1.5%      0.01      0.01
             lcss          +2.5%     -0.1%     -7.5%     -9.4%
             life          +4.2%     +0.0%     -3.1%     -3.3%
             lift          +2.4%     -3.2%      0.00      0.00
        listcompr          +4.0%     -1.6%      0.16      0.17
         listcopy          +4.0%     -1.4%      0.17      0.18
         maillist          +4.1%     +0.1%      0.09      0.14
           mandel          +2.9%     +0.0%      0.11      0.12
          mandel2          +4.7%     +0.0%      0.01      0.01
          minimax          +3.8%     -0.0%      0.00      0.00
          mkhprog          +3.2%     -4.2%      0.00      0.00
       multiplier          +2.5%     -0.4%     +0.7%     -1.3%
         nucleic2          -9.3%     +0.0%      0.10      0.10
             para          +2.9%     +0.1%     -0.7%     -1.2%
        paraffins         -10.4%     +0.0%      0.20     -1.9%
           parser          +3.1%     -0.0%      0.05      0.05
          parstof          +1.9%     -0.0%      0.00      0.01
              pic          -2.8%     -0.8%      0.01      0.02
            power          +2.1%     +0.1%     -8.5%     -9.0%
           pretty         -12.7%     +0.1%      0.00      0.00
           primes          +2.8%     +0.0%      0.11      0.11
        primetest          +2.5%     -0.0%     -2.1%     -3.1%
           prolog          +3.2%     -7.2%      0.00      0.00
           puzzle          +4.1%     +0.0%     -3.5%     -8.0%
           queens          +2.8%     +0.0%      0.03      0.03
          reptile          +2.2%     -2.2%      0.02      0.02
          rewrite          +3.1%    +10.9%      0.03      0.03
             rfib          -5.2%     +0.2%      0.03      0.03
              rsa          +2.6%     +0.0%      0.05      0.06
              scc          +4.6%     +0.4%      0.00      0.00
            sched          +2.7%     +0.1%      0.03      0.03
              scs          -2.6%     -0.9%     -9.6%    -11.6%
           simple          -4.0%     +0.4%    -14.6%    -14.9%
            solid          -5.6%     -0.6%     -9.3%    -14.3%
          sorting          +3.8%     +0.0%      0.00      0.00
           sphere          -3.6%     +8.5%      0.15      0.16
           symalg          -1.3%     +0.2%      0.03      0.03
              tak          +2.7%     +0.0%      0.02      0.02
        transform          +2.0%     -2.9%     -8.0%     -8.8%
         treejoin          +3.1%     +0.0%    -17.5%    -17.8%
        typecheck          +2.9%     -0.3%     -4.6%     -6.6%
          veritas          +3.9%     -0.3%      0.00      0.00
             wang          -6.2%     +0.0%      0.18     -9.8%
        wave4main         -10.3%     +2.6%     -2.1%     -2.3%
     wheel-sieve1          +2.7%     -0.0%     +0.3%     -0.6%
     wheel-sieve2          +2.7%     +0.0%     -3.7%     -7.5%
             x2n1          -4.1%     +0.1%      0.03      0.04
  --------------------------------------------------------------------------------
              Min         -12.7%    -14.5%    -17.5%    -17.8%
              Max          +4.7%    +10.9%     +9.1%     +8.4%
   Geometric Mean          +0.9%     -0.1%     -5.6%     -7.3%

    M ./compiler/basicTypes/BasicTypes.lhs -61 +100
    M ./compiler/basicTypes/Id.lhs -22 +12
    M ./compiler/basicTypes/IdInfo.lhs -101 +45
    M ./compiler/basicTypes/MkId.lhs -7 +36
    M ./compiler/basicTypes/Name.lhs -1 +6
    M ./compiler/basicTypes/OccName.lhs -6 +7
    M ./compiler/coreSyn/CoreArity.lhs -7 +17
    M ./compiler/coreSyn/CoreFVs.lhs -11 +49
    M ./compiler/coreSyn/CoreLint.lhs -47 +2
    M ./compiler/coreSyn/CorePrep.lhs -3 +2
    M ./compiler/coreSyn/CoreSubst.lhs -37 +192
    M ./compiler/coreSyn/CoreSyn.lhs -93 +188
    M ./compiler/coreSyn/CoreTidy.lhs -12 +24
    M ./compiler/coreSyn/CoreUnfold.lhs -279 +442
    M ./compiler/coreSyn/CoreUtils.lhs -181 +21
    M ./compiler/coreSyn/MkExternalCore.lhs -1
    M ./compiler/coreSyn/PprCore.lhs -45 +99
    M ./compiler/cprAnalysis/CprAnalyse.lhs -2 +2
    M ./compiler/deSugar/Desugar.lhs -23 +21
    M ./compiler/deSugar/DsBinds.lhs -199 +252
    M ./compiler/deSugar/DsExpr.lhs -2 +4
    M ./compiler/deSugar/DsForeign.lhs -4 +6
    M ./compiler/deSugar/DsMeta.hs -16 +19
    M ./compiler/deSugar/DsMonad.lhs -3 +3
    M ./compiler/deSugar/Match.lhs -4 +5
    M ./compiler/hsSyn/Convert.lhs -6 +8
    M ./compiler/hsSyn/HsBinds.lhs -35 +27
    M ./compiler/hsSyn/HsUtils.lhs -2 +7
    M ./compiler/iface/BinIface.hs -17 +40
    M ./compiler/iface/IfaceSyn.lhs -18 +34
    M ./compiler/iface/MkIface.lhs -29 +25
    M ./compiler/iface/TcIface.lhs -38 +52
    M ./compiler/main/TidyPgm.lhs -115 +120
    M ./compiler/parser/Parser.y.pp -8 +8
    M ./compiler/parser/ParserCore.y -5 +6
    M ./compiler/parser/RdrHsSyn.lhs -11 +17
    M ./compiler/prelude/PrelRules.lhs -2 +3
    M ./compiler/simplCore/CSE.lhs -2 +1
    M ./compiler/simplCore/CoreMonad.lhs -10 +87
    M ./compiler/simplCore/FloatIn.lhs -11 +8
    M ./compiler/simplCore/FloatOut.lhs -23
    M ./compiler/simplCore/OccurAnal.lhs -82 +98
    M ./compiler/simplCore/SetLevels.lhs -58 +54
    M ./compiler/simplCore/SimplCore.lhs -98 +120
    M ./compiler/simplCore/SimplEnv.lhs -26 +23
    M ./compiler/simplCore/SimplUtils.lhs -49 +38
    M ./compiler/simplCore/Simplify.lhs -157 +243
    M ./compiler/specialise/Rules.lhs -15 +36
    M ./compiler/specialise/Specialise.lhs -15 +25
    M ./compiler/stranal/DmdAnal.lhs -1 +1
    M ./compiler/stranal/StrictAnal.lhs -1
    M ./compiler/stranal/WorkWrap.lhs -12 +24
    M ./compiler/stranal/WwLib.lhs -1 +1
    M ./compiler/typecheck/Inst.lhs -1 +3
    M ./compiler/typecheck/TcBinds.lhs -29 +91
    M ./compiler/typecheck/TcClassDcl.lhs -79 +90
    M ./compiler/typecheck/TcDeriv.lhs -5 +5
    M ./compiler/typecheck/TcForeign.lhs -1 +1
    M ./compiler/typecheck/TcGenDeriv.lhs -13 +13
    M ./compiler/typecheck/TcHsSyn.lhs -8 +5
    M ./compiler/typecheck/TcInstDcls.lhs -129 +317
    M ./compiler/typecheck/TcRnDriver.lhs -2 +11
    M ./compiler/typecheck/TcSimplify.lhs -10 +10
    M ./compiler/typecheck/TcType.lhs -6 +16
    M ./compiler/types/InstEnv.lhs -7 +11
    M ./compiler/vectorise/VectCore.hs -1 +3
    M ./compiler/vectorise/VectType.hs +7

View patch online:
http://darcs.haskell.org/ghc/_darcs/patches/20091029143051-1287e-f0cbff4635bdcca336cf98ea2d19d46fac865c23.gz



More information about the Cvs-ghc mailing list