Is it possible INLINE didn't inline the function because it's recursive? If it were my function, I'd probably try a manual worker /wrapper.<br><br><div class="gmail_quote">On 07:59, Wed, Dec 17, 2014 Simon Peyton Jones <<a href="mailto:simonpj@microsoft.com">simonpj@microsoft.com</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I still would like to understand why INLINE does not make it inline. That's weird.<br>

<br>

Eg way to reproduce.<br>

<br>

Simion<br>

<br>

|  -----Original Message-----<br>

|  From: Richard Eisenberg [mailto:<a href="mailto:eir@cis.upenn.edu" target="_blank">eir@cis.upenn.edu</a>]<br>

|  Sent: 17 December 2014 15:56<br>

|  To: Simon Peyton Jones<br>

|  Cc: Joachim Breitner; <a href="mailto:ghc-devs@haskell.org" target="_blank">ghc-devs@haskell.org</a><br>

|  Subject: Re: performance regressions<br>

|<br>

|  By unsubstantiated guess is that INLINEABLE would have the same effect<br>

|  as INLINE here, as GHC doesn't see fit to actually inline the<br>

|  function, even with INLINE -- the big improvement seen between (1) and<br>

|  (2) is actually specialization, not inlining. The jump from (2) to (3)<br>

|  is actual inlining. Thus, it seems that GHC's heuristics for inlining<br>

|  aren't working out for the best here.<br>

|<br>

|  I've pushed my changes, though I agree with Simon that more research<br>

|  may uncover even more improvements here. I didn't focus on the number<br>

|  of calls because that number didn't regress. Will look into this soon.<br>

|<br>

|  Richard<br>

|<br>

|  On Dec 17, 2014, at 4:15 AM, Simon Peyton Jones<br>

|  <<a href="mailto:simonpj@microsoft.com" target="_blank">simonpj@microsoft.com</a>> wrote:<br>

|<br>

|  > If you use INLINEABLE, that should make the function specialisable<br>

|  to a particular monad, even if it's in a different module. You<br>

|  shouldn't need INLINE for that.<br>

|  ><br>

|  > I don't understand the difference between cases (2) and (3).<br>

|  ><br>

|  > I am still suspicious of why there are so many calls to this one<br>

|  function that it, alone, is allocating a significant proportion of<br>

|  compilation of the entire run of GHC.  Are you sure there isn't an<br>

|  algorithmic improvement to be had, to simply reduce the number of<br>

|  calls?<br>

|  ><br>

|  > Simon<br>

|  ><br>

|  > |  -----Original Message-----<br>

|  > |  From: ghc-devs [mailto:<a href="mailto:ghc-devs-bounces@haskell.org" target="_blank">ghc-devs-bounces@<u></u>haskell.org</a>] On Behalf Of<br>

|  > | Richard Eisenberg<br>

|  > |  Sent: 16 December 2014 21:46<br>

|  > |  To: Joachim Breitner<br>

|  > |  Cc: <a href="mailto:ghc-devs@haskell.org" target="_blank">ghc-devs@haskell.org</a><br>

|  > |  Subject: Re: performance regressions<br>

|  > |<br>

|  > |  I've learned several very interesting things in this analysis.<br>

|  > |<br>

|  > |  - Inlining polymorphic methods is very important. Here are some<br>

|  > | data  points to back up that claim:<br>

|  > |     * Original implementation using zipWithAndUnzipM:<br>

|  8,472,613,440<br>

|  > |  bytes allocated in the heap<br>

|  > |     * Adding {-# INLINE #-} to the definition thereof:<br>

|  6,639,253,488<br>

|  > |  bytes allocated in the heap<br>

|  > |     * Using `inline` at call site to force inlining:<br>

|  6,281,539,792<br>

|  > |  bytes allocated in the heap<br>

|  > |<br>

|  > |  The middle step above allowed GHC to specialize zipWithAndUnzipM<br>

|  to<br>

|  > | my  particular monad, but GHC didn't see fit to actually inline<br>

|  the<br>

|  > | function. Using `inline` forced it, to good effect. (I did not<br>

|  > | collect  data on code sizes, but it wouldn't be hard to.)<br>

|  > |<br>

|  > |  By comparison:<br>

|  > |     * Hand-written recursion:    6,587,809,112 bytes allocated in<br>

|  the<br>

|  > |  heap<br>

|  > |  Interestingly, this is *not* the best result!<br>

|  > |<br>

|  > |  Conclusion: We should probably add INLINE pragmas to Util and<br>

|  > | MonadUtils.<br>

|  > |<br>

|  > |<br>

|  > |  - I then looked at rejiggering the algorithm to keep the common<br>

|  > | case  fast. This had a side effect of changing the<br>

|  zipWithAndUnzipM<br>

|  > | to  mapAndUnzipM, from Control.Monad. To my surprise, this brought<br>

|  > | disaster!<br>

|  > |     * Using `inline` and mapAndUnzipM:        7,463,047,432 bytes<br>

|  > |  allocated in the heap<br>

|  > |     * Hand-written recursion:                 5,848,602,848 bytes<br>

|  > |  allocated in the heap<br>

|  > |<br>

|  > |  That last number is better than the numbers above because of the<br>

|  > | algorithm streamlining. But, the inadequacy of mapAndUnzipM<br>

|  > | surprised  me -- it already has an INLINE pragma in Control.Monad<br>

|  of course.<br>

|  > |  Looking at -ddump-simpl, it seems that mapAndUnzipM was indeed<br>

|  > | getting  inlined, but a call to `map` remained, perhaps causing<br>

|  > | extra  allocation.<br>

|  > |<br>

|  > |  Conclusion: We should examine the implementation of mapAndUnzipM<br>

|  > | (and  similar functions) in Control.Monad. Is it as fast as<br>

|  possible?<br>

|  > |<br>

|  > |<br>

|  > |<br>

|  > |  In the end, I was unable to bring the allocation numbers down to<br>

|  > | where  they were before my work. This is because the flattener now<br>

|  > | deals in  roles. Most of its behavior is the same between nominal<br>

|  > | and  representational roles, so it seems silly (though very<br>

|  > | possible) to  specialize the code to nominal to keep that path<br>

|  fast.<br>

|  > | Instead, I  identified one key spot and made that go fast.<br>

|  > |<br>

|  > |  Thus, there is a 7% bump to memory usage on very-type-family-<br>

|  heavy<br>

|  > | code, compared to before my commit on Friday. (On more ordinary<br>

|  > | code,  there is no noticeable change.)<br>

|  > |<br>

|  > |  Validating my patch locally now; will push when that's done.<br>

|  > |<br>

|  > |  Thanks,<br>

|  > |  Richard<br>

|  > |<br>

|  > |  On Dec 16, 2014, at 10:41 AM, Joachim Breitner <mail@joachim-<br>

|  > | <a href="http://breitner.de" target="_blank">breitner.de</a>> wrote:<br>

|  > |<br>

|  > |  > Hi,<br>

|  > |  ><br>

|  > |  ><br>

|  > |  > Am Dienstag, den 16.12.2014, 09:59 -0500 schrieb Richard<br>

|  Eisenberg:<br>

|  > |  >> On Dec 16, 2014, at 4:01 AM, Joachim Breitner <mail@joachim-<br>

|  > | <a href="http://breitner.de" target="_blank">breitner.de</a>> wrote:<br>

|  > |  >><br>

|  > |  >>> another guess (without looking at the code, sorry): Are they<br>

|  in<br>

|  > | the  >>> same module? I.e., can GHC specialize the code to your<br>

|  > | particular  Monad?<br>

|  > |  ><br>

|  > |  >> No, they're not in the same module. I could also try moving<br>

|  the<br>

|  > | >> zipWithAndUnzipM function to the same module, and even<br>

|  > | specializing  >> it by hand to the right monad.<br>

|  > |  ><br>

|  > |  > I did mean zipWithAndUnzipM, so maybe yes: Try that.<br>

|  > |  ><br>

|  > |  > (I find it hard to believe that any polymorphic monadic code<br>

|  > | should  > perform well, with those many calls to an unknown (>>=)<br>

|  > | with a  > function parameter, but maybe I'm too pessimistic here.)<br>

|  > | >  >  >> Could that be preventing the fusing?<br>

|  > |  ><br>

|  > |  > There is not going to be any fusing here, at least not list<br>

|  > | fusion;  > that would require your code to be written in terms of<br>

|  > | functions  with  > fusion rules.<br>

|  > |  ><br>

|  > |  > Greetings,<br>

|  > |  > Joachim<br>

|  > |  ><br>

|  > |  > --<br>

|  > |  > Joachim "nomeata" Breitner<br>

|  > |  >  <a href="mailto:mail@joachim-breitner.de" target="_blank">mail@joachim-breitner.de</a> * <a href="http://www.joachim-breitner.de/" target="_blank">http://www.joachim-breitner.<u></u>de/</a>  ><br>

|  > | Jabber: <a href="mailto:nomeata@joachim-breitner.de" target="_blank">nomeata@joachim-breitner.de</a>  * GPG-Key: 0xF0FBF51F  Debian<br>

|  > | > Developer: <a href="mailto:nomeata@debian.org" target="_blank">nomeata@debian.org</a>  >  ><br>

|  > | ______________________________<u></u>_________________<br>

|  > |  > ghc-devs mailing list<br>

|  > |  > <a href="mailto:ghc-devs@haskell.org" target="_blank">ghc-devs@haskell.org</a><br>

|  > |  > <a href="http://www.haskell.org/mailman/listinfo/ghc-devs" target="_blank">http://www.haskell.org/<u></u>mailman/listinfo/ghc-devs</a><br>

|  > |<br>

|  > |  ______________________________<u></u>_________________<br>

|  > |  ghc-devs mailing list<br>

|  > |  <a href="mailto:ghc-devs@haskell.org" target="_blank">ghc-devs@haskell.org</a><br>

|  > |  <a href="http://www.haskell.org/mailman/listinfo/ghc-devs" target="_blank">http://www.haskell.org/<u></u>mailman/listinfo/ghc-devs</a><br>

|  ><br>

<br>

______________________________<u></u>_________________<br>

ghc-devs mailing list<br>

<a href="mailto:ghc-devs@haskell.org" target="_blank">ghc-devs@haskell.org</a><br>

<a href="http://www.haskell.org/mailman/listinfo/ghc-devs" target="_blank">http://www.haskell.org/<u></u>mailman/listinfo/ghc-devs</a><br>

</blockquote></div>