possible solution! Re: llvm calling convention matters

Thu Sep 12 19:03:48 UTC 2013

emphasis on "very very clear warning"

On Thu, Sep 12, 2013 at 3:00 PM, Carter Schonwald <
carter.schonwald at gmail.com> wrote:

> after a bit more reflection: as long as we provide a clear warning that
> 7.8 may at some point no longer work with llvm 3.4, i'm down for the
> change. We just need to make it very very clear, that it may stop working.
> (and have AVX support via passing on the stack with <= 3.3)
>
> before i go and upstream that patch, could we benchmark how multivector
> perf fairs with  patched llvm? i don't have the right hardware for doing
> the benchmarks you did in your paper...
>
> sorry for being a bit over the top yesterday, i'm just juggling a lot
> right now :)
>
> -Carter
>
>
> On Thu, Sep 12, 2013 at 2:47 PM, Carter Schonwald <
> carter.schonwald at gmail.com> wrote:
>
>> oh, i didn't realize you had already done the work! (bah, i'm sorry, i
>> feel terrible)
>>
>> I thought i had communicated ~ a month ago that I was worried about
>> release engineering interaction with making it impossible to then make a
>> subsequent changes more thoughtfully because of the LLVM release cycle.
>> This concern of mine balloned a bit after helping triage a huge number of
>> problems people were hitting with the Clang transition on mac thats
>> underway.
>>
>> Its actually very easy to package up an llvm with that patch, much
>> simpler than "build GHC from source". In fact, on OS X, the simplest way to
>> install LLVM by default essentially does a build from source.
>>
>> Geoff, it'd at least be worth running the benchmarks to measure the work!
>> (and as I said, i'm happy to help)
>>
>>
>> On Thu, Sep 12, 2013 at 2:30 PM, Geoffrey Mainland <mainland at apeiron.net>wrote:
>>
>>> If users have to do a custom llvm build, we might as well ask them to
>>> build ghc from source too.
>>>
>>> Unless I misunderstood ticket #8033, you were originally quite gung-ho
>>> about changing the LLVM calling conventions to support passing SIMD
>>> vectors of all widths in registers on both x86-32 and -64, getting these
>>> patches into LLVM 3.4, and making sure that GHC 7.8 would support all
>>> this. I spent several days making sure this could happen from the GHC
>>> side. Now that the plan has changed, I will back out that work, and 7.8
>>> will only support passing 128-bit SIMD vectors in registers on x86-64.
>>> Other vectors sizes, and all vectors on x86-32, will be passed on the
>>> stack.
>>>
>>> Geoff
>>>
>>> On 9/12/13 1:32 PM, Carter Schonwald wrote:
>>> > to repeat:
>>> >
>>> >  I think no one would have object to having a clearly marked,
>>> > experimental -fllvmExpermentalAVX flag that requires building LLVM
>>> > with a specified patch, as a way to showcase your multivector work!
>>> >
>>> > that would evade all of my objections (provided avx is still exposed
>>> > with normal -fllvm, but spilled to stack rather than registers), and
>>> > i'd actually argue in favor of such.
>>> >
>>> > Especially since it would not impose any release cycle constraints  on
>>> > a subsequent, systematic exploration for using XMM / YMM / ZMM  in the
>>> > calling convention going forward.
>>> >
>>> > @Geoff, Simons, Johan, and others: does anyone object to that approach?
>>> >
>>> > applying such a calling convention patch to llvm is really quite
>>> > straightforward, and the build process is pretty zippy after that too.
>>> >
>>> > cheers
>>> > -Carter
>>> >
>>> >
>>> > On Thu, Sep 12, 2013 at 2:34 AM, Carter Schonwald
>>> > <carter.schonwald at gmail.com <mailto:carter.schonwald at gmail.com>>
>>> wrote:
>>> >
>>> >     that said it does occur to me that there is an alternative
>>> >     solution that may be acceptable for everyone!
>>> >
>>> >     what about providing a pseudo compatible way called
>>> >     -fllvm-experimentalAVX (or something), and simply require that for
>>> >     it to be used, the user has an llvm Patched with the YMM simd in
>>> >     register fun call support? internally that could just be an llvm
>>> >     way that trips the logic that puts the first few AVX values in
>>> >     those YMM1-6 slots if they are the first args, so only the stack
>>> >     spilling logic needs be changed?
>>> >
>>> >     (ie it wouldn't be tied to an llvm version, but rather this pseduo
>>> >     way flag)
>>> >
>>> >     does that make sense?
>>> >
>>> >     either way, i'd really like having avx even if its always spilled
>>> >     to stack at funcalls with standard LLVMs!
>>> >
>>> >     cheers
>>> >     -carter
>>> >
>>> >
>>> >
>>> >
>>> >     On Thu, Sep 12, 2013 at 2:28 AM, Carter Schonwald
>>> >     <carter.schonwald at gmail.com <mailto:carter.schonwald at gmail.com>>
>>> >     wrote:
>>> >
>>> >         Geoff,
>>> >
>>> >         a prosaic reason why there *might* be a fundamentally breaking
>>> >         change would be the following idea nathan howell suggested to
>>> >         me this afternoon: change the Sp and SPLim register so that
>>> >         the X86/x86_64 target can use the CPU's Push and (maybe) Pop
>>> >         instructions for the  stack manipulations, rather than MOV and
>>> >         fam.  see http://ghc.haskell.org/trac/ghc/ticket/8272 (which
>>> >         is just what i've said). Thats one change thats pretty simple
>>> >         but deep, but likely worth exploring.
>>> >
>>> >
>>> >         i'm saying any ABI change for GHC 7.10, would likely entail
>>> >         patching LLVM 3.4, because thats the only LLVM version likely
>>> >         to come out between now and whenever we get 7.10 out (assuming
>>> >         7.10 lands within the next 8-12 months, which is reasonable
>>> >         since we've got noticeably more (amazing) people  helping out
>>> >         lately). Thus, any change there entails either asking the llvm
>>> >         folks to support >1 GHC convention per architecture, or
>>> >         replace the current one!  I'd rather do the latter than the
>>> >         former, when it comes to asking other people to maintain it :)
>>> >         (and llvm engineers do in fact help out maintaining that code)
>>> >
>>> >
>>> >         have you run a Nofib, or even benchmarks restricted to your
>>> >         multivector code, for the current calling convention
>>> >         (including the spilling AVX vectors to the stack thats the
>>> >         current plan i gather) VS passing in registers with an LLVM
>>> >         built using the patches i worked out ~ 2 months ago?  it'd be
>>> >         really easy to build that custom llvm, then run the
>>> >         benchmarks! (i'm happy to help, and ultimately, benchmarks
>>> >         will reveal if its worth while or not! And if the main goal is
>>> >         for your talk, its still valid even if its not in the merge
>>> >         window over the next 4 days).
>>> >
>>> >         I really think its not obvious what the "best" abi
>>> >         change would be! It really will require coming up with a list
>>> >         of variants, implementing them, and running nofib with each
>>> >         variant, which i lack the compute/human time resources to do
>>> >         this week. Modern hardware is complex enough that for
>>> >         something like an ABI change, the only healthy attitude can be
>>> >         "lets benchmark it!".
>>> >
>>> >         i'd really like any change in calling convention to also
>>> >         improve perf on codes that aren't explicitly simd! (and a
>>> >         conservative simd only change, blocks/conflicts with that
>>> >         augmentation going forward, and not just for the stack pointer
>>> >         example i mention early)
>>> >
>>> >          Not just scalar floats in simd registers , but perhaps also
>>> >         words/ints !
>>> >
>>> >         (though that latter bit  might be pretty ambitious and subtle,
>>> >         i'll need to investigate that a bit to see how feasible it may
>>> >         be).
>>> >         SIMD has great support for  ints/words, and any partial abi
>>> >         change on the llvm backend now would make it hard to support
>>> >         that later well (or at least, thats what it looks like to me).
>>> >          actually effectively using simd for scalar ints and words
>>> >         should be doable, but might force us to be a bit more
>>> >         thoughtful on how GHC internally distinguishes ints used for
>>> >         address arithmetic, vs ints used as data.  (interestingly, i'm
>>> >         not sure if any current extent x86 calling convention does
>>> that!)
>>> >
>>> >
>>> >             That single change would make 7.10 require a completely
>>> >         different llvm and native code gen convention from our current
>>> >         one, plus touch all of the code gen on x86 architectures.
>>> >
>>> >
>>> >         basically: we're lucky that everyone builds haskell code from
>>> >         source, so ABI compat across GHC versions is a non issue. BUT,
>>> >         any ABI changes should be backed by benchmarks (at least when
>>> >         the change is performance motivated). Likewise, because we use
>>> >         LLVM as an external dep for the -fllvm backend, we really need
>>> >         to keep how their release cycle interacts with our release
>>> >         cycle, because people use haskell and ghc! which as many like
>>> >         to say, is both a boon and a pain ;).
>>> >
>>> >         Having people hit ghc acting broken with an llvm that was
>>> >         "supported before" is  risky support problem to deal with.
>>> >         having an LLVM head variant support a modified ABI, and then
>>> >         later needing to break it for 7.10 (for one of the possible
>>> >         exploratory reasons above) would lead to a support headache I
>>> >         don't wish on anyone.
>>> >
>>> >         pardon the verbose answer, but thats my offhand take
>>> >
>>> >         cheers
>>> >         -Carter
>>> >
>>> >
>>> >         On Wed, Sep 11, 2013 at 10:10 PM, Geoffrey Mainland
>>> >         <mainland at apeiron.net <mailto:mainland at apeiron.net>> wrote:
>>> >
>>> >             We support compiling some code with -fllvm and some not in
>>> >             the same
>>> >             executable. Otherwise how could users of the Haskell
>>> >             Platform link their
>>> >             -fllvm-compiled code with native-codegen-compiled
>>> >             libraries like base, etc.?
>>> >
>>> >             In other words, the LLVM and native back ends use the same
>>> >             calling
>>> >             convention. With my SIMD work, they still use the same
>>> calling
>>> >             conventions, but the native codegen can never generate
>>> >             code that uses
>>> >             SIMD instructions.
>>> >
>>> >             Geoff
>>> >
>>> >             On 09/11/2013 10:03 PM, Johan Tibell wrote:
>>> >             > OK. But that doesn't create a problem for the code we
>>> >             output with the
>>> >             > LLVM backend, no? Or do we support compiling some code
>>> >             with -fllvm and
>>> >             > some not in the same executable?
>>> >             >
>>> >             >
>>> >             > On Wed, Sep 11, 2013 at 6:56 PM, Geoffrey Mainland
>>> >             > <mainland at apeiron.net <mailto:mainland at apeiron.net>
>>> >             <mailto:mainland at apeiron.net
>>> >             <mailto:mainland at apeiron.net>>> wrote:
>>> >             >
>>> >             >     We definitely have interop between the native
>>> >             codegen and the LLVM
>>> >             >     back
>>> >             >     end now. Otherwise anyone who wanted to use the LLVM
>>> >             back end
>>> >             >     would have
>>> >             >     to build GHC themselves. Interop means that users
>>> >             can install the
>>> >             >     Haskell Platform and still use -fllvm when it makes
>>> >             a performance
>>> >             >     difference.
>>> >             >
>>> >             >     Geoff
>>> >             >
>>> >             >     On 09/11/2013 07:59 PM, Johan Tibell wrote:
>>> >             >     > Do nothing different than you're doing for 7.8, we
>>> >             can sort it out
>>> >             >     > later. Just put a comment on the primops saying
>>> >             they're
>>> >             >     LLVM-only. See
>>> >             >     > e.g.
>>> >             >     >
>>> >             >     >
>>> >             >     >
>>> >             >
>>> >
>>> https://github.com/ghc/ghc/blob/master/compiler/prelude/primops.txt.pp#L181
>>> >             >     >
>>> >             >     > for an example how to add docs to primops.
>>> >             >     >
>>> >             >     > I don't think we need interop between the native
>>> >             and the LLVM
>>> >             >     > backends. We don't have that now do we (i.e. they
>>> >             use different
>>> >             >     > calling conventions).
>>> >             >     >
>>> >             >     >
>>> >             >     >
>>> >             >     > On Wed, Sep 11, 2013 at 4:51 PM, Geoffrey Mainland
>>> >             >     > <mainland at apeiron.net
>>> >             <mailto:mainland at apeiron.net> <mailto:mainland at apeiron.net
>>> >             <mailto:mainland at apeiron.net>>
>>> >             >     <mailto:mainland at apeiron.net
>>> >             <mailto:mainland at apeiron.net> <mailto:mainland at apeiron.net
>>> >             <mailto:mainland at apeiron.net>>>> wrote:
>>> >             >     >
>>> >             >     >     On 09/11/2013 07:44 PM, Johan Tibell wrote:
>>> >             >     >     > On Wed, Sep 11, 2013 at 4:40 PM, Geoffrey
>>> >             Mainland
>>> >             >     >     <mainland at apeiron.net
>>> >             <mailto:mainland at apeiron.net> <mailto:mainland at apeiron.net
>>> >             <mailto:mainland at apeiron.net>>
>>> >             >     <mailto:mainland at apeiron.net
>>> >             <mailto:mainland at apeiron.net> <mailto:mainland at apeiron.net
>>> >             <mailto:mainland at apeiron.net>>>> wrote:
>>> >             >     >     > > Do you mean we need a reasonable emulation
>>> >             of the SIMD
>>> >             >     primops for
>>> >             >     >     > > the native codegen?
>>> >             >     >     >
>>> >             >     >     > Yes. Reasonable in the sense that it
>>> >             computes the right
>>> >             >     result.
>>> >             >     >     I can
>>> >             >     >     > see that some code might still want to
>>> >             #ifdef (if the
>>> >             >     fallback isn't
>>> >             >     >     > fast enough).
>>> >             >     >
>>> >             >     >     Two implications of this requirement:
>>> >             >     >
>>> >             >     >     1) There will not be SIMD in 7.8. I just don't
>>> >             have the
>>> >             >     time. In fact,
>>> >             >     >     what SIMD support is there already will have
>>> >             to be removed if we
>>> >             >     >     cannot
>>> >             >     >     live with LLVM-only SIMD primops.
>>> >             >     >
>>> >             >     >     2) If we also require interop between the LLVM
>>> >             back-end and
>>> >             >     the native
>>> >             >     >     codegen, then we cannot pass any SIMD vectors
>>> in
>>> >             >     registers---they all
>>> >             >     >     must be passed on the stack.
>>> >             >     >
>>> >             >     >     My plan, as discussed with Simon PJ, is to not
>>> >             support SIMD
>>> >             >     primops at
>>> >             >     >     all with the native codegen. If there is a
>>> >             strong feeling that
>>> >             >     >     this *is
>>> >             >     >     not* the way to go, the I need to know ASAP.
>>> >             >     >
>>> >             >     >     Geoff
>>> >             >     >
>>> >             >     >
>>> >             >     >
>>> >             >
>>> >             >
>>> >
>>> >
>>> >
>>> >
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130912/9b9e0e30/attachment.htm>