possible solution! Re: llvm calling convention matters

Thu Sep 12 19:00:42 UTC 2013

after a bit more reflection: as long as we provide a clear warning that 7.8
may at some point no longer work with llvm 3.4, i'm down for the change. We
just need to make it very very clear, that it may stop working. (and have
AVX support via passing on the stack with <= 3.3)

before i go and upstream that patch, could we benchmark how multivector
perf fairs with  patched llvm? i don't have the right hardware for doing
the benchmarks you did in your paper...

sorry for being a bit over the top yesterday, i'm just juggling a lot right
now :)

-Carter

On Thu, Sep 12, 2013 at 2:47 PM, Carter Schonwald <
carter.schonwald at gmail.com> wrote:

> oh, i didn't realize you had already done the work! (bah, i'm sorry, i
> feel terrible)
>
> I thought i had communicated ~ a month ago that I was worried about
> release engineering interaction with making it impossible to then make a
> subsequent changes more thoughtfully because of the LLVM release cycle.
> This concern of mine balloned a bit after helping triage a huge number of
> problems people were hitting with the Clang transition on mac thats
> underway.
>
> Its actually very easy to package up an llvm with that patch, much simpler
> than "build GHC from source". In fact, on OS X, the simplest way to install
> LLVM by default essentially does a build from source.
>
> Geoff, it'd at least be worth running the benchmarks to measure the work!
> (and as I said, i'm happy to help)
>
>
> On Thu, Sep 12, 2013 at 2:30 PM, Geoffrey Mainland <mainland at apeiron.net>wrote:
>
>> If users have to do a custom llvm build, we might as well ask them to
>> build ghc from source too.
>>
>> Unless I misunderstood ticket #8033, you were originally quite gung-ho
>> about changing the LLVM calling conventions to support passing SIMD
>> vectors of all widths in registers on both x86-32 and -64, getting these
>> patches into LLVM 3.4, and making sure that GHC 7.8 would support all
>> this. I spent several days making sure this could happen from the GHC
>> side. Now that the plan has changed, I will back out that work, and 7.8
>> will only support passing 128-bit SIMD vectors in registers on x86-64.
>> Other vectors sizes, and all vectors on x86-32, will be passed on the
>> stack.
>>
>> Geoff
>>
>> On 9/12/13 1:32 PM, Carter Schonwald wrote:
>> > to repeat:
>> >
>> >  I think no one would have object to having a clearly marked,
>> > experimental -fllvmExpermentalAVX flag that requires building LLVM
>> > with a specified patch, as a way to showcase your multivector work!
>> >
>> > that would evade all of my objections (provided avx is still exposed
>> > with normal -fllvm, but spilled to stack rather than registers), and
>> > i'd actually argue in favor of such.
>> >
>> > Especially since it would not impose any release cycle constraints  on
>> > a subsequent, systematic exploration for using XMM / YMM / ZMM  in the
>> > calling convention going forward.
>> >
>> > @Geoff, Simons, Johan, and others: does anyone object to that approach?
>> >
>> > applying such a calling convention patch to llvm is really quite
>> > straightforward, and the build process is pretty zippy after that too.
>> >
>> > cheers
>> > -Carter
>> >
>> >
>> > On Thu, Sep 12, 2013 at 2:34 AM, Carter Schonwald
>> > <carter.schonwald at gmail.com <mailto:carter.schonwald at gmail.com>> wrote:
>> >
>> >     that said it does occur to me that there is an alternative
>> >     solution that may be acceptable for everyone!
>> >
>> >     what about providing a pseudo compatible way called
>> >     -fllvm-experimentalAVX (or something), and simply require that for
>> >     it to be used, the user has an llvm Patched with the YMM simd in
>> >     register fun call support? internally that could just be an llvm
>> >     way that trips the logic that puts the first few AVX values in
>> >     those YMM1-6 slots if they are the first args, so only the stack
>> >     spilling logic needs be changed?
>> >
>> >     (ie it wouldn't be tied to an llvm version, but rather this pseduo
>> >     way flag)
>> >
>> >     does that make sense?
>> >
>> >     either way, i'd really like having avx even if its always spilled
>> >     to stack at funcalls with standard LLVMs!
>> >
>> >     cheers
>> >     -carter
>> >
>> >
>> >
>> >
>> >     On Thu, Sep 12, 2013 at 2:28 AM, Carter Schonwald
>> >     <carter.schonwald at gmail.com <mailto:carter.schonwald at gmail.com>>
>> >     wrote:
>> >
>> >         Geoff,
>> >
>> >         a prosaic reason why there *might* be a fundamentally breaking
>> >         change would be the following idea nathan howell suggested to
>> >         me this afternoon: change the Sp and SPLim register so that
>> >         the X86/x86_64 target can use the CPU's Push and (maybe) Pop
>> >         instructions for the  stack manipulations, rather than MOV and
>> >         fam.  see http://ghc.haskell.org/trac/ghc/ticket/8272 (which
>> >         is just what i've said). Thats one change thats pretty simple
>> >         but deep, but likely worth exploring.
>> >
>> >
>> >         i'm saying any ABI change for GHC 7.10, would likely entail
>> >         patching LLVM 3.4, because thats the only LLVM version likely
>> >         to come out between now and whenever we get 7.10 out (assuming
>> >         7.10 lands within the next 8-12 months, which is reasonable
>> >         since we've got noticeably more (amazing) people  helping out
>> >         lately). Thus, any change there entails either asking the llvm
>> >         folks to support >1 GHC convention per architecture, or
>> >         replace the current one!  I'd rather do the latter than the
>> >         former, when it comes to asking other people to maintain it :)
>> >         (and llvm engineers do in fact help out maintaining that code)
>> >
>> >
>> >         have you run a Nofib, or even benchmarks restricted to your
>> >         multivector code, for the current calling convention
>> >         (including the spilling AVX vectors to the stack thats the
>> >         current plan i gather) VS passing in registers with an LLVM
>> >         built using the patches i worked out ~ 2 months ago?  it'd be
>> >         really easy to build that custom llvm, then run the
>> >         benchmarks! (i'm happy to help, and ultimately, benchmarks
>> >         will reveal if its worth while or not! And if the main goal is
>> >         for your talk, its still valid even if its not in the merge
>> >         window over the next 4 days).
>> >
>> >         I really think its not obvious what the "best" abi
>> >         change would be! It really will require coming up with a list
>> >         of variants, implementing them, and running nofib with each
>> >         variant, which i lack the compute/human time resources to do
>> >         this week. Modern hardware is complex enough that for
>> >         something like an ABI change, the only healthy attitude can be
>> >         "lets benchmark it!".
>> >
>> >         i'd really like any change in calling convention to also
>> >         improve perf on codes that aren't explicitly simd! (and a
>> >         conservative simd only change, blocks/conflicts with that
>> >         augmentation going forward, and not just for the stack pointer
>> >         example i mention early)
>> >
>> >          Not just scalar floats in simd registers , but perhaps also
>> >         words/ints !
>> >
>> >         (though that latter bit  might be pretty ambitious and subtle,
>> >         i'll need to investigate that a bit to see how feasible it may
>> >         be).
>> >         SIMD has great support for  ints/words, and any partial abi
>> >         change on the llvm backend now would make it hard to support
>> >         that later well (or at least, thats what it looks like to me).
>> >          actually effectively using simd for scalar ints and words
>> >         should be doable, but might force us to be a bit more
>> >         thoughtful on how GHC internally distinguishes ints used for
>> >         address arithmetic, vs ints used as data.  (interestingly, i'm
>> >         not sure if any current extent x86 calling convention does
>> that!)
>> >
>> >
>> >             That single change would make 7.10 require a completely
>> >         different llvm and native code gen convention from our current
>> >         one, plus touch all of the code gen on x86 architectures.
>> >
>> >
>> >         basically: we're lucky that everyone builds haskell code from
>> >         source, so ABI compat across GHC versions is a non issue. BUT,
>> >         any ABI changes should be backed by benchmarks (at least when
>> >         the change is performance motivated). Likewise, because we use
>> >         LLVM as an external dep for the -fllvm backend, we really need
>> >         to keep how their release cycle interacts with our release
>> >         cycle, because people use haskell and ghc! which as many like
>> >         to say, is both a boon and a pain ;).
>> >
>> >         Having people hit ghc acting broken with an llvm that was
>> >         "supported before" is  risky support problem to deal with.
>> >         having an LLVM head variant support a modified ABI, and then
>> >         later needing to break it for 7.10 (for one of the possible
>> >         exploratory reasons above) would lead to a support headache I
>> >         don't wish on anyone.
>> >
>> >         pardon the verbose answer, but thats my offhand take
>> >
>> >         cheers
>> >         -Carter
>> >
>> >
>> >         On Wed, Sep 11, 2013 at 10:10 PM, Geoffrey Mainland
>> >         <mainland at apeiron.net <mailto:mainland at apeiron.net>> wrote:
>> >
>> >             We support compiling some code with -fllvm and some not in
>> >             the same
>> >             executable. Otherwise how could users of the Haskell
>> >             Platform link their
>> >             -fllvm-compiled code with native-codegen-compiled
>> >             libraries like base, etc.?
>> >
>> >             In other words, the LLVM and native back ends use the same
>> >             calling
>> >             convention. With my SIMD work, they still use the same
>> calling
>> >             conventions, but the native codegen can never generate
>> >             code that uses
>> >             SIMD instructions.
>> >
>> >             Geoff
>> >
>> >             On 09/11/2013 10:03 PM, Johan Tibell wrote:
>> >             > OK. But that doesn't create a problem for the code we
>> >             output with the
>> >             > LLVM backend, no? Or do we support compiling some code
>> >             with -fllvm and
>> >             > some not in the same executable?
>> >             >
>> >             >
>> >             > On Wed, Sep 11, 2013 at 6:56 PM, Geoffrey Mainland
>> >             > <mainland at apeiron.net <mailto:mainland at apeiron.net>
>> >             <mailto:mainland at apeiron.net
>> >             <mailto:mainland at apeiron.net>>> wrote:
>> >             >
>> >             >     We definitely have interop between the native
>> >             codegen and the LLVM
>> >             >     back
>> >             >     end now. Otherwise anyone who wanted to use the LLVM
>> >             back end
>> >             >     would have
>> >             >     to build GHC themselves. Interop means that users
>> >             can install the
>> >             >     Haskell Platform and still use -fllvm when it makes
>> >             a performance
>> >             >     difference.
>> >             >
>> >             >     Geoff
>> >             >
>> >             >     On 09/11/2013 07:59 PM, Johan Tibell wrote:
>> >             >     > Do nothing different than you're doing for 7.8, we
>> >             can sort it out
>> >             >     > later. Just put a comment on the primops saying
>> >             they're
>> >             >     LLVM-only. See
>> >             >     > e.g.
>> >             >     >
>> >             >     >
>> >             >     >
>> >             >
>> >
>> https://github.com/ghc/ghc/blob/master/compiler/prelude/primops.txt.pp#L181
>> >             >     >
>> >             >     > for an example how to add docs to primops.
>> >             >     >
>> >             >     > I don't think we need interop between the native
>> >             and the LLVM
>> >             >     > backends. We don't have that now do we (i.e. they
>> >             use different
>> >             >     > calling conventions).
>> >             >     >
>> >             >     >
>> >             >     >
>> >             >     > On Wed, Sep 11, 2013 at 4:51 PM, Geoffrey Mainland
>> >             >     > <mainland at apeiron.net
>> >             <mailto:mainland at apeiron.net> <mailto:mainland at apeiron.net
>> >             <mailto:mainland at apeiron.net>>
>> >             >     <mailto:mainland at apeiron.net
>> >             <mailto:mainland at apeiron.net> <mailto:mainland at apeiron.net
>> >             <mailto:mainland at apeiron.net>>>> wrote:
>> >             >     >
>> >             >     >     On 09/11/2013 07:44 PM, Johan Tibell wrote:
>> >             >     >     > On Wed, Sep 11, 2013 at 4:40 PM, Geoffrey
>> >             Mainland
>> >             >     >     <mainland at apeiron.net
>> >             <mailto:mainland at apeiron.net> <mailto:mainland at apeiron.net
>> >             <mailto:mainland at apeiron.net>>
>> >             >     <mailto:mainland at apeiron.net
>> >             <mailto:mainland at apeiron.net> <mailto:mainland at apeiron.net
>> >             <mailto:mainland at apeiron.net>>>> wrote:
>> >             >     >     > > Do you mean we need a reasonable emulation
>> >             of the SIMD
>> >             >     primops for
>> >             >     >     > > the native codegen?
>> >             >     >     >
>> >             >     >     > Yes. Reasonable in the sense that it
>> >             computes the right
>> >             >     result.
>> >             >     >     I can
>> >             >     >     > see that some code might still want to
>> >             #ifdef (if the
>> >             >     fallback isn't
>> >             >     >     > fast enough).
>> >             >     >
>> >             >     >     Two implications of this requirement:
>> >             >     >
>> >             >     >     1) There will not be SIMD in 7.8. I just don't
>> >             have the
>> >             >     time. In fact,
>> >             >     >     what SIMD support is there already will have
>> >             to be removed if we
>> >             >     >     cannot
>> >             >     >     live with LLVM-only SIMD primops.
>> >             >     >
>> >             >     >     2) If we also require interop between the LLVM
>> >             back-end and
>> >             >     the native
>> >             >     >     codegen, then we cannot pass any SIMD vectors in
>> >             >     registers---they all
>> >             >     >     must be passed on the stack.
>> >             >     >
>> >             >     >     My plan, as discussed with Simon PJ, is to not
>> >             support SIMD
>> >             >     primops at
>> >             >     >     all with the native codegen. If there is a
>> >             strong feeling that
>> >             >     >     this *is
>> >             >     >     not* the way to go, the I need to know ASAP.
>> >             >     >
>> >             >     >     Geoff
>> >             >     >
>> >             >     >
>> >             >     >
>> >             >
>> >             >
>> >
>> >
>> >
>> >
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130912/2cb85e25/attachment.htm>