simd branch ready for review

Thu Jan 31 22:25:55 CET 2013

On 31 January 2013 12:30, Geoffrey Mainland <mainland at apeiron.net> wrote:
> On 01/31/2013 07:10 PM, David Terei wrote:
>> On 31 January 2013 09:52, Geoffrey Mainland <mainland at apeiron.net> wrote:
>>> On 01/31/2013 12:56 PM, Simon Marlow wrote:
>>>> On 31/01/13 11:38, Geoffrey Mainland wrote:
>>>>> * Win32 issues
>>>>>
>>>>> Modern 32-bit x86 *NIX systems align the stack to 16-bytes, but Win32
>>>>> aligns only to 4-bytes. LLVM does not assume 16-byte stack
>>>>> alignment. Instead, on platforms where 16-byte stack alignment is not
>>>>> guaranteed, it 1) always outputs a function prologue that 2) aligns
>>>>> the stack to a 16-byte boundary with an "and" instructions, and it
>>>>> also 3) disables tail calls. Because LLVM aligns the stack for a
>>>>> function that has SSE register spills, it also generates movaps
>>>>> instructions (aligned SSE moves) for the spills.
>>>>
>>>> I must be misunderstanding your use of "always" above, because that
>>>> would imply that the LLVM backend doesn't work on Win32 at all. Maybe
>>>> LLVM only aligns the stack when it needs to store SSE values?
>>>
>>> You are correct---the stack-aligning prologue is only added by LLVM when
>>> SSE values are written to the stack, so this wasn't a problem before we
>>> had SSE support.
>>>
>>>>> This makes SSE support on Win32 difficult, and in my opinion not
>>>>> worth worrying about.
>>>>>
>>>>> The alternative is to 1) patch LLVM to disable the stack-alignment
>>>>> code so that we recover the ability to use tail calls and so that ebp
>>>>> scribbled over by the prologue and 2) patch the mangler to rewrite
>>>>> LLVM's movaps (move aligned) instructions to movups (move unaligned)
>>>>> instructions. I have these patches, but they are not included in the
>>>>> simd branch.
>>>>
>>>> I don't have an opinion here - maybe ask David T what he'd prefer.
>>>
>>> Requiring an LLVM hack seems pretty bad, and David yelled when I changed
>>> the mangler since he wants to get rid of it eventually. My patches are
>>> still around, so if we decide Win32 support is important, I can always
>>> add the changes.
>>
>> Not supporting Win32 sucks but yes, I want to move to just requiring
>> LLVM un-patched and no mangler. How ugly are the patches for LLVM? I'd
>> be supportive of it if the plan is to get them merged upstream.
>> Otherwise, I don't think it is worth the effort of having to carry
>> around our own patched LLVM for installation on windows.
>
> The patch against LLVM 3.0 is here:
>
> https://github.com/mainland/ghc-simd-tests/blob/master/patches/llvm-3.0.patch
>
> If you were to look, you'd see that it's not appropriate for upstream
> integration. Please don't look :)

Done :).

>
> Since we have support for Win64 as of GHC 7.6, I vote that we forget
> about Win32 support for SSE.

Yes, I meant to ask about Win64. Strongly agreed.

>
> Simon, this reminds me of two other issues...
>
> 1) SSE vector values are only passed in registers on x86-64 anyway right
> now. MAX_REAL_FLOAT_REG and MAX_REAL_DOUBLE_REG are both #defined to 0
> on x86-32 in includes/stg/MachRegs.h. Are floats and double not passed
> in registers on x86-32? I'm confused as to how this works. The GHC
> calling convention in LLVM certainly says they are passed in registers.

Not on x86-32. From the LLVM userguide on the GHC calling convention:

"On X86-32 only supports up to 4 bit type parameters. No floating
point types are supported.
On X86-64 only supports up to 10 bit type parameters and 6 floating
point parameters."

>
> 2) SSE support is processor and platform dependent. What is the proper
> way for the programmer to know what SSE primitives are available? A CPP
> define? If so, what should it be called?
>
> Right now one can look at the TARGET_* and __GLASGOW_HASKELL_LLVM__ CPP
> macros and make a decision as to whether or not SSE primitives are
> available, but that's not a great solution. Also, what happens when we
> want to add AVX support? How do we control the inclusion of AVX support
> when building GHC, and how do we let the programmer know that the AVX
> primops/primtypes are available for use?
>
> Geoff
>