LLVM 3.2 failure

Austin Seipp aseipp at pobox.com
Thu Mar 14 22:36:04 CET 2013


My stage2 compiler got built and also fails on any compilation, no matter
how trivial. After linking stage2, my build fails with:

$ make
===--- building phase 0
make -r --no-print-directory -f ghc.mk phase=0 phase_0_builds
make[1]: Nothing to be done for `phase_0_builds'.
===--- building phase 1
make -r --no-print-directory -f ghc.mk phase=1 phase_1_builds
make[1]: Nothing to be done for `phase_1_builds'.
===--- building final phase
make -r --no-print-directory -f ghc.mk phase=final all
"inplace/bin/ghc-stage2" -static  -H64m -O0 -fllvm    -package-name
old-time-1.1.0.1 -hide-all-packages -i -ilibraries/old-time/.
-ilibraries/old-time/dist-install/build
-ilibraries/old-time/dist-install/build/autogen
-Ilibraries/old-time/dist-install/build
-Ilibraries/old-time/dist-install/build/autogen
-Ilibraries/old-time/include    -optP-include
-optPlibraries/old-time/dist-install/build/autogen/cabal_macros.h -package
base-4.7.0.0 -package old-locale-1.0.0.5  -XHaskell98 -XCPP
-XForeignFunctionInterface -O -fllvm  -no-user-package-db -rtsopts
-odir libraries/old-time/dist-install/build -hidir
libraries/old-time/dist-install/build -stubdir
libraries/old-time/dist-install/build -hisuf hi -osuf  o -hcsuf hc -c
libraries/old-time/dist-install/build/System/Time.hs -o
libraries/old-time/dist-install/build/System/Time.o
make[1]: *** [libraries/old-time/dist-install/build/System/Time.o]
Segmentation fault: 11


Just compiling 'hello world' is enough to trigger it, however:

$ cat hi.hs
main = putStrLn "hello world"
$ ./inplace/bin/ghc-stage2 -v3 -fforce-recomp hi.hs
           ⏎
Glasgow Haskell Compiler, Version 7.7.20130313, stage 2 booted by GHC
version 7.6.2
Using binary package database:
/Users/a/ghc/ghc-pristine/inplace/lib/package.conf.d/package.cache
wired-in package ghc-prim mapped to ghc-prim-0.3.1.0-inplace
wired-in package integer-gmp mapped to integer-gmp-0.5.1.0-inplace
wired-in package base mapped to base-4.7.0.0-inplace
wired-in package rts mapped to builtin_rts
wired-in package template-haskell mapped to template-haskell-2.9.0.0-inplace
wired-in package dph-seq not found.
wired-in package dph-par not found.
Hsc static flags:
wired-in package ghc-prim mapped to ghc-prim-0.3.1.0-inplace
wired-in package integer-gmp mapped to integer-gmp-0.5.1.0-inplace
wired-in package base mapped to base-4.7.0.0-inplace
wired-in package rts mapped to builtin_rts
wired-in package template-haskell mapped to template-haskell-2.9.0.0-inplace
wired-in package dph-seq not found.
wired-in package dph-par not found.
*** Chasing dependencies:
Chasing modules from: *hi.hs
Stable obj: []
Stable BCO: []
Ready for upsweep
  [NONREC
      ModSummary {
         ms_hs_date = 2013-03-14 19:58:30 UTC
         ms_mod = main:Main,
         ms_textual_imps = [import (implicit) Prelude]
         ms_srcimps = []
      }]
*** Deleting temp files:
Deleting:
compile: input file hi.hs
Created temporary directory:
/var/folders/f6/rjtvxfp92j3ffvm3zs7hv7vh0000gn/T/ghc39205_0
*** Checking old interface for main:Main:
[1 of 1] Compiling Main             ( hi.hs, hi.o )
*** Parser:
*** Renamer/typechecker:
*** Desugar:
Result size of Desugar (after optimization)
  = {terms: 7, types: 5, coercions: 0}
*** Simplifier:
[1]    39205 segmentation fault  ./inplace/bin/ghc-stage2 -v3
-fforce-recomp hi.hs

It says the segfault occurs near the simplifier, but I'm a little skeptical
this is the actual cause. Digging further with LLDB on my Mac OS X machine,
I can see this:

$ lldb /Users/a/ghc/ghc-pristine/inplace/lib/bin/ghc-stage2 --
-B/Users/a/ghc/ghc-pristine/inplace/lib -v3 -fforce-recomp hi.hs

Current executable set to
'/Users/a/ghc/ghc-pristine/inplace/lib/bin/ghc-stage2' (x86_64).
(lldb) r
Process 39787 launched:
'/Users/a/ghc/ghc-pristine/inplace/lib/bin/ghc-stage2' (x86_64)
Glasgow Haskell Compiler, Version 7.7.20130313, stage 2 booted by GHC
version 7.6.2
Using binary package database:
/Users/a/ghc/ghc-pristine/inplace/lib/package.conf.d/package.cache
wired-in package ghc-prim mapped to ghc-prim-0.3.1.0-inplace
wired-in package integer-gmp mapped to integer-gmp-0.5.1.0-inplace
wired-in package base mapped to base-4.7.0.0-inplace
wired-in package rts mapped to builtin_rts
wired-in package template-haskell mapped to template-haskell-2.9.0.0-inplace
wired-in package dph-seq not found.
wired-in package dph-par not found.
Hsc static flags:
wired-in package ghc-prim mapped to ghc-prim-0.3.1.0-inplace
wired-in package integer-gmp mapped to integer-gmp-0.5.1.0-inplace
wired-in package base mapped to base-4.7.0.0-inplace
wired-in package rts mapped to builtin_rts
wired-in package template-haskell mapped to template-haskell-2.9.0.0-inplace
wired-in package dph-seq not found.
wired-in package dph-par not found.
*** Chasing dependencies:
Chasing modules from: *hi.hs
Stable obj: []
Stable BCO: []
Ready for upsweep
  [NONREC
      ModSummary {
         ms_hs_date = 2013-03-14 19:58:30 UTC
         ms_mod = main:Main,
         ms_textual_imps = [import (implicit) Prelude]
         ms_srcimps = []
      }]
*** Deleting temp files:
Deleting:
compile: input file hi.hs
Created temporary directory:
/var/folders/f6/rjtvxfp92j3ffvm3zs7hv7vh0000gn/T/ghc39787_0
*** Checking old interface for main:Main:
[1 of 1] Compiling Main             ( hi.hs, hi.o )
*** Parser:
*** Renamer/typechecker:
*** Desugar:
Result size of Desugar (after optimization)
  = {terms: 7, types: 5, coercions: 0}
*** Simplifier:
Process 39787 stopped
* thread #1: tid = 0x1c03, 0x0000000101f9a153 ghc-stage2`threadPaused +
163, stop reason = EXC_BAD_ACCESS (code=1, address=0xfffffffffffffff8)
    frame #0: 0x0000000101f9a153 ghc-stage2`threadPaused + 163
ghc-stage2`threadPaused + 163:
-> 0x101f9a153:  movl   -8(%rax), %ecx
   0x101f9a156:  addl   $-31, %ecx
   0x101f9a159:  cmpl   $7, %ecx
   0x101f9a15c:  ja     0x101f9a2ed               ; threadPaused + 573

So the crash is actually happening in the RTS. Looking at the disassembly
of the current frame, we see:

(lldb) disassemble -f
ghc-stage2`threadPaused:
   0x101f9a0b0:  pushq  %rbp
   0x101f9a0b1:  pushq  %r15
   0x101f9a0b3:  pushq  %r14
   0x101f9a0b5:  pushq  %r13
   0x101f9a0b7:  pushq  %r12
   0x101f9a0b9:  pushq  %rbx
   0x101f9a0ba:  subq   $56, %rsp
   0x101f9a0be:  movq   %rsi, %rbx
   0x101f9a0c1:  movq   %rdi, 48(%rsp)
   0x101f9a0c6:  movq   %rbx, %rsi
   0x101f9a0c9:  callq  0x101f8f350               ;
maybePerformBlockedException
   0x101f9a0ce:  cmpw   $3, 32(%rbx)
   0x101f9a0d3:  je     0x101f9a540               ; threadPaused + 1168
   0x101f9a0d9:  movq   24(%rbx), %rax
   0x101f9a0dd:  movl   8(%rax), %ecx
   0x101f9a0e0:  leaq   24(%rax,%rcx,8), %rcx
   0x101f9a0e5:  movq   %rcx, 40(%rsp)
   0x101f9a0ea:  movq   16(%rax), %r14
   0x101f9a0ee:  xorl   %eax, %eax
   0x101f9a0f0:  movl   %eax, 12(%rsp)
   0x101f9a0f4:  leaq   1109(%rip), %r15          ; threadPaused + 1184
   0x101f9a0fb:  leaq   102230(%rip), %r12        ; stg_WHITEHOLE_info
   0x101f9a102:  movl   %eax, 8(%rsp)
   0x101f9a106:  movl   %eax, 32(%rsp)
   0x101f9a10a:  movl   %eax, %r13d
   0x101f9a10d:  jmpq   0x101f9a305               ; threadPaused + 597
   0x101f9a112:  movslq (%r15,%rcx,4), %rcx
   0x101f9a116:  addq   %r15, %rcx
   0x101f9a119:  jmpq   *%rcx
   0x101f9a11b:  cmpq   1097902(%rip), %rax       ; (void
*)0x0000000101fb4a28: stg_marked_upd_frame_info
   0x101f9a122:  jne    0x101f9a164               ; threadPaused + 180
   0x101f9a124:  testl  %r13d, %r13d
   0x101f9a127:  je     0x101f9a310               ; threadPaused + 608
   0x101f9a12d:  movl   32(%rsp), %r13d
   0x101f9a132:  addl   %r13d, 8(%rsp)
   0x101f9a137:  addl   $2, 12(%rsp)
   0x101f9a13c:  jmpq   0x101f9a310               ; threadPaused + 608
   0x101f9a141:  nopw   %cs:(%rax,%rax)
   0x101f9a150:  movq   (%r14), %rax
-> 0x101f9a153:  movl   -8(%rax), %ecx
   0x101f9a156:  addl   $-31, %ecx
   0x101f9a159:  cmpl   $7, %ecx
   0x101f9a15c:  ja     0x101f9a2ed               ; threadPaused + 573
   ... lots more code

The segfaulting instruction attempts to load ECX from RAX, but RAX is null,
hence the access violation on 0xfffffffffffffff8

(lldb) register read
General Purpose Registers:
       rax = 0x0000000000000000
       rbx = 0x00000001061d4000
       rcx = 0x00000000fffffff0
       rdx = 0x00000001024e10a0  ghc-stage2`large_alloc_lim
       rdi = 0x00000001024d8540  ghc-stage2`MainCapability
       rsi = 0x00000001061d4000
       rbp = 0x00000001020c6760
 ghczm7zi7zi20130313_Demand_cprProdSig_closure + 16
       rsp = 0x00007fff5fbfb3f0
        r8 = 0x00000001020c8789
 ghczm7zi7zi20130313_IdInfo_MayHaveCafRefs_closure + 1
        r9 = 0x00000001020c8779
 ghczm7zi7zi20130313_IdInfo_NoLBVarInfo_closure + 1
       r10 = 0x0000000100685920  ghc-stage2`s5or_info
       r11 = 0x00000001051e98c8
       r12 = 0x0000000101fb3058  ghc-stage2`stg_WHITEHOLE_info
       r13 = 0x0000000000000000
       r14 = 0x00000001020c67d8
 ghczm7zi7zi20130313_Demand_worthSplittingThunk_closure + 8
       r15 = 0x0000000101f9a550  ghc-stage2`threadPaused + 1184
       rip = 0x0000000101f9a153  ghc-stage2`threadPaused + 163
    rflags = 0x0000000000010287
        cs = 0x000000000000002b
        fs = 0x0000000000000000
        gs = 0x0000000000000000


So there's still something weird going on. Looking at rts/ThreadPaused.c in
threadPaused, we see some code like:

    while ((P_)frame < stack_end) {
        info = get_ret_itbl(frame);

        switch (info->i.type) {

        case UPDATE_FRAME:

            // If we've already marked this frame, then stop here.
            if (frame->header.info == (StgInfoTable
*)&stg_marked_upd_frame_info) {
                if (prev_was_update_frame) {
                    words_to_squeeze += sizeofW(StgUpdateFrame);
                    weight += weight_pending;
                    weight_pending = 0;
                }
                goto end;
            }

            SET_INFO(frame, (StgInfoTable *)&stg_marked_upd_frame_info);

            bh = ((StgUpdateFrame *)frame)->updatee;
            bh_info = bh->header.info;

Which I believe roughly corresponds to this assembly:

   0x101f9a11b:  cmpq   1097902(%rip), %rax       ; (void
*)0x0000000101fb4a28: stg_marked_upd_frame_info
   0x101f9a122:  jne    0x101f9a164               ; threadPaused + 180 #
check if frame->header.info = stg_marked_upd_frame_info
   0x101f9a124:  testl  %r13d, %r13d             ; Check if
prev_was_update_frame == 0
   0x101f9a127:  je     0x101f9a310               ; threadPaused + 608; if
prev_was_update_frame == 0
   0x101f9a12d:  movl   32(%rsp), %r13d       ; increment words_to_squeeze,
etc
   0x101f9a132:  addl   %r13d, 8(%rsp)          ; same as above
   0x101f9a137:  addl   $2, 12(%rsp)              ; same as above
   0x101f9a13c:  jmpq   0x101f9a310               ; threadPaused + 608;
exit if frame->header.info == stg_marked_up_frame_info
   0x101f9a141:  nopw   %cs:(%rax,%rax)        ; Crap opcodes for alignment
   0x101f9a150:  movq   (%r14), %rax              ; Load info into RAX
-> 0x101f9a153:  movl   -8(%rax), %ecx          ; deref info->i.type

Due to optimization settings, the code is rather reorganized and coalesced
for being nice to the processor, but the segfault actually occurs on this
line:

    while ((P_)frame < stack_end) {
        info = get_ret_itbl(frame);

        switch (info->i.type) { <- SEGFAULT HERE

So this seems to be some interaction between the compiler and info table
layout, possibly? If we rebuild the stage2 compiler with a debug RTS and
disassemble with source, we see the same thing:

$ lldb ...
...
...
*** Desugar:
Result size of Desugar (after optimization)
  = {terms: 7, types: 5, coercions: 0}
*** Simplifier:
Process 42406 stopped
* thread #1: tid = 0x1c03, 0x0000000101fb79c1
ghc-stage2`threadPaused(cap=0x0000000102521980, tso=0x00000001060d0000) +
177 at ThreadPaused.c:223, stop reason = EXC_BAD_ACCESS (code=1,
address=0xfffffffffffffff8)
    frame #0: 0x0000000101fb79c1
ghc-stage2`threadPaused(cap=0x0000000102521980, tso=0x00000001060d0000) +
177 at ThreadPaused.c:223
   220     while ((P_)frame < stack_end) {
   221         info = get_ret_itbl(frame);
   222
-> 223 switch (info->i.type) {
   224
   225 case UPDATE_FRAME:
   226
(lldb) disassemble -f -m
ghc-stage2`threadPaused + 157 at ThreadPaused.c:221
   220     while ((P_)frame < stack_end) {
   221         info = get_ret_itbl(frame);
   222
   0x101fb79ad:  movq   -32(%rbp), %rax
   0x101fb79b1:  movq   %rax, %rdi
   0x101fb79b4:  callq  0x101fb7730               ; get_ret_itbl at
ClosureMacros.h:88
   0x101fb79b9:  movq   %rax, -40(%rbp)
ghc-stage2`threadPaused + 173 at ThreadPaused.c:223
   222
   223 switch (info->i.type) {
   224
   0x101fb79bd:  movq   -40(%rbp), %rax
-> 0x101fb79c1:  movl   16(%rax), %eax
   0x101fb79c4:  leal   -37(%rax), %ecx
   0x101fb79c7:  cmpl   $2, %ecx
   0x101fb79ca:  movl   %eax, -96(%rbp)
   0x101fb79cd:  jb     0x101fb7c56               ; threadPaused + 838 at
ThreadPaused.c:342
   0x101fb79d3:  movl   -96(%rbp), %eax
   0x101fb79d6:  cmpl   $35, %eax
   0x101fb79d9:  jne    0x101fb7c58               ; threadPaused + 840 at
ThreadPaused.c:352
ghc-stage2`threadPaused + 207 at ThreadPaused.c:228
   227             // If we've already marked this frame, then stop here.
   228             if (frame->header.info == (StgInfoTable
*)&stg_marked_upd_frame_info) {
   229                 if (prev_was_update_frame) {
   0x101fb79df:  movq   -32(%rbp), %rax
   0x101fb79e3:  movq   (%rax), %rax
   0x101fb79e6:  leaq   210779(%rip), %rcx        ;
stg_marked_upd_frame_info
   0x101fb79ed:  leaq   (%rcx), %rcx
   0x101fb79f0:  cmpq   %rcx, %rax
   0x101fb79f3:  jne    0x101fb7a1d               ; threadPaused + 269 at
ThreadPaused.c:237
...
...
...


Geoffrey, if you have time, can you confirm this behavior on your Linux
machine with LLVM 3.2? I think we should really fix this; it's rather
unfortunate if we have to tell users to use some specific LLVM 3.3 SVN
revision, or stay on 3.1 (and it's a pain to keep multiple LLVM installs
synchronized.) On that note, we should test this with 3.1 as well possibly.

You can rebuild a stage2 GHC with the debug RTS very easily - that'll give
you RTS source and extra sanity checks, etc. Just run it under GDB instead
and look at the trace. You can rebuild stage2 by saying (from the top-level
source directory.)

$ cd ghc
$ make re2 GhcDebugged=YES

And the new inplace/bin/ghc-stage2 compiler will have the debug runtime
enabled. I don't have a lot of more time to dig at this exact moment. I'll
look more tonight when I have time.


On Thu, Mar 14, 2013 at 2:23 PM, Geoffrey Mainland <mainland at apeiron.net>wrote:

> My stage 2 compiler was crashing the first time it was invoked.
>
> I just finished building GHC HEAD using LLVM compiled from HEAD, and that
> worked, so perhaps this was just a 3.2 bug. I have yet to run the
> testsuite though.
>
> Geoff
>
> On 03/14/2013 07:16 PM, Jan Stolarek wrote:
> > Goeff, Austin,
> >
> > do the build errors always happen with the same module or do they
> occur randomly? If the former,
> > which modules do you have problems with?
> >
> > Janek
>
>
>


-- 
Regards,
Austin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130314/577a9a53/attachment-0001.htm>


More information about the ghc-devs mailing list