cvs commit: fptools/ghc/rts Main.c

Simon Marlow simonmar@microsoft.com
Mon, 6 Aug 2001 10:57:15 +0100


This is a multi-part message in MIME format.

------_=_NextPart_001_01C11E5E.2C266652
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable


> I have questions for all you seasoned GHC developers:
>=20
> On 2001-07-26T02:26:34-0700, Julian Seward (Intl Vendor) wrote:
> > Sounds like a potential memory management/corruption bug.
> > Do you have an example program which causes this to happen
> > on Alpha, so we can see if it can be repro'd on other plats?
>=20
> (See additional quoted message for original commit log.)
>=20
> The original problem was simply that the conc004 test
> (ghc/tests/concurrent/should_run/conc004.hs) fails on alpha-dec-osf3
> (my homegrown version with patches on top of ghc-5.00.2), but not
> i686-pc-linux-gnu (the distributed version ghc-5.00.2).  I thought I
> magically fixed by adding the initialization call to tzset(), in my
> commit.  It did allow conc004 to complete successfully on
> alpha-dec-osf3.
>=20
> As it recently turned out, it doesn't seem to have to.  Once in a
> while, when using the abovementioned version of ghc-5.00.2 on
> alpha-dec-osf3 to compile GHC itself, I still get a segmentation
> fault.  It happens in Schedule.c: Right after the
>=20
>     switch (cap->rCurrentTSO->what_next) {
>         ...
>     }
>=20
> there's the innocuous line
>=20
>     t =3D cap->rCurrentTSO;
>=20
> but cap is null by then, not &MainRegTable anymore!  It appears that
> gcc put the cap variable in register $s3 (same as the register used
> for Hp), but StgRun is supposed to have saved $s3 on the stack and
> restored it on return.

Ok, this looks like a classic case of the stack being clobbered
somewhere in STG land.  The slot in which %s3 was saved is being
overwritten with a NULL somehow.  I've taken a quick look at the code in
StgCRun for the Alpha, and it looks reasonable, so probably the best way
to track this down is to find a repeatable example and use a watchpoint
in GDB to find who's stomping on the stack.

> I'm not sure what's going on here.
>=20
> I then discovered sanity checking, so I started running ghc-inplace
> with "+RTS -D128 -RTS".  It would always seg fault:
>=20
>     puffin:~/u/glasgow/puffin2/ghc/lib/std$ gdb=20
> ../../compiler/ghc-5.01 core
>     GNU gdb 4.18
>     Copyright 1998 Free Software Foundation, Inc.
>     GDB is free software, covered by the GNU General Public=20
> License, and you are
>     welcome to change it and/or distribute copies of it under=20
> certain conditions.
>     Type "show copying" to see the conditions.
>     There is absolutely no warranty for GDB.  Type "show=20
> warranty" for details.
>     This GDB was configured as "alphaev56-dec-osf4.0e"...
>     Core was generated by `ghc-5.01'.
>     Program terminated with signal 11, Segmentation fault.
>     #0  0x120d6fe98 in checkClosure (p=3D0x183bae1c8) at Sanity.c:220
>     220             ASSERT(!closure_STATIC(p));
>     (gdb) bt
>     #0  0x120d6fe98 in checkClosure (p=3D0x183bae1c8) at Sanity.c:220
>     #1  0x120d70c78 in checkHeap (bd=3D0x183b02b80) at Sanity.c:472
>     #2  0x120d611b4 in checkSanity () at Storage.c:736
>     #3  0x120d6b850 in GarbageCollect (get_roots=3D0x120d5e470=20
> <GetRoots>,
>         force_major_gc=3D304) at GC.c:923
>     #4  0x120d5db40 in schedule () at Schedule.c:1222
>     #5  0x120d5e3cc in waitThread (tso=3D0x1800bc000, ret=3D0x0)=20
> at Schedule.c:1956
>     #6  0x120d69850 in rts_evalIO (p=3D0x140098620, ret=3D0x0) at=20
> RtsAPI.c:421
>     #7  0x120d58ae0 in main (argc=3D18, argv=3D0x11fffc018) at =
Main.c:120
>=20
> (Note: HEAP_BASE is 0x180000000.)  I don't know what we mean by "slop"
> in the GC/Sanity code, but this seems to be slop at my first glance:
>=20
>     (gdb) p p
>     $1 =3D (StgClosure *) 0x183bae1c8
>     (gdb) p *p
>     $2 =3D {header =3D {info =3D 0x42000001}, payload =3D 0x183bae1d0}
>     (gdb) p *((StgSelector*)p)
>     $3 =3D {header =3D {info =3D 0x42000001}, selectee =3D =
0x120849770}
>     (gdb) up
>     #1  0x120d70c78 in checkHeap (bd=3D0x183b02b80) at Sanity.c:472
>     472                 nat size =3D checkClosure((StgClosure *)p);
>     (gdb) p *bd
>     $4 =3D {start =3D 0x183bae000, free =3D 0x183baefe8, link =3D=20
> 0x183b02940, u =3D {
>         back =3D 0x0, bitmap =3D 0x0}, gen_no =3D 1, step =3D=20
> 0x140200000, blocks =3D 1,
>       flags =3D 0, _padding =3D {0, 0}}
>=20
> Is this a real sanity problem, or a problem with the sanity check
> code?  Is there another strategy that anyone would suggest for
> tracking down the original segfault-in-scheduler bug?

It's hard to tell what the problem is, without seeing the contents of
the memory around p.  I've attached my .gdbinit file which has a few
useful macros in it - in particular I like to use p4 & p8 which print
out the 4 or 8 words starting from the given address as addresses (i.e.
looking up symbols).  So you would do 'p8 p-4' to print the 8 words
around p, for example.  And 'pinfo' dumps an info table, so 'pinfo *p'
gives the info table of the closure pointed to by p.

The sanity check code is working fine on x86, BTW.  It's probably an
incorrect 32-bit assumption somewhere.

Cheers,
	Simon

------_=_NextPart_001_01C11E5E.2C266652
Content-Type: application/octet-stream;
	name="gdb.init"
Content-Transfer-Encoding: base64
Content-Description: gdb.init
Content-Disposition: attachment;
	filename="gdb.init"

ZGVmaW5lIHByZWdzCnByaW50ICooU3RnUmVnVGFibGUgKikkZWJ4CmVuZAoKZGVmaW5lIHB0c28K
cHJpbnQgKigoU3RnUmVnVGFibGUgKikkZWJ4KS0+ckN1cnJlbnRUU08KZW5kCgpkZWZpbmUgcFIx
CnByaW50ICgoKFN0Z1JlZ1RhYmxlKU1haW5SZWdUYWJsZSkuclIxKQplbmQKZGVmaW5lIHBSMgpw
cmludCAoKChTdGdSZWdUYWJsZSlNYWluUmVnVGFibGUpLnJSMikKZW5kCmRlZmluZSBwUjMKcHJp
bnQgKCgoU3RnUmVnVGFibGUpTWFpblJlZ1RhYmxlKS5yUjMpCmVuZApkZWZpbmUgcFI0CnByaW50
ICgoKFN0Z1JlZ1RhYmxlKU1haW5SZWdUYWJsZSkuclI0KQplbmQKZGVmaW5lIHBSNQpwcmludCAo
KChTdGdSZWdUYWJsZSlNYWluUmVnVGFibGUpLnJSNSkKZW5kCmRlZmluZSBwUjYKcHJpbnQgKCgo
U3RnUmVnVGFibGUpTWFpblJlZ1RhYmxlKS5yUjYpCmVuZApkZWZpbmUgcFI3CnByaW50ICgoKFN0
Z1JlZ1RhYmxlKU1haW5SZWdUYWJsZSkuclI3KQplbmQKZGVmaW5lIHBSOApwcmludCAoKChTdGdS
ZWdUYWJsZSlNYWluUmVnVGFibGUpLnJSOCkKZW5kCmRlZmluZSBwRmx0MQpwcmludCAoU3RnRmxv
YXQpICgoKFN0Z1JlZ1RhYmxlKU1haW5SZWdUYWJsZSkuckZsdDEpCmVuZApkZWZpbmUgcERibDEK
cHJpbnQgKFN0Z0RvdWJsZSkgKCgoU3RnUmVnVGFibGUpTWFpblJlZ1RhYmxlKS5yRGJsMSkKZW5k
CgpkZWZpbmUgcFNwCnByaW50ICgoKFN0Z1JlZ1RhYmxlKU1haW5SZWdUYWJsZSkuclNwKQplbmQK
ZGVmaW5lIHBTdQpwcmludCAoKChTdGdSZWdUYWJsZSlNYWluUmVnVGFibGUpLnJTdSkKZW5kCmRl
ZmluZSBwU3BMaW0KcHJpbnQgKCgoU3RnUmVnVGFibGUpTWFpblJlZ1RhYmxlKS5yU3BMaW0pCmVu
ZAoKZGVmaW5lIHBIcApwcmludCAoKChTdGdSZWdUYWJsZSlNYWluUmVnVGFibGUpLnJIcCkKZW5k
CmRlZmluZSBwSHBMaW0KcHJpbnQgKCgoU3RnUmVnVGFibGUpTWFpblJlZ1RhYmxlKS5ySHBMaW0p
CmVuZAoKZGVmaW5lIHBzdGsKcG1lbSAkZWJwIDE2CmVuZAoKZGVmaW5lIHBzdGtfZ2MKcG1lbSBN
YWluVFNPLT5zcCAxNgplbmQKCmRlZmluZSBwbWVtCnNldCAkaSA9ICRhcmcxCndoaWxlICRpID49
IDAKeC8xYSAoKChpbnQgKikkYXJnMCkgKyRpKQpzZXQgJGkgPSAkaSAtIDEKZW5kCmVuZAoKZGVm
aW5lIHA0CnBtZW0gJGFyZzAgNAplbmQKCmRlZmluZSBwOApwbWVtICRhcmcwIDgKZW5kCgpkZWZp
bmUgcDE2CnBtZW0gJGFyZzAgMTYKZW5kCgpkZWZpbmUgcG1lbV9mb3J3YXJkcwpzZXQgJGkgPSAw
CndoaWxlICRpIDwgJGFyZzEKeC8xYSAoKChpbnQgKikkYXJnMCkgKyAkaSkKc2V0ICRpID0gJGkg
KyAxCmVuZAplbmQKCmRlZmluZSBwaGVhcApwbWVtICRlZGktMTYgMTYKZW5kCgpkZWZpbmUgZHNp
CmRpc3BsYXkgL2kgJHBjCnNpCmVuZAoKZGVmaW5lIHBpbmZvCnAgKigoU3RnSW5mb1RhYmxlICop
JGFyZzAtMSkKZW5kCgpkZWZpbmUgcGJkZXNjcgpwICogKChiZGVzY3IgKikoKCgkYXJnMCAmIDB4
ZmZmMDAwMDApIHwgKCgkYXJnMCAmIDB4ZmYwMDApID4+IDcpKSAmIDB4ZmZmZmZmZTApKQplbmQK
CmRlZmluZSBwZ2VuCnAgZ2VuZXJhdGlvbnNbKChiZGVzY3IgKikoKCgkYXJnMCAmIDB4ZmZmMDAw
MDApIHwgKCgkYXJnMCAmIDB4ZmYwMDApID4+IDcpKSAmIDB4ZmZmZmZmZTApKS0+Z2VuX25vXQpw
ICogKChiZGVzY3IgKikoKCgkYXJnMCAmIDB4ZmZmMDAwMDApIHwgKCgkYXJnMCAmIDB4ZmYwMDAp
ID4+IDcpKSAmIDB4ZmZmZmZmZTApKS0+c3RlcAplbmQKCmRlZmluZSBnZXRtYXJrCnNldCAkYmQg
PSAoYmRlc2NyICopKCgoJGFyZzAgJiAweGZmZjAwMDAwKSB8ICgoJGFyZzAgJiAweGZmMDAwKSA+
PiA3KSkgJiAweGZmZmZmZmUwKQpzZXQgJG9mZnNldCA9IChTdGdQdHIpJGFyZzAgLSAkYmQtPnN0
YXJ0CnNldCAkYml0bWFwX3dvcmQgPSAkYmQtPnUuYml0bWFwICsgKCRvZmZzZXQgLyAzMikKc2V0
ICRtYXNrID0gMSA8PCAoJG9mZnNldCAmIDMxKQpwICgqJGJpdG1hcF93b3JkICYgJG1hc2spICE9
IDAKZW5kCgojIGlnbm9yZSBTSUdQSVBFcwpoYW5kbGUgU0lHUElQRSBub3N0b3Agbm9wcmludCBp
Z25vcmUK

------_=_NextPart_001_01C11E5E.2C266652--