FW: Observation re Hugs GC problem on sparcs w/gcc -O2

Julian Seward (Intl Vendor) v-julsew@microsoft.com
Tue, 1 May 2001 04:55:08 -0700


Looking back through old mail.  I mailed this to the Hugs maintainers
about 2 years ago, but I think it is still relevant.  It contains a
suggestion about why GC doesn't always work right on Hugs on sparcs
when type.c/static.c/whatever are compiled -O/-O2, and an easy fix.

J

| -----Original Message-----
| From: Julian Seward (Intl Vendor) [mailto:v-julsew@microsoft.com]=20
| Sent: Thursday, April 22, 1999 10:13 AM
| To: 'hugs-bugs@cs.yale.edu'
| Cc: 'D.Wakeling@exeter.ac.uk'
| Subject: Observation re Hugs GC problem on sparcs w/gcc -O2
|=20
|=20
|=20
| Mark
|=20
| I seem to gather that (normal) Hugs has GC problems on Sparc=20
| with optimisation on.  David W also reported this recently.
|=20
| I got the impression that you think this happens because the=20
| register window mechanism magically obscures some registers=20
| which could hold roots, at the point when GC occurs.
|=20
| I would like to offer a different explanation, plus an easy=20
| fix.  When I was working for the UFO project at Manchester,=20
| we did a collector which took roots from the C stack.  It too=20
| had problems on sparcs, and the following observation saved=20
| the day.  I think it might equally apply to Hugs.
|=20
| In the code below, complexFnReturningAPair is obvious,
| and ... compute_and_alloc ... refers to some code which
| may cause a garbage collection.
|=20
|=20
| Pair complexFnReturningAPair ( ... )   { .... }
|=20
|=20
|    Pair p;
|=20
|    p =3D complexFnReturningAPair ( ... );
|    for ( ... ) {
|       ... fst(p) ... snd(p) ...
|       ... compute_and_alloc ...
|    }
|=20
| So: p becomes a pair.  We then enter a loop which refers to
| fst(p) and/or snd(p), and in which GC could happen. After cppisation,=20
| the code looks like
|=20
|    p =3D complexFnReturningAPair ( ... );
|    for ( ... ) {
|       ... heapTopFst[p] ... heapTopSnd[p] ...
|       ... compute_and_alloc ...
|    }
|=20
| Expanding the array address calculations gives
|=20
|    p =3D complexFnReturningAPair ( ... );
|    for ( ... ) {
|       ... heapTopFst + 4*p ... heapTopSnd + 4*p ...
|       ... compute_and_alloc ...
|    }
|=20
| Now the critical step, done by gcc when optimising: lift
| 4*p out of the loop since it's loop-invariant, and
| (incidentally) CSE the two 4*p's together:
|=20
|    tmp =3D 4 * complexFnReturningAPair ( ... );
|    for ( ... ) {
|       ... heapTopFst + tmp ... heapTopSnd + tmp ...
|       ... compute_and_alloc ...
|    }
|=20
| Blargh!  Now, inside the loop, there is no mention of the=20
| original p returned by complexFnReturningAPair, neither in=20
| regs nor on the stack.  So the pair won't be retained by GC=20
| even though it's still live in the loop.
|=20
| The moral of the story is this: when using an=20
| array-index-addressed heap, as Hugs does, we need to treat as=20
| a root not only valid array indexes into the heap, but also=20
| intermediates created by the array address calculations.  That is:
|=20
|    p                           sat -MAX_HEAP < p <=3D 0
|    p*sizeof(int)               sat -MAX_HEAP < p <=3D 0
|    heapTopFst+ p*sizeof(int)   sat -MAX_HEAP < p <=3D 0
|    heapTopSnd+ p*sizeof(int)   sat -MAX_HEAP < p <=3D 0
|=20
| Looking in machdep.c:gcCStack(), I only see the first of=20
| those four possibilities handled.  Perhaps the following=20
| would be better:
|=20
| #define Blargh markWithoutMove(*ptr); \
|                markWithoutMove((*ptr)/sizeof(Cell)); \
|                markWithoutMove((=20
| (void*)(*ptr)-(void*)heapTopFst)/sizeof(Cell));  \
|                markWithoutMove((
| (void*)(*ptr)-(void*)heapTopSnd)/sizeof(Cell))
|=20
| #define StackGrowsDown  { while (ptr<=3DCStackBase) { Blargh;=20
| ptr++; }; }
| #define StackGrowsUp    { while (ptr>=3DCStackBase) { Blargh;=20
| ptr--; }; }
| #define GuessDirection  if (ptr>CStackBase) StackGrowsUp else=20
| StackGrowsDown
|=20
| I tried this on x86 and it made no difference at all, since=20
| there isn't a problem for x86 anyway.  I'd be interested to=20
| hear if it makes any difference on Sparcs.  (note -- the code=20
| might not be exactly right). Check if you try it ...
|=20
| Finally, I think I know why the problem doesn't strike x86s. =20
| That is because the x86 has addressing modes of the form
|=20
|    (reg1 + k*reg2)    for k as 1, 2, 4 or 8
|=20
| So a reference to heapTopFst[p] becomes=20
|=20
|    ; reg1 holds p
|    ; reg2 holds heapTopFst
|    movl (reg1 + 4*reg2), reg3
|=20
| and there is never any point precomputing 4*p, since the=20
| architecture can do it for free.
|=20
| The entire diatribe applies not only to sparcs but other=20
| riscs which have lots of registers but only simple addressing=20
| modes.  Have you had problem reports for (eg) MIPS too?
|=20
| J
|=20