Compacting GC interacting with new codegen strangely
benl at ouroborus.net
Mon Feb 21 00:57:30 CET 2011
On 20/02/2011, at 11:57 AM, Edward Z. Yang wrote:
> If I:
> - Turn off compacting GC
> - Reduce the size of master-data
> - Turn off optimizations
> - Use the old codegen
> - Put all of the code in one file
> - Remove the seqs from 'sort' (which isn't actually a sort)
> - Remove the seqs from 'main'
> - Make the sort function monomorphic on Char
> ...the segfault goes away.
> I've been trying to figure out what C-- is to blame, but I can't
> discount the possibility that the new codegen is right and just
> tickling a bug in the compacting GC. The segfault manifests itself
> as someone forgetting to tag/untag a pointer, and then accessing
> that pointer results in a corrupted dereferenced pointer that
> causes the segfault. But I'm having a hard time tracking what
> should be tagged and what should not be tagged, as well as the
> interaction of pointer tagging and GC.
> Thoughts and suggestions would be appreciated.
As you suggest, it may well be a bug in the runtime system or libraries due to not untagging pointers. I have fixed a few of these myself. The hard-and-fast rule is that all pointers from the heap must be untagged before being dereferenced. Untagging an already untagged thunk pointer won't hurt anything.
I'd start with http://hackage.haskell.org/trac/ghc/wiki/Debugging/RuntimeSystem
and then move on to http://hackage.haskell.org/trac/ghc/wiki/Debugging/CompiledCode
If you can find the exact place in the RTS code where it's crashing, look at the source and see if the pointer it's dereferencing has been untagged. If not then use the untag macro and see if that helps. You can also printf a suspicious pointer from the RTS code, and if it's got any of the lowest 2/3 bits set then you've found the problem.
It can also help to run your code on an architecture that won't do misaligned memory accesses (like on SPARC). The fact that x86 will happily dereference a misaligned pointer means your program can limp along with a corrupted heap for some time before finally segfaulting.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Cvs-ghc