behaviour of ghci on .c modules that are part of a library

Axel Simon Axel.Simon at in.tum.de
Wed Jul 14 10:51:49 EDT 2010


Hi all,

I'm trying to debug a segfault relating to the memory management in  
Gtk2Hs. Rather than make you read the ticket http://hackage.haskell.org/trac/gtk2hs/ticket/1183 
  , I'll describe the problem:

- compiler 6.12.1 or 6.12.3
- darcs head of Gtk2Hs with #define DEBUG instead of #undef DEBUG in  
gtk/Graphics/UI/Gtk/General/hsthread.c
- platform Ubuntu Linux, x86-64
- to reproduce: cd gtk2hs/gtk/demo/hello and run ghci World.hs and  
type 'main'

A window with the "Hello World" button appears. After a few seconds,  
the GC runs and the finaliser of the GtkButton is run since the  
Haskell program no longer holds a reference to that object (only the  
GtkWindow in C land has).

Thus, the GC calls a C function gtk2hs_g_object_unref_from_mainloop  
which is supposed to enqueue the object into a global data structure  
from which objects are later taken and g_object_unref is called on them.

This global data structure is protected by a mutex, which is acquired  
using g_static_mutex_lock:

void gtk2hs_g_object_unref_from_mainloop(gpointer object) {

   int mutex_locked = 0;
   if (threads_initialised) {
#ifdef DEBUG
       printf("acquiring lock to add a %s object at %lx\n",
              g_type_name(G_OBJECT_TYPE(object)), (unsigned long)  
object);
       printf("value of lock function is %lx\n",
              (unsigned long)  
g_thread_functions_for_glib_use.mutex_lock);
#endif
     g_rand_new();
#if defined( WIN32 )
     EnterCriticalSection(&gtk2hs_finalizer_mutex);
#else
     g_static_mutex_lock(&gtk2hs_finalizer_mutex);
#endif
     mutex_locked = 1;
   }
[..]

The program prints:

acquiring lock to add a GtkButton object at 22d8020
value of lock function is 0
zsh: segmentation fault  ghci World

Now the debugging weirdness starts. Whatever I do, I cannot get gdb to  
find the symbol gtk2hs_g_object_unref_from_mainloop.

Since the function above is contained in a C file that comes with our  
Haskell library, I tried to add "cc-options: -g" and "cc-options: - 
ggdb -O0", but maybe somewhere symbols are stripped. So I added the  
bogus function call to "g_rand_new()" which is not called anywhere  
else and gdb stops as follows:

acquiring lock to add a GtkButton object at 2105020
value of lock function is 0
[Switching to Thread 0x7ffff41ff710 (LWP 15735)]

Breakpoint 12, 0x00007ffff115bfa0 in g_rand_new () from /usr/lib/ 
libglib-2.0.so

This all seems reasonable, but:

(gdb) bt
#0  0x00007ffff115bfa0 in g_rand_new () from /usr/lib/libglib-2.0.so
#1  0x00000000419b3792 in ?? ()
#2  0x00007ffff678f078 in ?? ()

i.e. the calling context is broken. I'm very, very sure that the  
caller is indeed the above mentioned function and since g_rand_new  
isn't called anywhere in my Haskell program (and otherwise the calling  
context would be sane).
I'm also passing the address of gtk2hs_g_object_unref_from_mainloop as  
FinalizerPtr to all my ForeignPtrs, so there is no inlining going on.

Back to the culprit, the call to g_static_mutex_lock. This is a macro  
that expands to

*g_thread_functions_for_glib_use.mutex_lock

where g_thread_functions_for_glib is a global variable that contains a  
lot of function pointers. At the break point, it contains this:

(gdb) print g_thread_functions_for_glib_use
$33 = {mutex_new = 0x7ffff0cd9820 <g_mutex_new_posix_impl>,
   mutex_lock = 0x7ffff6c8b3c0 <__pthread_mutex_lock>,
   mutex_trylock = 0x7ffff0cd97b0 <g_mutex_trylock_posix_impl>,
   mutex_unlock = 0x7ffff6c8ca00 <__pthread_mutex_unlock>,
   mutex_free = 0x7ffff0cd9740 <g_mutex_free_posix_impl>,
[..]

So the call to g_mutex_lock should call the function  
__pthread_mutex_lock but it calls NULL.

I hoped that writing this email would give me a bit more insight into  
the problem, but for now I suspect that something overwrites either  
the stack or the code of the function.

On the same platform, the compiled version prints:

acquiring lock to add a GtkButton object at 1b05820
value of lock function is 7f7adcabd3c0
within mutex: adding finalizer to a GtkButton object!

On Mac OS or i386, using ghci or ghc, version 6.10.4, it works as well.
Now for the fun bit: on i386 using ghci version 6.12.1 it works too.

So it's an x86-64 and ghc 6.12.1 bug. According to Christian Maeder  
who submitted the ticket, the problem persists in 6.12.3.

Any hints and help appreciated,
Cheers,
Axel









More information about the Glasgow-haskell-users mailing list