internal error: schedule: invalid what_next field

Simon Marlow simonmar at microsoft.com
Tue Jul 6 04:42:20 EDT 2004


On 06 July 2004 04:05, Ian Lynagh wrote:

>> If you think there's a deadlock that might not be your fault (quite
>> possible), then report it as a bug.  We might have to enlist your
>> help in order to debug it, though.
> 
> Well, the good news is I don't think I have any deadlocks now
> Unfortunately. after "a while" (isn't concurrent programming great?)
> something eats a load of memory and I end up swapping. I don't know if
> it would recover if left for a while.
> 
> However, I'm not really sure how to debug this. I haven't even worked
> out if it's my code, the RTS or one of the C libraries using the
> memory. 
> 
> I'm not sure if it would be helpful to do so, but I notice I can't
> compile with both -threaded and -debug
> (/usr/bin/ld: cannot find -lHSrts_thr_debug).

You can build the thr_debug version of the RTS if you have a local
build.  Just add thr_debug to GhcRTSWays in build.mk.

> I tried stracing it, but then it dies with
> 
> 514
> select: Unknown error 514
> minstrel: internal error: select failed
>     Please report this as a bug to glasgow-haskell-bugs at haskell.org,
>     or http://www.sourceforge.net/projects/ghc/
> 
> The strace ends (I can show more if it would be useful):
> 
> futex(0x8138ce0, FUTEX_WAKE, 1)  = 1
> futex(0x8138cf0, FUTEX_WAIT, 22, NULL)  = -1 EAGAIN (Resource
> temporarily unavailable) 
> futex(0x8143a08, FUTEX_WAKE, 1)         = 0
> gettimeofday({1089073815, 290362}, NULL) = 0
> select(4, [0 3], [], NULL, {1, 0}) = ? ERESTARTNOHAND (To be
> restarted) 
> write(2, "514\n", 4)                    = 4
> write(2, "select: Unknown error 514\n", 26) = 26
> write(2, "minstrel: internal error: ", 26) = 26
> write(2, "select failed", 13)           = 13
> write(2, "\n", 1)                       = 1
> write(2, "    Please report this as a bug "..., 117) = 117
> shmdt(0x4178a000)                       = 0
> exit_group(254)                         = ?
>
> $ grep ERESTARTNOHAND /usr/include/linux/errno.h
> #define ERESTARTNOHAND  514     /* restart if no handler.. */
> 
> This is on Linux 2.6.5.

Bizarre.  I have no idea what's going on here.

> During execution, there are 3 directories in /proc/pid/task (e.g.
> /proc/1605/task/1605
> /proc/1612/task/1612
> /proc/1613/task/1613
> ). Only the first is a PID as seen by ps. I suspect these correspond
> to 
> threads but I'm not sure. If I strace these once it has started
> swapping 
> then the first one is the only one that seems to be allocating memory,
> with calls like these:
> 
> mmap2(0x4c800000, 1048576, PROT_READ|PROT_WRITE|PROT_EXEC,
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4c800000 
> 
> mmap2(0x4c900000, 1048576, PROT_READ|PROT_WRITE|PROT_EXEC,
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4c900000 
> 
> mmap2(0x4ca00000, 1048576, PROT_READ|PROT_WRITE|PROT_EXEC,
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4ca00000 
> 
> mixed in with other bits. Of course, this doesn't mean it was the
> culprit, but I have 106 mmap2s like the above each allocating the next
> MB, so it looks quite suspicious to me.

Those look like the RTS allocating memory.

> Oh, I've just tried running with "+RTS -M25M" and after a while it
> failed with:
> 
> Heap exhausted;
> Current maximum heap size is 24997888 bytes (23 Mb);
> use `+RTS -M<size>' to increase it.
> 
> so it looks like the problem is on the Haskell side somewhere. Also
> just 
> tried heap profiling, but I "hp2ps -c minstrel-prof.hp" tells me:
> 
> hp2ps: minstrel-prof.hp, line 1119, samples out of sequence
> 
> presumably due to the threadedness.

Hmmm.  You'd need a thr_p version of the RTS to do that (again, you
might be able to build one of those yourself).

I suggest getting a local GHC build, and compiling up some more versions
of the RTS:  thr_debug, thr_p, and thr_debug_p.

Cheers,
	Simon


More information about the Cvs-ghc mailing list