internal error: schedule: invalid what_next field
Simon Marlow
simonmar at microsoft.com
Tue Jul 6 04:42:20 EDT 2004
On 06 July 2004 04:05, Ian Lynagh wrote:
>> If you think there's a deadlock that might not be your fault (quite
>> possible), then report it as a bug. We might have to enlist your
>> help in order to debug it, though.
>
> Well, the good news is I don't think I have any deadlocks now
> Unfortunately. after "a while" (isn't concurrent programming great?)
> something eats a load of memory and I end up swapping. I don't know if
> it would recover if left for a while.
>
> However, I'm not really sure how to debug this. I haven't even worked
> out if it's my code, the RTS or one of the C libraries using the
> memory.
>
> I'm not sure if it would be helpful to do so, but I notice I can't
> compile with both -threaded and -debug
> (/usr/bin/ld: cannot find -lHSrts_thr_debug).
You can build the thr_debug version of the RTS if you have a local
build. Just add thr_debug to GhcRTSWays in build.mk.
> I tried stracing it, but then it dies with
>
> 514
> select: Unknown error 514
> minstrel: internal error: select failed
> Please report this as a bug to glasgow-haskell-bugs at haskell.org,
> or http://www.sourceforge.net/projects/ghc/
>
> The strace ends (I can show more if it would be useful):
>
> futex(0x8138ce0, FUTEX_WAKE, 1) = 1
> futex(0x8138cf0, FUTEX_WAIT, 22, NULL) = -1 EAGAIN (Resource
> temporarily unavailable)
> futex(0x8143a08, FUTEX_WAKE, 1) = 0
> gettimeofday({1089073815, 290362}, NULL) = 0
> select(4, [0 3], [], NULL, {1, 0}) = ? ERESTARTNOHAND (To be
> restarted)
> write(2, "514\n", 4) = 4
> write(2, "select: Unknown error 514\n", 26) = 26
> write(2, "minstrel: internal error: ", 26) = 26
> write(2, "select failed", 13) = 13
> write(2, "\n", 1) = 1
> write(2, " Please report this as a bug "..., 117) = 117
> shmdt(0x4178a000) = 0
> exit_group(254) = ?
>
> $ grep ERESTARTNOHAND /usr/include/linux/errno.h
> #define ERESTARTNOHAND 514 /* restart if no handler.. */
>
> This is on Linux 2.6.5.
Bizarre. I have no idea what's going on here.
> During execution, there are 3 directories in /proc/pid/task (e.g.
> /proc/1605/task/1605
> /proc/1612/task/1612
> /proc/1613/task/1613
> ). Only the first is a PID as seen by ps. I suspect these correspond
> to
> threads but I'm not sure. If I strace these once it has started
> swapping
> then the first one is the only one that seems to be allocating memory,
> with calls like these:
>
> mmap2(0x4c800000, 1048576, PROT_READ|PROT_WRITE|PROT_EXEC,
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4c800000
>
> mmap2(0x4c900000, 1048576, PROT_READ|PROT_WRITE|PROT_EXEC,
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4c900000
>
> mmap2(0x4ca00000, 1048576, PROT_READ|PROT_WRITE|PROT_EXEC,
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4ca00000
>
> mixed in with other bits. Of course, this doesn't mean it was the
> culprit, but I have 106 mmap2s like the above each allocating the next
> MB, so it looks quite suspicious to me.
Those look like the RTS allocating memory.
> Oh, I've just tried running with "+RTS -M25M" and after a while it
> failed with:
>
> Heap exhausted;
> Current maximum heap size is 24997888 bytes (23 Mb);
> use `+RTS -M<size>' to increase it.
>
> so it looks like the problem is on the Haskell side somewhere. Also
> just
> tried heap profiling, but I "hp2ps -c minstrel-prof.hp" tells me:
>
> hp2ps: minstrel-prof.hp, line 1119, samples out of sequence
>
> presumably due to the threadedness.
Hmmm. You'd need a thr_p version of the RTS to do that (again, you
might be able to build one of those yourself).
I suggest getting a local GHC build, and compiling up some more versions
of the RTS: thr_debug, thr_p, and thr_debug_p.
Cheers,
Simon
More information about the Cvs-ghc
mailing list