Segfaulting programs with GHC 6.4.1

John Goerzen jgoerzen at complete.org
Mon Oct 24 12:23:05 EDT 2005


On Mon, Oct 24, 2005 at 10:53:48AM +0100, Simon Marlow wrote:
> Hi John,
> 
> Thanks for trying to narrow this down.  At this stage it looks like some
> kind of heap corruption.  Can you reproduce it on more than one machine?

Yes, though it is not nearly as easy.  I cannot really explain that.
I suspect it could have something to do with the order of data coming
from the DB (it's unordered) or system load or something else along
those lines.

Here's another odd thing: the binaries built on the two systems are not
quite identical, even though, as far as I can tell, everything about the
build environment is identical (Debian sid).  One is a few K larger than
the other, and I can't figure out why.

Both are fairly new, nice workstations from HP.  I've had no trouble
like this with any other program on either, and this isn't the first
task like this either place.

Also, it seems that the binary produced on one is more prone to crash
than that produced on the other.  But it could be my imagination.

> (we have to rule out hardware failure, it's happened before and can cost
> a lot of debugging time).
> 
> I'll need to reproduce it here.  Can you give me a set of instructions
> to get me up to the right point?

Here goes.  Reminder, my test environment is Linux x86, ghc 6.4.1:

1. Install PostgreSQL 8.0.  You can get this with most Linux distros,
   or from www.postgreql.org.

2. As your PostgreSQL user (usually you may need to su to postgres),
   run:

   createuser smarlow
   createdb smarlow
   createlang plpgsql smarlow

   (In this and following steps, replace "smarlow" with your Linux
   username, if it's not "smarlow")

3. Download http://www.complete.org/~jgoerzen/dump.bz2 (7.7MB)

4. Back as your normal smarlow user, run:

   bzcat dump.bz2 | sed 's/ jgoerzen/ smarlow/' > dump.sql
          (spaces and quotes are important there; unpacks to 190MB)
   psql -f dump.sql -U smarlow smarlow

   There will be four errors at the beginning that you can ignore.
   ("must be owner of schema public", 2x "permission denied for language
   c", "must be superuser to create procedural language")

   This will probably take a few minutes to run.  I think it will take
   up about 500MB of disk space once loaded.

5. Install prerequisites.  You will need HSQL 1.6 and the HSQL
   PostgreSQL module, plus MissingH 0.12.1
   from
   http://http.us.debian.org/debian/pool/main/m/missingh/missingh_0.12.1.tar.gz
   .  Both are cabalized.

6. Now, get the code.  darcs get http://darcs.complete.org/gopherbot

   ghc --make -o setup Setup.lhs
   ./setup configure
   ./setup build

7. Create the directory /home/jgoerzen/tree/gopher-arch on your system,
   making sure that your smarlow user has read access to it.

   (The data stored in the DB, as well as a config, references that path
   for now.  Sorry.)

8. Adjust these settings in your postgresql.conf, making sure to remove
   the existing values, if any:

   shared_buffers = 3000
   sort_mem = 4000
   maintenance_work_mem = 96000
   work_mem = 64000
   fsync = off 
   checkpoint_segments = 12
   effective_cache_size = 8000

   And then restart the PostgreSQL server.

9. Now run dist/build/gopherbot.

   You should see it start to download documents, and crash after a few
   minutes.

   If you have trouble connecting, 
   adjust the first empty string on line 42 of DB.hs to match 
   unix_socket_directory in your postgresql.conf.

   The settings made in step 8 make PostgreSQL much faster.
   Without them, it is hard to make the program crash.

   The program will use about 500MB RAM while running.

   It will take about 10 minutes to get up to speed.  (It takes a bit to
   load its worklist from PostgreSQL, and to eliminate some dead hosts.)

   After that, it'll start up quicker, and run fast.

I'll also keep trying to gather data here.


More information about the Glasgow-haskell-users mailing list