ghci and ghc -threaded [slowdown]

Mon Dec 15 08:23:51 EST 2008

Malcolm Wallace wrote:
> Simon Marlow <marlowsd at gmail.com> wrote:
> 
>> Malcolm Wallace wrote:
>>> For the only application I tried, using the threaded RTS imposes a
>>> 100% performance penalty - i.e. computation time doubles, compared
>>> to the non-threaded RTS.  This was with ghc-6.8.2, and maybe the
>>> overhead has improved since then?
>> This is a guess, but I wonder if this program is concurrent, and does
>> a  lot of communication between the main thread and other threads? 
> 
> Exactly so - it hits the worst case behaviour.  This was a naive attempt
> to parallelise an algorithm by shifting some work onto a spare
> processor.  Unfortunately, there is a lot of communication to the main
> thread, because the work that was shifted elsewhere computes a large
> data structure in chunks, and passes those chunks back.  The main thread
> then runs OpenGL calls using this data -- and I believe OpenGL calls must
> run in a bound thread.
> 
> This all suggests that one consequence of ghc's RTS implementation
> choices is that it will never be cheap to compute visualization data in
> parallel with rendering it in OpenGL.  That would be a shame.  This was
> exactly the parallelism I was hoping for.

I'm not sure how we could do any better here.  To get parallelism you need 
to run the OpenGL thread and the worker thread on separate OS threads, 
which we do.  So what aspect of the RTS design is preventing you from 
getting the parallelism you want?

It seems that the problem you have is that moving to the multithreaded 
runtime imposes an overhead on the communication between your two threads, 
when run on a *single CPU*.  But performance on a single CPU is not what 
you're interested in - you said you wanted parallelism, and for that you 
need multiple CPUs, and hence multiple OS threads.

I suspect the underlying problem in your program is that the communication 
is synchronous.  To get good parallelism you'll need to use asynchronous 
communication, otherwise even on multiple CPUs you'll see little 
parallelism.  If you still do asynchronous communication and yet don't get 
good parallelism, then we should look into what's causing that.

Cheers,
	Simon