[Haskell-cafe] Parallel weirdness

Andrew Coppin andrewcoppin at btinternet.com
Sat Apr 19 16:16:20 EDT 2008


Bulat Ziganshin wrote:
> Hello Andrew,
>
> Saturday, April 19, 2008, 7:50:30 PM, you wrote:
>
>   
>> The data to be sorted is generated using a trivial LCG PRNG.
>>     
>
> if you don't generate new data for each sorting run, this means that
> data generation is much slower than sorting. don't forget about ghc
> laziness :)
>   

...which is why the unsorted data is written to disk *before* I attempt 
to sort it. This is to ensure it's fully evaluated, and give me some 
idea how long it takes to write to disk. The [first] sorting stage is 
about 10x slower than the initial writing stage - and yet, subsequent 
sorting is 10x faster than the initial unsorted write.

>> Not with -N1.
>>     
>
> are you sure? :)  afaik, -threaded RTS uses dedicated i/o thread
> despite of -N setting (which controls only amount of threads running
> *user* code)
>   

Interesting. Without consulting a GHC RTS expert, I guess there's no way 
to know.

Certainly I doubt my program is I/O-bounded. It only writes about 11 MB 
of data to a text file. I would think converting Word32 -> String is the 
slow part.

>> Again, with -N1, it is *still* only using 1 CPU core.
>>     
>
> parallel version is different from sequential one and it process data
> in another order.

Yeah, perhaps it improves RAM usage or something...

>> Fails to explain why the parallel version is faster than the sequential
>> one (even with no parallelism), or why the sequential algorithm should
>> slow down with more threads. (Surely the extra threads just sit idle?)
>>     
>
> there are management overheads. with multiple worker threads you have
> many OS threads which fights for the right to execute single Haskell
> thread :))
>   

As I understand it, sparked work is placed into a FIFO queue, and a set 
of worker threads poll this queue, remove the first item and begin 
executing. There is no "fighting"; once a work item has been picked by a 
thread, it runs to completion.

Now, if there are more OS threads than physical CPU cores, they will 
fight it out at the OS level who runs first... ;-)



More information about the Haskell-Cafe mailing list