Ovidiu Deac ovidiudeac at gmail.com
Wed Aug 3 00:44:03 CEST 2011

```I'm trying to write a parallel quicksort algorithm for lists.

This is my original implementation:

quickSort [] = []
quickSort (x:xs) = (quickSort small) ⊕ [x] ⊕ (quickSort big)
where
small = [p | p ←  xs, p ≤ x]
big = [p | p ←  xs, p > x]

and the output is:

\$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS
[1 of 1] Compiling Main             ( quicksort.hs, quicksort.o )
Sorting 1000000 elements...
CPU Time: 13290000000000
Time elapsed: 8.929503s

quicksort.hs && ./quicksort +RTS -N2 -RTS
Sorting 1000000 elements...
CPU Time: 11240000000000
Time elapsed: 7.785293s

quicksort.hs && ./quicksort +RTS -N1 -RTS
Sorting 1000000 elements...
CPU Time: 6790000000000
Time elapsed: 6.817648s

quicksort.hs && ./quicksort +RTS -N1 -RTS
Sorting 1000000 elements...
CPU Time: 6980000000000
Time elapsed: 7.006658s

quicksort.hs && ./quicksort +RTS -N1 -RTS
Sorting 1000000 elements...
CPU Time: 5900000000000
Time elapsed: 5.932236s

...so the conclusion is that using option N1 is faster the N2. This makes sense.

Then I tried to parallelize it:

First try:
-----------------
quickSort [] = []
quickSort (x:xs) = small `pseq` ((quickSort small) ⊕ [x] ⊕ (quickSort big))
where
small = [p | p ←  xs, p ≤ x]
big = [p | p ←  xs, p > x]

\$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS
[1 of 1] Compiling Main             ( quicksort.hs, quicksort.o )
Sorting 1000000 elements...
CPU Time: 12020000000000
Time elapsed: 8.29653s

This is slower then the non-parallel version

Second try:
---------------
quickSort [] = []
quickSort (x:xs) = small `par` ((quickSort small) ⊕ [x] ⊕ (quickSort big))
where
small = [p | p ←  xs, p ≤ x]
big = [p | p ←  xs, p > x]

\$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS
Sorting 1000000 elements...
CPU Time: 14750000000000
Time elapsed: 10.772271s

Even slower

Third try:
-------------
quickSort [] = []
quickSort (x:xs) = small `par` (big `par` ((quickSort small) ⊕ [x] ⊕
(quickSort big)))
where
small = [p | p ←  xs, p ≤ x]
big = [p | p ←  xs, p > x]

\$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS
[1 of 1] Compiling Main             ( quicksort.hs, quicksort.o )
Sorting 1000000 elements...
CPU Time: 134490000000000
Time elapsed: 122.917093s

Fourth try:
------------------------
quickSort [] = []
quickSort (x:xs) = small `par` (big `pseq` ((quickSort small) ⊕ [x] ⊕
(quickSort big)))
where
small = [p | p ←  xs, p ≤ x]
big = [p | p ←  xs, p > x]

\$ ghc -threaded -rtsopts -o quicksort quicksort.hs && ./quicksort +RTS -N2 -RTS
[1 of 1] Compiling Main             ( quicksort.hs, quicksort.o )
Sorting 1000000 elements...
CPU Time: 12770000000000
Time elapsed: 8.844304s
-----------------------------

It seems that I'm unable to make it parallel. What am I doing wrong?

Thanks,
ovidiu

See the full code below:
--------------------------------------------------
module Main where

import Prelude
import Data.List
import Data.Time.Clock
import System.CPUTime
import System.Random
import Control.Parallel
import Control.Exception (evaluate)
import Control.DeepSeq (rnf)
import Text.Printf

quickSort [] = []
quickSort (x:xs) = small `par` (big `par` ((quickSort small) ⊕ [x] ⊕
(quickSort big)))
where
small = [p | p ←  xs, p ≤ x]
big = [p | p ←  xs, p > x]

randomlist :: Int →  StdGen →  [Int]
randomlist n = take n∘unfoldr (Just∘random)

len = 10 ↑ 6

time = do
t ←  getCurrentTime
c ←  getCPUTime
return (t,c)

measure f p = do
(t1, c1) ←  time
evaluate \$ rnf \$ f p
(t2, c2) ←  time
return (diffUTCTime t2 t1, c2 - c1)

main = do
seed  ←  newStdGen
let rs = randomlist len seed

printf "Sorting %d elements...\n" len

(t, cpu) ←  measure quickSort rs
printf "CPU Time: %dλnTime elapsed: %sλn" cpu (show t)

```