Odd Performance

Tim Otten tim@cap.american.edu
Tue, 22 Oct 2002 16:30:09 -0400 (EDT)


As a student in an undergraduate 'Intro to Discrete Structures' course, I
recently did a project which required generating the first n primes. We
discussed the sieve of Eratosthenes in class. Although the professor is
not familiar with Haskell, he allowed me to use it, and I
mistakenly wrote the following code[1]:

-- A list of all prime numbers
primes = [2] ++ [ x | x <- [3..], isPrime x ]

-- x is prime if the set of primes [2 <= p <= sqrt(x)] is empty
isPrime x = 
	( [] == [ y | y <- (take (intsqrt x) primes), x `mod` y == 0 ] )

This works, and I used it as I developed my project. However, it tests
more primes than necessary. A better version would seem to be:

-- x is prime if the set of prime factors [2 <= p <= sqrt(x)] is empty
isPrime x =
  ( [] == [ y | y <- (takeWhile (\z -> z <= (intsqrt x)) primes),
            x `mod` y == 0 ])

Surprisingly, the second (tight) version grows significantly more costly
than the first (loose) version as we take more primes. Taking 20 primes,
	> take 20 primes
both algorithms require under a second. Taking 200 primes, the tight
algorithm requires twice as much time as the loose one. Taking 2000
primes, the tight algorithm requires about 4.5 times as much time as the
loose one.[2]

Can anyone suggest why the tighter algorithm exhibits significantly worse
performance? Is takeWhile significicantly more expensive than take? Is the
\z lambda expression expensive? The intsqrt isn't recalculated each time
takeWhile evalutes a prime, is it?

Slightly confused,
Tim Otten

[1] Note that intsqrt calculates the floor of the square root of x.
    intsqrt x = intsqrt' 1 x
    intsqrt' n x
        | (n*n > x) = n-1
        | otherwise = intsqrt' (n+1) x

[2] A few comments on the test apparatus:
* Debian GNU/Linux 3.0
* Hugs98 circa Sept 2001
* Tested on a G3 400mhz and a PII 233mhz. The G3 was consistently
  faster, but the relationship (Time(Alg2)/Time(Alg1)) remains
  comparable across chips
* Times were established using the bash shell's 'time' command
* Each test was performed in a new instance of 'runhugs'
* Tests were repeated and staggered (Test alg1 @ n primes; test alg2 @ n
  primes; test alg1 @ n primes; test alg2 @ n primes)

Codes, scripts, etc. available upon request.