[Haskell-cafe] Cost: (:) vs head

Bas van Dijk v.dijk.bas at gmail.com
Sat Sep 11 08:46:48 EDT 2010


On Sat, Sep 11, 2010 at 5:13 AM, michael rice <nowgate at yahoo.com> wrote:
>
> Which of these would be more costly for a long list?
>
> f :: [Int] -> [Int]
> f [x] = [x]
> f (x:xs) = x + (head xs) : f xs
>
> f :: [Int] -> [Int]
> f [x] = [x]
> f (x:y:xs) = x + y : f (y:xs)

Use Criterion[1] to find out:


module Main where

import Criterion.Main

f1, f2, f3, f4 :: [Int] -> [Int]

f1 [x] = [x]
f1 (x:xs) = x + head xs : f1 xs

f2 [x] = [x]
f2 (x:y:xs) = x + y : f2 (y:xs)

f3 [x] = [x]
f3 (x:xs@(y:_)) = x + y : f3 xs

f4 [x] = [x]
f4 (x:y:xs) = x + y : go y xs
    where
      go x []     = [x]
      go x (y:xs) = x + y : go y xs

benchMark s f = bench s $ whnf (\n -> sum $ f [1..n]) 1000000

main = defaultMain [ benchMark "f1" f1
                   , benchMark "f2" f2
                   , benchMark "f3" f3
                   , benchMark "f4" f4
                   ]


now compile and run it:

$ ghc --make Benchmark.hs -O2 -o benchmark
...
$ ./benchmark
warming up
estimating clock resolution...
mean is 24.29944 us (40001 iterations)
found 1405 outliers among 39999 samples (3.5%)
  721 (1.8%) high mild
  684 (1.7%) high severe
estimating cost of a clock call...
mean is 1.844233 us (45 iterations)
found 2 outliers among 45 samples (4.4%)
  2 (4.4%) high severe

benchmarking f1
collecting 100 samples, 1 iterations each, in estimated 7.917595 s
bootstrapping with 100000 resamples
mean: 80.04258 ms, lb 79.85129 ms, ub 80.24094 ms, ci 0.950
std dev: 1.000711 ms, lb 878.8460 us, ub 1.179558 ms, ci 0.950
found 1 outliers among 100 samples (1.0%)
variance introduced by outliers: 0.990%
variance is unaffected by outliers

benchmarking f2
collecting 100 samples, 1 iterations each, in estimated 8.171391 s
bootstrapping with 100000 resamples
mean: 83.13315 ms, lb 82.93615 ms, ub 83.33348 ms, ci 0.950
std dev: 1.017999 ms, lb 904.5153 us, ub 1.174008 ms, ci 0.950
variance introduced by outliers: 0.990%
variance is unaffected by outliers

benchmarking f3
collecting 100 samples, 1 iterations each, in estimated 8.297014 s
bootstrapping with 100000 resamples
mean: 82.66586 ms, lb 82.34780 ms, ub 83.39774 ms, ci 0.950
std dev: 2.339937 ms, lb 976.2940 us, ub 4.133495 ms, ci 0.950
found 9 outliers among 100 samples (9.0%)
  7 (7.0%) high mild
  2 (2.0%) high severe
variance introduced by outliers: 0.998%
variance is unaffected by outliers

benchmarking f4
collecting 100 samples, 1 iterations each, in estimated 8.080888 s
bootstrapping with 100000 resamples
mean: 80.80089 ms, lb 80.61719 ms, ub 80.99542 ms, ci 0.950
std dev: 968.1706 us, lb 872.7758 us, ub 1.097217 ms, ci 0.950
variance introduced by outliers: 0.990%
variance is unaffected by outliers


So to summarize from fastest to slowest:

f1: mean: 80.04258 ms
f4: mean: 80.80089 ms
f3: mean: 82.66586 ms
f2: mean: 83.13315 ms


To find out why f1 is the fastest you can look at the core using ghc-core[2]:

$ ghc-core -- -O2 Benchmark.hs

f1 :: [Int] -> [Int]
GblId

f1 =
  \ (ds_d1h0 :: [Int]) ->
    case ds_d1h0 of _ {
      [] -> f11;
      : x_a12V ds1_d1h1 ->
        case ds1_d1h1 of _ {
          [] ->
            : @ Int x_a12V ([] @ Int);
          : ipv_s1hv ipv1_s1hw ->
            :
              @ Int
              (case x_a12V of _ { I# x1_a1k6 ->
               case ipv_s1hv of _ { I# y_a1ka ->
               I# (+# x1_a1k6 y_a1ka)
               }
               })
              (f1_$sf1 ipv1_s1hw ipv_s1hv)
        }
    }

f1_$sf1 :: [Int] -> Int -> [Int]
GblId

f1_$sf1 =
  \ (sc_s1FL :: [Int]) (sc1_s1FM :: Int) ->
    case sc_s1FL of _ {
      [] ->
        :
          @ Int sc1_s1FM ([] @ Int);
      : ipv_s1hv ipv1_s1hw ->
        :
          @ Int
          (case sc1_s1FM of _ { I# x_a1k6 ->
           case ipv_s1hv of _ { I# y_a1ka ->
           I# (+# x_a1k6 y_a1ka)
           }
           })
          (f1_$sf1 ipv1_s1hw ipv_s1hv)
    }


f4 :: [Int] -> [Int]
GblId

f4 =
  \ (ds_d1gl :: [Int]) ->
    case ds_d1gl of _ {
      [] -> f41;
      : x_a13f ds1_d1gm ->
        case ds1_d1gm of _ {
          [] ->
            : @ Int x_a13f ([] @ Int);
          : y_a13h xs_a13i ->
            :
              @ Int
              (plusInt x_a13f y_a13h)
              (f4_go y_a13h xs_a13i)
        }
    }

f4_go =
  \ (x_a13k :: Int) (ds_d1gu :: [Int]) ->
    case ds_d1gu of _ {
      [] ->
        : @ Int x_a13k ([] @ Int);
      : y_a13m xs_a13n ->
        :
          @ Int
          (plusInt x_a13k y_a13m)
          (f4_go y_a13m xs_a13n)
    }


f3 :: [Int] -> [Int]
GblId

f3 =
  \ (ds_d1gD :: [Int]) ->
    case ds_d1gD of _ {
      [] -> f31;
      : x_a13b ds1_d1gE ->
        case ds1_d1gE of _ {
          [] ->
            : @ Int x_a13b ([] @ Int);
          : y_a13e ds2_d1gF ->
            :
              @ Int
              (plusInt x_a13b y_a13e)
              (f3_$sf3 ds2_d1gF y_a13e)
        }
    }

f3_$sf3 :: [Int] -> Int -> [Int]
GblId

f3_$sf3 =
  \ (sc_s1G3 :: [Int]) (sc1_s1G4 :: Int) ->
    case sc_s1G3 of _ {
      [] ->
        :
          @ Int sc1_s1G4 ([] @ Int);
      : y_a13e ds_d1gF ->
        :
          @ Int
          (plusInt sc1_s1G4 y_a13e)
          (f3_$sf3 ds_d1gF y_a13e)
    }

f2 :: [Int] -> [Int]
GblId

f2 =
  \ (ds_d1gP :: [Int]) ->
    case ds_d1gP of _ {
      [] -> f21;
      : x_a137 ds1_d1gQ ->
        case ds1_d1gQ of _ {
          [] ->
            : @ Int x_a137 ([] @ Int);
          : y_a139 xs_a13a ->
            :
              @ Int
              (plusInt x_a137 y_a139)
              (f2_$sf2 xs_a13a y_a139)
        }
    }

f2_$sf2 :: [Int] -> Int -> [Int]
GblId

f2_$sf2 =
  \ (sc_s1FU :: [Int]) (sc1_s1FV :: Int) ->
    case sc_s1FU of _ {
      [] ->
        :
          @ Int sc1_s1FV ([] @ Int);
      : y_a139 xs_a13a ->
        :
          @ Int
          (plusInt sc1_s1FV y_a139)
          (f2_$sf2 xs_a13a y_a139)
    }

The reason that f1 is faster than the rest is that GHC is somehow able
to unpack the Ints and use the more efficient +# instead of the slower
plusInt.

I don't immediately see the reason for the time difference between f4,
f3 and f2. The inner loops all seem equivalent.

Regards,

Bas

[1] http://hackage.haskell.org/package/criterion
[2] http://hackage.haskell.org/package/ghc-core


More information about the Haskell-Cafe mailing list