[Haskell-cafe] performance question

Stijn De Saeger stijndesaeger at gmail.com
Mon Jan 17 04:47:10 EST 2005


Hello all,

A question on that most elusive of subjects.... performance in haskell (GHC 6.2)
Using the GHC profiler, I obtained the following analysis results (i
hope the code doesn't come out too ugly by mail):

	total time  =        0.92 secs   (46 ticks @ 20 ms)
	total alloc =  83,373,032 bytes  (excludes profiling overheads)

COST CENTRE                    MODULE               %time %alloc

isIn                               Main                  50.0   22.8
getCon                          Main                  13.0   16.7
o'                                  Main                   8.7    6.6
satisfies                        Main                   6.5    0.0
powerList                      Main                   6.5   46.9
CAF                             Main                   6.5    0.1
showCon                      Main                   4.3    0.3
MAIN                           MAIN                   4.3    0.0
a'                                 Main                   0.0    6.7

The problem child, that isIn function, has got about 78000 entries in
the profile log.
I should probably mention that this is an incredibly dumbed down
version of the program, the dimensions of the data it is supposed to
handle are such that, on a trial run I killed the process after about
15 minutes, when I found out it hadn't even completed 3% of its task.
sad stuff, really.

Anyways, 'isIn'  is a predicate that checks whether a given Double
lies within an interval, where intervals are defined as
...
define an interval bound, either inclusive (I) or exclusive (E)
> data Bound = I Double | E Double deriving (Eq, Show, Ord)  
> data Interval = Il Bound Bound | Nil Bound Bound deriving (Eq,Ord)  

where Nil acts as the complement of an interval, this is reflected in
the isIn function.
....
> isIn :: Double -> Interval -> Bool
> isIn r (Nil x y) = not (isIn r (Il x y))
> isIn r (Il (I x) (I y)) = r >= x && r <= y
> isIn r (Il (I x) (E y)) = r >= x && r < y
> isIn r (Il (E x) (I y)) = r > x && r <= y
> isIn r (Il (E x) (E y)) = r > x && r < y 

I tried rewriting it to something that intuitively 'feels' like it
should be faster, but i have no real idea about the cost of the
respective haskell expressions:

... version 2
> isIn :: Double -> Interval -> Bool
> isIn r (Nil x y) = not (isIn r (Il x y))
> isIn r (Il x y) = case x of 
>   (I x') -> if r >= x' then case y of 
>        (I y') -> r <= y'
>        (E y') -> r < y'
>	      else False	     
>   (E x') -> if r > x' then case y of 
>        (I y') -> r <= y'
>	 (E y') -> r < y'
>	      else False 

... which indeed turns out to be a tad bit faster, according to the
new profile log.

	total time  =        0.80 secs   (40 ticks @ 20 ms)
	total alloc =  64,404,104 bytes  (excludes profiling overheads)

COST CENTRE                    MODULE               %time %alloc

isIn                           Main                  30.0    0.0
getCon                      Main                  25.0   21.6
powerList                   Main                  15.0   60.7
showCon                    Main                   7.5    0.3
o'                               Main                   7.5    8.6
CAF                           Main                   7.5    0.1
MAIN                         MAIN                   5.0    0.0
a'                               Main                   2.5    8.6

But it can hardly be called impressive. 
Can anyone see another obvious optimization, or have I just hit the
ceiling because of the sheer number of function calls to isIn? I am
still pretty new to haskell, and I find it hard to wrap my head around
the way the compiler deals with my code.
If someone has a few leads on general performance heuristics in
haskell/GHC, I would be happy to read them too...

thanks for your time.

stijn


More information about the Haskell-Cafe mailing list