[Haskell-beginners] The cost of generality, or how expensive is realToFrac?

Wed Sep 15 18:36:11 EDT 2010

On Wednesday 15 September 2010 20:50:13, Greg wrote:
> I hadn't come across rewrite rules yet.  They definitely look like
> something worth learning,

Absolutely. GHC's optimiser is good, but there are a lot of cases where you 
need to push it via rewrite rules if you write polymorphic code or if you 
want to eliminate intermediate data structures (e.g. list fusion).

> though I'm not sure I'm prepared to start making custom versions
> of OpenGL.Raw...

Yes, if you can work around the issues without that, it's better to leave 
it in peace :)
Though you might ask the maintainer for rewrite rules.

>
> It looks like I managed to put that battle off for another day, however.
>  I did look at how realToFrac is implemented and (as you mention) it
> does the fromRational . toRational transform pair suggested in a number
> of sources, including Real World Haskell.  Looking at what toRational is
> doing, creating a ratio of integers out of a float it seems like a crazy
> amount of effort to go through just to convert floating point numbers.

I just did some benchmarking.

I benchmarked

foldl' (+) 0 [convert (1 / intToDoub k) | k <- [1 .. 100000]]

where

intToDoub :: Int -> Double
intToDoub = fromIntegral

for several functions

convert :: Double -> Float (actually, the type has been
(RealFloat a, RealFloat b) => a -> b, but it was used at a = Double,
b = Float).
Everything was compiled with -O2, so the rewrite rules fired, in particular 
intToDoub was replaced by a primop (int2Double#), so that one's ultra 
cheap.

For convert = realToFrac (hence by the rewrite rules the primop 
double2Float# was used), I got pretty good times, mean was 6.76 ms.

For convert = floatToFloat from below, the times were not too bad, with a 
mean of 26.3 ms.
A factor of roughly four for this benchmark (the factor for the conversion 
itself will be higher, but not exorbitantly) means it's usable in many 
situations, but not in performance critical situations where the conversion 
takes a significant amount of the running time. If you're converting to 
draw stuff with OpenGL (or some other graphics library), the conversion 
will take only a relatively small part of the time, so it's fine.

For convert = fromRational . toRational (so no rewrite rules), the times 
were rather appalling: mean was 3.34 seconds.
A factor of nearly 500 versus double2Float#.

toRational is bad. Looking at

instance  Real Double  where
    toRational x        =  (m%1)*(b%1)^^n
                           where (m,n) = decodeFloat x
                                 b     = floatRadix  x

(same for Float), and the implementations of (^^) and (^),

{-# SPECIALISE (^) ::
        Integer -> Integer -> Integer,
        Integer -> Int -> Integer,
        Int -> Int -> Int #-}
(^) :: (Num a, Integral b) => a -> b -> a
x0 ^ y0 | y0 < 0    = error "Negative exponent"
        | y0 == 0   = 1
        | otherwise = f x0 y0
    where -- f : x0 ^ y0 = x ^ y
          f x y | even y    = f (x * x) (y `quot` 2)
                | y == 1    = x
                | otherwise = g (x * x) ((y - 1) `quot` 2) x
          -- g : x0 ^ y0 = (x ^ y) * z
          g x y z | even y = g (x * x) (y `quot` 2) z
                  | y == 1 = x * z
                  | otherwise = g (x * x) ((y - 1) `quot` 2) (x * z)

-- | raise a number to an integral power
{-# SPECIALISE (^^) ::
        Rational -> Int -> Rational #-}
(^^)            :: (Fractional a, Integral b) => a -> b -> a
x ^^ n          =  if n >= 0 then x^n else recip (x^(negate n))

together with the multiplication and recip for Rationals, I have to say 
ouch!
There's no special implementation and rewrite rule for powers of Rationals, 
so on each multiplication in (^), the gcd of numerator and denominator is 
calculated, *although as powers of the original numerator and denominator 
they are guaranteed to be coprime*. Considering how slow a division of 
Integers is, awwwwwww noooooo.

So let's look at a better implementation of toRational:

toRat :: RealFloat a => a -> Rational
toRat x = case decodeFloat x of
            (m,e) -> case floatRadix x of
                        b -> if e < 0
                                then (m % (b^(negate e)))
                                else (m * b^e) :% 1

(inlined a better implementation of powers for Rationals).

Benchmarking convert = fromRational . toRat show a significant improvement, 
the mean dropped to 2.75 seconds.
Still appalling, but it's a nice improvement and I don't see any quick 
opportunities to improve that conversion.

So let's come to the last, fromRational. That's a compicated function, and 
unfortunately it has to be and I've no good idea to improve it.
fromRational is really evil (in terms of clock cycles).
Replacing fromRational with a dummy that just forces the evaluation of its 
argument and returns NaN, ±Infinity, or 0 for all real Rational values,

dummy . toRational had a mean of 623.5 ms and
dummy . toRat had a mean of 200.7 ms.

So toRat is a jolly good improvement over toRational, but it's still 
awfully slow. And since fromRational takes much much longer anyway, it's a 
not too impressive gain for realToFrac.

>
> Looking at the RealFloat class rather that Real and Fractional, it seems
> like this is a much more efficient way to go:
>
> floatToFloat :: (RealFloat a, RealFloat b) => a -> b
> floatToFloat = (uncurry encodeFloat) . decodeFloat

Yes, that's much more efficient, as witnessed by the benchmark results.
But.

>
> I substituted this in for realToFrac and I'm back to close to my
> original performance.  Playing with a few test cases in ghci, it looks
> numerically equivalent to realToFrac.
>
> This begs the question though--

No. Sorry, but I can't bear that misuse: 
http://en.wikipedia.org/wiki/Begging_the_question

It raises/demands/invites the question, but it doesn't beg it.

> am I doing something dangerous here?

Yes and no.

>  Why isn't this the standard approach?

Because it will wreak unspeakable havoc when someone creates a RealFloat 
instance with a floatRadix > 2. A floatRadix of 10 or some power of 2 (16, 
256?) could be even reasonable.

But for conversions between RealFloat types with the same floatRadix, it's 
sort of okay, only it clobbers NaNs and (for some conversions) Infinities.
However, realToFrac does that too.

>
> If I understand what's happening, decodeFloat and encodeFloat are
> breaking the floating point numbers up into their constituent parts--
> presumably by bit masking the raw binary.

Probably.

> That would explain the
> performance improvement.  I suppose there is some implementation
> dependence here, but as long as the encode and decode are implemented as
> a matched set then I think I'm good.

Not entirely matched, for Float -> Float and Double -> Double, 
NaN -> -Infinity, maybe denormalized values break too.

±Infinity is mapped to a finite value at Float -> Double

But since toRational uses decodeFloat and fromRational uses encodeFloat, 
floatToFloat is in that respect no worse than realToFrac without rewrite 
rules.

>
> Cheers--
>  Greg