Difference between revisions of "Playing by the rules"

From HaskellWiki
Jump to navigation Jump to search
(direct link to GHC/Using_rules)
(Category:Program transformation)
Line 300: Line 300:
 
[[Category:Mathematics]]
 
[[Category:Mathematics]]
 
[[Category:Performance]]
 
[[Category:Performance]]
  +
[[Category:Program transformation]]

Revision as of 08:46, 18 January 2008

GHC's term rewriting engine provides a playground for all sorts of user-defined optimisations and tweaks to Haskell code. This page collects examples of the use of rewrite rules. There is more information available on the GHC rewrite rules page.

reverse . reverse = id

The following question was asked in #haskell:

 15:40  Excedrin> why doesn't 'head $ reverse $ reverse [1..]' work in Haskell?

The answer is: it does work, if you tell the compiler that:

reverse . reverse = id

Using the following rewrite rule:

{-# RULES
"reverse.reverse/id" reverse . reverse = id
  #-}

main = print . head . reverse . reverse $ [1..]

We can "optimise" this program, changing a non-terminating program:

$ ./a.out
^C

into a much faster one:

$ ghc -fglasgow-exts -ddump-simpl-stats A.hs
5 PostInlineUnconditionally
4 UnfoldingDone
1 RuleFired
    1 reverse.reverse/id
    7 BetaReduction
    3 SimplifierDone

$ ./a.out
1

As can be seen here, rewrite rules let you teach the compiler about interesting algebraic properties you code satisfies, that it can't find on its own.

Note: In general it's a bad idea for rules to change the semantics (meaning) of your code in such a way. Consider someone using reverse . reverse to demand strictness of an IO action, this code would no longer work with the above rule.

Fast ByteString construction

Another technique is to use rewrite rules to remove intermediate lists when building bytestrings from literal string constants. The problem is that string constants must be converted from a [Char] to a ByteString, to use literal strings as ByteStrings. This is highly inefficient, when GHC already allocates string constants as unboxed byte arrays. We can bypass the [Char] conversion using rewrite rules.

Supposing we want to use a ByteString literal:

import qualified Data.ByteString.Char8 as C

mystring = C.pack "rewrite rules are fun!"
main = C.putStrLn mystring

and it is compiled by GHC to:

mystring_rDR = Data.ByteString.Char8.pack (GHC.Base.unpackCString# "rewrite rules are fun!"#)

Main.main = Data.ByteString.putStrLn mystring_rDR

:Main.main = GHC.TopHandler.runMainIO @ () Main.main

Now, the string literal has been packed by GHC for us into a Addr#, pointing to a proper C string. Now, we'd like to remove that intermediate unpackCString#, and just pack the string literal straight into a ByteString, avoiding an intermediate list.

So we add a rewrite rule:

{-# RULES
"pack/packAddress" forall s . pack (unpackCString# s) = B.packAddress s
#-}

Now, when compiled with -O -fglasgow-exts, we get:

1 RuleFired
    1 FPS pack/packAddress

and the code is transformed as follows:

mystring = Data.ByteString.Base.packAddress "rewrite rules are fun!"

to

mystring = Data.ByteString.Char8.pack (GHC.Base.unpackCString# "rewrite rules are fun

and then the rewrite rule kicks in, and we construct a new ByteString directly from the Addr#, via a call to strlen to get the length:

mystring =
  let addr#_a146 :: GHC.Prim.Addr#
      addr#_a146 = "rewrite rules are fun!"#
  in case Data.ByteString.Base.$wccall addr#_a146 s2#_a1Q8 of
     (# ds3_a1Re, ds4_a1Rf #) ->
        Data.ByteString.Base.PS addr#_a146
                                (GHC.ForeignPtr.PlainForeignPtr var#_a1Q9)
                                0
                                (GHC.Prim.word2Int# ds4_a1Rf)

exactly what we'd like. So at runtime, this string will require only a call to strlen to build.

Locating errors

Rewrite rules can almost be used to solve the infamous 'fromJust: Nothing' error message. Couldn't we just use rewrite rules to rewrite *transparently all uses of fromJust to safeFromJust, tagging the call site with a location? To work this requires a few things to go right:

  • a rewrite rule
  • assertions
  • and rewrite rules firing before assertions are expanded

Let's try this. Consider the program:

 1      import qualified Data.Map as M
 2      import Data.Maybe
 3
 4      main = print f
 5
 6      f = let m = M.fromList
 7                      [(1,"1")
 8                      ,(2,"2")
 9                      ,(3,"3")]
10              s = M.lookup 4 m
11          in fromJust s

When we run it we get the not so useful error:

   $ ./A
   A: Maybe.fromJust: Nothing

Ok, so we have a few tricks for locating this, using LocH, we can catch an assertion failure, but we have to insert the assertion by hand:

 1      import Debug.Trace.Location
 2      import qualified Data.Map as M
 3      import Data.Maybe
 4
 5      main = do print f
 6
 7      f = let m = M.fromList
 8                      [(1,"1")
 9                      ,(2,"2")
10                      ,(3,"3")]
11              s = M.lookup 4 m
12          in safeFromJust assert s
13
14      safeFromJust a = check a . fromJust

Which correctly identifies the call site:

   $ ./A
   A: A.hs:12:20-25: Maybe.fromJust: Nothing

Now, this approach is a little fragile. 'assert' is only respected by GHC if -O is not on, so if we happened to try this trick with -O, we'd get:

   $ ./A
   A: Debug.Trace.Location.failure

So lesson one: you have to do the bug hunting with -Onot.

Currently there's -fignore-asserts for turning off assertions, but no flag for turning them on with -O, Simon, could this be fixed? Could we get a -frespect-asserts that works even with -O ?

Ok, assuming this assert trick is used, can we get the compiler to insert the asserts for us? If so, this would be a great advantage, you'd just be able to switch on a flag, or import a debugging module, and your fromJusts would be transparently rewritten. With rewrite rules we do just this!

So, to our initial unsafe use of fromJust, we add a rewrite rule:

--
-- rewrite fromJust to a located version, and hope that GHC expands
-- 'assert' after the rule fires..
--
{-# RULES
"located fromJust" fromJust = check assert . myFromJust
  #-}

This just tells the compiler to replace every occurence of fromJust with a assertion-throwing fromJust, should it fail. We have to use myFromJust here, to avoid rule recursion.

--
-- Inlined to avoid recursion in the rule:
--
myFromJust :: Maybe a -> a
myFromJust Nothing  = error "Maybe.fromJust: Nothing" -- yuck
myFromJust (Just x) = x

Ok, so can we get ghc to rewrite fromJust to the safe fromJust magicaly?

   $ ghc --make -Onot A.hs -fglasgow-exts -ddump-simpl-stats
   [1 of 1] Compiling Main             ( A.hs, A.o )
   1 RuleFired
       1 located fromJust
   Linking A ...

Yes, the rule fired! GHC *did* rewrite our fromJust to a more useful fromJust. Running it:

$ ./A
A: A.hs:19:36-41: Maybe.fromJust: Nothing

Looks good! But that is deceiving: the assert was expanded before the rule fired, and refers to the rewrite rule source line (line 19), not the fromJust call site (line 12). Now if we could just have the 'assert' token inserted into the AST before it was expanded, we'd be home and dry. Could this be done with TH? Or could we arrange for asserts in rewrite rules not to be expanded till later?

Note that this is still a useful technique, we can rewrite head/fromJust/... to some other possibly more useful message. And if we can constrain the rule to fire in only particular modules, we may be able to narrow down the bug, just by turning on a rule. For example, adding:

{-# RULES
"located fromJust" fromJust = safeFromJust
  #-}

safeFromJust s = case s of
    Nothing -> "safeFromJust: failed with Nothing. Ouch"
    Just x  -> x

will produce:

$ ./A
"safeFromJust: failed with Nothing. Ouch"

So rewrite rules can be used to transparently alter uses of partial functions like head and fromJust.

Fusion and deforestation

By far the most elaborate use of compiler rewrite rules is to perform automatic deforestation of intermediate data structures. This is documented in a number of research papers. Note that also for this application field of rewrite rules the above mentioned problem of potentially changing the semantics of programs occurs. Read further under correctness of short cut fusion.

Symbolic differentiation

I like to demonstrate to people, that GHC has already built-in a kind of Computer Algebra system. The optimizer can be abused to differentiate expressions symbolically. Unfortunately it cannot be forced to do so. That's why, some, better say: too many, expressions are not derived.