Personal tools

GHC/As a library (up to 6.8)

From HaskellWiki

< GHC(Difference between revisions)
Jump to: navigation, search
m (fixed typo)
m (GHC/As a library moved to GHC/As a library (up to 6.8))
(16 intermediate revisions by 6 users not shown)
Line 62: Line 62:
 
-> IO Session -- your session; you will need it
 
-> IO Session -- your session; you will need it
 
</haskell>
 
</haskell>
The path to your GHC installation directory (e.g., /usr/local/lib/ghc-6.6) is in practice mandatory, even though in theory marked as optional.
+
The path to your GHC installation directory (e.g., /usr/local/lib/ghc-6.6) is in practice mandatory, even though in theory marked as optional. Also, in 6.8, the type signature of newSession changed to remove the GhcMode (but retaining the still not actually optional Maybe FilePath:
  +
  +
<haskell>newSession :: Maybe FilePath -> IO Session</haskell>
   
 
The session is configurable by dynamic flags (GHC dynamic flags plus session state; think <tt>-O2</tt>, <tt>-fvia-C</tt>, <tt>-fglasgow-exts</tt>, <tt>-package</tt>). This can be done with:
 
The session is configurable by dynamic flags (GHC dynamic flags plus session state; think <tt>-O2</tt>, <tt>-fvia-C</tt>, <tt>-fglasgow-exts</tt>, <tt>-package</tt>). This can be done with:
Line 74: Line 74:
 
-> IO (DynFlags, [String]) -- new flags, unknown args
 
-> IO (DynFlags, [String]) -- new flags, unknown args
 
</haskell>
 
</haskell>
The <hask>DynFlags</hask> record has a gazillion fields; ask ghci to show all of them. You can change them by hand, or use the parser (which implements the GHC command line format and does the Right Thing). But there is one you must note:
+
The <hask>DynFlags</hask> record has a gazillion fields; ask ghci to show all of them. You can change them by hand, or use the parser (which implements the GHC command line format and does the Right Thing). But there are a couple you must note:
 
<haskell>
 
<haskell>
 
data DynFlags = DynFlags { ...,
 
data DynFlags = DynFlags { ...,
Line 105: Line 105:
 
setSessionDynFlags session f0{hscTarget = HscInterpreted}
 
setSessionDynFlags session f0{hscTarget = HscInterpreted}
 
</haskell>
 
</haskell>
+
* no linking
  +
The GHC API behaves just like --make: it always links the program if you have a Main module. There are some cases where you will want to turn the linker off; this has the same affect as --make -c. In order to do this, set the ghcLink flag to NoLink:
  +
<haskell>
  +
session <- newSession BatchCompile (Just "/usr/local/lib/ghc-6.6")
  +
f0 <- getSessionDynFlags session
  +
(f1,b) <- parseDynamicFlags f0 ["-fglasgow-exts", "-O", "-package", "ghc", "-package Cabal",
  +
"foo", "-v", "bar"]
  +
-- b = ["foo", "bar"]; the other args are recognized
  +
-- in GHC 6.6 "-O" implies "-fvia-C", that kind of thing is automatic here too
  +
setSessionDynFlags session $ f1 {ghcLink=NoLink}
  +
</haskell>
 
=== Load or compile modules ===
 
=== Load or compile modules ===
   
Line 144: Line 144:
 
Modules are compiled as per the <hask>hscTarget</hask> flag (<tt>-fasm</tt>, <tt>-fvia-C</tt>, or interpreter) in <hask>DynFlags</hask>, ''independent of GHC mode''.
 
Modules are compiled as per the <hask>hscTarget</hask> flag (<tt>-fasm</tt>, <tt>-fvia-C</tt>, or interpreter) in <hask>DynFlags</hask>, ''independent of GHC mode''.
   
=== Compiling to Core ===
+
=== Compiling to and from Core ===
   
To compile a file to Core (the intermediate language used in GHC's simplifier), call <tt>compileToCore</tt>:
+
Note: the following will only work as advertised in the HEAD (GHC 6.9).
  +
  +
To compile a file to Core (the intermediate language used in GHC's simplifier), call <tt>compileToCoreModule</tt>:
 
<haskell>
 
<haskell>
compileToCore :: Session -> FilePath -> IO (Maybe [CoreBind])
+
compileToCoreModule :: Session -> FilePath -> IO (Maybe CoreModule)
 
</haskell>
 
</haskell>
<tt>compileToCore</tt> takes a session and a filename and, if successful, returns a list of Core bindings corresponding to the given module. It is not necessary to set a target first. For documentation of the Core data types, see [http://darcs.haskell.org/ghc/compiler/coreSyn/CoreSyn.lhs the CoreSyn module] in GHC.
+
<tt>compileToCoreModule</tt> takes a session and a filename and, if successful, returns a <tt>CoreModule</tt> corresponding to the Haskell module denoted by the filename. It is not necessary to set a target first.
  +
  +
The <tt>CoreModule</tt> data type is specified in [http://darcs.haskell.org/ghc/compiler/main/HscTypes.lhs the HscTypes module] in GHC. The most important field is probably <tt>cm_binds</tt>, a list of Core bindings for all top-level declarations from the original Haskell source file.
  +
<haskell>
  +
data CoreModule
  +
= CoreModule {
  +
-- Module name
  +
cm_module :: !Module,
  +
-- Type environment for types declared in this module
  +
cm_types :: !TypeEnv,
  +
-- Declarations
  +
cm_binds :: [CoreBind]
  +
-- Imports
  +
cm_imports :: ![Module]
  +
}
  +
</haskell>
  +
The type <tt>TypeEnv</tt> is defined in [http://darcs.haskell.org/ghc/compiler/main/HscTypes.lhs HscTypes.lhs].
  +
  +
For documentation of the Core data types, see [http://darcs.haskell.org/ghc/compiler/coreSyn/CoreSyn.lhs the CoreSyn module] in GHC.
  +
  +
For some applications, you may want to compile the Haskell source for various standard libraries into Core. Given the current API, this is probably only possible in the presence of a GHC tree that was built from source, including libraries. (Otherwise, there may be problems finding the right import files.) If you have experience suggesting otherwise, you may wish to report back.
  +
  +
You can also compile Core code to object code:
  +
<haskell>
  +
compileCoreToObj :: Bool -> Session -> CoreModule -> IO Bool
  +
</haskell>
  +
<tt>compileCoreToObj</tt> takes a boolean flag that says whether or not to run the simplifier, a session, and a <tt>CoreModule</tt>, and returns True iff compilation to machine code succeeded. As a side effect, it creates the executable file, as well as any necessary <tt>.hi</tt> or <tt>.o</tt> files, in the current working directory. This has only been tested with a single self-contained Core module (i.e., one that only depends on standard library functions rather than on functions in other user-defined modules).
  +
  +
These functions are a work-in-progress, so if you notice anything that is not what you expect, please complain on the cvs-ghc mailing list or [http://hackage.haskell.org/trac/ghc/newticket?type=bug file a bug report].
   
 
=== Interactive evaluation ===
 
=== Interactive evaluation ===
Line 273: Line 273:
 
== Interactive mode example ==
 
== Interactive mode example ==
   
The file [[Media:Interactive.hs]] (also requires [[Media:MyPrelude.hs]]) serves as an example for using GHC as a library in interactive mode. It also shows how to replace some of the standard prelude functions with modified versions. See the comments in the code for further information.
+
The file [[Media:Interactive-6.6.hs]] or [[Media:Interactive-6.8.hs]] (requires [[Media:MyPrelude.hs]]) serve as an example for using GHC as a library in interactive mode. It also shows how to replace some of the standard prelude functions with modified versions. See the comments in the code for further information.
   
 
== Using the GHC library from inside GHCi ==
 
== Using the GHC library from inside GHCi ==
Line 289: Line 289:
 
3
 
3
 
</pre>
 
</pre>
  +
  +
== Profiling ==
  +
  +
To build the profiling version of GHC-as-a-library, add:
  +
<pre>
  +
GhcCompilerWays=p
  +
</pre>
  +
to your <tt>build.mk</tt> file, and rebuild GHC.
  +
  +
<b>Note</b>: This seems to be broken as of [http://hackage.haskell.org/trac/ghc/ticket/2377 GHC 6.8.3].
  +
  +
== Binary size ==
  +
  +
Using the GHC API in your
  +
applications results in large executables (e.g. > 15Mb). You can mitigate this by an enormous amount using the tools <tt>strip</tt> and <tt>gzexe</tt>; this may reduce the executable to 15-30% of its previous size. ([http://austin.youareinferior.net/?q=node/29 Example]:
  +
<blockquote>"I will take this time to point out that using the GHC API in your
  +
applications results in *large* executables. The Interact example
  +
above when compiled with vanilla --make options resulted in a whopping
  +
17mb executable. I've observed however you can mitigate this by an
  +
enormous amount using the tools strip and gzexe [see also upx. -ed], taking it down to a light 2.5mb (a size reduction of about 85%)."[http://groups.google.com/group/fa.haskell/browse_thread/thread/90203222e5830d0a]</blockquote> <!-- I also found the page cached as <http://72.14.205.104/search?q=cache:DXz9SkIg0OAJ:austin.youareinferior.net/%3Fq%3Dnode/29+http://austin.youareinferior.net/%3Fq%3Dnode/29&hl=en&ct=clnk&cd=1&gl=us> -->
  +
  +
(Using 6.8.1 for me results in a 16M binary using the example interactive session from this page, even when thoroughly stripped. Using UPX on its most intensive settings brought it down to 2.9M. --[[User:Gwern|Gwern]])

Revision as of 15:32, 8 October 2008

Using GHC as a library

Contents

In GHC 6.5 and subsequently you can import GHC as a Haskell library, which lets you write a Haskell program that has access to all of GHC.

This page is a place for everyone to add

  • Notes about how to get it working
  • Comments about the API
  • Suggestions for improvement

and so on.

More documentation is available on the GHC wiki: http://cvs.haskell.org/trac/ghc/wiki/Commentary/Compiler/API

1 Getting started

You'll need a version of GHC (at least 6.5) that supports the GHC API. The GHC download page offers stable releases and development versions; you can also use CVS (instructions) or darcs (e.g., darcs get --partial http://darcs.haskell.org/ghc).

To use the GHC API you say

import GHC

Doing this imports the module GHC from the package ghc. This module exports the "GHC API", which is still in a state of flux. Currently it's not even Haddock-documented. You can see the source code (somewhat documented). There are also other modules of interest as you do more special things.

Here's an example main program that does it Media:Main.hs (good for GHC 6.6). You need to manually change the value of myGhcRoot to point to your GHC directory.

To compile Media:Main.hs, you have to turn on the flag "-package ghc", e.g.

  ghc -package ghc Main.hs

2 Common use cases and functions

Assumes GHC 6.6.

2.1 Default exception handling

If you don't handle exceptions yourself, you are recommended to wrap all code inside the wrapper:

defaultErrorHandler :: DynFlags -> IO a -> IO a
DynFlags.defaultDynFlags :: DynFlags

This catches exceptions and prints out exception details and exits your program with exit code 1.

Example:

import GHC
import DynFlags(defaultDynFlags)
 
main = defaultErrorHandler defaultDynFlags $ do
         {-
            stuff in the following subsections
         -}
You do not have to use
defaultDynFlags
, but it's the easiest starting point.

2.2 Initialization

First create a session:

newSession :: GhcMode         -- BatchCompile | Interactive | MkDepend | ...
           -> Maybe FilePath  -- GHC installation directory
           -> IO Session      -- your session; you will need it

The path to your GHC installation directory (e.g., /usr/local/lib/ghc-6.6) is in practice mandatory, even though in theory marked as optional. Also, in 6.8, the type signature of newSession changed to remove the GhcMode (but retaining the still not actually optional Maybe FilePath:

newSession :: Maybe FilePath -> IO Session

The session is configurable by dynamic flags (GHC dynamic flags plus session state; think -O2, -fvia-C, -fglasgow-exts, -package). This can be done with:

getSessionDynFlags :: Session -> IO DynFlags
setSessionDynFlags :: Session
                   -> DynFlags
                   -> IO [PackageId]  -- important iff dynamic-linking
parseDynamicFlags :: DynFlags  -- old flags
                  -> [String]  -- e.g., all or part of getArgs
                  -> IO (DynFlags, [String])  -- new flags, unknown args
The
DynFlags
record has a gazillion fields; ask ghci to show all of them. You can change them by hand, or use the parser (which implements the GHC command line format and does the Right Thing). But there are a couple you must note:
data DynFlags = DynFlags { ...,
    hscTarget :: HscTarget }    -- HscC | HscAsm | HscInterpreted | ...
This corresponds to -fvia-C, -fasm, or interpreting. When the session needs to re-compile a module, this field controls how. The default is
HscAsm
, even in the interactive mode, meaning the interactive mode may produce .hi and .o files too. If you want to follow GHCi in not doing that, you must set this field to
HscInterpreted
yourself. (On the other hand, it is fun to contemplate an interactive session that generates machine code upon your command.)
setSessionDynFlags
also sets up your session's awareness of the package database (without which you can't even use the Prelude), so even if you like the defaults, you should still call it. (Older code called
PackageConfig.initPackages
for this.)

Examples:

  • vanilla compiler, use all defaults (rare but good start)
session <- newSession BatchCompile (Just "/usr/local/lib/ghc-6.6")
getSessionDynFlags session >>= setSessionDynFlags session
  • compiler with custom flags, easy with parser
session <- newSession BatchCompile (Just "/usr/local/lib/ghc-6.6")
f0 <- getSessionDynFlags session
(f1,b) <- parseDynamicFlags f0 ["-fglasgow-exts", "-O", "-package", "ghc", "-package Cabal",
                                "foo", "-v", "bar"]
-- b = ["foo", "bar"]; the other args are recognized
-- in GHC 6.6 "-O" implies "-fvia-C", that kind of thing is automatic here too
setSessionDynFlags session f1
  • interactive session with interpreter
session <- newSession Interactive (Just "/usr/local/lib/ghc-6.6")
f0 <- getSessionDynFlags session
setSessionDynFlags session f0{hscTarget = HscInterpreted}
  • no linking

The GHC API behaves just like --make: it always links the program if you have a Main module. There are some cases where you will want to turn the linker off; this has the same affect as --make -c. In order to do this, set the ghcLink flag to NoLink:

session <- newSession BatchCompile (Just "/usr/local/lib/ghc-6.6")
f0 <- getSessionDynFlags session
(f1,b) <- parseDynamicFlags f0 ["-fglasgow-exts", "-O", "-package", "ghc", "-package Cabal",
                                "foo", "-v", "bar"]
-- b = ["foo", "bar"]; the other args are recognized
-- in GHC 6.6 "-O" implies "-fvia-C", that kind of thing is automatic here too
setSessionDynFlags session $ f1 {ghcLink=NoLink}

2.3 Load or compile modules

To compile code or load modules, first set one or more targets, then call the
load
function.
guessTarget :: String       -- "filename.hs" or "filename.lhs" or "MyModule"
            -> Maybe Phase  -- if not Nothing, specifies starting phase
            -> IO Target
addTarget :: Session -> Target -> IO ()
setTargets :: Session -> [Target] -> IO ()
getTargets :: Session -> IO [Target]
removeTarget :: Session -> TargetId -> IO ()
load :: Session -> LoadHowMuch -> IO SuccessFlag
data LoadHowMuch
  = LoadAllTargets
  | LoadUpTo ModuleName
  | LoadDependenciesOf ModuleName
Loading or compiling produces temp directories and files, which can only be correctly cleaned up or kept (depending on temp file flags in
DynFlags
) with the wrapper:
defaultCleanupHandler :: DynFlags -> IO a -> IO a
Two factors constrain how much code should be wrapped. At the minimal, function calls such as
load
and
depanal
that potentially unlit or compile or link should be wrapped. At the maximal, the flags passed to
defaultCleanupHandler
should be identical to those set to the session, so practically the wrapping should be after setting up seesion flags.

Example:

t <- guessTarget "Main.hs" Nothing
addTarget session t    -- setTargets session [t] is also good
f <- getSessionDynFlags session
sf <- defaultCleanupHandler f (load session LoadAllTargets)
case sf of Succeeded -> ...
           Failed -> ...

Dependencies (both modules and packages) are processed automatically, and an executable is produced if appropriate, precisely like --make.

Modules are compiled as per the
hscTarget
flag (-fasm, -fvia-C, or interpreter) in
DynFlags
, independent of GHC mode.

2.4 Compiling to and from Core

Note: the following will only work as advertised in the HEAD (GHC 6.9).

To compile a file to Core (the intermediate language used in GHC's simplifier), call compileToCoreModule:

compileToCoreModule :: Session -> FilePath -> IO (Maybe CoreModule)

compileToCoreModule takes a session and a filename and, if successful, returns a CoreModule corresponding to the Haskell module denoted by the filename. It is not necessary to set a target first.

The CoreModule data type is specified in the HscTypes module in GHC. The most important field is probably cm_binds, a list of Core bindings for all top-level declarations from the original Haskell source file.

data CoreModule
  = CoreModule {
      -- Module name
      cm_module   :: !Module,
      -- Type environment for types declared in this module
      cm_types    :: !TypeEnv,
      -- Declarations
      cm_binds    :: [CoreBind]
      -- Imports
      cm_imports  :: ![Module]
    }

The type TypeEnv is defined in HscTypes.lhs.

For documentation of the Core data types, see the CoreSyn module in GHC.

For some applications, you may want to compile the Haskell source for various standard libraries into Core. Given the current API, this is probably only possible in the presence of a GHC tree that was built from source, including libraries. (Otherwise, there may be problems finding the right import files.) If you have experience suggesting otherwise, you may wish to report back.

You can also compile Core code to object code:

compileCoreToObj :: Bool -> Session -> CoreModule -> IO Bool

compileCoreToObj takes a boolean flag that says whether or not to run the simplifier, a session, and a CoreModule, and returns True iff compilation to machine code succeeded. As a side effect, it creates the executable file, as well as any necessary .hi or .o files, in the current working directory. This has only been tested with a single self-contained Core module (i.e., one that only depends on standard library functions rather than on functions in other user-defined modules).

These functions are a work-in-progress, so if you notice anything that is not what you expect, please complain on the cvs-ghc mailing list or file a bug report.

2.5 Interactive evaluation

Interactive evaluation ala GHCi is done by
runStmt
. But first, this is always done under a current context, i.e., which modules are in scope. Most probably you want to have at least the Prelude and those you loaded in the previous section. How to manipulate the context:
setContext :: Session
           -> [Module]    -- their top levels will be visible
           -> [Module]    -- their exports will be visible
           -> IO ()
getContext :: Session -> IO ([Module], [Module])
findModule :: Session -> ModuleName -> Maybe PackageId -> IO Module
mkModule :: PackageId -> ModuleName -> Module
mkModuleName :: String -> ModuleName
PackageConfig.stringToPackageId :: String -> PackageId
Every module given to
setContext
must be either in a package known to the session or has been loaded as per the previous subsection. Example:
-- equivalent to GHCi's :m Prelude Control.Monad *Main
prelude <- findModule session (mkModuleName "Prelude") Nothing
monad <- findModule session (mkModuleName "Control.Monad") Nothing
usermod <- findModule session (mkModuleName "Main") Nothing  -- we have loaded this
setContext session [usermod] [prelude,monad]
You can also be specific about packages. You can also use
mkModule
instead of
findModule
, or even some module query functions in the next subsection.

Having set a useful context, we're now ready to evaluate.

runStmt :: Session -> String -> IO RunResult
data RunResult
    = RunOk [Name]    -- names bound by the expression
    | RunFailed
    | RunException GHC.IOBase.Exception  -- that's Control.Exception.Exception

Example:

runStmt session "let n = 2 + 2"  -- n is bound
runStmt session "n"              -- 4 is printed (note "it" is bound)

(Interactive evaluation works in BatchCompile mode too! There are still other subtle differences, so this is not recommended.)

2.6 Type checking

What if I want the type info from a module?

Once the modules are loaded in the session, they are already type-checked. The type information of a loaded module are stored in a data-structure called

ModuleInfo
. To access the type information, we need to apply function
getModuleInfo
to the target module.
ModuleInfo
is defined as follows,
data ModuleInfo = ModuleInfo {
	minf_type_env  :: TypeEnv,
	minf_exports   :: NameSet, -- ToDo, [AvailInfo] like ModDetails?
	minf_rdr_env   :: Maybe GlobalRdrEnv,	-- Nothing for a compiled/package mod
	minf_instances :: [Instance]
#ifdef GHCI
        ,minf_modBreaks :: ModBreaks 
#endif
	-- ToDo: this should really contain the ModIface too
  }
The field
minf_type_env
is holding the type environment, of type
TypeEnv
, which is defined as,
type TypeEnv = [TyThing]

where TyThing can be an identifier, a class, a type constructor or a data constructor.

data TyThing = AnId     Id
	     | ADataCon DataCon
	     | ATyCon   TyCon
	     | AClass   Class
Recalling the running example in the previous subsection, note that the variable
usermod
captures the user-defined module "Main".

We retrieve the module information of "Main" module and unfold the type environment out of it.

mb_userModInfo <- getModuleInfo session usermod
case mb_userModInfo of 
  Just userModInfo ->
    let userTyThings = modInfoTyThings userModInfo              -- access the type environments
        userTys = [ (i, idType i) | (AnId i) <- userTyThings ]  -- we are only interested in the declared ids and their (inferred) types. 
    in  ... -- do something with userTys
  Nothing -> return ()


2.7 Queries

-- Get module dependency graph
getModuleGraph :: Session -> IO ModuleGraph    -- ModuleGraph = [ModSummary]
-- Get bindings
getBindings :: Session -> IO [TyThing]

2.8 Messages

Compiler messages (including progress, warnings, errors) are controlled by verbosity and routed through a callback mechanism. These are fields in
DynFlags
:
data DynFlags = DynFlags { ...,
    verbosity :: Int,
    log_action :: Severity -> SrcLoc.SrcSpan -> Outputable.PprStyle -> ErrUtils.Message -> IO () }
The
verbosity
field corresponds to the command line -v flag;
parseDynamicFlags
is applicable. A low verbosity means
log_action
is seldom called.

You can set the callback to your logger, like

f <- getSessionDynFlags session
setSessionDynFlags session f{log_action = my_log_action}

This sets the session's logger, but it will not see exceptions.

If you call
defaultErrorHandler
at the outermost, clearly its message logger has to be set separately:
main = defaultErrorHandler defaultDynFlags{log_action = my_log_action} $ do ...
This logger will see messages produced by
defaultErrorHandler
upon exceptions.

3 Interactive mode example

The file Media:Interactive-6.6.hs or Media:Interactive-6.8.hs (requires Media:MyPrelude.hs) serve as an example for using GHC as a library in interactive mode. It also shows how to replace some of the standard prelude functions with modified versions. See the comments in the code for further information.

4 Using the GHC library from inside GHCi

This works, to some extent. However, beware about loading object code, because there is only a single linker symbol table in the runtime, so GHCi will be sharing the symbol table with the new GHC session.

$ ghci -package ghc
Prelude> :m + GHC PackageConfig
Prelude GHC> session <- newSession Interactive (Just "/usr/local/lib/ghc-6.6")
Prelude GHC> setSessionDynFlags session =<< getSessionDynFlags session
Prelude GHC> setContext session [] [mkModule (stringToPackageId "base") (mkModuleName "Prelude")]
Prelude GHC> runStmt session "let add1 x = x + 1"
Prelude GHC> runStmt session "add1 2"
3

5 Profiling

To build the profiling version of GHC-as-a-library, add:

GhcCompilerWays=p

to your build.mk file, and rebuild GHC.

Note: This seems to be broken as of GHC 6.8.3.

6 Binary size

Using the GHC API in your applications results in large executables (e.g. > 15Mb). You can mitigate this by an enormous amount using the tools strip and gzexe; this may reduce the executable to 15-30% of its previous size. (Example:

"I will take this time to point out that using the GHC API in your applications results in *large* executables. The Interact example above when compiled with vanilla --make options resulted in a whopping 17mb executable. I've observed however you can mitigate this by an enormous amount using the tools strip and gzexe [see also upx. -ed], taking it down to a light 2.5mb (a size reduction of about 85%)."[1]

(Using 6.8.1 for me results in a 16M binary using the example interactive session from this page, even when thoroughly stripped. Using UPX on its most intensive settings brought it down to 2.9M. --Gwern)