RFC: ghc's dynamic linker

Simon Marlow simonmar@microsoft.com
Tue, 27 Aug 2002 17:19:15 +0100


Ok, let's start with a possible API to the library I think you're asking
for:

   loadLibrary  :: FilePath -> IO ()
   lookupEntity :: String -> IO a

(there's no type checking at dynamic link time of course, so you get to
claim the returned object is whatever type you like).

Right, now what would it take to implement this.  As Duncan points out,
this is almost possible already using the GHCi dynamic linker, which is
available to any program compiled with GHC via the FFI.  The interface
is fairly straightforward, eg:

  foreign import "initLinker" unsafe
     initLinker :: IO ()

  foreign import "lookupSymbol" unsafe
     c_lookupSymbol :: CString -> IO (Ptr a)

  foreign import "loadObj" unsafe
     c_loadObj :: CString -> IO Int

but the main problem is that the dynamic linker can't link new modules
to symbols in the currently running binary.  So, in order to link a new
Haskell module, you first have to load up a fresh copy of the 'base' and
'haskell98' packages, just like GHCi does.  It *almost* works to do
this, except that you get strange effects, one of which is that you have
two copies of stdout each with their own buffer.

Going the final step and allowing linkage to the current binary is
possible, it just means the linker has to know how to read the symbol
table out of the binary, and you have to avoid running 'strip'.  I
believe reading the symbol table is quite straightforward, the main
problem being that on Unix you don't actually know where the binary
lives, so you have to wire it in or search the PATH. =20

Another problem is that you don't normally link a *complete* copy of the
base package into your binary, you only link the bits you need.  Linking
the whole lot would mean every binary would be about 10M; but doing this
on the basis of a flag which you turn on when you want to do dynamic
linking maybe isn't so bad.

There are a couple of other options:

  - make your program into a collection of dynamically-linked
    libraries itself.  i.e. have a little stub main() which links
    with the RTS, and loads up 'base' followed by your program
    when it starts.  The startup cost would be high (we don't
    do lazy linking in Haskell), but you'd only get one copy of
    the base package and this is possible right now.

  - make GHC generate objects that play nicely with the standarad
    dynamic linker on your system.  This is entirely non-trivial,
    I believe.  See previous discussions on this list.  However,
    it might get easier in the future; I'm currently working on
    removing the need to distinguish code from data in GHC's RTS,
    which will eliminate some of the problems.

Summary: extending GHC's dynamic linker to be able to slurp in the
symbol table from the currently running binary would be useful, and is a
good bite-sized GHC hacker task.  I can't guarantee that we'll get
around to it in a timely fashion, but contributions are, as always,
entirely welcome...

Cheers,
	Simon