Haddock 2 and GHC builds (Re: build fails while running haddock
in fgl)
Claus Reinke
claus.reinke at talk21.com
Mon Jul 14 17:37:48 EDT 2008
> Haddock reads two types of interface files:
> * GHC .hi files
> These are needed by the GHC API for all packages that the package
> under processing depends on. Otherwise it can't rename and typecheck
> the code. We want the code renamed, and in the future it would be nice
> to have it typechecked so that one doesn't have to write type
> signatures for functions in order for them to be part of the
> documentation.
> * Haddocks's .haddock files
> These store Haddock specific information for packages. In particular,
> they store a renaming environment used to point names to the correct
> place in the documentation for a package. So when processing a package
> we want one of these files for each package dependency, so that links
> to these packages goes to the right places. (Documentation links don't
> always point to the same places as links in the regular renaming
> environment, even though they often do).
Thanks for the summary, David. After re-reading the older threads,
it seems then that this is really a GHC and GHC Api problem, not
a Haddock problem as such, and not a GHC bootstrap issue, either.
Haddock just happens to be the first GHC Api client that tries to
process real-life programs, ie, programs consisting of several packages,
where the source for some packages is either not available or where it
would be impractical to re-process the source for all packages.
Haddock could, presumably, arrange for its own .haddock files to
be compatible across versions, but different GHC major versions
cannot (yet?) process each others packages/.hi files.
So far, this has mainly been a major annoyance, forcing library
rebuilds for every GHC update (I stopped bothering with
rebuilding WxHaskell from source for GHC Head long ago -
no GUIs for me, because of this backwards incompatibility issue),
but with GHC Api clients, this is going beyond annoyance: we
would have to rebuild our GHC Api-based tools for every
GHC/library version used by sources we'd want to process!
Scenario:
- we have a tool T, built with GHC version V1, using V1's Api
- we have a Haskell project H, buildable with GHC version V2,
using V2's libraries, built with V2
Issue:
Unless the major version numbers of V1 and V2 match, or we
are willing and able either to rebuild all the libraries used by H
with T's GHC V1, or T with H's GHC V2, this isn't going to work!
I've never found the rationale for GHC's binary incompatibility
very convincing (yes, we want cross-package optimizations, and
yes, we do like it if GHC V(n+1) does a better job at compiling
package P than GHC Vn did; but why can't GHC V(n+1) do
at least as good a job as GHC Vn with package P compiled by
GHC Vn? augment the .hi-files format, don't replace it completely;
or have a generic it-works-with-all-versions-but-wont-be-fast
section, preceded by a preferably-use-this-for-speed-version-x
section).
However, until this fundamental issue is addressed, is there any
way to make GHC Api clients less dependent on the details of
a specific GHC Api version? In the scenario given above, if T,
despite being built with GHC V1, was able to work with GHC
V2's Api, then it could use GHC V2's formats describing the
libraries/packages needed by H. But that means (a) abstraction
from the rapidly changing GHC Api, to get a stable sub-interface,
and (b) another version issue: can a GHC Api client, compiled
with GHC V1, use GHC V2's Api, without recompilation?
Haddock would be a good test-case, but the testsuite doesn't
do any cross-GHC-version tests yet, does it?-)
Perhaps it helps to visualize Haddock as consisting of two
parts:
- part I is the generic Haddock code
- part II is the GHC Api version-specific Haddock code
At the moment, part I is empty, so one has to rebuild all of
Haddock 2 for every GHC version one might want to work
with. Ideally, part II would be empty, so that one Haddock 2
would work with any GHC Api version available. The question
is: would it be possible to move enough of the current Haddock
2 code from part II to part I, so that one only has to build and
install a small Haddock support module (part II) with each
GHC version?
Then, Haddock (part I) would no longer call the various GHC
Apis directly, but would instead call the Haddock (part II)
support modules installed for each GHC version (not unlike
the ghc-paths package we'd like to see installed with each
GHC version, to abstract from the version-specific locations).
Does that make sense? Do you think it would be viable?
> The presentation style is why Haddock needs to know so much about
> GHC's language. There are many differences between the pretty printing
> requirements of GHC and the HTML output we want from Haddock, so the
> HTML backend can not simply re-use the GHC pretty printer. So Haddock
> goes through tons of GHC language elements in order to render them in
> its own way. I don't know whether some kind of generalized pretty
> printer would be a good idea or not.
That is another general issue with the GHC Api: one might want
to reuse its parser and pretty printer, but in slightly modified form,
say a pretty printer that doesn't ignore source locations, a pretty
printer that produces HTML while ignoring the markup for layout
purposes, a parser that parses a slightly expanded grammar, etc.
Other than defining your own variations, preferably in a generic
form that allows at least some reuse, I see no way around this.
Generalizing the pretty printer to cover more variations might be
possible, but generalizing the grammar/parser in a similar way
would need serious refactoring, and the known techniques for
extensible grammars/parsers might not be well-adapted for the
heavy duty lifting expected from GHC's frontend.
> Then there is a lot of other code in Haddock that needs to know
> details of GHC's language, but it could probably be reduced by
> using generics.
That is one of the great advantages of generic code: not just is
there less boilerplate to write, but the boilerplate will adapt to
changes in the structures you're working over. For instance,
Programatica did use an extensible two-level grammar, but
Strafunski's StrategyLib isolated HaRe from the details of
recursibe loop-tying in the Ast.
Btw, last time I checked, both the hackage and the darcs
version of Haddock 2 had a strict GHC<6.9 dependency.
You did send some patches to fix this a while ago, but it
would help if you could merge your patches and have
one Haddock 2 version that builds with any GHC (in a
reasonable range - say, the versions that can build GHC
and provide a useable GHC Api).
Claus
More information about the Cvs-ghc
mailing list