[Haskell] Re: compiler-independent core libraries infrastructure

Bulat Ziganshin bulat.ziganshin at gmail.com
Fri Sep 15 07:35:53 EDT 2006


(more details about problem i try to solve and plan to do it)

during development of Haskell compilers, it was discovered that their
libraries has so much in common. as a result, common library for Haskell
compilers was born that hides differences between them and provides common API.
unfortunately, compatibility over compiler versions was not considered
as important problem to address, so each version of 'base' library is
written in a way that is compatible only with latest versions of ghc,
hugs and nhc. this makes library features compiler-version-dependent,
despite the fact that most of these features, such as array interface,
fast ForeignPtr, or improved HashTable don't really rely on compiler version!

so we have a situation when APIs are bound to compiler versions used
- if one need to use old API, he is forced to continue use old
compilers, if he need new API - he is forced to switch to new compiler
(although old may be more stable). it's impossible to combine old and
new APIs in one program. switching to new compiler version becomes a
serous work. as a side-effect, improvements in APIs can't be
shipped with new GHC versions until release of new major version. for
example, HashTable was improved about year ago but last major ghc
version was shipped 1.5 years ago, so this improvement was not
available until last days. and as a side-side-effect, major GHC versions
can't be released too often because switching to new version needs
changes in existing programs. for example, this means that ByteString
support included in last GHC, can't change its API in the next 1 or 2
years.

Cabal makes it easy to use any library you need, but it cannot be used
to upgrade base library just because the library itself don't support
compatibility with previous major GHC version. 'base' library contains
a lot of features and its a pain that these features is the only thing
that can't be upgraded using Cabal

so, my conclusion is that existing 'base' library was not developed
with a compatibility-across-compiler-versions in mind and this makes it
unsuitable for large projects whose life-cycle is greater than 1 year.
and because this library is a real base for Haskell implementations,
this makes the whole implementations non-suitable for large projects.
you can see it yourself in GHC sources which is a full of
version-specific #ifs. instead of incorporating differences between
various GHC versions into each application, we should hide them inside
special library!

fortunately, Cabal and packages support allow now to change the
situation. GHC team already planned this movement, although these
plans was more about splitting base package into several independent
ones. it's also important (faster ForeignPtr or HashTable can be
utilized without switching to new array interface), but this don't
solves above-mentioned problem. so, i try to propose the solution that
serves the following goals:

- continue to use old APIs with new compiler versions (that allow to
upgrade to new compilers without rewriting large programs)

- utilize new APIs with old-good compilers (that may be more
stable or have some unique features)

- simplify inclusion of support for new compiler brands (yhc, jhc,
ehc) in base library and therefore other libraries (which is mainly
written against 'base')



The plan is:

before splitting base library into several task-oriented parts
(arrays, byte strings, FFI, concurrency...) make split into two
fundamental parts - compiler-dependent and compiler-independent ones

Compiler-independent part (which i will call 'algorithms' library)
should contain all definitions that may be written in pure Haskell -
types, functions, classes, algorithms. It should be based on calling
functions and using fundamental types that next part provides. Its
definition should not contain compiler dependencies, except for
optimization purposes (i.e. such compiler-specific definitions should
be strictly optional)

Second, compiler-dependent part (i will call it 'core' library) should
provide unified API to internal compiler library. Shortly speaking, we
can just get GHC primops list, give them standard names and publish it
as (ghc-specific) version of core library:

type Arr = Array#
newArr = newArray#
indexArr = indexArray#
...

but of course that is not enough. APIs of different compilers differ
and this library should hide these differences providing common API
for all the compiler brands and versions we are plan to support. So,
for example, we will not expose GMP operations but implement
operations over the Integer values:

integerMul (S# a) (S# b) = intMul# a b
integerMul (J# a) (J# b) = integerMul# a b
...

so at last we should finish with some "virtual Haskell compiler common
low-level API" and a series of implementations of these API for
various compilers and compiler versions. for the features that don't
supported in compiler itself some emulation should be provided

Having 'core' library, we will no more fight with compiler
incompatibilities. Any library can be written against interface it
provides and therefore run on any compiler and any version which is
supported by this library. When new compiler released, we need to
add it support only to core library. When we discover new features,
whose support is compiler-dependent, we add support of these features
only to core library and require that our lib/app use new version of
'core' 




It was first part of the plan. Second part is about further dividing
of 'core' library into separate packages. We can leave it monolithic. Or
we can split it into compiler-specific parts, as i proposed in previous
letter. Or we can split it into task-specific parts, like the algorithms
library. We can even combine splitting into compiler-specific and
task-specific parts

i think that splitting it into task-specific parts is appropriate for
APIs that cannot be implemented on some compilers, even via emulation.
extracting compiler-specific parts is in a line with current tradition
and simplifies some things. In this case, compiler-specific library
(let's call it ghc-core/hugc-core/nhc-core, although currently it's
GHC.* part of base/hugsbase/???) should just provide mapping of
common APIs to compiler-specific ones:

module GHC.Integer where

data Integer = ...
integerAdd = ...
integerSub = ...


'core' library itself selects between implementations and provide
emulation routines (written in pure Haskell) if some supported
compilers don't implement this API:

module Core.Integer where

#if GHC
import GHC.Integer
#elseif HUGS
import Hugs.Integer
#elseif NHC>=1.08
import NHC.Integer
#else
import Core.Int
data Integer = I [Int]
integerAdd = ...
integerSub = ...
#endif

so, 'core' library hides all the differences between various compiler
brands and versions and all other libs works again uniform API. this
API shouldn't include any class implementations and any other code
that may be written in pure Haskell - just low-level functions and
basic types! map, Either and all other pure Haskell things isn't for
'core'!



and now the concrete plan of actions:

afaiu, that i call the 'core' library now is split between base,
hugsbase and NHC Prelude packages. so we should work with these three
packages as something monolithic. these 3 packages is our starting
point which should be divided into three layers - *hc-core, core and
algorithms 

first, move all the pure Haskell definitions (classes, types,
operations) out of GHC.*. in particular, all list operations, all
classes and their instances. leave in GHC.* only low-level operations
on which all other code should rely. this means a lot of work but in
return we will get the following: 1) new, compiler-independent code
for classes may be reused by other compilers, 2) low-level functions
we will define can be used to build alternative class hierarchies such
as proposed Num' 

for example, the following:

instance Num Word where
    (W# x#) + (W# y#)      = W# (x# `plusWord#` y#)

should be split into:

module GHC.Word:
wordAdd (W# x#) (W# y#)      = W# (x# `plusWord#` y#)

module Data.Word:
instance Num Word where
    (+) = wordAdd


GHC IO libraries is so complex beast that i think it's better to just
leave it as is (possibly putting to separate package) and hope that
Streams package will replace it in everyday usage

Generally speaking, things that don't use '#' should be moved out of
GHC.* and things that use '#' should be split into simplest part that
uses '#' and remainder that call first part and don't uses '#'

The complex beast is I8..Word64 support. On the one side, they are not
really compiler-dependent, at least with implementation technique
used in GHC. on the other side, it's again a lot of work to convert
current implementation into splittable one and moreover, we will need
to use different type constructors that will break compatibility with
previous GHC version:

newtype I8 = I8 {-# UNPACK #-} !Int

instance Num I8 where
  (I8 a) + (I8 b) = I8 (narrowToInt8 (a+b))

For arrays, i already provided splitted implementation in ArrayRef
library (and this whole idea is modelled after my experience of
development this library)

For GHC.List module, i don't see much problems except for boxifying
back all the Int# parameters (and using 'seq' trick to allow compiler
unboxify them itself). Then the whole module can be moved into Data.*
hierarchy

GHC.Exceptions can be seen as a ready example of using this methodology - it
just wraps '*#' functions into #-less noes and do nothing more


At the same time, hugs-base and nhc Prelude libraries should be modified
to provide compatibility with new base library. all the conditional
imports like this:

module Control.Exception
#ifdef __GLASGOW_HASKELL__
import GHC.Exception    as ExceptionBase hiding (catch)
#else __HUGS__
import Hugs.Exception   as ExceptionBase
#endif

should be moved into new 'core' modules:

module Control.Exception where
import Core.Exception
...

also, we should move into core.* code that is now used to emulate
features not available on all compilers (like the STM emulation code)


After all these modifications, we will have radically simplified GHC.*
modules (and hugsbase/nhc Prelude), core.* modules, and remaining part
of base lib. so now we will be ready to split up things

after that, ghc-core library (with GHC.* modules) can be multiplied into
several variants - for ghc 6.6, ghc 6.4 and so on, and in variants for
old GHC versions support of new features omitted. then core library
updated to be able to deal with ghc-core libs of old compilers:

module Gore.STM where
#if GHC>=6.4
import GHC.STM
#else
--emulate STM
#endif

That's all! :)



Now, when new compiler version arrives, we just need to make new
*hc-core library that exposes all features available in this version,
under the standard names, and update 'core' library to properly deal
with new version (include appropriate modules and switch to emulation
for the features not yet supported)

when we need to add to the 'core' library support of new features, we
should update existing *hc-core libraries to support it if possible
and add to 'core' code that includes appropriate modules and switches
to emulation for the compilers that can't support this feature

when we need to add support for new compiler brand, we should
implement *hc-core for this compiler and add to the 'core' library new
#ifs to include appropriate modules/emulate behavior:

#if GHC
include GHC.Int
#elseif JHC
include JHC.Int
#endif

#if GHC
include GHC.Arr
#elseif HUGS
include Hugs.Arr
#else
-- Used for JHC because it don't provides array support
newtype Arr a = Arr [(Int,a)]
...
#endif

That we need here is to clearly develop version naming scheme in order
to ensure that APIs requested will be really available in underlying
libs. but that's one more email... :)

... one thing that i want to say is that we definitely don't want to
change or remove existing APIs from 'core' library. first, these APIs
are very simple so we can't buy much omitting old ones. it's better to
add second, third, fourth API with almost the same meaning but
continue to support old ones. omitting APIs from 'core' will lead to a
mess! 'core' library is fundament of our future libraries building and
it's not wise to remove part of fundament when we live on 99'th floor :)


-- 
Best regards,
 Bulat                            mailto:Bulat.Ziganshin at gmail.com



More information about the Haskell mailing list