Difference between revisions of "The Monad.Reader/Issue2/EternalCompatibilityInTheory"

From HaskellWiki
Jump to navigation Jump to search
 
(Reformatted)
Line 1: Line 1:
= The ECT Module Versioning Protocol =
+
'''The ECT Module Versioning Protocol'''
  +
 
''by [[Sven Moritz Hallberg]] for The Monad.Reader [[Issue Two]]''
 
''by [[Sven Moritz Hallberg]] for The Monad.Reader [[Issue Two]]''
 
<br>
 
<br>
Line 5: Line 6:
   
 
'''Abstract.'''
 
'''Abstract.'''
This article proposes a protocol
+
This article proposes a protocol (to be followed by the programmer) for isolating versions of Haskell modules, eliminating a large source of bit-rot
(to be followed by the programmer)
 
for isolating versions of Haskell modules,
 
eliminating a large source of bit-rot
 
 
in depending programs.
 
in depending programs.
 
The protocol consists of few simple rules.
 
The protocol consists of few simple rules.
No language extensions are required and
+
No language extensions are required and the scheme is immediately usable.
the scheme is immediately usable.
 
 
If followed, the result is ''Eternal Compatibility in Theory''.
 
If followed, the result is ''Eternal Compatibility in Theory''.
   
Because of its simplicity,
+
Because of its simplicity, ECT is not restricted to Haskell.
ECT is not restricted to Haskell.
 
   
   
Line 22: Line 18:
   
 
Every programmer knows the value of modularity.
 
Every programmer knows the value of modularity.
In particular, our languages have got ''module systems''
+
In particular, our languages have got ''module systems'' to support it, by decoupling code.
  +
However, much of the decoupling is lost again when the flux of interfaces over time is considered.
to support it, by decoupling code.
 
However, much of the decoupling is lost again when the flux
 
of interfaces over time is considered.
 
   
As a program module evolves,
+
As a program module evolves, functions and other elements are added to, removed from, and changed in its interface.
  +
It is clear that programs importing the module (it's ''dependants'') will not be compatible with all versions.
functions and other elements are
 
  +
At least, each program is compatible with one version, the one the author originally used, and usually a few ones before and after that.
added to, removed from, and changed in
 
  +
But if a program is not continuously updated, with time, chances rise dramatically that one of it's dependencies as installed on a given host system will be incompatible.
its interface.
 
  +
Alas, the program cannot be used. This effect comprises a major source of ''bit rot''.
It is clear that programs importing the module
 
(it's ''dependants'')
 
will not be compatible with all versions.
 
At least, each program is compatible with one version,
 
the one the author originally used,
 
and usually a few ones before and after that.
 
But if a program is not continuously updated,
 
with time, chances rise dramatically
 
that one of it's dependencies as installed on a given
 
host system will be incompatible.
 
Alas, the program cannot be used.
 
This effect comprises a major source of ''bit rot''.
 
   
To avoid such a situation, the programmer importing
+
To avoid such a situation, the programmer importing a module has several options, all of which are unsatisfactory.
a module has several options, all of which are unsatisfactory.
 
 
Let a module <code><nowiki>P</nowiki></code> import <code><nowiki>A</nowiki></code>, a ''dependency''.
 
Let a module <code><nowiki>P</nowiki></code> import <code><nowiki>A</nowiki></code>, a ''dependency''.
   
1. Avoid the dependency altogether.
+
# Avoid the dependency altogether.
  +
## Duplicate the functionality. If the functionality provided by <code><nowiki>A</nowiki></code> is simple, it might be possible to just replicate it with own code. This of course somewhat counters the purpose of creating module <code><nowiki>A</nowiki></code> in the first place. On a large scale, it results in many functions being duplicated by different individuals.
 
  +
## Use standard functions instead. Instead of <code><nowiki>A</nowiki></code>, use a similar module from the standard library. Of course, the standard library does not provide everything. Also, even the standard library can become incompatible over time, though it usually takes longer.
i. Duplicate the functionality.
 
  +
# Internalize the dependency. A "local" copy of <code><nowiki>A</nowiki></code> is packaged and distributed with <code><nowiki>P</nowiki></code>. This scenario relies on the host Haskell system to prefer the bundled definition of <code><nowiki>A</nowiki></code> to any others it might have available. Because that is the usual behaviour, this strategy generally puts the programmer on the safe side but has the disadvantage of locking out bug-fixes and other (compatible) improvements to the original module <code><nowiki>A</nowiki></code>. It also increases the overall size of the package for <code><nowiki>P</nowiki></code> so is not desirable for many (or) large dependencies.
 
  +
# Continuously update <code><nowiki>P</nowiki></code> to stay compatible. While continuous updates are certainly desirable, everyone's capacity is limited and in reality, no program can stay up-to-date forever. Also, this strategy always assumes the latest version of <code><nowiki>A</nowiki></code> to be available at the host. Updating a library to the latest version runs the risk of breaking other dependants requiring the older version.
If the functionality provided by <code><nowiki>A</nowiki></code> is simple,
 
it might be possible to just replicate it
 
with own code.
 
This of course somewhat counters the purpose of
 
creating module <code><nowiki>A</nowiki></code> in the first place.
 
On a large scale, it results in many functions
 
being duplicated by different individuals.
 
 
i. Use standard functions instead.
 
 
Instead of <code><nowiki>A</nowiki></code>, use a similar module from the standard library.
 
Of course, the standard library does not provide everything.
 
Also, even the standard library can become incompatible
 
over time, though it usually takes longer.
 
 
1. Internalize the dependency.
 
 
A "local" copy of <code><nowiki>A</nowiki></code> is packaged and distributed with <code><nowiki>P</nowiki></code>.
 
This scenario relies on the host Haskell system to prefer
 
the bundled definition of <code><nowiki>A</nowiki></code> to any others it might
 
have available.
 
Because that is the usual behaviour,
 
this strategy generally puts the programmer on the safe side but
 
has the disadvantage of
 
locking out bug-fixes and other (compatible) improvements
 
to the original module <code><nowiki>A</nowiki></code>.
 
It also increases the overall size of the package for <code><nowiki>P</nowiki></code> so
 
is not desirable for many (or) large dependencies.
 
 
1. Continuously update <code><nowiki>P</nowiki></code> to stay compatible.
 
 
While continuous updates are certainly desirable,
 
everyone's capacity is limited and in reality,
 
no program can stay up-to-date forever.
 
Also, this strategy always assumes the latest version of <code><nowiki>A</nowiki></code>
 
to be available at the host.
 
Updating a library to the latest version runs the risk
 
of breaking other dependants requiring the older version.
 
   
Notice that option 1.ii is quite attractive.
+
Notice that option 1.2 is quite attractive.
The standard library on a host is often reasonably up-to-date
+
The standard library on a host is often reasonably up-to-date (though not always!) and, most importantly, widely available.
  +
As a result of these points, a large standard library is usually desired and welcomed.
(though not always!)
 
and, most importantly, widely available.
 
As a result of these points,
 
a large standard library is usually desired and welcomed.
 
 
But that also brings large issues with it.
 
But that also brings large issues with it.
A standard library should fit everyone,
+
A standard library should fit everyone, which is impossible to achieve and very hard to approximate well.
  +
Incompatible changes (improvements!) are very hard, if not impossible, to introduce in a standard library, since they will likely break many dependants, ''especially'' if the library is large.
which is impossible to achieve and
 
very hard to approximate well.
 
Incompatible changes (improvements!)
 
are very hard, if not impossible, to introduce
 
in a standard library,
 
since they will likely break many dependants,
 
''especially'' if the library is large.
 
   
So, in summary, we actually see two orthogonal problems
+
So, in summary, we actually see two orthogonal problems when importing an external module:
when importing an external module:
 
   
1. ''Availability:'' Will <code><nowiki>A</nowiki></code> be available at the host site?
+
# ''Availability:'' Will <code><nowiki>A</nowiki></code> be available at the host site?
  +
# ''Compatibility:'' Will the version of <code><nowiki>A</nowiki></code> available at the host be compatible with <code><nowiki>P</nowiki></code>?
   
  +
Solving issue 1, the easy and wide-spread distribution of Haskell libraries,
1. ''Compatibility:'' Will the version of <code><nowiki>A</nowiki></code> available
 
at the host be compatible with <code><nowiki>P</nowiki></code>?
 
 
Solving issue 1,
 
the easy and wide-spread distribution of Haskell libraries,
 
 
is the combined goal of the [http://haskell.org/cabal/ Cabal] and
 
is the combined goal of the [http://haskell.org/cabal/ Cabal] and
 
[http://haskell.org/cabal/hackage/ Hackage] projects.
 
[http://haskell.org/cabal/hackage/ Hackage] projects.
Note, however, that Cabal does not solve issue 2,
+
Note, however, that Cabal does not solve issue 2, as will be discussed below.
as will be discussed below.
 
   
As mentioned in the abstract,
+
As mentioned in the abstract, the contribution of this article is a protocol,
  +
dubbed ''ECT'', by which it is possible for the programmer of the ''dependency'', <code><nowiki>A</nowiki></code>, to solve, for his users, and with respect to <code><nowiki>A</nowiki></code>,
the contribution of this article is a protocol,
 
dubbed ''ECT'',
 
by which it is possible for the programmer of the ''dependency'',
 
<code><nowiki>A</nowiki></code>,
 
to solve, for his users, and with respect to <code><nowiki>A</nowiki></code>,
 
 
issue 2.
 
issue 2.
 
I.e. the complete elimination of concerns that <code><nowiki>P</nowiki></code>
 
I.e. the complete elimination of concerns that <code><nowiki>P</nowiki></code>
Line 137: Line 64:
 
Recall the issue at hand.
 
Recall the issue at hand.
 
Interface changes break dependants.
 
Interface changes break dependants.
  +
As elaborated above, the importer can do nothing against that, and in fact, neither can the dependency author.
As elaborated above,
 
the importer can do nothing against that,
 
and in fact, neither can the dependency author.
 
 
''Interface changes break dependants.''
 
''Interface changes break dependants.''
   
Thence the solution is painfully obvious:
+
Thence the solution is painfully obvious: ''Stop interface changes.''
''Stop interface changes.''
 
   
In ECT, the module <code><nowiki>A</nowiki></code> is split into separate submodules
+
In ECT, the module <code><nowiki>A</nowiki></code> is split into separate submodules <code><nowiki>A_1</nowiki></code>, ..., <code><nowiki>A_</nowiki></code>''n'' at version boundaries.
  +
The numbers are in chronological order,where ''n'' denotes the latest version and contains the current implementation.
<code><nowiki>A_1</nowiki></code>, ..., <code><nowiki>A_</nowiki></code>''n''
 
  +
Whenever the interface changes in any way, the number is incremented, creating a new module.
at version boundaries.
 
The numbers are in chronological order,
 
where ''n'' denotes the latest version and
 
contains the current implementation.
 
Whenever the interface changes in any way,
 
the number is incremented, creating a new module.
 
 
The original module <code><nowiki>A</nowiki></code> is retained as a short-hand
 
The original module <code><nowiki>A</nowiki></code> is retained as a short-hand
 
(think re-export) for "the latest version".
 
(think re-export) for "the latest version".
   
 
If the importer <code><nowiki>P</nowiki></code> uses one of the specific,
 
If the importer <code><nowiki>P</nowiki></code> uses one of the specific,
  +
numbered modules, compatibility is guaranteed, since (by assumption) these modules have ''static'' interfaces.
numbered modules,
 
  +
But because <code><nowiki>A</nowiki></code> is still present, using the number is not required.
compatibility is guaranteed,
 
  +
It would, after all, be quite tiresome to look up a version number for each module one wants to import.
since (by assumption) these modules have ''static'' interfaces.
 
But because <code><nowiki>A</nowiki></code> is still present,
 
using the number is not required.
 
It would, after all, be quite tiresome to look up
 
a version number for each module one wants to import.
 
   
In fact, <code><nowiki>A</nowiki></code> can be used to
+
In fact, <code><nowiki>A</nowiki></code> can be used to notify the programmer of the number to use via a deprecation pragma.
notify the programmer of the number to use
 
via a deprecation pragma.
 
 
<code><nowiki>A</nowiki></code> would look like the following.
 
<code><nowiki>A</nowiki></code> would look like the following.
   
  +
module A
<pre><nowiki>
 
  +
{-# DEPRECATED "This is actually A_5." #-}
module A
 
  +
(module A_5)
{-# DEPRECATED "This is actually A_5." #-}
 
(module A_5)
+
where
  +
import A_5
where
 
import A_5
 
</nowiki></pre>
 
   
So the user, after first blindly importing module <code><nowiki>A</nowiki></code>,
+
So the user, after first blindly importing module <code><nowiki>A</nowiki></code>, can conveniently add the number after compilation, then knowing that <code><nowiki>A_5</nowiki></code> is the actual version the program was written against.
can conveniently add the number after compilation,
 
then knowing that <code><nowiki>A_5</nowiki></code>
 
is the actual version the program was written against.
 
   
 
The last remark above illustrates an important point.
 
The last remark above illustrates an important point.
In any scheme to ensure compatibility of modules with their imports,
+
In any scheme to ensure compatibility of modules with their imports, there ''must'' be, for each module for each import, a specification of which interface version the module is referring to.
  +
And precisely that is what importing one of the numbered modules means.
there ''must'' be, for each module for each import,
 
a specification of which interface version the module is referring to.
 
And precisely that is what importing one of the numbered
 
modules means.
 
   
 
''"I'm importing module <code><nowiki>A</nowiki></code>, referring to version 5 of its interface."''
 
''"I'm importing module <code><nowiki>A</nowiki></code>, referring to version 5 of its interface."''
Line 195: Line 100:
 
== The Protocol ==
 
== The Protocol ==
   
When first thinking through what my idea of module version
+
When first thinking through what my idea of module version management would be,
  +
I found I had only very vague ideas of what I actually meant by "module", "version", etc..
management would be,
 
  +
So let me start the formal description of the ECT versioning protocol by introducing some '''Definitions.'''
I found I had only very vague ideas of what I actually meant by
 
"module", "version", etc..
 
So let me start the formal description of the ECT versioning protocol
 
by introducing some '''Definitions.'''
 
 
'''Module definition.'''
 
A piece of code defining a Haskell module, as described in the Haskell Report, is called a ''module definition''. Each module definition has an associated ''module identifier'', e.g. <code><nowiki>System.Console.Cmdline.Pesco</nowiki></code> (which may be implicit in the case of Main).
 
 
'''Revision.'''
 
Let ''x'' be a module definition and ''X'' be its associated module identifier. We call ''x'' a ''revision'' of ''X''.
 
 
'''Module.'''
 
In plain English, the module named ''X'' is defined as the set of all revisions of ''X''.
 
More formally, let ''x'' be a revision of ''X'', ''y'' be a revision of ''Y'', and call ''x'' and ''y'' ''related'' iff ''X=Y''. We then call the equivalence classes of this relation ''modules'' and their associated identifiers - unique within the class - the modules' ''names''.
 
 
'''Interface.'''
 
The set of entities (functions, classes, instances, etc.) exported from a module is called its ''interface''.
 
 
'''Compatibility.'''
 
Two revisions are called ''(mutually) compatible'', if their interfaces are equal and all exported entities have the same semantics.
 
 
'''Version.'''
 
The equivalence classes of the compatibility relation, restricted to a single module ''X'', are called ''versions'' of ''X''.
 
   
  +
; '''Module definition.''' : A piece of code defining a Haskell module, as described in the Haskell Report, is called a ''module definition''. Each module definition has an associated ''module identifier'', e.g. <code><nowiki>System.Console.Cmdline.Pesco</nowiki></code> (which may be implicit in the case of Main).
  +
; '''Revision.''' : Let ''x'' be a module definition and ''X'' be its associated module identifier. We call ''x'' a ''revision'' of ''X''.
  +
; '''Module.''' : In plain English, the module named ''X'' is defined as the set of all revisions of ''X''. More formally, let ''x'' be a revision of ''X'', ''y'' be a revision of ''Y'', and call ''x'' and ''y'' ''related'' iff ''X=Y''. We then call the equivalence classes of this relation ''modules'' and their associated identifiers - unique within the class - the modules' ''names''.
  +
; '''Interface.''' : The set of entities (functions, classes, instances, etc.) exported from a module is called its ''interface''.
  +
; '''Compatibility.''' : Two revisions are called ''(mutually) compatible'', if their interfaces are equal and all exported entities have the same semantics.
  +
; '''Version.''' : The equivalence classes of the compatibility relation, restricted to a single module ''X'', are called ''versions'' of ''X''.
   
 
Now, say you are implementing a module <code><nowiki>A</nowiki></code>.
 
Now, say you are implementing a module <code><nowiki>A</nowiki></code>.
To fully conform to the protocol with respect to <code><nowiki>A</nowiki></code>,
+
To fully conform to the protocol with respect to <code><nowiki>A</nowiki></code>, simply ensure the following invariants.
simply ensure the following invariants.
 
   
 
-----
 
-----
  +
<blockquote>
'''
 
 
Let all versions of <code><nowiki>A</nowiki></code> be numbered
 
Let all versions of <code><nowiki>A</nowiki></code> be numbered
 
from 1 to ''n''.
 
from 1 to ''n''.
   
1. For each ''i''=1,...,n there exists a module <code><nowiki>A_</nowiki></code>''i''
+
#For each ''i''=1,...,''n'' there exists a module <code><nowiki>A_</nowiki></code>''i'' whose only version is equal to version ''i'' of <code><nowiki>A</nowiki></code>.
whose only version is equal to version ''i'' of <code><nowiki>A</nowiki></code>.
+
# Version ''n'' contains the latest revision of <code><nowiki>A</nowiki></code>.
  +
# Module <code><nowiki>A</nowiki></code> is deprecated by means of a DEPRECATED pragma, with a message that <code><nowiki>A_</nowiki></code>''n'' should be imported instead.
 
  +
</blockquote>
1. Version ''n'' contains the latest revision of <code><nowiki>A</nowiki></code>.
 
 
1. Module <code><nowiki>A</nowiki></code> is deprecated by means of a DEPRECATED pragma,
 
with a message that <code><nowiki>A_</nowiki></code>''n'' should be imported instead.
 
'''
 
 
-----
 
-----
   
 
So for example, say you are writing <code><nowiki>A</nowiki></code> from scratch.
 
So for example, say you are writing <code><nowiki>A</nowiki></code> from scratch.
Starting with version number 1,
+
Starting with version number 1, implement the first revision as <code><nowiki>A_1</nowiki></code>.
implement the first revision as <code><nowiki>A_1</nowiki></code>.
 
   
  +
module A_1 where
<pre><nowiki>
 
  +
foo = "foo"
module A_1 where
 
foo = "foo"
 
</nowiki></pre>
 
   
Define module <code><nowiki>A</nowiki></code> to re-export <code><nowiki>A_1</nowiki></code>
+
Define module <code><nowiki>A</nowiki></code> to re-export <code><nowiki>A_1</nowiki></code> and include a DEPRECATED pragma that at the same time
and include a DEPRECATED pragma that at the same time
 
   
1. reminds the user to explicify her import and
+
# reminds the user to explicify her import and
2. informs her of the current version number.
+
# informs her of the current version number.
   
  +
module A
<pre><nowiki>
 
  +
{-# DEPRECATED "Import A_1." #-}
module A
 
{-# DEPRECATED "Import A_1." #-}
+
(module A_1)
(module A_1)
+
where
  +
import A_1
where
 
import A_1
 
</nowiki></pre>
 
   
When you modify the definition of <code><nowiki>A_1</nowiki></code>, consider any
+
When you modify the definition of <code><nowiki>A_1</nowiki></code>, consider any changes to its interface, including type signatures (of course)
changes to its interface,
 
including type signatures (of course)
 
 
and semantics.
 
and semantics.
If ''anything'' has changed, i.e. added, removed, or modified,
+
If ''anything'' has changed, i.e. added, removed, or modified, a new version has formed, so increment the version number and move this revision to a new module, i.e. <code><nowiki>A_2</nowiki></code>.
  +
Change every occurrence of <code><nowiki>A_1</nowiki></code> in the definition of <code><nowiki>A</nowiki></code> to <code><nowiki>A_2</nowiki></code>.
a new version has formed, so
 
  +
Either re-implement <code><nowiki>A_1</nowiki></code> in terms of <code><nowiki>A_2</nowiki></code> or keep the old revision in place.
increment the version number and
 
move this revision to a new module, i.e. <code><nowiki>A_2</nowiki></code>.
 
Change every occurrence of <code><nowiki>A_1</nowiki></code> in the definition of <code><nowiki>A</nowiki></code>
 
to <code><nowiki>A_2</nowiki></code>.
 
Either re-implement <code><nowiki>A_1</nowiki></code> in terms of <code><nowiki>A_2</nowiki></code>
 
or keep the old revision in place.
 
   
If there were no changes whatsoever,
+
If there were no changes whatsoever, keep the new definition a revision of <code><nowiki>A_1</nowiki></code>.
keep the new definition a revision of <code><nowiki>A_1</nowiki></code>.
 
   
 
Repeat the above for every new revision.
 
Repeat the above for every new revision.
   
 
Ideally, the versions of <code><nowiki>A</nowiki></code> will form a chain
 
Ideally, the versions of <code><nowiki>A</nowiki></code> will form a chain
of adapters and re-exports,
+
of adapters and re-exports, where every version is implemented in terms of
  +
a later one, <code><nowiki>A_</nowiki></code>''n'' containing the actual codebase.
where every version is implemented in terms of
 
  +
But of course it is also possible to completely leave some version's old implementation in place, if the next version differs very significantly,
a later one,
 
<code><nowiki>A_</nowiki></code>''n'' containing the actual codebase.
 
But of course it is also possible to completely
 
leave some version's old implementation in place,
 
if the next version differs very significantly,
 
 
for instance.
 
for instance.
   
Now consider the case of switching an existing module,
+
Now consider the case of switching an existing module, which has already gone through several versions, to ECT.
  +
To attain "full compliance", as per the above definition, every old version of the module would have to be identified and split into its own module.
which has already gone through several versions,
 
to ECT.
 
To attain "full compliance",
 
as per the above definition,
 
every old version of the module would have to
 
be identified and split into its own module.
 
   
  +
Certainly, it is desirable to allow weaker invariants than given above, to allow, for instance all the pre-ECT versions of a module <code><nowiki>A</nowiki></code> to be grouped in <code><nowiki>A_0</nowiki></code> or similar. Or, a module owner might want to keep <code><nowiki>A</nowiki></code> locked on the last pre-ECT version to remain backwards-compatible.
Certainly, it is desirable to allow weaker invariants
 
  +
Then <code><nowiki>A_0</nowiki></code> could take the role of the "latest version" short-cut module.
than given above, to allow, for instance
 
all the pre-ECT versions of a module <code><nowiki>A</nowiki></code> to be grouped
 
in <code><nowiki>A_0</nowiki></code> or similar.
 
Or, a module owner might want to keep <code><nowiki>A</nowiki></code> locked on
 
the last pre-ECT version to remain backwards-compatible.
 
Then <code><nowiki>A_0</nowiki></code> could take the role of the "latest version"
 
short-cut module.
 
   
In the end, the only property essential to the compatibility
+
In the end, the only property essential to the compatibility guarantees to be facilitated by ECT is the following.
guarantees to be facilitated by ECT is the following.
 
   
 
-----
 
-----
  +
'''
 
 
Let all versions of <code><nowiki>A</nowiki></code> be numbered from 1 to ''n''.
 
Let all versions of <code><nowiki>A</nowiki></code> be numbered from 1 to ''n''.
   
For each ''i''=1,...,n, if there exists a module <code><nowiki>A_</nowiki></code>''i'',
+
For each ''i''=1,...,n, if there exists a module <code><nowiki>A_</nowiki></code>''i'', it has only one version which is equal to version ''i'' of <code><nowiki>A</nowiki></code>.
  +
it has only one version which is equal to version ''i'' of <code><nowiki>A</nowiki></code>.
 
'''
 
 
-----
 
-----
   
Line 327: Line 180:
 
=== Bugs, Behaviour, Semantics ===
 
=== Bugs, Behaviour, Semantics ===
   
The above explanations have ignored the very likely
+
The above explanations have ignored the very likely possibility of program bugs.
  +
The common definition of a ''bug'' is a programmer error resulting in undesired behaviour of the program.
possibility of program bugs.
 
The common definition of a ''bug'' is a programmer error
 
resulting in undesired behaviour of the program.
 
   
  +
The question - in light of the earlier definition of compatibility - is, whether this unintended behaviour can or should be considered as part of the program's semantics.
The question
 
  +
If a bug in some revision of a module is part of its semantics, the bug-fixed revision would be considered part of a new version and prompt the creation of a separate module.
- in light of the earlier definition of compatibility -
 
is, whether this unintended behaviour
 
can or should be considered as part of the program's semantics.
 
If a bug in some revision of a module is part of its semantics,
 
the bug-fixed revision would be considered part of a new version
 
and prompt the creation of a separate module.
 
 
Clearly, this is usually undesirable.
 
Clearly, this is usually undesirable.
   
To answer this question in detail would largely be a philosophical
+
To answer this question in detail would largely be a philosophical undertaking on ''intent'', ''meaning'', ''collective vs. individual perception'' etc..
undertaking on ''intent'', ''meaning'',
 
''collective vs. individual perception'' etc..
 
 
So, for this discussion, let the following guidelines suffice.
 
So, for this discussion, let the following guidelines suffice.
   
An actual problem only arises out of a bug if some user
+
An actual problem only arises out of a bug if some user mistakes the unintended behaviour for intended semantics and uses it, effectively creating a ''dependency on the bug''.
  +
Then, if the bug is fixed in a later revision, that user's program will break.
mistakes the unintended behaviour for intended semantics
 
and uses it, effectively creating a ''dependency on the bug''.
 
Then, if the bug is fixed in a later revision,
 
that user's program will break.
 
   
Luckily, it will often be clear that a bug is not part of the
+
Luckily, it will often be clear that a bug is not part of the intended semantics of an interface.
  +
If it is indeed safe to assume that no users will accidentally mistake the bug for a feature, it is also safe to leave the version numbering as it is.
intended semantics of an interface.
 
If it is indeed safe to assume that no users will accidentally
 
mistake the bug for a feature,
 
it is also safe to leave the version numbering as it is.
 
   
If, however, it is unclear whether any importers could
+
If, however, it is unclear whether any importers could come to rely on the erroneous behaviour, there is also a simple solution: split the buggy module.
  +
So far, we have assumed module version numbers to be integers.
come to rely on the erroneous behaviour, there is
 
  +
When splitting a module due to a bug-fix, also "split" it's version number into two rationals "close" to it.
also a simple solution: split the buggy module.
 
So far, we have assumed module version numbers
 
to be integers.
 
When splitting a module due to a bug-fix,
 
also "split" it's version number into two rationals
 
"close" to it.
 
 
Let me explain by means of an example.
 
Let me explain by means of an example.
If a dubious bug is found in version 4,
+
If a dubious bug is found in version 4, create two new modules with versions
create two new modules with versions
 
 
3.9 and 4.1 (add/subtract 0.1).
 
3.9 and 4.1 (add/subtract 0.1).
Since the decimal point cannot be used in the module name,
+
Since the decimal point cannot be used in the module name, substitute an underscore yielding <code><nowiki>A_3_9</nowiki></code> and <code><nowiki>A_4_1</nowiki></code>.
substitute an underscore yielding <code><nowiki>A_3_9</nowiki></code> and <code><nowiki>A_4_1</nowiki></code>.
 
   
Put the original module definition, with the bug,
+
Put the original module definition, with the bug, into the lower module, <code><nowiki>A_3_9</nowiki></code>.
  +
This makes it clear that the buggy version is semantically "before" what version 4 should have been.
into the lower module, <code><nowiki>A_3_9</nowiki></code>.
 
  +
Likewise, the bug-fixed version goes into the higher module, <code><nowiki>A_4_1</nowiki></code>.
This makes it clear that the buggy version is semantically
 
"before" what version 4 should have been.
 
Likewise, the bug-fixed version goes into the higher module,
 
<code><nowiki>A_4_1</nowiki></code>.
 
   
Now deprecate the old module <code><nowiki>A_4</nowiki></code>
+
Now deprecate the old module <code><nowiki>A_4</nowiki></code> with a warning about the bug and let it re-export <code><nowiki>A_3_9</nowiki></code> to preserve compatibility.
with a warning about the bug and
 
let it re-export <code><nowiki>A_3_9</nowiki></code> to preserve compatibility.
 
   
It's good the rationals are dense, for thus
+
It's good the rationals are dense, for thus we can split each module an arbitrary number of times.
  +
For instance, if another bug was discovered later in <code><nowiki>A_3_9</nowiki></code>, split its rational version number 3.9 again at the last digit:
we can split each module an arbitrary number of times.
 
For instance, if another bug was discovered
 
later in <code><nowiki>A_3_9</nowiki></code>, split its rational version
 
number 3.9 again at the last digit:
 
 
3.9 is moved to 3.89, 3.9 itself becomes a deprecated re-export of 3.89, and
 
3.9 is moved to 3.89, 3.9 itself becomes a deprecated re-export of 3.89, and
 
3.91 contains the bug-fix.
 
3.91 contains the bug-fix.
   
Please note that this fractional numbering has no
+
Please note that this fractional numbering has no relation to any second- or third-level version components in say, the package containing these modules.
relation to any second- or third-level version
 
components in say,
 
the package containing these modules.
 
 
For instance, if <code><nowiki>A</nowiki></code> is released in a corresponding
 
For instance, if <code><nowiki>A</nowiki></code> is released in a corresponding
  +
package <code><nowiki>a</nowiki></code>, <code><nowiki>a-4.2</nowiki></code> could refer to the second revision of the fourth release, consisting, using the example from above, of <code><nowiki>A</nowiki></code> and <code><nowiki>A_1</nowiki></code> through <code><nowiki>A_4</nowiki></code>, as well as <code><nowiki>A_3_9</nowiki></code> and <code><nowiki>A_4_1</nowiki></code> (first revision), and <code><nowiki>A_3_89</nowiki></code> and <code><nowiki>A_3_91</nowiki></code> (second revision).
package <code><nowiki>a</nowiki></code>,
 
<code><nowiki>a-4.2</nowiki></code> could refer to the second revision of
 
the fourth release, consisting,
 
using the example from above,
 
of <code><nowiki>A</nowiki></code> and <code><nowiki>A_1</nowiki></code> through <code><nowiki>A_4</nowiki></code>, as well as
 
<code><nowiki>A_3_9</nowiki></code> and
 
<code><nowiki>A_4_1</nowiki></code> (first revision), and <code><nowiki>A_3_89</nowiki></code> and
 
<code><nowiki>A_3_91</nowiki></code> (second revision).
 
   
Finally, let me give an example for an appropriate bug
+
Finally, let me give an example for an appropriate bug deprecation message.
deprecation message.
 
   
  +
{-# DEPRECATED
<pre><nowiki>
 
  +
"A_4 is buggy: The foo function returns "foo" \
{-# DEPRECATED
 
"A_4 is buggy: The foo function returns "foo" \
+
\which should have been "Foo".\n\
\which should have been "Foo".\n\
+
\Import A_4_1 for the correct behaviour or \
  +
\A_3_9 if you rely on the bug."
\Import A_4_1 for the correct behaviour or \
 
  +
#-}
\A_3_9 if you rely on the bug."
 
#-}
 
</nowiki></pre>
 
   
 
Notice that the user is informed of
 
Notice that the user is informed of
  +
1. how to determine whether she is relying on the bug and
 
  +
# how to determine whether she is relying on the bug and
2. which module to import in each case.
 
  +
# which module to import in each case.
   
   
Line 426: Line 237:
   
 
Consider the following likely situation.
 
Consider the following likely situation.
You are the author of module <code><nowiki>A</nowiki></code> which
+
You are the author of module <code><nowiki>A</nowiki></code> which is currently at version 1 and you have a bunch of unit tests for <code><nowiki>A_1</nowiki></code>.
  +
You create a new version, <code><nowiki>A_2</nowiki></code>, which is backwards-compatible, i.e. the interface of <code><nowiki>A_1</nowiki></code> is a subset of <code><nowiki>A_2</nowiki></code>'s and the semantics of this subset are the same for both modules.
is currently at version 1 and
 
you have a bunch of unit tests for <code><nowiki>A_1</nowiki></code>.
 
You create a new version, <code><nowiki>A_2</nowiki></code>,
 
which is backwards-compatible,
 
i.e. the interface of <code><nowiki>A_1</nowiki></code> is a subset of <code><nowiki>A_2</nowiki></code>'s
 
and the semantics of this subset are the same for both modules.
 
 
In this case, any tests for <code><nowiki>A_1</nowiki></code> are also good
 
In this case, any tests for <code><nowiki>A_1</nowiki></code> are also good
for <code><nowiki>A_2</nowiki></code>, and, if you replace <code><nowiki>A_1</nowiki></code> with a
+
for <code><nowiki>A_2</nowiki></code>, and, if you replace <code><nowiki>A_1</nowiki></code> with a partial re-export of <code><nowiki>A_2</nowiki></code>, it would be sufficient to run only the <code><nowiki>A_2</nowiki></code> tests in order to test both versions.
partial re-export of <code><nowiki>A_2</nowiki></code>,
+
Also, if any tests for the <code><nowiki>A_1</nowiki></code> subset of <code><nowiki>A_2</nowiki></code> were added to <code><nowiki>A_2</nowiki></code>,
it would be sufficient to run only the <code><nowiki>A_2</nowiki></code> tests
 
in order to test both versions.
 
Also, if any tests
 
for the <code><nowiki>A_1</nowiki></code> subset of <code><nowiki>A_2</nowiki></code> were added to <code><nowiki>A_2</nowiki></code>,
 
 
they would automatically apply to <code><nowiki>A_1</nowiki></code> as well.
 
they would automatically apply to <code><nowiki>A_1</nowiki></code> as well.
   
In general, tests for functions that are re-exports
+
In general, tests for functions that are re-exports can probably move up to the actual implementation safely.
  +
In the hypothetical case that the re-export is changed later, it just has to be remembered to pull down copies of the tests again.
can probably move up to the actual implementation safely.
 
In the hypothetical case that the re-export is changed
 
later, it just has to be remembered to pull down copies of the
 
tests again.�
 
   
 
For instance, if module <code><nowiki>A_1</nowiki></code> is defined as
 
For instance, if module <code><nowiki>A_1</nowiki></code> is defined as
   
  +
module A_1
<pre><nowiki>
 
  +
(foo, bar)
module A_1
 
(foo, bar)
+
where
  +
import A_2 (foo)
where
 
  +
bar = "barr"
import A_2 (foo)
 
bar = "barr"
 
</nowiki></pre>
 
   
 
and <code><nowiki>A_2</nowiki></code> contains
 
and <code><nowiki>A_2</nowiki></code> contains
   
  +
module A_2
<pre><nowiki>
 
  +
(foo, bar, baz)
module A_2
 
(foo, bar, baz)
+
where
  +
foo = "foo"
where
 
foo = "foo"
+
bar = "bar"
bar = "bar"
+
baz = "baz"
baz = "baz"
 
</nowiki></pre>
 
   
the tests for <code><nowiki>A_1</nowiki></code> would consist only of tests for <code><nowiki>bar</nowiki></code>
+
the tests for <code><nowiki>A_1</nowiki></code> would consist only of tests for <code><nowiki>bar</nowiki></code> while the tests for <code><nowiki>A_2</nowiki></code> would contain tests for all three functions, <code><nowiki>foo</nowiki></code>, <code><nowiki>bar</nowiki></code>, <code><nowiki>baz</nowiki></code>.
while the tests for <code><nowiki>A_2</nowiki></code> would contain tests for all
 
three functions, <code><nowiki>foo</nowiki></code>, <code><nowiki>bar</nowiki></code>, <code><nowiki>baz</nowiki></code>.
 
   
   
Line 477: Line 270:
   
 
Similar remarks as for tests apply to documentation.
 
Similar remarks as for tests apply to documentation.
If you use an autogenerator like Haddock,
+
If you use an autogenerator like Haddock, everything will be fine automatically:
everything will be fine automatically:
 
 
Re-exports inherit their documentation from the original.
 
Re-exports inherit their documentation from the original.
If you maintain documentation separately,
+
If you maintain documentation separately, the documentation must be split by hand.
  +
But keep in mind that nothing forces you to expend more work than you would without following ECT.
the documentation must be split by hand.
 
But keep in mind that nothing forces you to
 
expend more work than you would without
 
following ECT.
 
   
I.e. please consider the ECT protocol even if you
+
I.e. please consider the ECT protocol even if you do not want to spend time on the old documentation (or tests, for that matter)!
do not want to spend time on the old documentation
 
(or tests, for that matter)!
 
   
This concludes the discussion of the
+
This concludes the discussion of the developer's side of the protocol.
developer's side of the protocol.
 
   
   
 
=== On the User's Side ===
 
=== On the User's Side ===
   
Luckily, no extra rules at all apply to
+
Luckily, no extra rules at all apply to the user of an ECT module.
the user of an ECT module.
 
 
The user does not need to know about ECT at all.
 
The user does not need to know about ECT at all.
 
She can just import the unnumbered module as ever.
 
She can just import the unnumbered module as ever.
Then, if she notices the deprecation message,
+
Then, if she notices the deprecation message, and hopefully follows it, it is just all the better.
and hopefully follows it,
 
it is just all the better.
 
   
Of course, everyone will quickly get the hang of it,
+
Of course, everyone will quickly get the hang of it, so in general, the procedure is as follows:
so in general, the procedure is as follows:
 
   
1. Import the unnumbered module.
+
# Import the unnumbered module.
2. Compile.
+
# Compile.
3. Observe the deprecation warnings and change all
+
# Observe the deprecation warnings and change all imports to include the given version number.
  +
# ''Relax.'' You are safe. ;)
imports to include the given version number.
 
4. ''Relax.'' You are safe. ;)
 
   
   
Line 517: Line 298:
   
 
The Cabal also includes provisions for versioning and dependency tracking.
 
The Cabal also includes provisions for versioning and dependency tracking.
Keep in mind, however, that it is concerned with ''packages''
+
Keep in mind, however, that it is concerned with ''packages'' instead of single modules.
  +
The mechanism for dependency tracking consists of listing the dependencies (packages) in the package description, along with "version annotations", for instance <code><nowiki>foo > 1.2</nowiki></code> would mean "depends on package foo, any version later than 1.2 is compatible".
instead of single modules.
 
The mechanism for dependency tracking consists of listing
 
the dependencies (packages) in the package description,
 
along with "version annotations", for instance <code><nowiki>foo > 1.2</nowiki></code>
 
would mean "depends on package foo, any version later than 1.2 is compatible".
 
   
  +
I shall demonstrate that the ECT protocol does not compete with Cabal's functionality, but, together with what the Cabal and Hackage offer, forms a necessary part of a very valuable whole, i.e. the solution to both problems presented in the introduction, availability and compatibility.
I shall demonstrate that the ECT protocol does not
 
compete with Cabal's functionality,
 
but, together with what the Cabal and Hackage offer,
 
forms a necessary part of a very valuable whole,
 
i.e. the solution to both problems presented in the introduction,
 
availability and compatibility.
 
   
Let me first argue that Cabal (at least in its current form)
+
Let me first argue that Cabal (at least in its current form) is not suited to solve the problem of dependency tracking, i.e. compatibility.
  +
I will contrast that with the alternate scenario, ECT ensuring compatibility, and conclude that Cabal's dependency system in fact fills an important
is not suited to solve the problem of dependency tracking,
 
i.e. compatibility.
 
I will contrast that with the alternate scenario,
 
ECT ensuring compatibility, and conclude that Cabal's
 
dependency system in fact fills an important
 
 
need in that case.
 
need in that case.
   
 
=== Could Cabal Be Used Instead? ===
 
=== Could Cabal Be Used Instead? ===
## idea: dependency-tracking at the package level
+
idea: dependency-tracking at the package level<br/>
## problem: unknown when compatibility ends, e.g. 1.2 <= foo < 2.0
+
problem: unknown when compatibility ends, e.g. 1.2 <= foo < 2.0<br/>
## compatibility information is available only at the dependency!
+
compatibility information is available only at the dependency!<br/>
## importer knows exactly one version that is compatible.
+
importer knows exactly one version that is compatible.
   
This scenario implies dependency-tracking at the level
+
This scenario implies dependency-tracking at the level of packages.
  +
As already described, a Cabal package can list a number of other packages
of packages.
 
As already described,
 
a Cabal package can list a number of other packages
 
 
as dependencies.
 
as dependencies.
  +
However, there is a problem: A package author may know the ''first'' compatible revision of a dependency, but he has no way to know the last.
However, there is a problem:
 
  +
For example, if I depend on a package <code><nowiki>foo</nowiki></code> and know that <code><nowiki>foo-1.2</nowiki></code> is compatible, I can specify <code><nowiki>foo >= 1.2</nowiki></code> as a dependency.
A package author may know the ''first'' compatible revision of a
 
  +
If however, say, <code><nowiki>foo-2.0</nowiki></code> turns out not to be backwards-compatible, I should have written something like <code><nowiki>1.2 <= foo < 2.0</nowiki></code>, which I could not have known at the time.
dependency, but he has no way to know the last. For example,
 
if I depend on a package <code><nowiki>foo</nowiki></code> and know that <code><nowiki>foo-1.2</nowiki></code> is compatible,
 
I can specify <code><nowiki>foo >= 1.2</nowiki></code> as a dependency.
 
If however, say, <code><nowiki>foo-2.0</nowiki></code> turns out not to be backwards-compatible,
 
I should have written something like <code><nowiki>1.2 <= foo < 2.0</nowiki></code>,
 
which I could not have known at the time.
 
   
In general, the compatibility information is available on the
+
In general, the compatibility information is available on the side of the dependency, but not at the site of import!
  +
The importer knows a priori only one version that is compatible, and that is the one he used.
side of the dependency, but not at the site of import!
 
The importer knows a priori only one version that is compatible,
 
and that is the one he used.
 
   
A possible solution would be to establish a convention
+
A possible solution would be to establish a convention that Cabal packages are assumed backwards-compatible until the next major version, or similar.
that Cabal packages are assumed backwards-compatible
 
until the next major version, or similar.
 
 
This scenario gives rise to a number of other problems, though.
 
This scenario gives rise to a number of other problems, though.
   
1. It forces an interpretation onto package versions which
+
# It forces an interpretation onto package versions which might not be accurate. Many software packages are numbered according to largely arbitrary schemes.
  +
# Assuming the convention in question, consider the following consequence. It would be impossible for a single package to use two different major versions of the same dependency. A standard example would be a program to convert data from the old version into something suitable for the new.
might not be accurate.
 
  +
# Still assuming the convention in question, There are the following two alternative consequences:
Many software packages are numbered according to
 
  +
## New major versions are released infrequently, slowing down interface evolution. For example, the programmer would hesitate to change an oddly-named function. Contrast this quickly with the ECT alternative: Freely change the function name (or anything else!) in the next revision, give it a new number, replace the old version by a trivial adapter, and you are done without having broken any of your dependants.
largely arbitrary schemes.
 
  +
## New major versions are released frequently, to facilitate rapid interface evolution. This possible world would quickly become populated with many (to be assumed) incompatible versions of a package, many of which would probably find their way onto a host system. In the end, a host would for every package contain many different versions, as required for different dependants.
   
  +
In summary, it should be clear that relying on Cabal to provide any kind of compatibility guarantees would be unpleasant at least.
2. Assuming the convention in question,
 
consider the following consequence.
 
It would be impossible for a single package
 
to use two different major versions
 
of the same dependency.
 
A standard example would be a program
 
to convert data from the old version into
 
something suitable for the new.
 
 
3. Still assuming the convention in question,
 
There are the following two alternative consequences:
 
 
i. New major versions are released infrequently,
 
slowing down interface evolution.
 
For example, the programmer would hesitate
 
to change an oddly-named function.
 
Contrast this quickly with the ECT alternative:
 
Freely change the function name
 
(or anything else!) in the next revision,
 
give it a new number,
 
replace the old version by a trivial adapter,
 
and you are done without having broken
 
any of your dependants.
 
 
i. New major versions are released frequently,
 
to facilitate rapid interface evolution.
 
This possible world would quickly become
 
populated with many (to be assumed) incompatible
 
versions of a package,
 
many of which would probably find their way
 
onto a host system.
 
In the end, a host would for every package
 
contain many different versions,
 
as required for different dependants.
 
 
In summary, it should be clear that relying on Cabal to provide
 
any kind of compatibility guarantees would be unpleasant at least.
 
 
Luckily, there is a perfect alternative.
 
Luckily, there is a perfect alternative.
   
 
=== Cabal Provides, ECT Guarantees ===
 
=== Cabal Provides, ECT Guarantees ===
## but: cabal versions establish chronological ordering.
 
## => dependency-tracking with ECT, cabal to ensure availability
 
## compatibility info encoded by dependency through re-exports and adapters
 
## import specifies the one version known to be compatible
 
## and the point in ''time'' (i.e. package version!) it is available!
 
   
  +
but: cabal versions establish chronological ordering.<br/>
There is one thing that ''is'' safe to assume about package
 
  +
=> dependency-tracking with ECT, cabal to ensure availability<br/>
version numbers:
 
  +
compatibility info encoded by dependency through re-exports and adapters<br/>
  +
import specifies the one version known to be compatible and the point in ''time'' (i.e. package version!) it is available!
  +
  +
There is one thing that ''is'' safe to assume about package version numbers:
 
They reflect the chronological order of the releases.
 
They reflect the chronological order of the releases.
Now note that if a package follows the ECT protocol,
+
Now note that if a package follows the ECT protocol, every one of its releases is backwards-compatible and a dependency of the form <code><nowiki>foo >= 1.2</nowiki></code> becomes meaningful!
every one of its releases is backwards-compatible
 
and a dependency of the form <code><nowiki>foo >= 1.2</nowiki></code>
 
becomes meaningful!
 
   
The compatibility information is encoded where it originates,
+
The compatibility information is encoded where it originates, in the dependency itself, by means of the "back issues" of a module, be they implemented as simple re-exports, adapters, or the original old code.
  +
The import specifies only what is known there, that is the version the program was originally written against.
in the dependency itself,
 
  +
And if Cabal is used, the dependant package can specify from which package release (i.e. point in time!) on the required version(s) are available, which can then be fetched automatically -- look for next issue's article on cabal-get!
by means of the "back issues" of a module,
 
be they implemented as simple re-exports, adapters,
 
or the original old code.
 
The import specifies only what is known there,
 
that is the version the program was
 
originally written against.
 
And if Cabal is used, the dependant package can
 
specify from which package release (i.e. point in time!) on
 
the required version(s) are available, which can then
 
be fetched automatically --
 
look for next issue's article on cabal-get!
 
   
A package can also seamlessly use multiple versions of the same module,
+
A package can also seamlessly use multiple versions of the same module, just by depending on a sufficiently late release.
just by depending on a sufficiently late release.
 
   
Finally, package authors have complete freedom over how
+
Finally, package authors have complete freedom over how they chose their package version numbers, provided they keep them in ascending chronological order. ;)
they chose their package version numbers,
 
provided they keep them in ascending chronological order. ;)
 
   
   
 
== Conclusion and Call for Adoption ==
 
== Conclusion and Call for Adoption ==
   
I have introduced a simple and unintrusive protocol that,
+
I have introduced a simple and unintrusive protocol that, when followed by a module implementor, can, as the name implies, theoretically guarantee eternal compatibility to his users.
when followed by a module implementor,
 
can, as the name implies, theoretically guarantee
 
eternal compatibility to his users.
 
 
This eliminates a huge source of bit-rot.
 
This eliminates a huge source of bit-rot.
Also, the user himself needs to know next to nothing about the
+
Also, the user himself needs to know next to nothing about the protocol.
protocol.
 
   
ECT works with plain Haskell 98,
+
ECT works with plain Haskell 98, relying only on basic features of the module system.
  +
It coexists synergetically with Cabal.
relying only on basic features of the module system.
 
  +
In fact, when both are used together, they enable the formation of a richly interconnected network of Haskell modules, maximizing code reuse among them
It coexists synergetically with Cabal. In fact,
 
when both are used together, they
 
enable the formation of a richly interconnected
 
network of Haskell modules,
 
maximizing code reuse among them
 
 
and enabling their rapid evolution.
 
and enabling their rapid evolution.
   
In the long run, I envision a world where I can
+
In the long run, I envision a world where I can stumble across some module on the Net, immediately import it in my code, and, on the stroke of a single command, have my system locate, fetch, compile, and install that module,
  +
dropping me at an interactive prompt with the new import available!
stumble across some module on the Net,
 
  +
All of the above ''without'' having to worry about whether or how to bundle that
immediately import it in my code,
 
  +
module with my code, prompted by questions of its availability and compatibility on my users' hosts.
and, on the stroke of a single command,
 
have my system locate, fetch, compile,
 
and install that module,
 
dropping me at an interactive prompt
 
with the new import available!
 
All of the above ''without'' having to worry
 
about whether or how to bundle that
 
module with my code,
 
prompted by questions of its availability
 
and compatibility on my users' hosts.
 
   
Standing next to Cabal, Hackage, and cabal-get,
+
Standing next to Cabal, Hackage, and cabal-get, the ECT protocol is my contribution to the above, so here is my plea:
the ECT protocol is my contribution to
 
the above, so here is my plea:
 
   
'''If you are a module implementor,
+
'''If you are a module implementor, please consider using the ECT versioning scheme, for ''Eternal Compatibility in Theory''!'''
please consider using the ECT versioning scheme,
 
for ''Eternal Compatibility in Theory''!'''
 
   
I think it would be great if we could solve the problem of
+
I think it would be great if we could solve the problem of "creeping incompatibility", whether by my idea or otherwise.
"creeping incompatibility", whether by my idea or otherwise.
 
 
Please let me know of your experience should you try it out!
 
Please let me know of your experience should you try it out!
   
Line 699: Line 382:
   
 
I've modified my own command line parsing module to use the ECT scheme.
 
I've modified my own command line parsing module to use the ECT scheme.
As of this writing (2005-05-02) the latest proper release
+
As of this writing (2005-05-02) the latest proper release (package version 1.1)
  +
contains only interface version 1 but 2 is practically finished, just waiting for the last documentation updates.
(package version 1.1)
 
  +
Look in the public darcs repository for the latest version.
contains only interface version 1
 
but 2 is practically finished,
 
just waiting for the last documentation updates.
 
Look in the public darcs repository
 
for the latest version.
 
   
You can find both the release package,
+
You can find both the release package, seperate documentation, and the darcs repository at
seperate documentation,
 
and the darcs repository at
 
   
 
http://www.scannedinavian.org/~pesco/
 
http://www.scannedinavian.org/~pesco/
Line 718: Line 395:
 
== Acknowledgements ==
 
== Acknowledgements ==
   
Though I had spent some thought about compatibility for myself,
+
Though I had spent some thought about compatibility for myself, I owe thanks to Alexander Jacobson who, with a message to the Haskell mailing list, originally caused me to eventually turn those thoughts into something concrete.
I owe thanks to Alexander Jacobson who, with a message to the
 
Haskell mailing list, originally caused me to eventually turn
 
those thoughts into something concrete.
 
   
Subsequent discussion with Lemmih, musasabi,
+
Subsequent discussion with Lemmih, musasabi, and probably others I've forgotten
  +
on #haskell and the mailing list provided important enlightenment, especially that no extensions to the language would be required.
and probably others I've forgotten
 
on #haskell and the mailing list
 
provided important enlightenment,
 
especially that no extensions
 
to the language would be required.
 
   
I also owe inspiration to the designers of GNU libtool,
+
I also owe inspiration to the designers of GNU libtool, who use some similar ideas for the dependency managment of shared libraries.
who use some similar ideas for the dependency
 
managment of shared libraries.
 
   
Finally, I was only lately reminded by Eike Scholz
+
Finally, I was only lately reminded by Eike Scholz (cptchaos on #h) that Bjarne Stroustrup actually described a very similar scheme to mine in "The C++ Programming Language", which I had completely forgotten about.
(cptchaos on #h)
 
that Bjarne Stroustrup actually described a very similar scheme
 
to mine in "The C++ Programming Language",
 
which I had completely forgotten about.
 
 
Thanks Bjarne, I'm sure that chapter left some subconcious seeds.
 
Thanks Bjarne, I'm sure that chapter left some subconcious seeds.
  +
P

Revision as of 12:39, 25 August 2008

The ECT Module Versioning Protocol

by Sven Moritz Hallberg for The Monad.Reader Issue Two
Date(2005-04-30T18:39:55Z)

Abstract. This article proposes a protocol (to be followed by the programmer) for isolating versions of Haskell modules, eliminating a large source of bit-rot in depending programs. The protocol consists of few simple rules. No language extensions are required and the scheme is immediately usable. If followed, the result is Eternal Compatibility in Theory.

Because of its simplicity, ECT is not restricted to Haskell.


Introduction

Every programmer knows the value of modularity. In particular, our languages have got module systems to support it, by decoupling code. However, much of the decoupling is lost again when the flux of interfaces over time is considered.

As a program module evolves, functions and other elements are added to, removed from, and changed in its interface. It is clear that programs importing the module (it's dependants) will not be compatible with all versions. At least, each program is compatible with one version, the one the author originally used, and usually a few ones before and after that. But if a program is not continuously updated, with time, chances rise dramatically that one of it's dependencies as installed on a given host system will be incompatible. Alas, the program cannot be used. This effect comprises a major source of bit rot.

To avoid such a situation, the programmer importing a module has several options, all of which are unsatisfactory. Let a module P import A, a dependency.

  1. Avoid the dependency altogether.
    1. Duplicate the functionality. If the functionality provided by A is simple, it might be possible to just replicate it with own code. This of course somewhat counters the purpose of creating module A in the first place. On a large scale, it results in many functions being duplicated by different individuals.
    2. Use standard functions instead. Instead of A, use a similar module from the standard library. Of course, the standard library does not provide everything. Also, even the standard library can become incompatible over time, though it usually takes longer.
  2. Internalize the dependency. A "local" copy of A is packaged and distributed with P. This scenario relies on the host Haskell system to prefer the bundled definition of A to any others it might have available. Because that is the usual behaviour, this strategy generally puts the programmer on the safe side but has the disadvantage of locking out bug-fixes and other (compatible) improvements to the original module A. It also increases the overall size of the package for P so is not desirable for many (or) large dependencies.
  3. Continuously update P to stay compatible. While continuous updates are certainly desirable, everyone's capacity is limited and in reality, no program can stay up-to-date forever. Also, this strategy always assumes the latest version of A to be available at the host. Updating a library to the latest version runs the risk of breaking other dependants requiring the older version.

Notice that option 1.2 is quite attractive. The standard library on a host is often reasonably up-to-date (though not always!) and, most importantly, widely available. As a result of these points, a large standard library is usually desired and welcomed. But that also brings large issues with it. A standard library should fit everyone, which is impossible to achieve and very hard to approximate well. Incompatible changes (improvements!) are very hard, if not impossible, to introduce in a standard library, since they will likely break many dependants, especially if the library is large.

So, in summary, we actually see two orthogonal problems when importing an external module:

  1. Availability: Will A be available at the host site?
  2. Compatibility: Will the version of A available at the host be compatible with P?

Solving issue 1, the easy and wide-spread distribution of Haskell libraries, is the combined goal of the Cabal and Hackage projects. Note, however, that Cabal does not solve issue 2, as will be discussed below.

As mentioned in the abstract, the contribution of this article is a protocol, dubbed ECT, by which it is possible for the programmer of the dependency, A, to solve, for his users, and with respect to A, issue 2. I.e. the complete elimination of concerns that P might break because of incompatibilities with A.


Solving the Problem by Removing the Problem

Recall the issue at hand. Interface changes break dependants. As elaborated above, the importer can do nothing against that, and in fact, neither can the dependency author. Interface changes break dependants.

Thence the solution is painfully obvious: Stop interface changes.

In ECT, the module A is split into separate submodules A_1, ..., A_n at version boundaries. The numbers are in chronological order,where n denotes the latest version and contains the current implementation. Whenever the interface changes in any way, the number is incremented, creating a new module. The original module A is retained as a short-hand (think re-export) for "the latest version".

If the importer P uses one of the specific, numbered modules, compatibility is guaranteed, since (by assumption) these modules have static interfaces. But because A is still present, using the number is not required. It would, after all, be quite tiresome to look up a version number for each module one wants to import.

In fact, A can be used to notify the programmer of the number to use via a deprecation pragma. A would look like the following.

module A
    {-# DEPRECATED "This is actually A_5." #-}
    (module A_5)
    where
import A_5

So the user, after first blindly importing module A, can conveniently add the number after compilation, then knowing that A_5 is the actual version the program was written against.

The last remark above illustrates an important point. In any scheme to ensure compatibility of modules with their imports, there must be, for each module for each import, a specification of which interface version the module is referring to. And precisely that is what importing one of the numbered modules means.

"I'm importing module A, referring to version 5 of its interface."


The Protocol

When first thinking through what my idea of module version management would be, I found I had only very vague ideas of what I actually meant by "module", "version", etc.. So let me start the formal description of the ECT versioning protocol by introducing some Definitions.

Module definition.
A piece of code defining a Haskell module, as described in the Haskell Report, is called a module definition. Each module definition has an associated module identifier, e.g. System.Console.Cmdline.Pesco (which may be implicit in the case of Main).
Revision.
Let x be a module definition and X be its associated module identifier. We call x a revision of X.
Module.
In plain English, the module named X is defined as the set of all revisions of X. More formally, let x be a revision of X, y be a revision of Y, and call x and y related iff X=Y. We then call the equivalence classes of this relation modules and their associated identifiers - unique within the class - the modules' names.
Interface.
The set of entities (functions, classes, instances, etc.) exported from a module is called its interface.
Compatibility.
Two revisions are called (mutually) compatible, if their interfaces are equal and all exported entities have the same semantics.
Version.
The equivalence classes of the compatibility relation, restricted to a single module X, are called versions of X.

Now, say you are implementing a module A. To fully conform to the protocol with respect to A, simply ensure the following invariants.


Let all versions of A be numbered from 1 to n.

  1. For each i=1,...,n there exists a module A_i whose only version is equal to version i of A.
  2. Version n contains the latest revision of A.
  3. Module A is deprecated by means of a DEPRECATED pragma, with a message that A_n should be imported instead.

So for example, say you are writing A from scratch. Starting with version number 1, implement the first revision as A_1.

module A_1 where
foo = "foo"

Define module A to re-export A_1 and include a DEPRECATED pragma that at the same time

  1. reminds the user to explicify her import and
  2. informs her of the current version number.
module A
    {-# DEPRECATED "Import A_1." #-}
    (module A_1)
    where
import A_1

When you modify the definition of A_1, consider any changes to its interface, including type signatures (of course) and semantics. If anything has changed, i.e. added, removed, or modified, a new version has formed, so increment the version number and move this revision to a new module, i.e. A_2. Change every occurrence of A_1 in the definition of A to A_2. Either re-implement A_1 in terms of A_2 or keep the old revision in place.

If there were no changes whatsoever, keep the new definition a revision of A_1.

Repeat the above for every new revision.

Ideally, the versions of A will form a chain of adapters and re-exports, where every version is implemented in terms of a later one, A_n containing the actual codebase. But of course it is also possible to completely leave some version's old implementation in place, if the next version differs very significantly, for instance.

Now consider the case of switching an existing module, which has already gone through several versions, to ECT. To attain "full compliance", as per the above definition, every old version of the module would have to be identified and split into its own module.

Certainly, it is desirable to allow weaker invariants than given above, to allow, for instance all the pre-ECT versions of a module A to be grouped in A_0 or similar. Or, a module owner might want to keep A locked on the last pre-ECT version to remain backwards-compatible. Then A_0 could take the role of the "latest version" short-cut module.

In the end, the only property essential to the compatibility guarantees to be facilitated by ECT is the following.


Let all versions of A be numbered from 1 to n.

For each i=1,...,n, if there exists a module A_i, it has only one version which is equal to version i of A.


Any module satisfying the above shall be called weakly conforming to the ECT versioning scheme.


Bugs, Behaviour, Semantics

The above explanations have ignored the very likely possibility of program bugs. The common definition of a bug is a programmer error resulting in undesired behaviour of the program.

The question - in light of the earlier definition of compatibility - is, whether this unintended behaviour can or should be considered as part of the program's semantics. If a bug in some revision of a module is part of its semantics, the bug-fixed revision would be considered part of a new version and prompt the creation of a separate module. Clearly, this is usually undesirable.

To answer this question in detail would largely be a philosophical undertaking on intent, meaning, collective vs. individual perception etc.. So, for this discussion, let the following guidelines suffice.

An actual problem only arises out of a bug if some user mistakes the unintended behaviour for intended semantics and uses it, effectively creating a dependency on the bug. Then, if the bug is fixed in a later revision, that user's program will break.

Luckily, it will often be clear that a bug is not part of the intended semantics of an interface. If it is indeed safe to assume that no users will accidentally mistake the bug for a feature, it is also safe to leave the version numbering as it is.

If, however, it is unclear whether any importers could come to rely on the erroneous behaviour, there is also a simple solution: split the buggy module. So far, we have assumed module version numbers to be integers. When splitting a module due to a bug-fix, also "split" it's version number into two rationals "close" to it. Let me explain by means of an example. If a dubious bug is found in version 4, create two new modules with versions 3.9 and 4.1 (add/subtract 0.1). Since the decimal point cannot be used in the module name, substitute an underscore yielding A_3_9 and A_4_1.

Put the original module definition, with the bug, into the lower module, A_3_9. This makes it clear that the buggy version is semantically "before" what version 4 should have been. Likewise, the bug-fixed version goes into the higher module, A_4_1.

Now deprecate the old module A_4 with a warning about the bug and let it re-export A_3_9 to preserve compatibility.

It's good the rationals are dense, for thus we can split each module an arbitrary number of times. For instance, if another bug was discovered later in A_3_9, split its rational version number 3.9 again at the last digit: 3.9 is moved to 3.89, 3.9 itself becomes a deprecated re-export of 3.89, and 3.91 contains the bug-fix.

Please note that this fractional numbering has no relation to any second- or third-level version components in say, the package containing these modules. For instance, if A is released in a corresponding package a, a-4.2 could refer to the second revision of the fourth release, consisting, using the example from above, of A and A_1 through A_4, as well as A_3_9 and A_4_1 (first revision), and A_3_89 and A_3_91 (second revision).

Finally, let me give an example for an appropriate bug deprecation message.

{-# DEPRECATED
    "A_4 is buggy: The foo function returns "foo" \
        \which should have been "Foo".\n\
    \Import A_4_1 for the correct behaviour or \
        \A_3_9 if you rely on the bug."
#-}

Notice that the user is informed of

  1. how to determine whether she is relying on the bug and
  2. which module to import in each case.


Tests

Consider the following likely situation. You are the author of module A which is currently at version 1 and you have a bunch of unit tests for A_1. You create a new version, A_2, which is backwards-compatible, i.e. the interface of A_1 is a subset of A_2's and the semantics of this subset are the same for both modules. In this case, any tests for A_1 are also good for A_2, and, if you replace A_1 with a partial re-export of A_2, it would be sufficient to run only the A_2 tests in order to test both versions. Also, if any tests for the A_1 subset of A_2 were added to A_2, they would automatically apply to A_1 as well.

In general, tests for functions that are re-exports can probably move up to the actual implementation safely. In the hypothetical case that the re-export is changed later, it just has to be remembered to pull down copies of the tests again.

For instance, if module A_1 is defined as

module A_1
    (foo, bar)
    where
import A_2 (foo)
bar = "barr"

and A_2 contains

module A_2
    (foo, bar, baz)
    where
foo = "foo"
bar = "bar"
baz = "baz"

the tests for A_1 would consist only of tests for bar while the tests for A_2 would contain tests for all three functions, foo, bar, baz.


Documentation

Similar remarks as for tests apply to documentation. If you use an autogenerator like Haddock, everything will be fine automatically: Re-exports inherit their documentation from the original. If you maintain documentation separately, the documentation must be split by hand. But keep in mind that nothing forces you to expend more work than you would without following ECT.

I.e. please consider the ECT protocol even if you do not want to spend time on the old documentation (or tests, for that matter)!

This concludes the discussion of the developer's side of the protocol.


On the User's Side

Luckily, no extra rules at all apply to the user of an ECT module. The user does not need to know about ECT at all. She can just import the unnumbered module as ever. Then, if she notices the deprecation message, and hopefully follows it, it is just all the better.

Of course, everyone will quickly get the hang of it, so in general, the procedure is as follows:

  1. Import the unnumbered module.
  2. Compile.
  3. Observe the deprecation warnings and change all imports to include the given version number.
  4. Relax. You are safe. ;)


Relation to the Cabal

The Cabal also includes provisions for versioning and dependency tracking. Keep in mind, however, that it is concerned with packages instead of single modules. The mechanism for dependency tracking consists of listing the dependencies (packages) in the package description, along with "version annotations", for instance foo > 1.2 would mean "depends on package foo, any version later than 1.2 is compatible".

I shall demonstrate that the ECT protocol does not compete with Cabal's functionality, but, together with what the Cabal and Hackage offer, forms a necessary part of a very valuable whole, i.e. the solution to both problems presented in the introduction, availability and compatibility.

Let me first argue that Cabal (at least in its current form) is not suited to solve the problem of dependency tracking, i.e. compatibility. I will contrast that with the alternate scenario, ECT ensuring compatibility, and conclude that Cabal's dependency system in fact fills an important need in that case.

Could Cabal Be Used Instead?

idea: dependency-tracking at the package level
problem: unknown when compatibility ends, e.g. 1.2 <= foo < 2.0
compatibility information is available only at the dependency!
importer knows exactly one version that is compatible.

This scenario implies dependency-tracking at the level of packages. As already described, a Cabal package can list a number of other packages as dependencies. However, there is a problem: A package author may know the first compatible revision of a dependency, but he has no way to know the last. For example, if I depend on a package foo and know that foo-1.2 is compatible, I can specify foo >= 1.2 as a dependency. If however, say, foo-2.0 turns out not to be backwards-compatible, I should have written something like 1.2 <= foo < 2.0, which I could not have known at the time.

In general, the compatibility information is available on the side of the dependency, but not at the site of import! The importer knows a priori only one version that is compatible, and that is the one he used.

A possible solution would be to establish a convention that Cabal packages are assumed backwards-compatible until the next major version, or similar. This scenario gives rise to a number of other problems, though.

  1. It forces an interpretation onto package versions which might not be accurate. Many software packages are numbered according to largely arbitrary schemes.
  2. Assuming the convention in question, consider the following consequence. It would be impossible for a single package to use two different major versions of the same dependency. A standard example would be a program to convert data from the old version into something suitable for the new.
  3. Still assuming the convention in question, There are the following two alternative consequences:
    1. New major versions are released infrequently, slowing down interface evolution. For example, the programmer would hesitate to change an oddly-named function. Contrast this quickly with the ECT alternative: Freely change the function name (or anything else!) in the next revision, give it a new number, replace the old version by a trivial adapter, and you are done without having broken any of your dependants.
    2. New major versions are released frequently, to facilitate rapid interface evolution. This possible world would quickly become populated with many (to be assumed) incompatible versions of a package, many of which would probably find their way onto a host system. In the end, a host would for every package contain many different versions, as required for different dependants.

In summary, it should be clear that relying on Cabal to provide any kind of compatibility guarantees would be unpleasant at least. Luckily, there is a perfect alternative.

Cabal Provides, ECT Guarantees

but: cabal versions establish chronological ordering.
=> dependency-tracking with ECT, cabal to ensure availability
compatibility info encoded by dependency through re-exports and adapters
import specifies the one version known to be compatible and the point in time (i.e. package version!) it is available!

There is one thing that is safe to assume about package version numbers: They reflect the chronological order of the releases. Now note that if a package follows the ECT protocol, every one of its releases is backwards-compatible and a dependency of the form foo >= 1.2 becomes meaningful!

The compatibility information is encoded where it originates, in the dependency itself, by means of the "back issues" of a module, be they implemented as simple re-exports, adapters, or the original old code. The import specifies only what is known there, that is the version the program was originally written against. And if Cabal is used, the dependant package can specify from which package release (i.e. point in time!) on the required version(s) are available, which can then be fetched automatically -- look for next issue's article on cabal-get!

A package can also seamlessly use multiple versions of the same module, just by depending on a sufficiently late release.

Finally, package authors have complete freedom over how they chose their package version numbers, provided they keep them in ascending chronological order. ;)


Conclusion and Call for Adoption

I have introduced a simple and unintrusive protocol that, when followed by a module implementor, can, as the name implies, theoretically guarantee eternal compatibility to his users. This eliminates a huge source of bit-rot. Also, the user himself needs to know next to nothing about the protocol.

ECT works with plain Haskell 98, relying only on basic features of the module system. It coexists synergetically with Cabal. In fact, when both are used together, they enable the formation of a richly interconnected network of Haskell modules, maximizing code reuse among them and enabling their rapid evolution.

In the long run, I envision a world where I can stumble across some module on the Net, immediately import it in my code, and, on the stroke of a single command, have my system locate, fetch, compile, and install that module, dropping me at an interactive prompt with the new import available! All of the above without having to worry about whether or how to bundle that module with my code, prompted by questions of its availability and compatibility on my users' hosts.

Standing next to Cabal, Hackage, and cabal-get, the ECT protocol is my contribution to the above, so here is my plea:

If you are a module implementor, please consider using the ECT versioning scheme, for Eternal Compatibility in Theory!

I think it would be great if we could solve the problem of "creeping incompatibility", whether by my idea or otherwise. Please let me know of your experience should you try it out!


A Real-World Example

I've modified my own command line parsing module to use the ECT scheme. As of this writing (2005-05-02) the latest proper release (package version 1.1) contains only interface version 1 but 2 is practically finished, just waiting for the last documentation updates. Look in the public darcs repository for the latest version.

You can find both the release package, seperate documentation, and the darcs repository at

http://www.scannedinavian.org/~pesco/

under the title "System.Console.Cmdline.Pesco".


Acknowledgements

Though I had spent some thought about compatibility for myself, I owe thanks to Alexander Jacobson who, with a message to the Haskell mailing list, originally caused me to eventually turn those thoughts into something concrete.

Subsequent discussion with Lemmih, musasabi, and probably others I've forgotten on #haskell and the mailing list provided important enlightenment, especially that no extensions to the language would be required.

I also owe inspiration to the designers of GNU libtool, who use some similar ideas for the dependency managment of shared libraries.

Finally, I was only lately reminded by Eike Scholz (cptchaos on #h) that Bjarne Stroustrup actually described a very similar scheme to mine in "The C++ Programming Language", which I had completely forgotten about. Thanks Bjarne, I'm sure that chapter left some subconcious seeds. P