[Haskell-cafe] Re: [Haskell] [Fwd: undecidable & overlapping instances: a bug?]

Sun Oct 21 21:30:45 EDT 2007

Hi All,

Here are my responses to the recent messages, starting with some
summary comments:

- I agree with Martin that the condition I posted a few days ago is
   equivalent to the "*refined* weak coverage condition" in your
   paper.  The "refined" tag here is important---I missed it the
   first time and thought he was referring to the WCC at the start of
   Section 6.1, and I would strongly recommend against using that
   (unrefined) version.  I confess that I haven't read the paper
   carefully enough to understand why you included that first WCC at
   all; I don't see any reason why you would want to use that
   particular condition in a practical implementation, and I don't
   see any advantages to it from a theoretical perspective either.
   (Maybe you just used it as a stepping stone to help motivate the
   refined WCC?)

- I believe that the refined WCC (or the CC or even the original WCC
   if you really want) are sufficient to ensure soundness.  I don't
   agree that the (B) restriction to "full" FDs is necessary.

- I think it is dangerous to tie up soundness with questions about
   termination.  The former is a semantic property while the latter
   has to do with computability.  Or, from a different perspective,
   soundness is essential, while termination/decidability is "only"
   desirable.  I know that others have thought more carefully than
   I have about conditions on the form of class and instance
   declarations that will guarantee decidability and I don't have
   much to say on that topic in what follows.  However, I will
   suggest that one should start by ensuring soundness and then,
   *separately*, consider termination.  (Of course, it's perfectly
   ok---just not essential---if conditions used to ensure soundness
   can also be used to guarantee termination.)

Stop reading now unless you're really interested in the details!
I'm going to start by explaining some key (but hopefully
unsurprising) ideas before I respond in detail to the example
that Martin posted.

Interpreting Class, Instance, and Dependency Decls:
---------------------------------------------------
I'll start by observing that Haskell's class, instance, and
functional dependency declarations all map directly to formulas in
standard predicate logic.  The correspondence is straightforward, so
I'll just illustrate it with some examples:

   Declaration               Formula
   -----------               -------
   class Eq a => Ord a       forall a. Ord a => Eq a
   instance Eq a => Eq [a]   forall a. Eq a => Eq [a]
   class C a b | a -> b      forall a, b. C a b /\ C a c => b = c

This correspondence between declarations and formulas seems to be
very similar to what you did in the paper using CHRs.  I haven't
read the paper carefully enough to know what extra mileage and/or
problems you get by using CHRs.  To me, predicate logic seems more
natural, but I don't think that matters here---I just wanted to
acknowledge that essentially the same idea is in your paper.

In the following, assuming some given program P, I'll write I(P)
for the conjunction of all the instance declaration formulas in
P; C(P) for the conjunction of all the class declaration formulas;
and F(P) fo the conjunction of all the dependency declaration
formulas.

Semantics and Soundness:
------------------------
We'll adopt a simple and standard semantics in which each
n-parameter type class is described by an n-place relation, that is,
a set of n-tuples of types.  Of course, in the most common, one
parameter case, this is equivalent to viewing the class as a set of
types.  Each tuple in this relation is referred to as an "instance"
of the class, and the predicate syntax C t1 ... tn is just a
notation for asserting that the tuple (t1, ..., tn) is an instance
of C.

We can assign a meaning to each of the classes in a given program P
by using the smallest model of the instance declaration formulas,
I(P).  (Existence of such a model is guaranteed; it is the union of
an increasing chain of approximations.)

What about the superclass properties in C(P)?  We expect to use
information from C(P) to simplify contexts during type inference.
For example, if we're using the standard Haskell prelude, then we
know that we can simplify (Eq a, Ord a) to (Ord a).  But how do we
know this is sound?  Haskell answers this by imposing restrictions
(third bullet in Section 4.3.2 of the report) on the use of
instance declarations to ensure that our (least) model of I(P) is
also a model of C(P).  This guarantees, for example, that every
type in the semantics of "Ord" will, as required, also be included
in the semantics of "Eq".

That should all seem pretty standard, but I've included it here
to make the point that we need to something very similar when we
introduce functional dependencies.  Specifically, we want to be
able to use properties from F(P) to simplify/improve contexts
during type inference, so we have to ensure that this is sound.
If we follow the strategy that was used to ensure soundness of
C(P) in Haskell, then we have to come up with a restriction on
instance declarations to ensure that our model of I(P) is also
a model of F(P).  I claim that any of the three conditions we've
talked about previously---using your terminology, the CC, WCC or
refined WCC---are sufficient to ensure this soundness property.
I haven't completed a formal proof, but I think it should be a
pretty straightforward exercise in predicate logic.

If a program fails to meet the criteria that ensure soundness of a
superclass declaration, the Haskell compiler should reject that
code.  Exactly the same thing should happen if the input program
doesn't satisfy the functional dependencies.  Further restrictions
are needed to ensure that there is a well-defined semantics (i.e.,
dictionary/evidence) for each type class instance; this is where
overlapping instances complicate the picture, for example.
However, for what we're doing here, it's sufficient to focus on
the semantics of classes (i.e., on which types are instances of
which classes) and to defer the semantics of overloaded operators
to a separate discussion.

Responding to Martin's Example:
-------------------------------

 > Here's an example (taken from the paper).
 >
 >   class F a b c | a->b
 >   instance F Int Bool Char
 >   instance F a b Bool => F [a] [b] Bool

This is an unfortunate example because, in fact, the second instance
declaration doesn't contribute anything to the class F!  Using the
semantics I outlined above, the class F is just {(Int, Bool, Char)},
with only the one instance that comes from the first instance
declaration.  This is unfortunate because, in the presence of
functional dependencies---or, in general, any form of
"improvement"---you have to take account of satisfiability.  In the
example here, any predicate of the form F a b Bool is
unsatisfiable, and so any time we attempt to transform an inferred
context containing a predicate of this form into some "simplified"
version, then there is a sense in which all we are doing is proving
that False is equivalent to False (or that anything follows from a
contradiction).

In what follows, let's refer to the program above as Q1.  However,
to explore a case where the predicates of interest are satisfiable,
let's also consider an extended version, Q2, that adds one extra
instance declaration:

     instance F Char Int Bool

Now the semantics of Q2 looks something like:

  { (Int, Bool, Char),
    (Char, Int, Bool),
    ([Char], [Int], Bool),
    ([[Char]], [[Int]], Bool),
    ([[[Char]]], [[[Int]]], Bool),
    ([[[[Char]]]], [[[[Int]]]], Bool), ... }

 > The FD is not full because the class parameter c is not involved in
 > the FD. We will show now that the CHR solver is not confluent.

Although it's just one example, it is worth pointing out that both
Q1 and Q2 lead to a semantics for F that is consistent/sound with
respect to the declared functional dependency.  So far as our
semantics is concerned, the fact that the dependency is not full
is not significant.

I don't dispute your claim that fullness is necessary to establish
confluence with CHRs.  But that is a consequence of the formal
proof/rewrite system that you're using, and not of the underlying
semantics.

 > Here is the translation to CHRs (see the paper for details)
 >
 >   rule F a b1 c, F a b2 d  ==> b1=b2      -- (FD)
 >   rule F Int Bool Char    <==> True       -- (Inst1)
 >   rule F Int a b           ==> a=Bool     -- (Imp1)
 >   rule F [a] [b] Bool     <==> F a b Bool -- (Inst2)
 >   rule F [a] c d           ==> c=[b]      -- (Imp2)

I'm a little surprised that you need Imp1 and Imp2 here.  I didn't
include analogs of these when I gave a predicate logic rendering
for dependency declarations above because they can be obtained as
derived properties.  Imp1, for example:

     Suppose F Int a b.
     From Inst1, we know that F Int Bool Char.
     Hence by FD, we know that a = Bool.

My best guess is that you included the Imp1 and Imp2 rules because,
although the "<==>" symbol suggests the possibility of bidirectional
rewriting, you actually want to view the above as oriented rewrite
rules?  From my perspective, however, the FD rule is the definition
of the functional dependency on F, and everything that you want to
know about this dependency can be derived from it ...

 > The above CHRs are not confluent. For example,
 > there are two possible CHR derivations for the initial
 > constraint store F [a] [b] Bool, F [a] b2 d
 >
 >     F [a] [b] Bool, F [a] b2 d
 > -->_FD (means apply the FD rule)
 >     F [a] [b] Bool, F [a] [b] d , b2=[b]
 > --> Inst2
 >     F a b Bool, F [a] [b] d , b_2=[b]         (*)
 >
 > Here's the second CHR derivation
 >
 >     F [a] [b] Bool, F [a] b2 d
 > -->_Inst2
 >     F a b Bool, F [a] b2 d
 > -->_Imp2
 >     F a b Bool, F [a] [c] d, b2=[c]           (**)
 >
 > (*) and (**) are final stores (ie no further CHR are applicable).
 > Unfortunately, they are not logically equivalent (one store says
 > b2=[b] whereas the other store says b2=[c]).

I think it's a mistake to expect these formulas to be logically
equivalent, at least in the strong syntactic sense that you seem
to be assuming here.  As I've mentioned previously, if we're going
to use improvement, then we have to take account of satisfiability.

In the context of Q1, this is an example in which each of the two
derivations above are essentially equivalent to False --> False.
But, in the general case, if we comparing (*) with (**), then the
main difference seems to be that the latter involves the use of
a "new" variable, c, in place of the variable b that appears in
the latter.  However, (*) includes F [a] [b] d and (**) includes
F [a] [c] d, and so we can deduce from the functional dependency
(i.e., your FD rule) that [b]=[c], and hence b=c.  If we're
allowed to inline that equality (i.e., use it to replace all
occurences of c with b), then (**) becomes identical to (*).

 > To conclude, fullness is a necessary condition to establish confluence
 > of the CHR solver.

I think that the standard notion of confluence is the wrong
property to be looking at here.  Or, to put it another way, I don't
think it is reasonable to expect confluence in the presence of
improvement.  That said, it might still be appropriate and useful
to formulate a notion of "confluence modulo satisfiability"
in which the equalities involved in the definition of confluence
are weakened to "equalities modulo satisfiability" ...

 > Confluence is vital to guarantee completeness of type inference.

I disagree.  I established the completeness of type inference in
the presence of improvement (without requiring any kind of
confluence-like property) as part of my work in 1994 on
"Simplifying and Improving Qualified Types."  (See Section 4 of
the technical report version, for example, which includes
details---like the proofs---that I had to omit from the later
FPCA '95 version for lack of space.)  Functional dependencies
are a special case of improvement, so I'd dispute the claim in
your paper that you are the first to formalize and prove
soundness, completeness and decidability.  (However, I'm
sympathetic to the fact that you missed this, because my results
on improvement were published six years before my formulation of
functional dependencies; if you were only looking at later
publications, you wouldn't have found anything.)

It's worth pointing out that, to achieve my completeness result,
I had to do exactly what I am suggesting that you should do now:
modify your formalism to account for satisfiability.  In my work,
for example, that meant introducing a "satisfiability ordering"
on type schemes and replacing the conventional notion of principal
types with a concept of "principal satisfiable types".

Let me take this opportunity to reflect on my understanding of
your main contributions and the relationship to my work.  In my
general, formal system, simplification and improvement rules can
be used at any point during type inference, but there are no
language features---such as explicit type signatures or instance
declarations---that *require* the use of such rules.  As a result,
decidability, in the general case, is easy to establish, but it
is arguably not very useful.

In my paper, I suggested that, in practice, for any specific
application of improvement, a language designer would likely want
to adopt a more algorithmic approach that invokes suitably defined
"improvement" and "simplification" functions at specific points
during type inference (coupled, for example, with
generalization).  With this approach, soundness, completeness, and
decidability of type inference become dependent on the soundness
and termination properties of the improvement and simplification
functions.  This, I think, is the most significant technical issue
that you are addressing in your paper.  In concrete terms, I think
this comes down to a question of whether the entailment relation
C1 ||- C2 between contexts C1 and C2 is decidable.  I hadn't heard
of the "Paterson Conditions" before I saw the reference on the
Trac page (sorry Ross!), but it sounds like they are sufficient to
guarantee good theoretical properties but also clear and flexible
enough to be useful in practice.

 > I don't think that fullness is an onerous condition.

Well that's probably true, but requiring fullness means that
there's one extra concept that programmers have to grapple with,
and one extra hurdle that they have to overcome in those
situations where a non-full FD seems to provide a more natural
solution.

There are at least a couple of ways to avoid the non-full FD in
your example program.  For example, we could replace every use
of (F a b c) with a pair of constraints (F1 a b, F2 a c) where
the classes F1 and F2 are as follows:

   class F1 a b | a -> b
   class F2 a c

This is an example of what's called normalization in relational
databases.  Because the type class relations that we use in
practice are fairly simple (just a few columns) and because
normalized relations have some structural benefits, I'd suspect
that the use of normalized type class relations, like the F1+F2
pair above, is usually a hallmark of better design than the
non-normalized version (like the original version with just F).
Nevertheless, unless there are strong technical reasons to require
one version over the other, I'd prefer to leave choices like this
open so that individual programmers can make the most appropriate
call for a given application.

....

I'm sorry I haven't given you this kind of feedback previously.
Simon had invited me to take a look at the paper on at least one
previous occassion, but it had not found its way to the front of my
reading queue.  Even now, I've only skimmed the paper (e.g., I
started with the Trac web page, figured out that I needed to know
what a "full" FD might be, and then used a text search of the pdf
to locate the definition.)  But I can see that there is a lot more
in the paper beyond the aspects we're discussing here, particularly
with respect to criteria for establishing termination, so I
apologize again if I've overlooked other relevant sections.

I hope my comments are still useful!

All the best,
Mark