Personal tools

Physical equality

From HaskellWiki

(Difference between revisions)
Jump to: navigation, search
 
m (Problems in optimizing equality: typo)
 
(15 intermediate revisions by 5 users not shown)
Line 1: Line 1:
__TOC__
+
= Object Equality for Lazy Languages =
   
 
== Why we need this ==
 
== Why we need this ==
   
<pre>
+
By Ahn, Ki Yung
\paragraph{ }
+
 
Lazy languages like Haskell have a good excuse for
 
Lazy languages like Haskell have a good excuse for
 
not having object equality (or physical equality):
 
not having object equality (or physical equality):
Line 14: Line 14:
 
test as forcing evaluation.
 
test as forcing evaluation.
   
\paragraph{ }
+
Using equality test for forcing evaluation is against the philosophy of Haskell.
Using equality test for forcing evaluation is agints design purpose of Haskell.
+
Haskell has only one equality operator unlike Unlike Scheme or ML,
Hakell has only one equality operator unlike Unlike Scheme or ML,
 
 
which is to be simple and clean. But we have assigned strange operational
 
which is to be simple and clean. But we have assigned strange operational
 
semantics on equality. This is also problematic for partial evaluation;
 
semantics on equality. This is also problematic for partial evaluation;
we cannot replace \verb|x==x| to \verb|True|.
+
we cannot replace ''x==x'' with ''True''.
   
\paragraph{ }
+
The functional programming style of persistent data structures tend to
Functional programming style of persistant data structure tend to
 
 
generate shared objects frequently. Functional languages lacking
 
generate shared objects frequently. Functional languages lacking
object equality cannot have full-benefit of sharing
+
object equality cannot have the full-benefit of sharing
since they cannont optimize equality with object equality.
+
since they cannot optimize equality with object equality.
Functional languages that cannot utilize object equality is
+
Functional languages that cannot use object equality are
 
giving up the optimization opportunity on the best-fitting domain.
 
giving up the optimization opportunity on the best-fitting domain.
 
It is a common idiom to check object equality before structural equality.
 
It is a common idiom to check object equality before structural equality.
 
Object equality implies structural equality,
 
Object equality implies structural equality,
 
unless we are dealing with non-deterministic objects.
 
unless we are dealing with non-deterministic objects.
\footnote{There are rare exceptions such as OCaml \texttt{nan}, which is in my opinion a design flaw.}
+
(There are rare exceptions such as OCaml ''nan'', which is in my opinion a design flaw)
   
\paragraph{ }
+
Therefore, our goal is to develop an evaluation strategy that intelligently
Therefore, our goal is to develope a evaluation strategy that intelligently
 
 
forces evaluation when it is sure that forcing evaluation will not take too
 
forces evaluation when it is sure that forcing evaluation will not take too
 
long and save space as well. For this we would need both static analysis and
 
long and save space as well. For this we would need both static analysis and
Line 38: Line 38:
 
we can have better optimization opportunities
 
we can have better optimization opportunities
 
such as using object equality in equality tests.
 
such as using object equality in equality tests.
</pre>
 
   
 
== Problems in optimizing equality ==
 
== Problems in optimizing equality ==
   
  +
It is a common idiom to benefit from physical equality in C++.
  +
  +
A common idiom of C++ copy ctor, assignment and equality:
  +
  +
<pre>
  +
class C
  +
{
  +
...
  +
C(const C& c)
  +
{
  +
if (*this == &c) return; // the same object!! nothing to do
  +
// do the job if not the same object
  +
}
  +
  +
C& operator = (const C& c)
  +
{
  +
if (*this == &c) return; // the same object!! nothing to do
  +
// do the job if not the same object
  +
}
  +
  +
bool operator == (const C& c)
  +
{
  +
if (*this == &c) return true; // the same object!! nothing to do
  +
// do the job if not the same object
  +
}
  +
...
  +
};
  +
</pre>
  +
  +
However, haskell being a lazy langauge it has concerns that strict languages would not have.
 
Let <code>===</code> be the physical equality.
 
Let <code>===</code> be the physical equality.
 
Even though <code>===</code> implies <code>==</code> except for bottom,
 
Even though <code>===</code> implies <code>==</code> except for bottom,
Line 50: Line 79:
   
 
=== Analyzing haskell insertion sort algorithm ===
 
=== Analyzing haskell insertion sort algorithm ===
These examples came up during the CS584 Algorithms (2006 Spring) class
+
at Portland State University, lectured by Mark P. Jones.
+
These examples came up doing one of the homework
  +
in CS584 Algorithms (2006 Spring) class at Portland State University,
  +
lectured by Mark P. Jones.
  +
I choose haskell to implement sorting but memory leaked I added the counter.
 
Thanks to Mark P. Jones for mentioning that
 
Thanks to Mark P. Jones for mentioning that
 
equalities are used for forcing evaluation.
 
equalities are used for forcing evaluation.
   
\paragraph{Insertion sort}
+
==== Insertion sort ====
  +
 
<haskell>
 
<haskell>
 
inssrt [] = []
 
inssrt [] = []
Line 64: Line 93:
 
</haskell>
 
</haskell>
 
We test for the worst case when the length of the input is 1000 and it works fine.
 
We test for the worst case when the length of the input is 1000 and it works fine.
  +
 
<pre>
 
<pre>
 
Main> inssrt [1000,999..1]
 
Main> inssrt [1000,999..1]
Line 70: Line 100:
 
</pre>
 
</pre>
   
\paragraph{Insertion sort with counter that leaks memory}
+
==== Insertion sort with counter that leaks memory ====
  +
 
<haskell>
 
<haskell>
 
inssort p@(_,[]) = p
 
inssort p@(_,[]) = p
Line 79: Line 109:
 
| otherwise = (n', y:ys') where (n',ys') = ins z (n+1,ys)
 
| otherwise = (n', y:ys') where (n',ys') = ins z (n+1,ys)
 
</haskell>
 
</haskell>
  +
 
We test for the worst case when the length of the input is 1000 and it crashes
 
We test for the worst case when the length of the input is 1000 and it crashes
because of memory leak due to constructing lengthy expression $1+1+1+\cdots $.
+
because of [[memory leak]] due to constructing lengthy expression $1+1+1+\cdots $.
 
<pre>
 
<pre>
 
Main> inssort (0,[1000,999..1])
 
Main> inssort (0,[1000,999..1])
Line 86: Line 117:
 
</pre>
 
</pre>
   
\paragraph{Insertion sort with counter patched by obscure equality}
+
==== Insertion sort with counter patched by obscure equality ====
  +
 
<haskell>
 
<haskell>
 
inssort' p@(_,[]) = p
 
inssort' p@(_,[]) = p
Line 95: Line 126:
 
| n==n && otherwise = (n', y:ys') where (n',ys') = ins z (n+1,ys)
 
| n==n && otherwise = (n', y:ys') where (n',ys') = ins z (n+1,ys)
 
</haskell>
 
</haskell>
  +
 
We test for the worst case when the length of the input is 1000 and it works
 
We test for the worst case when the length of the input is 1000 and it works
 
because of the equality check forces evaluation of the counter value.
 
because of the equality check forces evaluation of the counter value.
Line 103: Line 135:
 
</pre>
 
</pre>
   
+
== Implementing physical equality in Haskell ==
== Implementation ==
 
   
 
I tried Hugs Version: 20050308 and GHC version 6.4.2 but only GHC worked.
 
I tried Hugs Version: 20050308 and GHC version 6.4.2 but only GHC worked.
Line 129: Line 161:
 
s = S (S [1..])
 
s = S (S [1..])
 
s2 = S (S [1..])
 
s2 = S (S [1..])
  +
s3 = s
 
</haskell>
 
</haskell>
 
<pre>
 
<pre>
Line 144: Line 177:
 
*Main> s === s2
 
*Main> s === s2
 
False
 
False
  +
*Main> s === s3
  +
True
 
</pre>
 
</pre>
   
  +
This implementation works and concurrncy or garbage collection does not affect the result.
  +
But it is very slow since newStablePtr, freeStablePtr operations are expensive because it changes the state of the [[garbage collector]].
  +
Cheap operation that just peeks the address is needed for the implementation.
  +
I think the best way is to provide a primitive operator for this.
  +
  +
== Caveat ==
  +
It should be noted that an equality operator like (===) above breaks referential transparency, so adding it to Haskell would ruin many of the nice properties that Haskell has and would make many program transformations invalid.
   
 
== Related Papers and Websites ==
 
== Related Papers and Websites ==
   
* Observable Sharing http://citeseer.ist.psu.edu/claessen99observable.html
+
* Observable Sharing http://citeseer.ist.psu.edu/claessen99observable.html
* Hash Consing http://citeseer.ist.psu.edu/goubault94implementing.html
+
* Hash Consing http://citeseer.ist.psu.edu/goubault94implementing.html
   
   

Latest revision as of 06:34, 2 April 2009

Contents

[edit] 1 Object Equality for Lazy Languages

[edit] 1.1 Why we need this

By Ahn, Ki Yung

Lazy languages like Haskell have a good excuse for not having object equality (or physical equality): the equality test is a hack for forcing evaluation. We can make use of object equality in Haskell if we can provide a resonable alternative for forcing evaluation; it would be even better if we can automatially decide when to evaluate. Therefore, we suggest we should search for a evaluation strategy that will dismiss the use of equality test as forcing evaluation.

Using equality test for forcing evaluation is against the philosophy of Haskell. Haskell has only one equality operator unlike Unlike Scheme or ML, which is to be simple and clean. But we have assigned strange operational semantics on equality. This is also problematic for partial evaluation; we cannot replace x==x with True.

The functional programming style of persistent data structures tend to generate shared objects frequently. Functional languages lacking object equality cannot have the full-benefit of sharing since they cannot optimize equality with object equality. Functional languages that cannot use object equality are giving up the optimization opportunity on the best-fitting domain. It is a common idiom to check object equality before structural equality. Object equality implies structural equality, unless we are dealing with non-deterministic objects. (There are rare exceptions such as OCaml nan, which is in my opinion a design flaw)

Therefore, our goal is to develop an evaluation strategy that intelligently forces evaluation when it is sure that forcing evaluation will not take too long and save space as well. For this we would need both static analysis and evaluation strategy at runtime. When we rely less on operational semantics of equality, we can have better optimization opportunities such as using object equality in equality tests.

[edit] 1.2 Problems in optimizing equality

It is a common idiom to benefit from physical equality in C++.

A common idiom of C++ copy ctor, assignment and equality:

class C
{
...
   C(const C& c)
   {
       if (*this == &c) return; // the same object!! nothing to do
       // do the job if not the same object
   }

   C& operator = (const C& c)
   {
      if (*this == &c) return; // the same object!! nothing to do
      // do the job if not the same object
   }

   bool operator == (const C& c)
   {
      if (*this == &c) return true; // the same object!! nothing to do
      // do the job if not the same object
   }
...
};

However, haskell being a lazy langauge it has concerns that strict languages would not have. Let === be the physical equality. Even though === implies == except for bottom, there are some situations that we cannot by default optimize equality with physical equality. It is because haskell programmer use reflexive equality test as a common idom for deep evaluation.

[edit] 1.2.1 Analyzing haskell insertion sort algorithm

These examples came up doing one of the homework in CS584 Algorithms (2006 Spring) class at Portland State University, lectured by Mark P. Jones. I choose haskell to implement sorting but memory leaked I added the counter. Thanks to Mark P. Jones for mentioning that equalities are used for forcing evaluation.

[edit] 1.2.1.1 Insertion sort

    inssrt [] = []
    inssrt (x:xs) = ins x $ inssrt xs
      where ins z [] = [z]
            ins z l@(y:ys) | z <= y    = z : l
                           | otherwise = y : ins z ys

We test for the worst case when the length of the input is 1000 and it works fine.

Main> inssrt [1000,999..1]
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
... ... ,988,989,990,991,992,993,994,995,996,997,998,999,1000]

[edit] 1.2.1.2 Insertion sort with counter that leaks memory

    inssort p@(_,[]) = p
    inssort (n,x:xs) = ins x $ inssort (n,xs)
      where ins z (n,[]) = (n,[z])
            ins z (n,l@(y:ys))
              | z <= y    = (n+1, z:l)
              | otherwise = (n', y:ys') where (n',ys') = ins z (n+1,ys)

We test for the worst case when the length of the input is 1000 and it crashes because of memory leak due to constructing lengthy expression $1+1+1+\cdots $.

Main> inssort (0,[1000,999..1])
(Segmentation fault

[edit] 1.2.1.3 Insertion sort with counter patched by obscure equality

    inssort' p@(_,[]) = p
    inssort' (n,x:xs) = ins x $ inssort' (n,xs)
      where ins z (n,[]) = (n,[z])
            ins z (n,l@(y:ys))
                | n==n && z <= y    = (n+1, z:l)
                | n==n && otherwise = (n', y:ys') where (n',ys') = ins z (n+1,ys)

We test for the worst case when the length of the input is 1000 and it works because of the equality check forces evaluation of the counter value.

Main> inssort' (0,[1000,999..1])
(499500,[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,
... ... ,986,987,988,989,990,991,992,993,994,995,996,997,998,999,1000])

[edit] 1.3 Implementing physical equality in Haskell

I tried Hugs Version: 20050308 and GHC version 6.4.2 but only GHC worked. In Hugs, StablePtr a is not an instance of Eq.

implementation:

import Foreign
import System.IO.Unsafe
 
x === y = unsafePerformIO $
  do
    px <- newStablePtr x
    py <- newStablePtr y
    let ret = px == py
    freeStablePtr px
    freeStablePtr py
    return ret

usage:

data S a = S a
 
s = S (S [1..])
s2 = S (S [1..])
s3 = s
   ___         ___ _
  / _ \ /\  /\/ __(_)
 / /_\// /_/ / /  | |      GHC Interactive, version 6.4.2, for Haskell 98.
/ /_\\/ __  / /___| |      http://www.haskell.org/ghc/
\____/\/ /_/\____/|_|      Type :? for help.

Loading package base-1.0 ... linking ... done.
Compiling Main             ( Tst.hs, interpreted )
Ok, modules loaded: Main.
*Main> s === s
True
*Main> s === s2
False
*Main> s === s3
True

This implementation works and concurrncy or garbage collection does not affect the result. But it is very slow since newStablePtr, freeStablePtr operations are expensive because it changes the state of the garbage collector. Cheap operation that just peeks the address is needed for the implementation. I think the best way is to provide a primitive operator for this.

[edit] 1.4 Caveat

It should be noted that an equality operator like (===) above breaks referential transparency, so adding it to Haskell would ruin many of the nice properties that Haskell has and would make many program transformations invalid.

[edit] 1.5 Related Papers and Websites