[Haskell-cafe] pointer equality

Levent Erkok erkokl at gmail.com
Wed Jul 20 08:34:19 CEST 2011


> Is there any way of getting the following code to immediately return
> True without performing the element-by-element comparison? Essentially
> this boils down to checking whether pointers are equal before
> comparing the contents.
> 
>> main = print $ f == f
>>      where f = [1..10^9]

Nikhil,

As others pointed out, what you're asking for is not possible in Haskell, and for good reasons. However, this is an important problem, and it comes up quite often in practice when implementing DSLs: Detecting sharing. So, it's no surprise that people developed different ways of dealing with it in various forms.

In my mind, Andy Gill came up with the nicest solution in his "Type-Safe Observable Sharing in Haskell" paper, published in the 2009 Haskell Symposium. Andy's paper traces the technique back to a 1999 paper by Simon Peyton Jones, Simon Marlow, and Conal Elliott. Probably few others came up with the same idea as well over the years. Andy's paper is a joy to read and I'd highly recommend it, you can get a copy at his web site: http://www.ittc.ku.edu/csdl/fpg/sites/default/files/Gill-09-TypeSafeReification.pdf. 

Andy's fundamental observation is that while you cannot check for "pointer-equality" in the pure world for obvious reasons, it's perfectly fine to do so when you're in the IO monad. He further observes that for almost all most practical use cases, this is really not an issue: You're probably in some sort of a monad wrapped over IO anyhow: This has certainly been my experience, for instance. While the paper has all the details, the "trick" is to use GHC's StableName abstraction. If you define:

import System.Mem.StableName

areEqual :: Eq a => a -> a -> IO Bool
areEqual x y = do
   sx <- hashStableName `fmap` (x `seq` makeStableName x)
   sy <- hashStableName `fmap` (y `seq` makeStableName y)
   return $ (sx == sy) || x == y

then areEqual will run quite fast if it indeed receives the same object twice, no matter how large it is. In fact, it might even be cyclic! (See Andy's paper for details.) However, if the stable-name equality fails, then you are *not* guaranteed that the objects are different, hence you further need to run the usual "==" on them, which can be quite costly. (Of course "==" can go in loops if the objects happen to have cycles in them.)

You can also change the last line of areEqual to read:

    return $ sx == sy

In this case, if it returns True then you're guaranteed that the objects are equal. If the result is False, then you just don't know. However it's guaranteed that the function will run fast in either case. Client code can decide on how to proceed based on that information.

I hope this helps. Reading Andy's paper, and the papers he's cited can further elucidate the technique. In fact, Andy uses this idea to turn cyclic structures to graphs with explicit back-edges that can be processed much more easily in the pure world, something that comes up quite often in practice as well.

-Levent.


More information about the Haskell-Cafe mailing list