[Haskell-cafe] Where is Data.Atom ?

Sun Jul 2 16:42:01 EDT 2006

David House wrote:
> On 02/07/06, Brian Hulley <brianh at metamilk.com> wrote:
>> I can see that an "unsafe" global ref to a Trie of Char with Unique
>> as the "value" of a node would allow me to implement fromString,
>> toString, and instance Eq Atom, but I've got no idea how to
>> implement instance Ord Atom so that the order is independent of the
>> order in which Atoms are created and exactly the same as the
>> lexicographic ordering of the String without being O(n) where n is
>> the min of the lengths of the Atoms being compared.
>
> Isn't compare on strings O(n) anyway? I suppose it would actually be
> O(d), where d is the index of the first difference, but that's O(n)
> worst-case. Even equality (a special case of compare) on strings
> (well, [Char], to be more precise) involves traversing the list and so
> is O(n) worst-case, no?

If an Atom was just represented by an Int, then construction would be O(n) 
where n is the length of the String (using a Trie with Int as the data 
associated with each node), and == and /= would be O(1).

If the Int's could somehow be chosen clairvoyantly then < would also be 
O(1). However since we can't see into the future (to know which new Atoms 
will be created) afaics the problem of correctly choosing these Int's is 
completely impossible!

Of course an Integer could be used (since it can be of any length) but it's 
likely that any systematic way of assigning them would simply encode the 
string in the Integer leading back to O(n) so in terms of speed for < there 
would be no advantage over ByteString's.

So perhaps my original spec is impossible to implement, though it is an open 
question whether some very clever encoding (with corresponding 
implementation of <) could be found which would lead to a better average 
performance (whatever that means).

An alternative design for an atom module could be:

      create :: MonadIO m => String -> m Atom
      toString :: Atom -> String

      instance Eq Atom          -- O(1)
      instance Ord Atom        -- O(1) but depends on creation order

but here the < would not be lexicographic, so although it would be useful 
for implementing symbol tables, environments etc it's not ideal for GUI use 
(eg when displaying a tree of modules where everything should be listed 
alphabetically).

As an aside, if the monad was removed then the result of atom "a" < atom "b" 
(atom :: String -> Atom) could not be determined by analysis of the program. 
It would depend on the evaluation order chosen by the compiler, but in a 
sense this doesn't matter because whatever the result is, it would be the 
same at any future time during the same run of the program so the use of 
Atoms as keys would still be safe. But is this still "functional"?

Regards, Brian.
-- 
Logic empowers us and Love gives us purpose.
Yet still phantoms restless for eras long past,
congealed in the present in unthought forms,
strive mightily unseen to destroy us.

http://www.metamilk.com