[Haskell-cafe] Top-level state debate on the wiki

Thu Dec 2 05:27:32 EST 2004

On Thu, Dec 02, 2004 at 09:08:21AM +0000, Keean Schupke wrote:
> Ben Rudiak-Gould wrote:
> 
> Just a small comment on the Wiki page... it says
> 
> "Several real-life examples of pure haskell code which needs fast global 
> variables to either be implemented efficiently or statically guarantee 
> their invariants are given in 
> http://www.haskell.org//pipermail/haskell/2004-November/014929.html"
> 
> The first example is that of randomIO - not implementable in Haskell, 
> however the function,
> randoms :: RandomGen g => g -> [a], is (and is probably more idomatic 
> haskell anyway).

Yes. There are lots of ways to do things without global variables, that
was never in doubt. However randomIO is a part of the haskell standard.
Why is it not (efficiently) implementable in haskell? There is no
particular reason it should not be. it should optimize to exactly about
5 instructions to run the linear congruence algorithm on a static
location in memory. 

> The second example "Unique",  can be implemented:
> 
>    getUniqueSupply = do
>       a <- newIORef 0
>       return (nextUnique a)
>       where
> 
>       nextUnqiue n = do
>          x <- readIORef n
>          writeIORef n (x+1)
>          return x
> 
> Which should be just as fast as the global version, and all you do is 
> pass the 'unique' supply
> around... you can even generate a lazy list of unqiues which can be used 
> outside the IO monad.
> Again the "disadvantage" is that you can have multiple unique supplies 
> and you could use the "wrong" one... (which is an advantage in my 
> opinion, as it increases flexibility and reuse of the code).

Yes, this would be as fast as the global version*, but it implements
something else. The entire point of Data.Unique is that one can consider
the unique supply as part of the world, just like you consider the
filesystem, the screen, the network, various OS routines, etc as part of
the world.  This should be implementable efficiently, after all, you can
store the counter in a file in /tmp, or just create a stub C file to do
it, so it is obviously not a bad thing to allow, it is already allowed,
it just needs to be able to done efficiently or people will resort to
unsafe hacks like unsafePerformIO which is a serious impediment to
aggressive compiler optimizations and a plauge on the mathematical
semantics of the intermediate language.

> The same applies to the AtomHash, it can be implemented just as 
> effieciently without globals...
> The only difference appears to be the supposed ability of globals 
> stopping the programmer using an alternate Hash... but of course there 
> is nothing stopping the programmer using the
> wrong global at all! (In other words it seems just as easy to access the 
> wrong top-level name as to pass the wrong parameter).

No, because then it would not typecheck. the whole point of Atom.hs is
that the only way to generate values of type 'Atom' is to go through the
single unique hash table.  Hence the static guarentee that there is
always an isomorphism between everything of type 'Atom' and everything
of type 'String' in the system. This is only made possible by the
modules ability to hide access to routines which could be used to break
the invarient (such as the raw global hash). This is obviously a very
important invarient!

Let us please not confuse the many philosophical issues against global
variables in design which I wholeheartily agree with, with what the
global variables proposal is meant to achieve. It is for use at the very
lowest level of the libraries. i.e. not to be used by the average
person. They are for Atom tables, memoization, anti-memoization, I have
desires to move some of the runtime stable/weak pointer infrastructure
out of being magic implemented by the runtime, to being implemented in
haskell itself, this requires the global hash of stablepointers to be
implementable directly. Ghc itself is getting rid of global variables AS
SEEN BY THE PROGRAMMER but many libraries still NEED them inside to do their
clever memoization tricks and fast strings which are required to make
ghc usable at all. Really, you should not be opposed to them unless you
are also opposed to the FFI. At some level, deep inside the libraries,
this functionality is needed, just like the FFI. it is even needed to
implement the type indexed execution context proposals. 

Exposing the fact there is global state will still be a bad idea, their
usage will be hidden by pure interfaces by good programers, just like
unsafePerformIO or uses of the ST monad are done now. A module which
provides observable global state, but does not let you parameterize over
it is bad form. For example randomIO has implicit global state, but you
can use the parameterized versions such as randoms. unlike Random Atom.hs DOES NOT
HAVE IMPLICIT GLOBAL STATE. A perfectly acceptable implementation would
be toAtom = fromAtom = id. This is why Atom does not need to be
parameterized over its global state. the fact it is used is completly
abstracted away because it is an implementation detail.  There is a real
_fundamental difference_ here, please try to understand it before
rebutting. I really dislike this argument because I find myself having
to vocally disagree with things I actually fully agree with in their
proper context :). 

They are needed to support the various tricks necessary to create the
standard haskell libraries and large programs that need to do real work
and are pushing the system to the limits like ghc, ginsu and darcs. 

At its base level, top level initializers are about providing a very low
level tool which provides a generic and semantically correct
way to implement necessary performance improving measures and higher
level libraries such as George Russel's which provide a much nicer and
higher level interface for general users to use. 

        John

* actually, under the mdo proposal, Data.Unique can even be implemented
  faster with some easy compiler transformations to lift static heap
  allocated cells to the bss. but this is not important for the discussion
  however I would be interested in implementing them if the mdo proposal
  is ever implemented.

-- 
John Meacham - ⑆repetae.net⑆john⑈