[Haskell-cafe] Explicit garbage collection

Miguel Mitrofanov miguelimo38 at yandex.ru
Thu Jan 7 16:39:03 EST 2010


I liked it too. Seems like I have to show some real code, and my  
apologies for a long e-mail.

Well, that's what I'm actually trying to do and what I've managed to  
achieve so far.

 > module Ask where
 > import Control.Monad
 > import Data.IORef
 > import Data.Maybe
 > import System.IO.Unsafe -- Yes, I would need that
 > import System.Mem
 > import System.Mem.Weak

Suppose we're writing a RESTful server, which keeps all state on the  
client. We want it to send questions to the client and get back the  
answers. We can also send some state data to the client and we are  
sure that the client would return them back unchanged. So, that's the  
type of our server:

 > type Server medium = Maybe (medium, String) -> IO (Maybe (medium,  
String))

The client starts interaction, sending "Nothing"; when server finally  
returns "Nothing", the session stops. I would emulate the client in  
GHCi with the following function:

 > client :: Show medium => Bool -> Server medium -> IO ()
 > client debug server = client' Nothing where
 >     client' query =
 >         do response <- server query
 >            case response of
 >              Nothing -> return ()
 >              Just (medium, question) ->
 >                  do when debug $ putStr "Debug: " >> print medium
 >                     putStrLn question
 >                     putStr "--> "
 >                     answer <- getLine
 >                     client' $ Just (medium, answer)

Nothing very interesting. The only thing to note is that the boolean  
argument allows us to see the state sent by server - for debugging  
purposes.

But I want something more high-level. The goal is to solve the  
Graham's "arc challenge". So the high-level interface is implemented  
by this type:

 > data Ask a = Finish a | Continue String (String -> Ask a)

A possible application would be:

 > test :: Ask ()
 > test =
 >     do s1 <- ask "Enter the first number"
 >        let n1 = read s1
 >        s2 <- ask "Enter the second number"
 >        let n2 = read s2
 >        ask $ show $ n1 + n2
 >        return ()

Well, to do that, I need a Monad instance for Ask:

 > instance Monad Ask where
 >     return = Finish
 >     Finish x >>= h = h x
 >     Continue question k >>= h = Continue question $ \answer -> k  
answer >>= h

and an "ask" function:

 > ask :: String -> Ask String
 > ask question = Continue question $ \answer -> return answer

Now, the problem is with the route from high level to the low level.  
We can do it relatively simply:

 > simpleServer :: Ask () -> Server [String]
 > simpleServer anything = return . simpleServer' anything . maybe []  
(\(medium, answer) -> medium ++ [answer]) where
 >     simpleServer' (Finish ()) _ = Nothing
 >     simpleServer' (Continue question _) [] = Just ([], question)
 >     simpleServer' (Continue _ k) (s : ss) =
 >         do (medium, question) <- simpleServer' (k s) ss
 >            return (s : medium, question)

And indeed, our test example works:

*Ask> client False $ simpleServer test
Enter the first number
--> 3
Enter the second number
--> 4
7
-->

But, as you've probably guessed already, this "simpleServer" sends way  
too much information over network:

 > nonsense :: Ask ()
 > nonsense =
 >     do a <- ask "?"
 >        b <- ask a
 >        c <- ask b
 >        d <- ask c
 >        e <- ask b -- Note this "b" instead of "d", it's deliberate.
 >        return ()

*Ask> client True $ simpleServer nonsense
Debug: []
?
--> a
Debug: ["a"]
a
--> b
Debug: ["a","b"]
b
--> c
Debug: ["a","b","c"]
c
--> d
Debug: ["a","b","c","d"]
b
--> e

All user answers from the very beginning get transmitted.

I want to transmit only those answers that are actually used. I'm  
gonna write another server, a (relatively) smart one.

A simple boxing type:

 > data Box a = Box {unBox :: a}

Now, the server. I use "Nothing" for those user answers that are no  
longer needed.

 > smartServer :: Ask () -> Server [Maybe String]
 > smartServer anything = smartServerIO anything . maybe [] (\(medium,  
answer) -> medium ++ [Just answer]) where
 >     smartServerIO anything query =

Sometimes a value is not used anymore just because it WAS used  
already. So I'll use a mutable list to keep track of those values that  
are actually dereferenced:

 >         do usedList <- newIORef []

Then, I'll make weak pointers for all user answers:

 >            let boxedQuery = map Box query
 >            weakPtrs <- mapM (\box -> mkWeak box () Nothing)  
boxedQuery

and actually run my "Ask" value.

 >            case findWeakPtrs 0 anything boxedQuery usedList of
 >              Finish () -> return Nothing
 >              Continue question rest ->

Now, the interesting part. I'd invoke a garbage collector to get rid  
of those boxes that aren't necessary anymore.

 >                  do performGC

Now, weak pointers tell me what boxes are not dereferenced but can be  
dereferenced in the future, and my mutable list has the numbers of  
boxes that were dereferenced.

 >                     danglingNumbers <- mapM deRefWeak weakPtrs
 >                     usedNumbers <- readIORef usedList

I also need to keep this "future" alive, so that usable boxes don't  
get garbage collected.

 >                     case rest undefined of _ -> return ()

And now all I need to do is to filter out those boxes that are not in  
any of the two lists:

 >                     return $ Just (zipWith3 (isUsed usedNumbers)  
[0..] danglingNumbers query, question)

Helper function has an interesting detail:

 >     findWeakPtrs _ (Finish ()) _ _ = Finish ()
 >     findWeakPtrs _ response [] _ = response

I use unsafePerformIO to track those boxes that get actually  
dereferenced:

 >     findWeakPtrs n (Continue _ k) (s : ss) usedList =
 >         findWeakPtrs (n+1) (k (unsafePerformIO $ modifyIORef  
usedList (n :) >> return (fromJust $ unBox s))) ss usedList

A filtering function is not very interesting:

 >     isUsed usedNumbers index Nothing _ | not (index `elem`  
usedNumbers) = Nothing
 >     isUsed _ _ _ answer = answer

And it works:

*Ask> client True $ smartServer nonsense
Debug: []
?
--> a
Debug: [Just "a"]
a
--> b
Debug: [Nothing,Just "b"]
b
--> c
Debug: [Nothing,Just "b",Just "c"]
c
--> d
Debug: [Nothing,Just "b",Nothing,Nothing]
b
--> e

So, the problem is, I don't think calling performGC every time is a  
good idea. I'd like to tell explicitly what values are to be garbage  
collected. It doesn't seem like it's possible with current version of  
GHC, though.

On 7 Jan 2010, at 23:07, Edward Kmett wrote:

> Thats a shame, I was rather fond of this monad. To get that last  
> 'True', you'll really have to rely on garbage collection to  
> approximate reachability for you, leaving the weak reference  
> solution the best you can hope for.
>
> -Edward Kmett
>
> On Thu, Jan 7, 2010 at 12:39 PM, Miguel Mitrofanov <miguelimo38 at yandex.ru 
> > wrote:
> Damn. Seems like I really need (True, False, True) as a result of  
> "test".
>
>
> On 7 Jan 2010, at 08:52, Miguel Mitrofanov wrote:
>
> Seems very nice. Thanks.
>
> On 7 Jan 2010, at 08:01, Edward Kmett wrote:
>
> Here is a slightly nicer version using the Codensity monad of STM.
>
> Thanks go to Andrea Vezzosi for figuring out an annoying hanging bug  
> I was having.
>
> -Edward Kmett
>
> {-# LANGUAGE Rank2Types, GeneralizedNewtypeDeriving, DeriveFunctor #-}
> module STMOracle
>   ( Oracle, Ref
>   , newRef, readRef, writeRef, modifyRef, needRef
>   ) where
>
> import Control.Applicative
> import Control.Monad
> import Control.Concurrent.STM
>
> instance Applicative STM where
>   pure = return
>   (<*>) = ap
>
> newtype Ref s a = Ref (TVar (Maybe a))
> newtype Oracle s a = Oracle { unOracle :: forall r. (a -> STM r) ->  
> STM r } deriving (Functor)
>
> instance Monad (Oracle s) where
>  return x = Oracle (\k -> k x)
>  Oracle m >>= f = Oracle (\k -> m (\a -> unOracle (f a) k))
>
> mkOracle m = Oracle (m >>=)
>
> runOracle :: (forall s. Oracle s a) -> IO a
> runOracle t = atomically (unOracle t return)
>
> newRef :: a -> Oracle s (Ref s a)
> newRef a = mkOracle $ Ref <$> newTVar (Just a)
>
> readRef :: Ref s a -> Oracle s a
> readRef (Ref r) = mkOracle $ do
>   m <- readTVar r
>   maybe retry return m
>
> writeRef :: a -> Ref s a -> Oracle s a
> writeRef a (Ref r) = mkOracle $ do
>   writeTVar r (Just a)
>   return a
>
> modifyRef :: (a -> a) -> Ref s a -> Oracle s a
> modifyRef f r = do
>   a <- readRef r
>   writeRef (f a) r
>
> needRef :: Ref s a -> Oracle s Bool
> needRef (Ref slot) = Oracle $ \k ->
>            (writeTVar slot Nothing >> k False)
>   `orElse` k True
>
> -- test  
> case 
> :                                                                                                                               refMaybe 
>  b dflt ref = if b then readRef ref else return dflt
> refIgnore ref = return "blablabla"
> refFst ref = fst `fmap` readRef ref
> test = do
>    a <- newRef "x"
>    b <- newRef 1
>    c <- newRef ('z', Just 0)
>    -- no performLocalGC required
>    x <- needRef a
>    y <- needRef b
>    z <- needRef c
>    u <- refMaybe y "t" a -- note that it wouldn't actually read "a",
>                          -- but it won't be known until runtime.
>    w <- refIgnore b
>    v <- refFst c
>    return (x, y, z)
>
>
>
> On Wed, Jan 6, 2010 at 10:28 PM, Edward Kmett <ekmett at gmail.com>  
> wrote:
> I don't believe you can get quite the semantics you want. However,  
> you can get reasonably close, by building a manual store and  
> backtracking.
>
> {-# LANGUAGE Rank2Types #-}
> -- lets define an Oracle that tracks whether or not you might need  
> the reference, by backtracking.
> module Oracle
>   ( Oracle, Ref
>   , newRef, readRef, writeRef, modifyRef, needRef
>   ) where
>
> import Control.Applicative
> import Control.Arrow (first)
> import Control.Monad
> import Data.IntMap (IntMap)
> import qualified Data.IntMap as M
> import Unsafe.Coerce (unsafeCoerce)
> import GHC.Prim (Any)
>
> -- we need to track our own worlds, otherwise we'd have to build  
> over ST, change optimistically, and track how to backtrack the state  
> of the Store. Much uglier.
> -- values are stored as 'Any's for safety, see GHC.Prim for a  
> discussion on the hazards of risking the storage of function types  
> using unsafeCoerce as anything else.
> data World s = World { store :: !(IntMap Any), hwm :: !Int }
>
> -- references into our store
> newtype Ref s a = Ref Int deriving (Eq)
>
> -- our monad that can 'see the future' ~ StateT (World s) []newtype  
> Oracle s a = Oracle { unOracle :: World s -> [(a, World s)] }
>
> -- we rely on the fact that the list is always non-empty for any  
> oracle you can run. we are only allowed to backtrack if we thought  
> we wouldn't need the reference, and wound up needing it, so head  
> will always succeed.
> runOracle :: (forall s. Oracle s a) -> a
> runOracle f = fst $ head $ unOracle f $ World M.empty 1
>
>
> instance Monad (Oracle s) where
>   return a = Oracle $ \w -> [(a,w)]
>   Oracle m >>= k = Oracle $ \s -> do
>       (a,s') <- m s
>       unOracle (k a) s'
>
> -- note: you cannot safely define fail here without risking a crash  
> in runOracle
> -- Similarly, we're not a MonadPlus instance because we always want  
> to succeed eventually.
>
> instance Functor (Oracle s) where
>   fmap f (Oracle g) = Oracle $ \w -> first f <$> g w
>
> instance Applicative (Oracle s) where
>   pure = return
>   (<*>) = ap
>
> -- new ref allocates a fresh slot and inserts the value into the  
> store. the type level brand 's' keeps us safe, and we don't export  
> the Ref constructor.
> newRef :: a -> Oracle s (Ref s a)
> newRef a = Oracle $ \(World w t) ->
>   [(Ref t, World (M.insert t (unsafeCoerce a) w) (t + 1))]
>
> -- readRef is the only thing that ever backtracks, if we try to read  
> a reference we claimed we wouldn't need, then we backtrack to when  
> we decided we didn't need the reference, and continue with its value.
> readRef :: Ref s a -> Oracle s a
> readRef (Ref slot) = Oracle $ \world ->
>   maybe [] (\a -> [(unsafeCoerce a, world)]) $ M.lookup slot (store  
> world)
>
> -- note, writeRef dfoesn't 'need' the ref's current value, so  
> needRef will report False if you writeRef before you read it after  
> this.
> writeRef :: a -> Ref s a -> Oracle s a
> writeRef a (Ref slot) = Oracle $ \world ->
>       [(a, world { store = M.insert slot (unsafeCoerce a) $ store  
> world })]
>
> {-
> -- alternate writeRef where writing 'needs' the ref.
> writeRef :: a -> Ref s a -> Oracle s a
> writeRef a (Ref slot) = Oracle $ \World store v -> do
>   (Just _, store') <- return $ updateLookupWithKey replace slot store
>   [(a, World store' v)]
>  where
>   replace _ _ = Just (unsafeCoerce a)
> -}
>
> -- modifying a reference of course needs its current value.
> modifyRef :: (a -> a) -> Ref s a -> Oracle s a
> modifyRef f r = do
>   a <- readRef r
>   writeRef (f a) r
>
> -- needRef tries to continue executing the world without the element  
> in the store in question. if that fails, then we'll backtrack to  
> here, and try again with the original world, and report that the  
> element was in fact needed.
> needRef :: Ref s a -> Oracle s Bool
> needRef (Ref slot) = Oracle $ \world ->
>   [ (False, world { store = M.delete slot $ store world })
>   , (True, world)
>   ]
>
> -- test case:
> refMaybe b dflt ref = if b then readRef ref else return dflt
> refIgnore ref = return "blablabla"
> refFst ref = fst <$> readRef ref
> test = do
>    a <- newRef "x"
>    b <- newRef 1
>    c <- newRef ('z', Just 0)
>    -- no performLocalGC required
>    x <- needRef a
>    y <- needRef b
>    z <- needRef c
>    u <- refMaybe y "t" a -- note that it wouldn't actually read "a",
>                          -- but it won't be known until runtime.
>    w <- refIgnore b
>    v <- refFst c
>    return (x, y, z)
>
> -- This will disagree with your desired answer, returning:
>
> *Oracle> runOracle test
> Loading package syb ... linking ... done.
> Loading package array-0.2.0.0 ... linking ... done.
> Loading package containers-0.2.0.1 ... linking ... done.
> (False,False,True)
>
> rather than (True, False, True), because the oracle is able to see  
> into the future (via backtracking) to see that refMaybe doesn't use  
> the reference after all.
>
> This probably won't suit your needs, but it was a fun little exercise.
>
> -Edward Kmett
>
> On Wed, Jan 6, 2010 at 4:05 PM, Miguel Mitrofanov <miguelimo38 at yandex.ru 
> > wrote:
>
> On 6 Jan 2010, at 23:21, Edward Kmett wrote:
>
> You probably just want to hold onto weak references for your  
> 'isStillNeeded' checks.
>
> That's what I do now. But I want to minimize the network traffic, so  
> I want referenced values to be garbage collected as soon as possible  
> - and I couldn't find anything except System.Mem.performIO to do the  
> job - which is a bit too global for me.
>
> Otherwise the isStillNeeded check itself will keep you from garbage  
> collecting!
>
> Not necessary. What I'm imagining is that there is essentially only  
> one way to access the value stored in the reference - with readRef.  
> So, if there isn't any chance that readRef would be called, the  
> value can be garbage collected; "isStillNeeded" function only needs  
> the reference, not the value.
>
> Well, yeah, that's kinda like weak references.
>
>
> http://cvs.haskell.org/Hugs/pages/libraries/base/System-Mem-Weak.html
>
> -Edward Kmett
>
> On Wed, Jan 6, 2010 at 9:39 AM, Miguel Mitrofanov <miguelimo38 at yandex.ru 
> > wrote:
> I'll take a look at them.
>
> I want something like this:
>
> refMaybe b dflt ref = if b then readRef ref else return dflt
> refIgnore ref = return "blablabla"
> refFst ref =
> do
>   (v, w) <- readRef ref
>   return v
> test =
> do
>   a <- newRef "x"
>   b <- newRef 1
>   c <- newRef ('z', Just 0)
>   performLocalGC -- if necessary
>   x <- isStillNeeded a
>   y <- isStillNeeded b
>   z <- isStillNeeded c
>   u <- refMaybe y "t" a -- note that it wouldn't actually read "a",
>                         -- but it won't be known until runtime.
>   w <- refIgnore b
>   v <- refFst c
>   return (x, y, z)
>
> so that "run test" returns (True, False, True).
>
>
> Dan Doel wrote:
> On Wednesday 06 January 2010 8:52:10 am Miguel Mitrofanov wrote:
> Is there any kind of "ST" monad that allows to know if some STRef is  
> no
> longer needed?
>
> The problem is, I want to send some data to an external storage over a
> network and get it back later, but I don't want to send unnecessary  
> data.
>
> I've managed to do something like that with weak pointers,
> System.Mem.performGC and unsafePerformIO, but it seems to me that  
> invoking
> GC every time is an overkill.
>
> Oh, and I'm ready to trade the purity of runST for that, if necessary.
>
> You may be able to use something like Oleg's Lightweight Monadic  
> Regions to get this effect. I suppose it depends somewhat on what  
> qualifies a reference as "no longer needed".
>
> http://www.cs.rutgers.edu/~ccshan/capability/region-io.pdf
>
> I'm not aware of anything out-of-the-box that does what you want,  
> though.
>
> -- Dan
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe



More information about the Haskell-Cafe mailing list