Transformations of cyclic graphs [Was: Efficiency of list indices in definitions]

oleg@pobox.com oleg@pobox.com
Thu, 8 Aug 2002 16:43:48 -0700 (PDT)


[Moved to Haskell-Cafe]

Hello!

Cycles sure make it difficult to transform graphs in a pure non-strict
language. Cycles in a source graph require us to devise a way to mark
traversed nodes -- however we cannot mutate nodes and cannot even
compare nodes with a generic ('derived') equality operator. Cycles in
a destination graph require us to keep track of the already built
nodes so we can complete a cycle. An obvious solution is to use a
state monad and IORefs. There is also a monad-less solution, which is
less obvious: we can't add a node to the dictionary of constructed nodes
until we have built the node, then means we can't use the updated
dictionary when building descendants -- which need the dictionary when
linking back. The problem can be overcome however with a credit card
transform (a.k.a. "buy now, pay later" transform). To avoid hitting the
bottom, we just have to "pay" by the "due date".

The will use as an example the NFA->DFA transformation considered
earlier. Perhaps we should first attempt a few obvious optimizations
of a NFA/DFA recognizer.  It seems that performance matters.

finAuAcceptStringQ startStates string =
    or [faStateAcceptStringQ s string | s <- startStates]
    where faStateAcceptStringQ (FaState _ acceptQ _) [] = acceptQ
          faStateAcceptStringQ (FaState _ _ trans) (a:as)
	           | null followerStates = False
		   | otherwise =
		      or [ faStateAcceptStringQ s as | s <- followerStates ]
	    where followerStates = 
                    if a > (length trans) - 1 then [] else trans!!a

Computing (length trans) when determining the followerStates is
inefficient: even if 'a' is zero, we still have to traverse the whole
list 'trans'. If our alphabet is binary, perhaps its better to
specialize the FaState correspondingly and write:

-- Automata over a binary alphabet

data (Ord l,Show l) => FaState l = 
	FaState {label :: l, acceptQ :: Bool, 
	         trans0:: [FaState l], 
	         trans1:: [FaState l]}

-- Then a finite automaton is merely a list of starting FaStates:
type FinAu l = [FaState l]

--So, for example, an automaton equivalent to the regular expression
--0*(0(0+1)*)* could be defined as:

dom18 = [one]
    where one = FaState 1 True [one,two] []
	  two = FaState 2 True [two,one] [one,two]

The acceptance function can be written as

finAuAcceptStringQ start_states str =
        foldr (\l seed -> acceptP l str || seed) False start_states
  where acceptP (FaState _ acceptQ _ _) [] = acceptQ
	acceptP (FaState _ _ t0 t1) (s:rest) = 
	        finAuAcceptStringQ (if s then t1 else t0) rest

We have simplified the original expression, and manually deforested it.

test1= finAuAcceptStringQ dom18 $ map (>0) [0,1,0,1]
test2= finAuAcceptStringQ dom18 $ map (>0) [1,1,0,1]
test3= finAuAcceptStringQ dom18 [True]
test4= finAuAcceptStringQ dom18 [False]

It would be great to be able to compare and print the nodes:

instance (Ord l,Show l) => Eq (FaState l) where
    (FaState l1 _ _ _) == (FaState l2 _ _ _) = l1 == l2
    

but printing a node is already a slight problem. We need to keep
track of the already printed nodes to avoid looping.

-- a data class for an occurrence check
class OCC occ where
    empty:: occ a
    seenp:: (Eq a) => a -> occ a -> Bool
    put::  a -> occ a -> occ a
    
-- Currently, it's just a list.
-- In the future, we can pull in something fancier from the Edison
instance OCC [] where
    empty = []
    seenp = elem
    put = (:)
    

-- Depth-first, pre-order traversal of the graph, keeping track of
-- already printed nodes
instance (Ord l,Show l) => Show (FaState l) where
    show state = "{@" ++ showstates [state] (empty::[FaState l]) "@}"
         where
           -- showstates worklist seen_states suffix
           showstates [] states_seen suffix = suffix
	   showstates (st:rest) states_seen suffix
	       | st `seenp` states_seen = showstates rest states_seen suffix
	   showstates (st@(FaState l accept t0 t1):rest) states_seen suffix =
	       showstate st 
	             $ showstates (t0++t1++rest) (st `put` states_seen) suffix
	   
           showstate (FaState l accept t0 t1) suffix 
	             = "{State " ++ (show l) ++ 
                     " " ++ (show accept) ++ " " ++ (show $ map label t0) ++
	             " " ++ (show $ map label t1) ++ "}" ++ suffix


Now, "print dom18" prints as
[{@{State 1 True [1,2] []}{State 2 True [2,1] [1,2]}@}]


For the NFA->DFA conversion, we need to keep track of already built
states.

-- A dictionary of states
class StateDict sd where
    emptyd :: sd (l,FaState l)
    locate :: (Eq l) => l -> sd (l,FaState l) -> Maybe (FaState l)
    putd   :: (l,FaState l) -> sd (l,FaState l) -> sd (l,FaState l)

-- if performance matters, we can use a fancier dictionary from the Edison   
instance StateDict [] where
    emptyd = []
    locate = lookup
    putd   = (:)

-- [nfa_state] -> dictionary_of_seen_states -> ([dfa_state],updated_dictionary)
-- [dfa_state] is a singleton list
determinize_cc states converted_states =
	-- first, check the cache
      case  dfa_label `locate` converted_states of
	Nothing -> build_state
	Just dfa_state -> ([dfa_state],converted_states)
  where
      -- [NFA_labels] -> DFA_labels
      det_labels = sort . nub . (map label)
      dfa_label  = det_labels states

      -- find out NFA-followers for [nfa_state] upon ingestion of 0 and 1
      (t0_followers,t1_followers) = 
		foldr (\st (f0,f1) -> (trans0 st ++ f0, trans1 st ++ f1))
		      ([],[]) states
      acceptQ'    = or (map acceptQ states)
      
      -- really build the dfa state and return ([dfa_state],updated_cache)
      build_state = let
	   -- node, dfa_state is computed _below_
	   converted_states1 = (dfa_label,dfa_state) `putd` converted_states
           (t0', converted_states2) = 
	           (determinize_cc t0_followers converted_states1)
	   (t1', converted_states3) = 
	           (determinize_cc t1_followers converted_states2)
           dfa_state =
	        (FaState dfa_label acceptQ' t0' t1')
	   in ([dfa_state],converted_states3)
	
finAuDeterminize states = fst $ determinize_cc states []

At the heart of the credit card transform is the phrase
   converted_states1 = (dfa_label,dfa_state) `putd` converted_states

we added to the dictionary of the computed states the state that we
haven't built yet. Because (,) is non-strict in its arguments and
`locate` is non-strict in its result, we can get away with a mere
promise to "pay". 

We can print the DFA for dom18 to see what we've got:

finAuDeterminize dom18
-- shows
[{@{State [1]   True  [[1,2]] [[]]   }
   {State [1,2] True  [[1,2]] [[1,2]]}
   {State []    False [[]]    [[]]   }@}] 

which is a DFA (which happens to be minimal) recognizing (0+1)* - 1(0+1)*

An example from the original message:

dom19 = [one,two]
    where one = FaState 1 True [two] []
	  two = FaState 2 True [one] [one]

finAuDeterminize dom19
-- shows
[{@{State [1,2] True  [[1,2]] [[1]] }
   {State [1]   True  [[2]]   [[]]  }
   {State [2]   True  [[1]]   [[1]] }
   {State []    False [[]]    [[]]  }@}] 

which recognizes (0+1)* - (0+1)*11(0+1)*