Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Milan Straka fox at ucw.cz
Sun Aug 4 09:49:43 CEST 2013


Hi Ryan and Simon,

> From: Libraries [mailto:libraries-bounces at haskell.org] On Behalf Of Ryan Newton
> Sent: 03 August 2013 08:45
> To: Roman Cheplyaka
> Cc: Haskell Libraries
> Subject: Re: Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_
>
> The Data.Map.Base.foldRWithKey function discussed in this thread is
> another example.  That's a place where even after it inlines the
> provided function into the fold, we end up with the below STG with an
> allocation of an IO () function inside the inner loop:

I have been thinking about it and I think GHC optimizer is not to blame
in this case. The two methods we are interested in are:

  traverseWithKey_ :: Applicative t => (k -> a -> t b) -> Map k a -> t ()
  traverseWithKey_ f = go
    where go Tip = pure ()
          go (Bin _ k v l r) = f k v *> go l *> go r

  foldrWithKey :: (k -> a -> b -> b) -> b -> Map k a -> b
  foldrWithKey f z = go z
    where go z' Tip = z'
          go z' (Bin _ kx x l r) = go (f kx x (go z' r)) l

and we call them like this:
  let actionTraverse _k 1000000 = putStrLn "Hit 1000000"
      actionTraverse _k _v = return ()
  in traverseWithKey_ actionTraverse map

and this:
  let actionFoldr _k 1000000 z = putStrLn "Hit 1000000" *> z
      actionFoldr _k _v z = z
  in foldrWithKey actionFoldr (pure ()) map


So although both traverseWithKey_ and foldrWithKey build an action, only
traverseWithKey_ can execute the action 'on-the-fly'. When foldrWithKey
is calling itself recursively, it has to allocate thunk with the IO action
which is to be performed after the subtree is processed. If a strict
fold is used, the memory allocated is lower (it allocated only the
resulting IO action itself), but still linear in the size of the map.

If the GHC optimizer were to avoid allocating, it would have to rewrite
the 'foldrWithKey actionFoldr' by pushing the *> up to the definition of
the recursive case of foldrWithKey, which seems extremely complicated to
perform automatically (it would have to add several 'pure ()' to the code).
But maybe I am not seeing some other way around it.



Nevertheless, I am not sure how to proceed. The problem is that
Data.Foldable offers many non-overloadable methods that are provided
through the Foldable instance, notably
  traverse_
  foldrM
  foldlM
which could be implemented so that they iterate over all elements
without allocating, but the Data.Foldable implementation allocates and
cannot be overloaded. Therefore, if we add traverseWithKey_, we still
will not have means of iterating an action over Set and IntSet elements,
and although Map.traverseWithKey_ will not allocate, using traverse_ on
Map will.

Also I am not happy that traverseWithKey_ does not specify order in
which it processed the elements. That is not such a big issue for
traverseWithKey, because the map is reassembled afterwards, but
traverseWithKey_ is only performing the action (without constructing the
map) and it does so in unspecified order.

What I am thinking at the moment about is foldlM and foldrM -- I checked
that they can be implemented not to allocate and they specify evaluation
order. But they require a Monad, not only Applicative, and would clash
with Data.Foldable.fold[lr]M. Maybe we will end up adding both
traverseWithKey_ and fold[lr]M? But than again, API growth, API growth :)

Cheers,
Milan




More information about the Libraries mailing list