Proposal: Non-allocating way to iterate over a Data.Map: traverseWithKey_

Ryan Newton rrnewton at gmail.com
Tue Jul 2 22:00:35 CEST 2013


Ok, here's a little test to confirm what happens when you try to use
foldrWithKey for this.

*--------------------------------------------------------*
*import Control.DeepSeq*
*import GHC.Stats*
*import qualified Data.Map.Strict as M*
*import Data.Time.Clock*
*import Control.Exception*
*import System.Mem*
*
*
*main :: IO ()*
*main = do*
*  t0 <- getCurrentTime*
*  let m0 = M.fromList (map (\i -> (i,i)) [1..1000000::Int])*
*  evaluate$ rnf m0 *
*  t1 <- getCurrentTime*
*  performGC*
*  s1 <- getGCStats  *
*  putStrLn$"Constructed map in "++show (diffUTCTime t1 t0)++"\n "++ show
s1++"\n"*
*  let fn 500000 v = putStrLn "Got it!"*
*      fn _      _ = return ()*
*  *
*  M.foldrWithKey (\k a -> (fn k a >>)) (return ()) m0*
*  t2 <- getCurrentTime*
*  performGC*
*  s2 <- getGCStats *
*  putStrLn$"Consumed map in "++show (diffUTCTime t2 t0)++"\n "++ show
s2++"\n"*
*  putStrLn$"Bytes allocated during consume:  "++show (bytesAllocated s2 -
bytesAllocated s1)*
*  return ()*
*--------------------------------------------------------  *

It's also at this Gist:
https://gist.github.com/rrnewton/5912513#file-maptest-hs

And here is the loop ("go10") generated "fn":
https://gist.github.com/rrnewton/5912513#file-maptest-ddump-simple-L214

Ok, empirically, in -O2, the consumption phase allocates 32MB additional
data (for a 1 million element map), and in -O0 it allocates 200MB.  Here's
the recursive case of the loop:

          ((\ (eta_B1 :: GHC.Prim.State# GHC.Prim.RealWorld) ->
              case x1_s2rr of _ {

                __DEFAULT ->
                  ((go10_r2uL z'_a10K r_a10S)
                   `cast` (<GHC.Types.NTCo:IO <()>>
                           :: GHC.Types.IO ()
                                ~#
                              (GHC.Prim.State# GHC.Prim.RealWorld
                               -> (# GHC.Prim.State#
GHC.Prim.RealWorld, () #))))
                    eta_B1;


Ok, so I didn't yet look at the STG where there's a clear allocation story.
 So, actually, I'm not sure if this recursive case forces GHC to build some
O(1M) first class representation of the IO action here, since eta is
abstract?  At least, I'm assuming that's what the 32Mb-200Mb allocated is
(though 32MB is actually rather skimpy for such a thing... it would have to
spend only 4 words per closure...)




On Tue, Jul 2, 2013 at 3:32 PM, Ryan Newton <rrnewton at gmail.com> wrote:

> Hi all,
>
> Thanks for the responses.  I want to go through and make sure I understand
> these.
>
> --------------------------------------------------------
> First, Henning, won't both of these allocate in proportion to the size of
> the map?
>
>     Map.foldrWithKey (\k a -> f k a >>) (return ())
>     Foldable.sequence_ . Map.mapWithKey f
>
> In particular, will the compiler be able to avoid allocating when building
> up that large monadic computation in the foldrWithKey?
>
> --------------------------------------------------------
> Edward said to use foldMapWithKey, which hasn't been released yet, but it
> sounds like it will be.
>
>
> https://github.com/ekmett/containers/commit/40187f32a43689ff02ca2b97465aa4fcd9f9d150
>
> Even then, I might argue it is somewhat non-obvious to the programmers how
> to use this to do a non-allocating "for-each".  For example, I am not
> totally clear on what will happen with that tree of mappend calls -- will
> it allocate thunks?  Further, IO is not a monoid, so am I to create an
> instance of "Monoid (IO ())" in order to use foldMapWithKey to iterate over
> the Map?
>
> --------------------------------------------------------
> On Tue, Jul 2, 2013 at 10:29 AM, Shachaf Ben-Kiki <shachaf at gmail.com>
> wrote:
>  *> Is there a reason you couldn't implement this just as well
> using traverseWithKey, à la
> >
> http://hackage.haskell.org/packages/archive/lens/3.9.0.2/doc/html/Control-Lens-Fold.html#v:traverseOf_
> *
>
> That function looks more overloaded than the traverse in Data.Map that I'm
> referring to, e.g. here:
>
> http://www.haskell.org/ghc/docs/latest/html/libraries/containers/Data-Map-Strict.html#g:13
>
> I'm afraid I don't understand the proposal then -- is it to use lens
> somehow?  For the traversal I need to do over a Data.Map.Map, I need to fix
> 't' to be IO or Par or whatever I'm working with, so that the (k -> a -> t
> b) function I'm passing in can do the effects I need.
>
> To be specific I'm proposing providing these variants:
>
>    traverseWithKey :: **Applicative t => (k -> a -> t b) -> Map k a -> t
> (Map k b)
>    traverseWithKey_ :: **Applicative t => (k -> a -> t ()) -> Map k a ->
> t ()
>
> And without exposing the latter natively, I still don't understand how to
> trick the former into not allocating, if that's the proposal.
>
>    -Ryan
>
>
> On Tue, Jul 2, 2013 at 2:54 PM, Edward Kmett <ekmett at gmail.com> wrote:
>
>> On Tue, Jul 2, 2013 at 2:45 PM, Henning Thielemann <
>> lemming at henning-thielemann.de> wrote:
>>
>>>   Foldable.sequence_ . Map.mapWithKey f
>>
>>
>>
>> which looks more elegant.
>>>
>>
>> This has the unfortunate consequence that it builds an entire new map
>> strictly before sequencing it, due to the fact that Data.Map is
>> spine-strict. =(
>>
>> -Edward
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/libraries/attachments/20130702/aa605d53/attachment-0001.htm>


More information about the Libraries mailing list