<p dir="ltr"><br>

On May 9, 2014 5:13 PM, "Edward Z. Yang" <<a href="mailto:ezyang@mit.edu">ezyang@mit.edu</a>> wrote:<br>

><br>

> Hello Brandon,<br>

><br>

> Excerpts from Brandon Simmons's message of 2014-05-08 16:18:48 -0700:<br>

> > I have an unusual application with some unusual performance problems<br>

> > and I'm trying to understand how I might use unsafeFreezeArray to help<br>

> > me, as well as understand in detail what's going on with boxed mutable<br>

> > arrays and GC. I'm using the interface from 'primitive' below.<br>

> ><br>

> > First some basic questions, then a bit more background<br>

> ><br>

> > 1) What happens when I do `newArray s x >>= \a-> unsafeFreezeArray a<br>

> > >> return a` and then use `a`? What problems could that cause?<br>

><br>

> Your code as written wouldn't compile, but assuming you're talking about<br>

> the primops newArray# and unsafeFreezeArray#, what this operation does<br>

> is allocate a new array of pointers (initially recorded as mutable), and<br>

> then freezes it in-place (by changing the info-table associated with<br>

> it), but while maintaining a pointer to the original mutable array.  Nothing bad<br>

> will happen immediately, but if you use this mutable reference to mutate<br>

> the pointer array, you can cause a crash (in particular, if the array<br>

> makes it to the old generation, it will not be on the mutable list and<br>

> so if you mutate it, you may be missing roots.)<br>

><br>

> > 2) And what if a do a `cloneMutableArray` on `a` and likewise use the<br>

> > resulting array?<br>

><br>

> If you do the clone before freezing, that's fine for all use-cases;<br>

> if you do the clone after, you will end up with the same result as (1).<br>

><br>

> > Background: I've been looking into an issue [1] in a library in which<br>

> > as more mutable arrays are allocated, GC dominates (I think I verified<br>

> > this?) and all code gets slower in proportion to the number of mutable<br>

> > arrays that are hanging around.<br>

> ><br>

> > I've been trying to understand how this is working internally. I don't<br>

> > quite understand how the "remembered set" works with respect to<br>

> > MutableArray. As best I understand: the remembered set in generation G<br>

> > points to certain objects in older generations, which objects hold<br>

> > references to objects in G. Then for MutableArrays specifically,<br>

> > card-marking is used to mark regions of the array with garbage in some<br>

> > way.<br>

> ><br>

> > So my hypothesis is the slowdown is associated with the size of the<br>

> > remembered set, and whatever the GC has to do on it. And in my tests,<br>

> > freezing the array seems to make that overhead (at least the overhead<br>

> > proportional to number of arrays) disappear.<br>

><br>

> You're basically correct.  In the current GC design, mutable arrays of<br>

> pointers are always placed on the mutable list.  The mutable list of<br>

> generations which are not being collected are always traversed; thus,<br>

> the number of pointer arrays corresponds to a linear overhead for minor GCs.<br>

><br>

> Here is a feature request tracking many of the infelicities that our<br>

> current GC design has:  <a href="https://ghc.haskell.org/trac/ghc/ticket/7662">https://ghc.haskell.org/trac/ghc/ticket/7662</a><br>

> The upshot is that the Haskell GC is very nicely tuned for mostly<br>

> immutable workloads, but there are some bad asymptotics when your<br>

> heap has lots of mutable objects.  This is generally a hard problem:<br>

> tuned GC implementations for mutable languages are a lot of work!<br>

> (Just ask the JVM implementors.)<br>

></p>

<p dir="ltr">Very helpful, thanks! And take some internet points.</p>

<p dir="ltr">> > Now I'm really lost in the woods though. My hope is that I might be<br>

> > able to safely use unsafeFreezeArray to help me here [3]. Here are the<br>

> > particulars of how I use MutableArray in my algorithm, which are<br>

> > somewhat unusual:<br>

> >   - keep around a small template `MutableArray Nothing`<br>

> >   - use cloneMutableArray for fast allocation of new arrays<br>

> >   - for each array only a *single* write (CAS actually) happens at each position<br>

> ><br>

> > In fact as far as I can reason, there ought to be no garbage to<br>

> > collect at all until the entire array becomes garbage (the initial<br>

> > value is surely shared, especially since I'm keeping this template<br>

> > array around to clone from, right?). In fact I was even playing with<br>

> > the idea of rolling a new CAS that skips the card-marking stuff.<br>

><br>

> I don't understand your full workload, but if you have a workload that<br>

> involves creating an array, mutating it over a short period of time,<br>

> and then never mutating it afterwards, you should simply freeze it after<br>

> you are done writing it.  Once frozen, the array will no longer be kept<br>

> on the mutable list and you won't pay for it when doing GC.  However,<br>

> the fact that you are doing a CAS makes it seem to me that your workflow<br>

> may be more complicated than that...</p>

<p dir="ltr">Yes I think for my use case the overhead required to determine when the array can be frozen would not be worth it. I think I have some other knobs I can twist here.</p>

<p dir="ltr">I'll keep an eye on that ticket and maybe chime in if I have any ideas.</p>

<p dir="ltr">Thanks,<br>

Brandon</p>

<p dir="ltr">><br>

> Cheers,<br>

> Edward<br>

</p>