[Haskell-cafe] An interesting paper on VM-friendly GC

Sat Oct 16 07:37:24 EDT 2010

On 16 October 2010 10:35, Andrew Coppin <andrewcoppin at btinternet.com> wrote:
>  On 15/10/2010 11:50 PM, Gregory Crosswhite wrote:
>>
>>  On 10/15/2010 03:15 PM, Andrew Coppin wrote:
>>>
>>> On the other hand, their implementation uses a modified Linux kernel, and
>>> no sane person is going to recompile their OS kernel with a custom patch
>>> just to run Haskell applications, so we can't do quite as well as they did.
>>> But still, and interesting read...
>>>
>> Ah, but you are missing an important fact about the article:  it is not
>> about improving garbage collection for Haskell, it is about improving
>> collection for *Java*, which a language in heavy use on servers.  If this
>> performance gain really is such a big win, then I bet that it would highly
>> motivate people to make this extension as part of the standard Linux kernel,
>> at which point we could use it in the Haskell garbage collector.
>
> Mmm, that's interesting. The paper talks about "Jikes", but I have no idea
> what that is. So it's a Java implementation then?

Jikes as a virtual machine used for research, it actually has a decent
just in time compiler.  Its memory management toolkit (MMTk) also
makes it quite easy to experiment with new GC designs.

> Also, it's news to me that Java finds heavy use anywhere yet. (Then again,
> if they run Java server-side, how would you tell?)

Oh, it's *very* heavily used.  Many commercial products run on Java
both server and client.

> It seems to me that most operating systems are designed with the assumption
> that all the code being executed will be C or C++ with manual memory
> management. Ergo, however much memory the process has requested, it actually
> *needs* all of it. With GC, this assumption is violated. If you ask the GC
> nicely, it may well be able to release some memory back to you. It's just
> that the OS isn't designed to do this, so the GC has no idea whether it's
> starving the system of memory, or whether there's plenty spare.
>
> I know the GC engine in the GHC RTS just *never* releases memory back to the
> OS. (I imagine that's a common choice.) It means that if the amount of truly
> live data fluctuates up and down, you don't spend forever allocating and
> freeing memory from the OS. I think we could probably do better here.
> (There's an [ancient] feature request ticket for it somewhere on the
> Traq...) At a minimum, I'm not even sure how much notice the current GC
> takes of memory page boundaries and cache effects...

Actually that's been fixed in GHC 7.

> GC languages are not exactly rare, so maybe we'll see some OSes start adding
> new system calls to allow the OS to ask the application whether there's any
> memory it can cheaply hand back. We'll see...

I wouldn't be surprised if some OS kernels already have some
undocumented features to aid VM-friendly GC.  I think it's probably
going to have to be the other way around, though.  Not the OS should
ask for its memory back, but the application should ask for the page
access bits and then decide itself (as done in the paper).  I don't
know how that interacts with the VM paging strategy, though.
Microkernels such as L4 already support these things (e.g., L4 using
the UNMAP system call).  Xen and co. probably have something similar.

-- 
Push the envelope. Watch it bend.