<div dir="ltr">The main takeaway I had from my work with prefetching was that if you can shove things into a fixed-sized queue and prefetch on the way into the queue instead of doing it just to sort of kickstart the next element during a tree traversal that is going to be demanded too fast to take full advantage of the latency, then you can smooth out a lot of the cross system variance.<div><br></div><div>It is just incredibly invasive. =(<br></div><div><div><br></div><div>Re: doing prefetching in the mark phase, I just skimmed and found <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.93.9090&rep=rep1&type=pdf">http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.93.9090&rep=rep1&type=pdf</a> takes which appears to take a similar approach.</div><div><br></div><div>-Edward</div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Nov 28, 2014 at 3:42 AM, Simon Marlow <span dir="ltr"><<a href="mailto:marlowsd@gmail.com" target="_blank">marlowsd@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Thanks for this.  In the copying GC I was using prefetching during the scan phase, where you do have a pretty good tunable knob for how far ahead you want to prefetch.  The only variable is the size of the objects being copied, but most tend to be in the 2-4 words range.  I did manage to get 10-15% speedups with optimal tuning, but it was a slowdown on a different machine or with wrong tuning, which is why GHC doesn't have any of this right now.<br>

<br>

Glad to hear this can actually be used to get real speedups in Haskell, I will be less sceptical from now on :)<br>

<br>

Cheers,<br>

Simon<span class=""><br>

<br>

On 27/11/2014 10:20, Edward Kmett wrote:<br>

</span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">

My general experience with prefetching is that it is almost never a win<br>

when done just on trees, as in the usual mark-sweep or copy-collection<br>

garbage collector walk. Why? Because the time from the time you prefetch<br>

to the time you use the data is too variable. Stack disciplines and<br>

prefetch don't mix nicely.<br>

<br>

If you want to see a win out of it you have to free up some of the<br>

ordering of your walk, and tweak your whole application to support it.<br>

e.g. if you want to use prefetching in garbage collection, the way to do<br>

it is to switch from a strict stack discipline to using a small<br>

fixed-sized queue on the output of the stack, then feed prefetch on the<br>

way into the queue rather than as you walk the stack. That paid out for<br>

me as a 10-15% speedup last time I used it after factoring in the<br>

overhead of the extra queue. Not too bad for a weekend project. =)<br>

<br>

Without that sort of known lead-in time, it works out that prefetching<br>

is usually a net loss or vanishes into the noise.<br>

<br>

As for the array ops, davean has a couple of cases w/ those for which<br>

the prefetching operations are a 20-25% speedup, which is what motivated<br>

Carter to start playing around with these again. I don't know off hand<br>

how easily those can be turned into public test cases though.<br>

<br>

-Edward<br>

<br>

On Thu, Nov 27, 2014 at 4:36 AM, Simon Marlow <<a href="mailto:marlowsd@gmail.com" target="_blank">marlowsd@gmail.com</a><br></span><span class="">

<mailto:<a href="mailto:marlowsd@gmail.com" target="_blank">marlowsd@gmail.com</a>>> wrote:<br>

<br>

    I haven't been watching this, but I have one question: does<br>

    prefetching actually *work*?  Do you have benchmarks (or better<br>

    still, actual library/application code) that show some improvement?<br>

    I admit to being slightly sceptical - when I've tried using<br>

    prefetching in the GC it has always been a struggle to get something<br>

    that shows an improvement, and even when I get things tuned on one<br>

    machine it typically makes things slower on a different processor.<br>

    And that's in the GC, doing it at the Haskell level should be even<br>

    harder.<br>

<br>

    Cheers,<br>

    Simon<br>

<br>

<br>

    On 22/11/2014 05:43, Carter Schonwald wrote:<br>

<br>

        Hey Everyone,<br>

        in<br></span>

        <a href="https://ghc.haskell.org/trac/__ghc/ticket/9353" target="_blank">https://ghc.haskell.org/trac/_<u></u>_ghc/ticket/9353</a><br>

        <<a href="https://ghc.haskell.org/trac/ghc/ticket/9353" target="_blank">https://ghc.haskell.org/trac/<u></u>ghc/ticket/9353</a>><br>

        and<br>

        <a href="https://phabricator.haskell." target="_blank">https://phabricator.haskell.</a>__<u></u>org/D350<span class=""><br>

        <<a href="https://phabricator.haskell.org/D350" target="_blank">https://phabricator.haskell.<u></u>org/D350</a>><br>

<br>

        is some preliminary work to fix up how the pure versions of the<br>

        prefetch<br>

        primops work is laid out and prototyped.<br>

<br>

        However, while it nominally fixes up some of the problems with<br>

        how the<br>

        current pure prefetch apis are fundamentally borken,  the simple<br>

        design<br>

        in D350 isn't quite ideal, and i sketch out some other ideas in the<br>

        associated ticket #9353<br>

<br>

        I'd like to make sure  pure prefetch in 7.10 is slightly less broken<br>

        than in 7.8, but either way, its pretty clear that working out<br>

        the right<br>

        fixed up design wont happen till 7.12. Ie, whatever makes 7.10,<br>

        there<br>

        WILL have to be breaking changes to fix those primops for 7.12<br>

<br>

        thanks and any feedback / thoughts appreciated<br>

        -Carter<br>

<br>

<br></span>

        ______________________________<u></u>___________________<br>

        ghc-devs mailing list<br>

        <a href="mailto:ghc-devs@haskell.org" target="_blank">ghc-devs@haskell.org</a> <mailto:<a href="mailto:ghc-devs@haskell.org" target="_blank">ghc-devs@haskell.org</a>><br>

        <a href="http://www.haskell.org/__mailman/listinfo/ghc-devs" target="_blank">http://www.haskell.org/__<u></u>mailman/listinfo/ghc-devs</a><br>

        <<a href="http://www.haskell.org/mailman/listinfo/ghc-devs" target="_blank">http://www.haskell.org/<u></u>mailman/listinfo/ghc-devs</a>><br>

<br>

    ______________________________<u></u>___________________<br>

    ghc-devs mailing list<br>

    <a href="mailto:ghc-devs@haskell.org" target="_blank">ghc-devs@haskell.org</a> <mailto:<a href="mailto:ghc-devs@haskell.org" target="_blank">ghc-devs@haskell.org</a>><br>

    <a href="http://www.haskell.org/__mailman/listinfo/ghc-devs" target="_blank">http://www.haskell.org/__<u></u>mailman/listinfo/ghc-devs</a><br>

    <<a href="http://www.haskell.org/mailman/listinfo/ghc-devs" target="_blank">http://www.haskell.org/<u></u>mailman/listinfo/ghc-devs</a>><br>

<br>

<br>

</blockquote>

</blockquote></div><br></div>