[Haskell-cafe] Understanding GC time

Yves Parès yves.pares at gmail.com
Sat Mar 10 21:30:28 CET 2012


(Sorry for the double post)
A good practice would be to always factorize your list treatments [*]
(using a single fold to compute in the same time max & min), and be sure
never to keep a plain reference to the list and embed that list into a
function that returns it (to enforce its evaluation each time you need to
access it).
But I can't think of a way to check that property (list never accessed
twice) statically in Haskell.

[*] But wouldn't GHC do in that specific case do something about that with
-O2 activated?

2012/3/10 Yves Parès <yves.pares at gmail.com>

> I'm afraid without uniqueness or linear typing it would be hard to avoid.
>
>
> 2012/3/10 Thiago Negri <evohunz at gmail.com>
>
>> I see. Thanks for the answers.
>>
>> Any data structure or source annotation that would prevent that?
>>
>> For example, if I try the same program to run on a
>> [1..9999999999999999] list, I'll get an out of memory error for the
>> single-threaded version. Any way to prevent it without declaring two
>> different versions of "list"?
>>
>>
>> 2012/3/10 Anthony Cowley <acowley at gmail.com>:
>> > From that profiling data, I think you're just seeing a decrease in
>> sharing. With one thread, you create the list structure in memory: the
>> first fold could consume it in-place, but the second fold is still waiting
>> for its turn.  The list is built on the heap so the two folds can both
>> refer to the same list.
>> >
>> > With two threads, GHC is being clever and inlining the definition you
>> give for list, which is then optimized into two parallel loops. No list on
>> the heap means there's not much for the GC to do.
>> >
>> > Sharing of index lists like this is a common source of problems. In
>> particular, nested loops can make it even trickier to prevent sharing as
>> there may not be an opportunity for parallel evaluation.
>> >
>> > Anthony
>> >
>> > On Mar 10, 2012, at 10:21 AM, Thiago Negri <evohunz at gmail.com> wrote:
>> >
>> >> Hi all.
>> >>
>> >> I wrote a very simple program to try out parallel Haskel and check how
>> >> it would look like to make use of more than one core in this language.
>> >>
>> >> When I tried the program with RTS option -N1, total time shows it took
>> >> 2.48 seconds to complete and around 65% of that time was taken by GC.
>> >>
>> >> Then I tried the same program with RTS options -N2 and total time
>> >> decreased to 1.15 seconds as I expected a gain here. But what I didn't
>> >> expect is the GC time to drop to 0%.
>> >>
>> >> I guess I'm having trouble to understand the output of the RTS option
>> -s.
>> >> Can you enlighten me?
>> >>
>> >>
>> >> The source for the testing program:
>> >>
>> >>> module Main where
>> >>>
>> >>> import Data.List (foldl1')
>> >>> import Control.Parallel (par, pseq)
>> >>> import Control.Arrow ((&&&))
>> >>>
>> >>> f `parApp` (a, b) = a `par` (b `pseq` (f a b))
>> >>> seqApp = uncurry
>> >>>
>> >>> main = print result
>> >>>  where result = (+) `parApp` minMax list
>> >>>        minMax = minlist &&& maxlist
>> >>>        minlist = foldl1' min
>> >>>        maxlist = foldl1' max
>> >>>        list = [1..19999999]
>> >>
>> >>
>> >> The results on a Windows 7 64bits with an Intel Core 2 Duo, compiled
>> >> with GHC from Haskell Platform:
>> >>
>> >> c:\tmp\hs>par +RTS -s -N1
>> >> par +RTS -s -N1
>> >> 20000000
>> >>     803,186,152 bytes allocated in the heap
>> >>     859,916,960 bytes copied during GC
>> >>     233,465,740 bytes maximum residency (10 sample(s))
>> >>      30,065,860 bytes maximum slop
>> >>             483 MB total memory in use (0 MB lost due to fragmentation)
>> >>
>> >>  Generation 0:  1523 collections,     0 parallel,  0.80s,  0.75s
>> elapsed
>> >>  Generation 1:    10 collections,     0 parallel,  0.83s,  0.99s
>> elapsed
>> >>
>> >>  Parallel GC work balance: nan (0 / 0, ideal 1)
>> >>
>> >>                        MUT time (elapsed)       GC time  (elapsed)
>> >>  Task  0 (worker) :    0.00s    (  0.90s)       0.00s    (  0.06s)
>> >>  Task  1 (worker) :    0.00s    (  0.90s)       0.00s    (  0.00s)
>> >>  Task  2 (bound)  :    0.86s    (  0.90s)       1.62s    (  1.69s)
>> >>
>> >>  SPARKS: 1 (0 converted, 0 pruned)
>> >>
>> >>  INIT  time    0.00s  (  0.00s elapsed)
>> >>  MUT   time    0.86s  (  0.90s elapsed)
>> >>  GC    time    1.62s  (  1.74s elapsed)
>> >>  EXIT  time    0.00s  (  0.00s elapsed)
>> >>  Total time    2.48s  (  2.65s elapsed)
>> >>
>> >>  %GC time      65.4%  (65.9% elapsed)
>> >>
>> >>  Alloc rate    936,110,032 bytes per MUT second
>> >>
>> >>  Productivity  34.6% of total user, 32.4% of total elapsed
>> >>
>> >> gc_alloc_block_sync: 0
>> >> whitehole_spin: 0
>> >> gen[0].sync_large_objects: 0
>> >> gen[1].sync_large_objects: 0
>> >>
>> >>
>> >> c:\tmp\hs>par +RTS -s -N2
>> >> par +RTS -s -N2
>> >> 20000000
>> >>   1,606,279,644 bytes allocated in the heap
>> >>          74,924 bytes copied during GC
>> >>          28,340 bytes maximum residency (1 sample(s))
>> >>          29,004 bytes maximum slop
>> >>               2 MB total memory in use (0 MB lost due to fragmentation)
>> >>
>> >>  Generation 0:  1566 collections,  1565 parallel,  0.00s,  0.01s
>> elapsed
>> >>  Generation 1:     1 collections,     1 parallel,  0.00s,  0.00s
>> elapsed
>> >>
>> >>  Parallel GC work balance: 1.78 (15495 / 8703, ideal 2)
>> >>
>> >>                        MUT time (elapsed)       GC time  (elapsed)
>> >>  Task  0 (worker) :    0.00s    (  0.59s)       0.00s    (  0.00s)
>> >>  Task  1 (worker) :    0.58s    (  0.59s)       0.00s    (  0.01s)
>> >>  Task  2 (bound)  :    0.58s    (  0.59s)       0.00s    (  0.00s)
>> >>  Task  3 (worker) :    0.00s    (  0.59s)       0.00s    (  0.00s)
>> >>
>> >>  SPARKS: 1 (1 converted, 0 pruned)
>> >>
>> >>  INIT  time    0.00s  (  0.00s elapsed)
>> >>  MUT   time    1.15s  (  0.59s elapsed)
>> >>  GC    time    0.00s  (  0.01s elapsed)
>> >>  EXIT  time    0.00s  (  0.00s elapsed)
>> >>  Total time    1.15s  (  0.61s elapsed)
>> >>
>> >>  %GC time       0.0%  (2.4% elapsed)
>> >>
>> >>  Alloc rate    1,391,432,695 bytes per MUT second
>> >>
>> >>  Productivity 100.0% of total user, 190.3% of total elapsed
>> >>
>> >> gc_alloc_block_sync: 90
>> >> whitehole_spin: 0
>> >> gen[0].sync_large_objects: 0
>> >> gen[1].sync_large_objects: 0
>> >>
>> >> _______________________________________________
>> >> Haskell-Cafe mailing list
>> >> Haskell-Cafe at haskell.org
>> >> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20120310/27df2165/attachment.htm>


More information about the Haskell-Cafe mailing list