[Haskell-cafe] Understanding GHC allocations

Roman Cheplyaka roma at ro-che.info
Thu Jun 17 12:35:27 EDT 2010


* Daniel Fischer <daniel.is.fischer at web.de> [2010-06-17 16:27:01+0200]
> On Thursday 17 June 2010 11:43:09, Roman Cheplyaka wrote:
> > * Roman Cheplyaka <roma at ro-che.info> [2010-06-17 12:40:59+0300]
> >
> > > I'm trying to optimize the following program:
> > > http://github.com/feuerbach/particles/blob/303c8a17c9b732e22457b5409bd
> > >ce4b7520be94a/run.hs
> > >
> > > Of course general suggestions are welcome (BTW I'm going to give a try
> > > to vector), but currently I'm concerned with two questions:
> > >
> > > 1. Heavy allocations in 'distance' function. Here is (part of) the
> > > profile:
> > >
> > > COST CENTRE   MODULE    %time %alloc  ticks     bytes
> > >
> > > d2            Main        9.0   22.0    290 600000000
> > > d             Main        8.6   65.9    278 1800000000
> > > d1            Main        7.5   11.0    242 299700000
> > >
> 
> I suspect the distance function is not what you intended,
> distance 0.2 24.8 = 24.6, while the wrapping suggests that it should be 
> 0.4, so in d2, it should be d1 instead of d.

Good catch! :)

> Either way, both d and d1 are <= 25, so the 'abs' in d2 is superfluous, 

Correct

> removing that alone reduces the allocations drastically and the running 
> time by ~40%

That's exactly what I'm asking about. 'abs' in C does not require any
allocations, does it? So why does it require any allocations in Haskell,
assuming we've got no lazyness, typeclass indirection (I assume 'abs'
was specialized and inlined) or other high-level features in resulted
low-level code?

> Further, if you export only main from the module, you allow GHC to be more 
> aggressive with optimising. On my box, that leads to more allocation again 
> because there aren't enough registers, but things become a little faster.

Good idea indeed.

> > > Perhaps this is related to creating some closures? How to get
> > > rid of those allocations?
> > >
> 
> Do you need to? Sometimes an allocating loop is faster than a non-
> allocating one (of course, if you have enough registers for the allocating 
> loop to run entirely in registers, it'll be much faster still).
> 
> IMO, the important criteria are time and resident memory, not allocation.

Maybe, but what bothers me is that I can't answer myself where are those
allocation from. What problem do they solve?

> > > 2. Again from reading the core I learned that although 'l' and other
> > > constants are inlined, their type is boxed Double. This makes sense
> > > since CAFs are evaluated on demand, but obviously in this particular
> > > case it does not make sense, so can I somehow make them unboxed?
> 
> Putting bangs in the loops where they are used likely uses the unboxed 
> values; not exporting them too.

I'll play with this, thanks.

-- 
Roman I. Cheplyaka :: http://ro-che.info/
"Don't let school get in the way of your education." - Mark Twain


More information about the Haskell-Cafe mailing list