[Haskell-cafe] Help optimising a Haskell program

Wed Mar 23 12:44:04 CET 2011

On 22 March 2011 15:49, Jason Dagit <dagitj at gmail.com> wrote:
>> This seems to consistently give about a 0.4s improvement, which isn't
>> nothing but isn't a particularly interesting chunck of 8s (actually
>> it's 8.4s -> 8s). Setting it to 256M doesn't make any difference.
>
> You should use criterion to make sure that your assessments of the
> performance difference are grounded in statistically robust reasoning :)
>  Progression may also help.

Thanks. These look useful.

> GHC 7 has a new rts option, +RTS -H -RTS that gives you a good high default
> for these memory numbers.  You might give it ago, but I doubt it's going to
> help much here.

It didn't seem to, no.

> One thing I noticed was that you calculate the length of elts more than once
> inside aggregate.  It's probably not going to account for a big boost, but
> I'd recommend doing it only once.  Something like:
>   let lenelts = length elts
> Then use lenelts everywhere.

This doesn't seem to make much difference - elts is only a thousand or
so items.

> Also, it seems like you use Data.Set to sort and uniquify the input then you
> throw away the Set, that's potentially costly.

This is actually the fastest way I've found to do it! It replaced an
earlier implementation that looked like it should have been more
optimised but wasn't. If you have any better suggestions, I'm
definitely open to hearing them.

> There are probably better
> types for aggregate than [[a]] -> [a].

What did you have in mind? That's largely the format the data comes in
as, so other than changing from lists to some other ordered container
I'm not sure what to change.

> You use Arrays, but it's possible
> that a different vector library such as Data.Vector gives better performance
> here.

For the case I'm using arrays I'm specifically using unboxed double
arrays with integer indices, mainly because I do a very large number
of lookups.

> Or, perhaps you want that instead of lists everywhere.

Possibly. I looked into it briefly and it's not that easy to do. In
particular there are important bits in the code which depend on being
able to do pull the list apart head/tail-wise. It might be possible
for some of this to use vectors internally, or I might be able to
replace those with something else. I'm not entirely sure.

> For example,
> you build an index and a reverse index.  It seems like with an array or
> vector you could eliminate at least one index and have O(1) lookup time
> instead of Data.Map's O(log n) lookup time.

Yeah, in particular the reverse index really should be a vector. But
that particular code is only actually used once right at the end and
takes a tiny amount of time, so it's not actually that useful to do so
I think.

> I hope that helps,

It does. Thanks!

David