[Haskell-beginners] Re: [Haskell-begin] Maximum of a List?

Mon Jul 28 18:45:28 EDT 2008

Am Montag, 28. Juli 2008 23:35 schrieb Niels Aan de Brugh:
> Right. For this puzzle (n up to 1000000) it's quite safe to say an Int64
> will never overflow. :-)

Fairly small risk. You can't be sure beforehand, though.

>
> I was wondering how the Integer type is implemented.

Two constructors, using native machine integers and their arithmetics when 
possible because it's way faster, using GMP for larger numbers. There's work 
going on to replace GMP, but I don't know the state.

-- | Arbitrary-precision integers.
data Integer    
   = S# Int#                            -- small integers
#ifndef ILX
   | J# Int# ByteArray#                 -- large integers
#else
   | J# Void BigInteger                 -- .NET big ints

foreign type dotnet "BigInteger" BigInteger
#endif

> If I were to implement such a type my first approach would be to keep
> a list of digits (and perform calculation like I do by hand).

I'd do something along those lines, too - of course I wouldn't use decimal 
digits, rather use base 2^16 (easy to control overflow then). But if I had to 
do it in earnest, I'd research some algorithms first.

> Having a type like Integer on board makes many of the puzzles on Project
> Euler a shoo-in. One of the questions is indeed to calculate the last n
> digits of 1000!.

Don't you mean the sum of the digits of 100! ?
The idea behind those questions was to have people write their own BigInt 
library, from the days when they weren't so widespread. Nowadays everything 
fits into 64 bits (overflow prevention may be necessary, though).

>
> > Using a Map, though easier to code, doesn't buy you much here, indeed
> > less than a few easy considerations where the maximal length cannot
> > occur. The problem is that you have to update the Map often, which is
> > rather costly, also the Map takes a lot of memory. Using an array is more
> > convoluted, but much faster (if done right).
>
> The problem with an array is finding a good upper limit.

1000000 is a good upper limit. What makes the code convoluted is how you 
handle the values above :)

> One could of
> course copy the whole array if a new upper bound is found, the costs
> would amortize over the whole run. Having a really big and sparse array
> is no good either (no/bad locality of reference).
>
> Oh well, the puzzle itself definitely isn't worth the effort, but maybe
> some custom made heap structure would perform best.
>
> > At least, it would take far longer to get the solution.
>
> Yes, but sometimes the difference is several days vs. a couple of minutes.

Or even years vs. seconds.

>
> > Nowadays, the native code generator is not necessarily slower than
> > -fvia-C, often faster. So the best should be to test both, -O2 and -O2
> > -fvia-C -optc-On, for some medium sized inputs and see which actually is
> > faster.
>
> Interesting. In which release has the back-end been reworked?

I think it was the 6.8 branch. I can't remember which, but I had programmes 
that performed better when compiled via C, others with the native code 
generator, though I never found big differences. But when going via C, don't 
forget -optc-On (I habitually use n = 3), otherwise gcc won't do much with 
the generated C.
Generally everything got faster with the 6.8 branch, let's see what 6.10 does.

> Does it work equally well for architectures other than Intel?

No Idea, you might ask on glasgow-haskell-users, somebody there should know.
>
> Niels

Cheers,
Daniel