bignums, gmp, bytestring, .. ?

Sat Nov 18 16:46:24 EST 2006

On Nov 17, 2006, at 7:24 PM, Claus Reinke wrote:
> it seems that haskell versions of bignums is pretty much gone from  
> more recent discussions of gmp replacements. now, I assume that  
> there are lots of optimizations that keep gmp popular that one  
> wouldn't want to have to reproduce, so that a haskell variant might  
> not be competitive even if one had an efficient representation, but
>
> - do all those who want to distribute binaries, but not dynamically
>    linked, need bignums?

You are right: most don't.  Even when working with larger numbers, I  
have very rarely used bignum libraries myself, mostly because there  
is usually a clever--and often faster--way to deal with large  
numbers, especially when you don't require all that extra precision.   
These methods were better known and relatively widely used before  
multi-precision libraries became so widespread and have even become  
more useful since 64-bit machines and C99's 64-bit ints came around.   
Integers are mostly a convenience.  Large numbers are necessary if  
you need very high precision mathematical calculations or if you are  
doing cryptography; for that matter, high precision mathematics  
usually benefits more from arbitrary precision decimal (fixed or  
floating point) for certain calculations.

The simple problem with Haskell and Integer is that, according to the  
Standard, Integer is a primitive: it is consequently implemented as  
part of the runtime system (RTS), not the Prelude or any library  
(though the interface to Integer is in the base library).  For GHC,  
compiling with -fno-implicit-prelude and explicitly importing only  
those functions and types you need the won't get rid of Integer.   
Possible solutions would be to implement the Integer 'primitive' as a  
separate library and import it into the Prelude or base libraries,  
then perform an optimisation step where base functions are only  
linked in when needed.  Except for the optimisation step, this  
actually makes the job easier since Integer functions would be called  
using the FFI and held in ForeignPtrs.  (I have already done the FFI- 
thing for other libraries and a primitive version of the replacement.)

> - it would be nice to know just how far off a good haskell version
>    would be performance-wise..

There is actually a relatively recent (2005, revised) Haskell version  
of an old Miranda library for "infinite" precision floating point  
numbers by Martin Guy, called BigFloat, at http:// 
bignum.sourceforge.net/.  Of course, it is floating point and  
Integers would be faster but the general speed difference between the  
two would probably be proportional to the speed difference in C and  
so would be just as disappointing.  The BigFloat library (using the  
Haskell version) came in last place at the Many Digits "Friendly"  
Competition for 2005 (see http://www.cs.ru.nl/~milad/manydigits/ 
final_timings_rankings.html), though you would probably be more  
interested in looking at the actual timing results to get a better  
idea.  (The fastest competitors were MPFR, which uses GMP, and The  
Wolfram Team, makers of Mathematica; BigFloat actually beat iRRAM and  
Maple solutions for several problems.)

The real problem with an Integer library written in *pure* Haskell-- 
especially with Integers--is simple: Haskell is too high-level and no  
current Haskell compiler, even JHC, has even remotely decent support  
for low-level optimisations such as being able to unroll a loop over  
two arrays of uint32_t and immediately carry the result from adding  
the first elements from each array to the addition of the next two,  
in two machine instructions.  I shouldn't have to mention  
parallelization of operations.  In short, if you look at general  
assembler produced from any Haskell compiler, it is *very* ugly and  
Arrays are even uglier.  (For a simple comparison to Integer  
problems, try implementing a fast bucket sort in Haskell.)

GMP uses hand-written assembler routines for many supported  
architectures, partly because GMP was originally created for earlier  
versions of GCC which could not optimise as well as current  
versions.  Even GMP cannot compare to an optimised library using SIMD  
(Altivec, SSE)--in my tests, SIMD-optimised algorithms are between 2x  
to 10x faster.  SIMD and small assembler routines (especially for  
architectures without SIMD, especially) are what I have been doing  
the bulk of my work on.  I doubt I have the ability to extend the  
current state of the art with regard to higher-level polynomial  
optimisations, so I am always trying out any algorithm I can find.   
(For very high precision multiplication (more than 30,000 bits), not  
much beats a SIMD-enabled Fast Fourier Transform; a specially coded  
Toom-3 algorithm would be faster but for very large operands the  
algorithm becomes prohibitively complex.  Division is another labour- 
intensive area.)

> - what would be a killer for numerical programming, might still be
>    quite acceptable for a substantial part of haskell uses?

Quite possibly, yes.  I haven't done my own study on what most users  
actually require but from notes in the source code of the Python  
multi-precision integer library (LongObject), they found most users  
of their library never needed more than 2 or 3 uint32_t in length.  
(They had originally reserved 5 uint32_t of space for a newly created  
PyLongObject.)  Of course even ordinary users would notice a change  
in speed if they used a much slower library on larger numbers (79  
decimal digits, 256 bits, or more) in an interactive session.

> of course, the real gmp replacement project might be going so well
> that a haskell version would be obsolete rather sooner than later, and
> i certainly don't want to interfere with that effort.

Not at all.  All ideas gratefully received :)  It's going slowly,  
partly because I am constantly having to learn more, test and  
refactor so the result isn't a naive solution.  I could reimplement  
GMP, of course, but many of the GMP algorithms are not readily  
amenable to SIMD operations--I am trying to *beat* GMP in speed.

> all that being said, it occurred to me that the representations and
> fusions described in the nice "rewriting haskell strings" paper  
> would be a good foundation for a haskell bignum project, wouldn't  
> they?

They would.  Thanks for the link--I had been working from the source  
code.  A Haskell-C combined solution may sometimes be faster--I  
noticed this before now while experimenting with using a random  
number generator library written in C, and came to the conclusion  
that the speed was largely due to Haskell's ability to cache the  
results of many operations, sometimes better than I could predict on  
my own.  Term rewriting for a hybrid Haskell-C library could be very  
useful for complex polynomials.  It might use equational  
transformations or Template Haskell, GHC's own Ian Lynagh did some  
work on this http://web.comlab.ox.ac.uk/oucl/work/ian.lynagh/ 
Fraskell/ (on Fraskell, with some code examples); and see esp.,  
http://web.comlab.ox.ac.uk/oucl/work/ian.lynagh/papers/ 
Template_Haskell-A_Report_From_The_Field.ps (PostScript file)).  The  
one problem I have with using equational transformations over arrays  
is that it seems very difficult--I would say, impossible without  
modifying the compiler--to perform a transformation that works  
between two combined arrays when the the array-operands may be of  
different sizes.

This is similar to the solution FFTW (see http://www.fftw.org) uses,  
only they used OCaml to perform the analysis on particular equations  
and patch the higher-level combinations together again in C--FFTW is  
essentially an equational compiler.  The problem with many of FFTW's  
optimisations are that much of their analysis relies on constant  
values--you are writing a program to solve one defined mathematical  
problem where you know something from the beginning, such as the size  
of the operands (that's a biggie!).  The SPIRAL project uses similar  
methods for more basic operations, such as multiplierless constant  
multiplication (transform multiplication of an unknown value with a  
constant into an optimal number of left-shifts and additions).  See,  
e.g., http://www.spiral.net/hardware/multless.html.  When writing an  
arbitrary precision Integer library for Haskell it must be more  
general than that: I have to assume no constants may be available.   
It is essentially an optimisation problem--but this whole endeavour  
is, in a way.

Cheers,
Pete