[Haskell-cafe] [0/16] SBM: Simple Bytestring Microbenchmarks, Overview and Introduction

Sat Dec 22 04:16:52 EST 2007

Table of contents:
0/16	This email
1/16	The Haskell and C benchmarks
2/16	Inner loops of the hand-tweaked assembly benchmarks
3/16	The Makefile
4/16	How to use the Makefile (how to run benchmarks etc.)
5/16	Support scripts and scriptlets
6/16	6.9.20071124 Athlon Duron
7/16	6.9.20071208 Core Duo
8/16	6.9.20071119 Pentium III
9/16	6.9.20071119 Athlon64
10/16	Graphs for 6.9.x across four cpus
11/16	Graphs for hand-tweaked assembly benchmarks
12/16	Graphs for 7 ghc/bytestring combinations on a 2GHz Athlon64 300+
13/16	Graphs that show the infidelity of -sstderr
14/16	Behind the measurements (rationale)
15/16	Predictions compared to the measurements
16/16	Discussion and Conclusion

Simple Bytestring Microbenchmarks
---------------------------------

Introduction
------------
I love parsers.  I have been writing parsers for fun for over twenty years.
The nicest way to construct a parser used to be to write a recursive descent
parser by hand.  If you had to work with people who'd had the misfortune of
a university education, you would resort to lex and yacc (flex and bison),
despite their many shortcomings.

Combinator parsers is the only real improvement over hand-written recursive
descent parsers that I know of.  They do tend to require features that not all
languages provide.  I don't know how to write a good one in C, for example.
They do work very well in Haskell, though.

So, I've started writing a parser in Haskell (ghc, really) for the programming
language X++.  X++ is not a nice language but that's beside the point.  The
challenge for me is to write an efficient compiler + provide good analysis
tools for X++.  I think I stand a better chance of doing that in Haskell (ghc)
than in practically any other language.

There are a few drawbacks, though.

I love speed.  And efficiency.

String handling in Haskell
--------------------------
Native strings are simple and generally work well, but they are slow, take up
too much memory and there's the whole encoding mess that still needs to be
sorted out.

People have worked on other string representations and libraries for quite a
while.  I think packedstrings (as used in Darcs) was one of the first ones.
Bytestrings is the current incarnation.  It seems to be just the right thing,
especially when combined with improved automatic fusion in the compiler so
higher-order functions don't have to be expensive.

I use Parsec as my parser combinator library at the moment, which
uses native strings.  I would dearly love Parsec to be faster and use less
memory.  I think bytestrings will be part of any substantial improvement in
Parsec's resource consumption.

Other performance concerns
--------------------------
File I/O is also interesting for a compiler writer.  I would like to have a
program that is as fast as possible both when the source files are already
cached by the operating system and when they are not.  The former situation is
best handled with mmap() and the latter is best handled by read(), preferably
in combination with multi-threading so the compiler doesn't have to waste too
much time waiting for disk seeks.  Haskell seems to be very close to ideal for
me because it has very good threading support and very accessible raw access
to the operating system.

File I/O is not my current bottleneck, though.  I'll probably take a closer
look at file I/O when the other performance problems have been solved.

Then there's the general quality of the generated code.  Having read just about
every paper on ghc that was available back in the late nineties (when I first
looked at Haskell), I'd thought that the quality was good and that the compiler
also had extremely good high-level optimizations, in other words, that
abstraction was free.

I also read the C-- papers and thought that it was a very interesting and
promising approach.  I'd expected the C-- path to have matured and be well-
optimized by now.

Unfortunately, the backend is /the/ weak spot in ghc.  The frontend is heroic,
the typesystems are (too) abundant and rich, the language itself is nice -- but
the backend is not.  Looking at the generated code I'd say that it is slightly
better than Turbo Pascal 3.x and about on par with Turbo Pascal 4.0, a compiler
that didn't use any intermediate code at all, compiled each statement in
isolation, was single-pass, and had a compilation speed of about 27000 lines
per minute on an 8 MHz IBM PC AT.

Ecosystem and culture
---------------------
Haskell has a very good ecosystem.  Probably the second best one amongst the
modern functional languages.  Ten years ago, I'd thought that Standard ML would
win but the only MLish language with a good ecosystem and culture is OCaml,
which unfortunately isn't really Standard ML.

By ecosystem I mean things like access to raw operating system calls, access
to libraries written in other languages, readily available libraries for
graphical user interfaces, databases, XML processing, network I/O.  Parsing is
nice, too, but practically all functional programming languages have that --
and not much else.

The culture is also good.  People actually use this stuff.  They care about it.
And they don't hang around waiting for somebody to tell them what to do, they
start on their own.  And they actually seem to be interested in good
performance :)

Hackage and cabal are very promising already and may be what finally makes ghc
real-world useful for more people, because most people are not interested in
working with raw source packages and fiddling with compiler flags and weird
error messages.  They don't like chasing dependencies, either.

So, what's the problem?
-----------------------
The major problem with Haskell (ghc) is that its performance (in terms of both
speed and memory use) is unpredictable.  The second-worst problem is that the
actual performance is not good enough.

These benchmarks
----------------
I have written a bunch of microbenchmarks that either count all the bytes in
stdin or all the spaces in stdin.  And some supporting benchmarks in C.  And
I've also handtweaked the assembly of one byte-counting and one space-counting
microbenchmark to illustrate what difference it would make if the backend could
use registers in a less stupid^W^W more efficient manner.

Homepage + source code
----------------------
I have put up a homepage for the benchmarks at:

  http://vax64.dk/ghc-bs-tests

The raw measurements are in tarballs on that page.

The source code for the benchmarks (+ support code) is in a mercurial
repository at:

  http://vax64.dyndns.org/repo/hg/ghc-bs-tests

I used scripts to install the various versions of ghc and bytestring both to
avoid operator error and so you could me look over my shoulder.  The scripts
are in a mercurial repository at:

  http://vax64.dyndns.org/repo/hg/ghc-installations

You can either follow the link and download any version you like as a tarball
or you can (preferably) clone the repositories with:

  hg http://vax64.dyndns.org/repo/hg/ghc-bs-tests
  hg http://vax64.dyndns.org/repo/hg/ghc-installations

All my code in those repositories is GPLv2.

The text file 'text.txt' in the ghc-bs-tests repository is unfortunately partly
in Danish and partly in very terse English.

Acknowledgements
----------------
Daniel Fischer, for running the benchmarks on his SuSE 8.2 Athlon Duron 1200
MHz machine and for being helpful and patient while I made the scripts work on
a 2.4 kernel and with unhelpful versions of GNU Make and strace.

Erik van der Meer, for letting me run the benchmarks (and install ghc) on his
Core Duo laptop.  And for discussions on measurements over the years.

Don Stewart, for playing along and for fixing a bytestring problem.  And for
  contributing three benchmarks (one of which I had to change a bit before it
  would compile).

Duncan Coutts, for playing along.

Jules Bean (quicksilver), for contributing a benchmark.

Bertram Felgenhauer (int-e), for contributing three benchmarks (in the form
  of a single file, which I untangled to three files).

Spencer Jannsen (sjannsen), for contributing a benchmark.

William Lee Irwin III (wli), for inspiring me to add the getwchar benchmarks.

-Peter