ForeignPtr reloaded

Duncan Coutts duncan.coutts at worc.ox.ac.uk
Sun Jun 4 19:04:54 EDT 2006


On Sun, 2006-06-04 at 18:34 +0400, Bulat Ziganshin wrote:

> afaik, your lazy bytestrings is anyway about 1mb each.

Not quite sure what you mean, our chunk size currently defaults to 64k.

> but when you will apply, say, 'lines' to it - you will get a lazy list of
> ByteStrings, each representing just one line of file. as i previously
> said, it should be a good benchmark for your library

Yes, it certainly is a good benchmark and is quite common in practise.
We have been optimising this case recently.

> you should remember that this test instantly showed difference between
> foreign ptrs in ghc 6.4 and 6.5

However this lines test is no good for seeing if this new form of
ForeignPtr makes any difference since for the 'lines' function we're
only constructing substrings not new strings so all the ForeignPtrs are
shared.

Here's a brief summary of our current results on the sumfile benchmark:

In each case we're using a cached 5.4Mb file with one number per line,
mostly 3 and 4 digit numbers, some negative. All tests using a recent
version of ghc-6.5 with -O2 on a 1.8Ghz x86-64.

1: original version from the shootout, elegant, uses [Char], is very
slow.

time: 33.97s

        main = do
           incoming <- getContents
           putStrLn $ show $ sum $ map read $ lines incoming
        
2: the 'GHC #4' entry on the shootout, less elegant, still uses [Char],
remarkably fast.

time: 0.74s

        main = print . new 0 =<< getContents
        
        new i []       = i
        new i ('-':xs) = neg 0 xs
            where neg n ('\n':xs) = new (i - n) xs
                  neg n (x   :xs) = neg (parse x + (10 * n)) xs
        new i (x:xs) = pos (parse x) xs
            where pos n ('\n':xs) = new (i + n) xs
                  pos n (x   :xs) = pos (parse x + (10 * n)) xs
        
        parse c = ord c - ord '0'
        

3: strict ByteString version, more like the original elegant pipeline
version, quite quick.

time: 0.36s

        import qualified Data.ByteString.Char8 as B
        
        main = print . foldl (\n l -> n + B.readInt l) 0 . B.lines =<< B.getContents
        
4: lazy ByteString version, same code but uses the .Lazy.Char8 module.
Not quite as quick.

time: 0.45s

        import qualified Data.ByteString.Lazy.Char8 as B
        
        main = print . foldl (\n l -> n + B.readInt l) 0 . B.lines =<< B.getContents

5: C version, very quick, reuses a fixed size buffer (which is possibly
cheating as the Haskell versions will allow any line length).

time: 0.15s

        #include <stdio.h>
        #include <stdlib.h>
        
        #define MAXLINELEN 128
        
        int
        main() {
            int sum = 0;
            char line[MAXLINELEN];
        
            while (fgets(line, MAXLINELEN, stdin)) {
           	sum += atoi(line);
            }
            printf("%d\n", sum);
            return(0);
        }

So our fastest version is similar code to the original naive version and
is only half the speed of the C version.

We're using a manually used sum . map lines because for some reason that
wasn't fusing automatically. We're investigating that. Other todo items
related to this benchmark are to make the ByteString.lines fuse with a
list consumer and to optimise the IO parts (for hGetContents we can
avoid copying through the Handle buffers and instead copy directly into
the ByteString(s)). Though interestingly the IO was only 0.02 of the
time so there's not much to be saved there.

This benchmark shows up the speed of the lines and readInt functions and
whether we are getting list consumer/producer fusion or whether we're
allocating list elements for each one.

Duncan



More information about the Cvs-ghc mailing list