[RFC] Faster implementation of Integer to string conversion.

Bertram Felgenhauer bertram.felgenhauer at googlemail.com
Thu Apr 13 03:59:39 EDT 2006


Hi,

several people have noted that printing integers is slow. The attached
patch implements a faster version of the jtos (read: positive integer
to string) function in GHC's Num library. The algorithm is a divide and
conquer algorithm, that converts numbers to base b by converting them
to base b^2 first, and then splitting the digits.

Notes:
- I've tested the speed on my computer (which has an Athlon XP processor)
  and it seems to be of comparable speed as the original for small numbers
  and way faster for big numbers (read: a few hundred or thousand digits).
- The changed conversion code is a much better list producer than the
  original one, which generates digits starting with the last.
- There's similar code in Numeric.lhs, that deals with arbitrary bases.
  The current patch does not deal with this. (see questions below)
- musabi on #haskell mentioned that there's interest in replacing GMP as
  the bignum implementation. This is independent of changing this string
  conversion, as far as I can see.

Questions:
- Is this really GHC specific?
- Unfortunately I see no nice way to share code between the Numeric and
  Num modules. Does anyone have an idea how to do this without sacrificing
  performance for the common base 10 case?

regards,

Bertram
-------------- next part --------------
diff -rN -u old-base/GHC/Num.lhs new-base/GHC/Num.lhs
--- old-base/GHC/Num.lhs	2006-04-13 09:44:25.000000000 +0200
+++ new-base/GHC/Num.lhs	2006-04-13 09:44:25.000000000 +0200
@@ -17,10 +17,16 @@
 #include "MachDeps.h"
 #if SIZEOF_HSWORD == 4
 #define LEFTMOST_BIT 2147483648
+#define DIGITS       9
+#define BASE         1000000000
 #elif SIZEOF_HSWORD == 8
 #define LEFTMOST_BIT 9223372036854775808
+#define DIGITS       18
+#define BASE         1000000000000000000
 #else
 #error Please define LEFTMOST_BIT to be 2^(SIZEOF_HSWORD*8-1)
+-- DIGITS should be the largest integer such that 10^DIGITS < LEFTMOST_BIT
+-- BASE should be 10^DIGITS. Note that ^ is not available yet.
 #endif
 
 -- #hide
@@ -457,17 +463,81 @@
         | otherwise      = jtos n r
     showList = showList__ (showsPrec 0)
 
+-- Divide an conquer implementation of string conversion
 jtos :: Integer -> String -> String
 jtos n cs
     | n < 0     = '-' : jtos' (-n) cs
     | otherwise = jtos' n cs
     where
     jtos' :: Integer -> String -> String
-    jtos' n' cs'
-        | n' < 10    = case unsafeChr (ord '0' + fromInteger n') of
-            c@(C# _) -> c:cs'
-        | otherwise = case unsafeChr (ord '0' + fromInteger r) of
-            c@(C# _) -> jtos' q (c:cs')
+    jtos' n cs
+        | n < BASE  = jhead (fromInteger n) cs
+        | otherwise = jprinth (jsplitf (BASE*BASE) n) cs
+
+    -- Split n into digits in base p. We first split n into digits
+    -- in base p*p and then split each of these digits into two.
+    -- Note that the first 'digit' modulo p*p may have a leading zero
+    -- in base p that we need to drop - this is what jsplith takes care of.
+    -- jsplitb the handles the remaining digits.
+    jsplitf :: Integer -> Integer -> [Integer]
+    jsplitf p n
+        | p > n     = [n]
+        | otherwise = jsplith p (jsplitf (p*p) n)
+
+    jsplith :: Integer -> [Integer] -> [Integer]
+    jsplith p (n:ns) =
+        if q > 0 then fromInteger q : fromInteger r : jsplitb p ns
+                 else fromInteger r : jsplitb p ns
         where
-        (q,r) = n' `quotRemInteger` 10
+        (q, r) = n `quotRemInteger` p
+
+    jsplitb :: Integer -> [Integer] -> [Integer]
+    jsplitb p []     = []
+    jsplitb p (n:ns) = q : r : jsplitb p ns
+        where
+        (q, r) = n `quotRemInteger` p
+
+    -- Convert a number that has been split into digits in base BASE^2
+    -- this includes a last splitting step and then conversion of digits
+    -- that all fit into a machine word.
+    jprinth :: [Integer] -> String -> String
+    jprinth (n:ns) cs =
+        if q > 0 then jhead q $ jblock r $ jprintb ns cs
+                 else jhead r $ jprintb ns cs
+        where
+        (q', r') = n `quotRemInteger` BASE
+        q = fromInteger q'
+        r = fromInteger r'
+
+    jprintb :: [Integer] -> String -> String
+    jprintb []     cs = cs
+    jprintb (n:ns) cs = jblock q $ jblock r $ jprintb ns cs
+        where
+        (q', r') = n `quotRemInteger` BASE
+        q = fromInteger q'
+        r = fromInteger r'
+
+    -- Convert an integer that fits into a machine word. Again, we have two
+    -- functions, one that drops leading zeros (jhead) and one that doesn't
+    -- (jblock)
+    jhead :: Int -> String -> String
+    jhead n cs
+        | n < 10    = case unsafeChr (ord '0' + n) of
+            c@(C# _) -> c : cs
+        | otherwise = case unsafeChr (ord '0' + r) of
+            c@(C# _) -> jhead q (c : cs)
+        where
+        (q, r) = n `quotRemInt` 10
+
+    jblock = jblock' {- ' -} DIGITS
+
+    jblock' :: Int -> Int -> String -> String
+    jblock' d n cs
+        | d == 1    = case unsafeChr (ord '0' + n) of
+             c@(C# _) -> c : cs
+        | otherwise = case unsafeChr (ord '0' + r) of
+             c@(C# _) -> jblock' (d - 1) q (c : cs)
+        where
+        (q, r) = n `quotRemInt` 10
+
 \end{code}



More information about the Libraries mailing list