[Haskell-beginners] Effective file I/O with bytestrings

Johannes Engels johannes.engels at hft-stuttgart.de
Fri Feb 10 10:31:27 CET 2012


Dear Haskellers,

in several books (RWH, LYAH) I have found the statement that file I/O is 
more effective with bytestrings than with common strings. As I am doing 
mostly scientific programming and nearly every program of mine uses file 
I/O, I would like to check this. So I tried to write a small 
"benchmark": read a double matrix from file and write it to file again. 
Regarding to reading, the benefit of bytestrings was actually huge, it 
was about ten times faster than with strings. What refers to writing, 
however, I failed completely. Most important, I did not find a function 
which directly converts doubles to bytestrings. So the best I could 
figure out was the following ugly workaround using Text.Show.ByteString:


import qualified Data.ByteString.Lazy as DL
import qualified Text.Show.ByteString as BS

import Data.Char
import Data.List
import Data.Array.Unboxed


lineendw8 = DL.pack [fromIntegral (ord '\n')]
blankw8 = DL.pack [fromIntegral (ord ' ')]

showAll :: UArray (Int, Int) Double  ->   -- matrix
            Int                       ->   -- number of rows
            Int                       ->   -- number of columns
            DL.ByteString
showAll mymatrix numrows numcols = foldr f lineendw8 [0..numrows-1]
     where f = showLine mymatrix numcols

showLine :: UArray (Int,Int) Double  ->   -- matrix
             Int                      ->   -- number of columns
             Int                      ->   -- current row
             DL.ByteString            ->   -- accumulator
             DL.ByteString
showLine mymatrix numcols row akku =
    let f col s = DL.append blankw8 $
                      DL.append (BS.runPut $ BS.showp 
(mymatrix!(row,col))) s
    in DL.append lineendw8 $ foldr f akku [0..numcols-1]



main :: IO ()
main = do

-- read file into UArray Int, Int) Double ...
-- ....
-- ... and write it to file again

     let  bs = showAll mymatrix numrows numcols
     DL.writeFile "writeOut.dat" bs


This was more in order to show goodwill than to present a solution, of 
course. It actually works, but, ugly as it is, it is by no means faster 
than the corresponding procedure with strings. So my question: what is 
the canonical way to write doubles to file? I guess this question must 
have been posed already hundred times before, so I would also appreciate 
very much a link to former answers ...

Best regards
Johannes Engels



More information about the Beginners mailing list