[Haskell-cafe] Efficient way to edit a file

Donald Bruce Stewart dons at cse.unsw.edu.au
Thu Jun 1 22:34:51 EDT 2006


dons:
> briqueabraque:
> >   Hi,
> > 
> >   I need to edit big text files (5 to 500 Mb). But I just need to 
> > change one or two small lines, and save it. What is the best way to do 
> > that in Haskell, without creating copies of the whole files?
> > 

Thinking further, since you want to avoid copying on the disk, you need
to be able to keep the edited version in memory. So the strict
bytestring would be best, for example:

    import System.Environment
    import qualified Data.ByteString.Char8 as B

    main = do
        [f] <- getArgs
        B.writeFile f . B.unlines . map edit . B.lines =<< B.readFile f

        where
            edit :: B.ByteString -> B.ByteString
            edit s | (B.pack "Instances") `B.isPrefixOf` s = B.pack "EDIT"
                   | otherwise                             = s

Edits a 100M file in

    $ ghc -O -funbox-strict-fields A.hs -package fps 
    $ time ./a.out /home/dons/data/100M
    ./a.out /home/dons/data/100M  1.54s user 0.76s system 13% cpu 17.371 total

You could probably tune this further.

-- Don


More information about the Haskell-Cafe mailing list