[Haskell-cafe] memory issues

Daniel Fischer daniel.is.fischer at web.de
Fri Feb 27 18:21:31 EST 2009


Am Freitag, 27. Februar 2009 23:18 schrieb Rogan Creswick:
>
> \begin{code}
> -- Compiled with:
> -- $ ghc --make offsetSorter.hs
> --
> --  (ghc v. 6.8.2)
> --
> -- Run with:
> -- $ time ./offsetSorter data/byteOffsets.txt > haskOffsets.txt
> -- offsetSorter: out of memory (requested 1048576 bytes)
> --
> -- real	4m12.130s
> -- user	3m4.812s
> -- sys	0m5.660s
> --(OOM happened after consuming just over 3000mb of Virt, 2.6gb Res,
> according to top.)
> --
>
> import System (getArgs)
> import Data.Maybe
> import Monad
> import Text.Printf (printf)
> import Data.Function (on)
> import Data.List (sort)
> import Data.ByteString (ByteString)
> import qualified Data.ByteString.Char8 as C8
> import qualified Data.ByteString as B
>
>
> -- get the lines
> -- parse each line to get the offset.
> -- scan the list of offsets
>
> -- | The full file size:
> maxSize :: Integer
> maxSize = 2807444044080
>
> -- | Block is a contiguous chunk of data.
> -- The first entry is the offset, the second is the length.
> data Block = Block {
>       offset::Integer
>     , size::Integer
>     } deriving (Eq)
>
> -- | Ordering of Blocks is based entirely on the block size.
> instance Ord Block where
>     compare = compare `on` size
>
> instance Show Block where
>     show (Block o s) = (show o) ++ "  " ++ (show s)
>
> -- turn the file into a list of offsets:
> getOffsets :: ByteString -> [Integer]
> getOffsets = catMaybes . map parseOffset . C8.lines
>
> -- | Pull out the offsets frome a line of the file.
> parseOffset :: ByteString -> Maybe Integer
> parseOffset s = do
>   (i, _) <- C8.readInteger (C8.filter (/=':') s)

Why the C8.filter (/= ':')?
That just costs and doesn't help anything (in fact, if your file contains 
lines like 1234:5678, it gives wrong results).

If you know that your file contains only lines of the form offset: <page>,
you can have

getOffsets = map (fst . fromJust . C8.readInteger) . C8.lines

which seems to do a little good.

>   Just i
>
> -- | Get the offsets between entries in a list
> getSizes :: [Integer]  -> [Integer]
> getSizes (x:y:[]) = [y - x]
> getSizes (x:y:ys) = (y - x):(getSizes (y:ys))
>
> -- | creates and returns a list of Blocks, given a file's content.
> blocks :: ByteString -> [Block]
> blocks s = zipWith (Block) offsets sizes
>            where offsets = getOffsets s
>                  sizes   = getSizes (offsets ++ [maxSize])
>
> main :: IO ()
> main = do
>   args <- getArgs
>   content <- B.readFile (args!!0)


>   printf "%s" $ unlines $ map (show) (sort $! blocks content)

Bad!
Use
	mapM_ print $ sort $ blocks content

> \end{code}



More information about the Haskell-Cafe mailing list