[Haskell-cafe] Re: memory issues

Fri Feb 27 18:36:51 EST 2009

Bulat is right about making Block's fields strict.

> 
> -- | Get the offsets between entries in a list
> getSizes :: [Integer]  -> [Integer]
> getSizes (x:y:[]) = [y - x]
> getSizes (x:y:ys) = (y - x):(getSizes (y:ys))

You should change the first part to add maxSize:

 > getSizes :: [Integer]  -> [Integer]
 > getSizes (x:y:[]) = [y - x,maxSize]
 > getSizes (x:y:ys) = (y - x):(getSizes (y:ys))

This avoids the ugly use of (++) below.  Note that appending to a singly linked 
list is a bad "code smell":

> 
> -- | creates and returns a list of Blocks, given a file's content.
> blocks :: ByteString -> [Block]
> blocks s = zipWith (Block) offsets sizes
>            where offsets = getOffsets s
>                  sizes   = getSizes offsets
> 
> main :: IO ()
> main = do
>   args <- getArgs
>   content <- B.readFile (args!!0)
>   printf "%s" $ unlines $ map (show) (sort $! blocks content)
> \end{code}

I think the printf "%s" should be replaced by putStr or putStrLn.

The print is forcing the unlines which forces the map which forces the result of 
sort.

The ($!) is nearly pointless...it forces only the first cons (:) cell in the list.

The sort starts comparing the output of blocks by applying compare.  The compare 
forces the snd part of the part from the zipWith, which is the sizes.  The size 
values force the values in the offsets in the fst part of the pair.

The fst part of the pair was actually a lazy thunk returned by the 
C8.readInteger function.  But these do not build up since the are indirectly 
getting forced during the sorting routine.

Hmmm....no quick fix.

-- 
Chris