[Haskell-beginners] space leak processing multiple compressed files

Ian Knopke ian.knopke at gmail.com
Tue Sep 4 15:34:13 CEST 2012


Hi Lorenzo,

You're correct. Well spotted! I must have created that doing some copy
and paste. The program is basically as you suggested it. Here's a
corrected version:

main = do

    -- get a list of file names
    filelist <- getFileList "testsetdir"

    -- process each compressed file
    files <- mapM (\x -> do
                            thisfile <- B.readFile x
                            return (Z.decompress thisfile)
                    ) filelist

    display $ processEntries files

    putStrLn "finished"

-- processEntries
-- processEntries is defined elsewhere, but basically does some string
-- processing per line, counts the number of resulting elements and
sums them per file
processEntries :: [B.ByteString] -> Int
processEntries xs = foldl' (\x y -> x + countItems (B.lines y)) 0 xs

I'm still running into memory issues though. I think it's the mapM
loop above and that each file is not being released after reading
through it. Does that seem reasonable, and is there any way to write
this better?


Ian



... and countItems uses foldl'
On Tue, Sep 4, 2012 at 1:55 PM, Lorenzo Bolla <lbolla at gmail.com> wrote:
> On Tue, Sep 4, 2012 at 11:00 AM, Ian Knopke <ian.knopke at gmail.com> wrote:
>> main = do
>>
>>     -- get a list of file names
>>     filelist <- getFileList "testsetdir"
>>
>>     -- process each compressed file
>>     files <- mapM (\x -> do
>>                             thisfile <- B.readFile x
>>                             return (Z.decompress thisfile)
>>                     ) filelist
>>
>>
>>     display $ processEntries files
>>
>>
>>     putStrLn "finished"
>>
>> -- processEntries
>> -- processEntries is defined elsewhere, but basically does some string
>> processing per line,
>> -- counts the number of resulting elements and sums them per file
>> processEntries :: [B.ByteString] -> Int
>> processEntries xs = foldl' (\x y -> x + processEntries (B.lines y)) 0 xs
>
> The problem seems to be your `processEntries` function: it is
> recursively defined, and as far as I understand, it's never going to
> end because "y" (inside the lambda function) is always going to be the
> full list of files (xs).
>
> Probably, `processEntries` should be something like:
>
> processEntries = foldl' (\acc fileContent -> acc + processFileContent
> fileContent) 0
>
> processFileContent :: B.ByteString -> Int
> processFileContent = -- count what you have to, in a file
>
> In fact, processEntries could be rewritten without using foldl':
> processEntries = sum . map processFileContent
>
> hth,
> L.



More information about the Beginners mailing list