[Haskell-cafe] Space leak

Arnoldo Muller arnoldomuller at gmail.com
Wed Mar 10 17:01:28 EST 2010


Hello Daniel:

Thanks!
I employed mapM'_ but I am still getting the space leak.
Any other hint?



Arnoldo

On Wed, Mar 10, 2010 at 10:40 PM, Daniel Fischer
<daniel.is.fischer at web.de>wrote:

> Am Mittwoch 10 März 2010 21:45:56 schrieb Arnoldo Muller:
> > Hello,
> >
> > I am learning haskell and I found a space leak that I find difficult to
> > solve. I've been asking at #haskell but we could not solve
> > the issue.
> >
> > I want to lazily read a set of 22 files of about 200MB each, filter them
> > and then I want to output the result into a unique file.
> > If I modify the main function to work only with one input file,  the
> > program runs without issues. I will call this version A.
> > Version B  uses a mapM_ to iterate over a list of filenames and uses
> > appendFile to output the result of filtering each file.
> > In this case the memory usage grows sharply and quickly (profiles show
> > constant memory growth). In less than a minute, memory
> > occupation will make my system hang with swapping.
>
> No work is been done until the end, when all is tried to be done
> simultaneously. Make sure genomeExecute ... input1 has actually finished
> its work before genomeExecute ... input2 starts etc.
>
> One way is to use a stricter version of sequence_,
>
> sequence'_ :: Monad m => [m a] -> m ()
> sequence'_ (x:xs) = do
>    a <- x
>    a `seq` sequence'_ xs
> sequence'_ [] = return ()
>
> (nicer with BangPatterns, but not portable), and
>
> mapM'_ f = sequence'_ . map f
>
> Another option is making genomeExecute itself stricter.
>
> >
> > This is version B:
> >
> > ------------------------------- Program B
> > ------------------------------------------------------------------------
> >-------------------------------------------- import Data.List
> > import System.Environment
> > import System.Directory
> > import Control.Monad
> >
> >
> > -- different types of chromosomes
> > data Chromosome =    C1
> >
> >                 | C2
> >                 | C3
> >                 | C4
> >                 | C5
> >                 | C6
> >                 | C7
> >                 | C8
> >                 | C9
> >                 | C10
> >                 | C11
> >                 | C12
> >                 | C13
> >                 | C14
> >                 | C15
> >                 | C16
> >                 | C17
> >                 | C18
> >                 | C19
> >                 | CX
> >                 | CY
> >                 | CMT
> >
> >                   deriving (Show)
> > -- define a window
> > type Sequence = [Char]
> > -- Window data
> > data Window = Window { sequen :: Sequence,
> >                        chrom :: Chromosome,
> >                        pos   :: Int
> >                      }
> > -- print a window
> > instance Show Window where
> >     show w =  (sequen w) ++ "\t" ++ show (chrom w) ++ "\t" ++ show (pos
> > w)
> >
> > -- Reading fasta files with haskell
> >
> > -- Initialize the
> > main = do
> >        -- get the arguments (intput is
> >        [input, output, windowSize] <- getArgs
> >        -- get directory contents (only names)
> >        names <- getDirectoryContents input
> >        -- prepend directory
> >        let fullNames = filter isFastaFile $ map (\x -> input ++ "/" ++
> > x) names
> >        let wSize = (read windowSize)::Int
> >        -- process the directories
> >        mapM (genomeExecute output wSize filterWindow)  fullNames
> >
> >
> > -- read the files one by one and write them to the output file
> > genomeExecute :: String -> Int -> (Window -> Bool) -> String -> IO ()
> > genomeExecute  outputFile windowSize f inputFile = do
> >   fileData <- readFile inputFile
> >   appendFile outputFile $ fastaExtractor fileData windowSize f
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100310/f0b585d8/attachment.html


More information about the Haskell-Cafe mailing list