Hello Daniel:<br><br>Thanks! <br>I employed mapM'_ but I am still getting the space leak.<br>Any other hint? <br><br><br><br>Arnoldo<br><br><div class="gmail_quote">On Wed, Mar 10, 2010 at 10:40 PM, Daniel Fischer <span dir="ltr"><<a href="mailto:daniel.is.fischer@web.de">daniel.is.fischer@web.de</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Am Mittwoch 10 März 2010 21:45:56 schrieb Arnoldo Muller:<br>
<div class="im">> Hello,<br>
><br>
> I am learning haskell and I found a space leak that I find difficult to<br>
> solve. I've been asking at #haskell but we could not solve<br>
> the issue.<br>
><br>
> I want to lazily read a set of 22 files of about 200MB each, filter them<br>
> and then I want to output the result into a unique file.<br>
> If I modify the main function to work only with one input file, the<br>
> program runs without issues. I will call this version A.<br>
> Version B uses a mapM_ to iterate over a list of filenames and uses<br>
> appendFile to output the result of filtering each file.<br>
> In this case the memory usage grows sharply and quickly (profiles show<br>
> constant memory growth). In less than a minute, memory<br>
> occupation will make my system hang with swapping.<br>
<br>
</div>No work is been done until the end, when all is tried to be done<br>
simultaneously. Make sure genomeExecute ... input1 has actually finished<br>
its work before genomeExecute ... input2 starts etc.<br>
<br>
One way is to use a stricter version of sequence_,<br>
<br>
sequence'_ :: Monad m => [m a] -> m ()<br>
sequence'_ (x:xs) = do<br>
a <- x<br>
a `seq` sequence'_ xs<br>
sequence'_ [] = return ()<br>
<br>
(nicer with BangPatterns, but not portable), and<br>
<br>
mapM'_ f = sequence'_ . map f<br>
<br>
Another option is making genomeExecute itself stricter.<br>
<div><div></div><div class="h5"><br>
><br>
> This is version B:<br>
><br>
> ------------------------------- Program B<br>
> ------------------------------------------------------------------------<br>
>-------------------------------------------- import Data.List<br>
> import System.Environment<br>
> import System.Directory<br>
> import Control.Monad<br>
><br>
><br>
> -- different types of chromosomes<br>
> data Chromosome = C1<br>
><br>
> | C2<br>
> | C3<br>
> | C4<br>
> | C5<br>
> | C6<br>
> | C7<br>
> | C8<br>
> | C9<br>
> | C10<br>
> | C11<br>
> | C12<br>
> | C13<br>
> | C14<br>
> | C15<br>
> | C16<br>
> | C17<br>
> | C18<br>
> | C19<br>
> | CX<br>
> | CY<br>
> | CMT<br>
><br>
> deriving (Show)<br>
> -- define a window<br>
> type Sequence = [Char]<br>
> -- Window data<br>
> data Window = Window { sequen :: Sequence,<br>
> chrom :: Chromosome,<br>
> pos :: Int<br>
> }<br>
> -- print a window<br>
> instance Show Window where<br>
> show w = (sequen w) ++ "\t" ++ show (chrom w) ++ "\t" ++ show (pos<br>
> w)<br>
><br>
> -- Reading fasta files with haskell<br>
><br>
> -- Initialize the<br>
> main = do<br>
> -- get the arguments (intput is<br>
> [input, output, windowSize] <- getArgs<br>
> -- get directory contents (only names)<br>
> names <- getDirectoryContents input<br>
> -- prepend directory<br>
> let fullNames = filter isFastaFile $ map (\x -> input ++ "/" ++<br>
> x) names<br>
> let wSize = (read windowSize)::Int<br>
> -- process the directories<br>
> mapM (genomeExecute output wSize filterWindow) fullNames<br>
><br>
><br>
> -- read the files one by one and write them to the output file<br>
> genomeExecute :: String -> Int -> (Window -> Bool) -> String -> IO ()<br>
> genomeExecute outputFile windowSize f inputFile = do<br>
> fileData <- readFile inputFile<br>
> appendFile outputFile $ fastaExtractor fileData windowSize f<br>
<br>
<br>
</div></div></blockquote></div><br>