Hello Daniel:<br><br>Thanks! <br>I employed mapM&#39;_ but I am still getting the space leak.<br>Any other hint? <br><br><br><br>Arnoldo<br><br><div class="gmail_quote">On Wed, Mar 10, 2010 at 10:40 PM, Daniel Fischer <span dir="ltr">&lt;<a href="mailto:daniel.is.fischer@web.de">daniel.is.fischer@web.de</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Am Mittwoch 10 März 2010 21:45:56 schrieb Arnoldo Muller:<br>

<div class="im">&gt; Hello,<br>

&gt;<br>

&gt; I am learning haskell and I found a space leak that I find difficult to<br>

&gt; solve. I&#39;ve been asking at #haskell but we could not solve<br>

&gt; the issue.<br>

&gt;<br>

&gt; I want to lazily read a set of 22 files of about 200MB each, filter them<br>

&gt; and then I want to output the result into a unique file.<br>

&gt; If I modify the main function to work only with one input file,  the<br>

&gt; program runs without issues. I will call this version A.<br>

&gt; Version B  uses a mapM_ to iterate over a list of filenames and uses<br>

&gt; appendFile to output the result of filtering each file.<br>

&gt; In this case the memory usage grows sharply and quickly (profiles show<br>

&gt; constant memory growth). In less than a minute, memory<br>

&gt; occupation will make my system hang with swapping.<br>

<br>

</div>No work is been done until the end, when all is tried to be done<br>

simultaneously. Make sure genomeExecute ... input1 has actually finished<br>

its work before genomeExecute ... input2 starts etc.<br>

<br>

One way is to use a stricter version of sequence_,<br>

<br>

sequence&#39;_ :: Monad m =&gt; [m a] -&gt; m ()<br>

sequence&#39;_ (x:xs) = do<br>

    a &lt;- x<br>

    a `seq` sequence&#39;_ xs<br>

sequence&#39;_ [] = return ()<br>

<br>

(nicer with BangPatterns, but not portable), and<br>

<br>

mapM&#39;_ f = sequence&#39;_ . map f<br>

<br>

Another option is making genomeExecute itself stricter.<br>

<div><div></div><div class="h5"><br>

&gt;<br>

&gt; This is version B:<br>

&gt;<br>

&gt; ------------------------------- Program B<br>

&gt; ------------------------------------------------------------------------<br>

&gt;-------------------------------------------- import Data.List<br>

&gt; import System.Environment<br>

&gt; import System.Directory<br>

&gt; import Control.Monad<br>

&gt;<br>

&gt;<br>

&gt; -- different types of chromosomes<br>

&gt; data Chromosome =    C1<br>

&gt;<br>

&gt;                 | C2<br>

&gt;                 | C3<br>

&gt;                 | C4<br>

&gt;                 | C5<br>

&gt;                 | C6<br>

&gt;                 | C7<br>

&gt;                 | C8<br>

&gt;                 | C9<br>

&gt;                 | C10<br>

&gt;                 | C11<br>

&gt;                 | C12<br>

&gt;                 | C13<br>

&gt;                 | C14<br>

&gt;                 | C15<br>

&gt;                 | C16<br>

&gt;                 | C17<br>

&gt;                 | C18<br>

&gt;                 | C19<br>

&gt;                 | CX<br>

&gt;                 | CY<br>

&gt;                 | CMT<br>

&gt;<br>

&gt;                   deriving (Show)<br>

&gt; -- define a window<br>

&gt; type Sequence = [Char]<br>

&gt; -- Window data<br>

&gt; data Window = Window { sequen :: Sequence,<br>

&gt;                        chrom :: Chromosome,<br>

&gt;                        pos   :: Int<br>

&gt;                      }<br>

&gt; -- print a window<br>

&gt; instance Show Window where<br>

&gt;     show w =  (sequen w) ++ &quot;\t&quot; ++ show (chrom w) ++ &quot;\t&quot; ++ show (pos<br>

&gt; w)<br>

&gt;<br>

&gt; -- Reading fasta files with haskell<br>

&gt;<br>

&gt; -- Initialize the<br>

&gt; main = do<br>

&gt;        -- get the arguments (intput is<br>

&gt;        [input, output, windowSize] &lt;- getArgs<br>

&gt;        -- get directory contents (only names)<br>

&gt;        names &lt;- getDirectoryContents input<br>

&gt;        -- prepend directory<br>

&gt;        let fullNames = filter isFastaFile $ map (\x -&gt; input ++ &quot;/&quot; ++<br>

&gt; x) names<br>

&gt;        let wSize = (read windowSize)::Int<br>

&gt;        -- process the directories<br>

&gt;        mapM (genomeExecute output wSize filterWindow)  fullNames<br>

&gt;<br>

&gt;<br>

&gt; -- read the files one by one and write them to the output file<br>

&gt; genomeExecute :: String -&gt; Int -&gt; (Window -&gt; Bool) -&gt; String -&gt; IO ()<br>

&gt; genomeExecute  outputFile windowSize f inputFile = do<br>

&gt;   fileData &lt;- readFile inputFile<br>

&gt;   appendFile outputFile $ fastaExtractor fileData windowSize f<br>

<br>

<br>

</div></div></blockquote></div><br>