[Haskell-cafe] Space leak

Daniel Fischer daniel.is.fischer at web.de
Wed Mar 10 18:24:28 EST 2010


Am Mittwoch 10 März 2010 23:01:28 schrieb Arnoldo Muller:
> Hello Daniel:
>
> Thanks!
> I employed mapM'_ but I am still getting the space leak.
> Any other hint?
>

Hmm, offhand, I don't see why that isn't strict enough.
With some datafiles, I could try to investigate.

One question, how does programme C with

main = do
       [input, output, windowSize] <- getArgs
       let wSize = (read windowSize)::Int
       genomeExecute output wSize filterWindow input

behave? Space leak or not?

But yes, a few other hints I have (though they're not likely to squash the 
space leak).

Generally, ByteString IO  is often orders of magnitude faster than String 
IO and uses much less memory, so using (lazy) ByteStrings is worthy of 
consideration.

>
>
> Arnoldo

-- define a window
type Sequence = [Char]
-- Window data
data Window = Window { sequen :: Sequence,
                       chrom :: Chromosome,
                       pos   :: Int
                     }

-- print a window
instance Show Window where
    show w =  (sequen w) ++ "\t" ++ show (chrom w) ++ "\t" ++ show (pos w)

-- Reading fasta files with haskell

-- Initialize the
main = do
       -- get the arguments (intput is
       [input, output, windowSize] <- getArgs
       -- get directory contents (only names)
       names <- getDirectoryContents input
       -- prepend directory
       let fullNames = filter isFastaFile $ map (\x -> input ++ "/" ++ x) 
names

*********
let fullNames = map ((input ++) . ("/" ++)) $ filter isFastaFile names

saves a little work
*********

       let wSize = (read windowSize)::Int
       -- process the directories
       mapM (genomeExecute output wSize filterWindow)  fullNames


-- read the files one by one and write them to the output file
genomeExecute :: String -> Int -> (Window -> Bool) -> String -> IO ()
genomeExecute  outputFile windowSize f inputFile = do
  fileData <- readFile inputFile
  appendFile outputFile $ fastaExtractor fileData windowSize f

*********
The arguments of fastaExtractor should be reversed, then

genomeExecute outputFile windowSize f inputFile
    = appendFile outputFile . fastaExtractor' f windowSize 
                   =<< readFile inputFile
*********

-- 
isFastaFile :: String -> Bool
isFastaFile fileName = isSuffixOf ".fa" fileName


-- fasta extractor (receives a Fasta String and returns a windowed string
ready to be sorted)
-- an example on how to compose several functions to parse a fasta file
fastaExtractor :: String -> Int -> (Window -> Bool) -> String
fastaExtractor input wSize f = printWindowList $ filter f $ readFasta 
 wSize
input

*********
fastaExtractor' f wSize = printWindowList . filter f . readFasta wSize
*********


-- MAIN FILTER that removes N elements from the strings!
filterWindow :: Window -> Bool
filterWindow w = not (elem 'N' (sequen w))

*********
filterWindow w = 'N' `notElem` sequen w
*********

-- print a window list (the printing makes it ready for output as raw data)
printWindowList :: [Window] -> String
printWindowList l = unlines $ map show l

-- read fasta, remove stuff that is not useful from it
-- removes the
readFasta :: Int -> [Char] -> [Window]
readFasta windowSize sequence =
    -- get the header
    let (header:rest) = lines sequence
        chr = parseChromosome header
        in

-- We now do the following:
--      take window                  create counter
remove newlines
   map (\(i, w) -> Window w chr i) $ zip [0..]  $ slideWindow windowSize  $
filter ( '\n' /= )  $ unlines rest

*********
filter ('\n' /=) . unlines

is odd. What about concat? Or

readFasta wSize chrseq
   = case span (/= '\n') chrseq of
       (header, _:rest) ->
           let chr = parseChromosome header
           in map (\(i,w) -> Window w chr i) . zip [0 .. ] . 
                      slideWindow wSize $ filter (/= '\n') rest
       _ -> []

if your input file format had no other newline than the one between header 
and body, that'd be nice.
*********

slideWindow :: Int -> [Char] -> [[Char]]
slideWindow _ [] = []
slideWindow windowSize l@(_:xs)  = take windowSize l : slideWindow
windowSize xs

*********
slideWindow wSize = map (take wSize) . tails
*********



More information about the Haskell-Cafe mailing list