[Haskell-cafe] operating on a hundred files at once

Bulat Ziganshin bulat.ziganshin at gmail.com
Mon Apr 9 14:24:32 EDT 2007


Hello Jefferson,

Monday, April 9, 2007, 9:34:12 PM, you wrote:

if you have enough memory available, the fastest way is to read file
to memory using bytestring, convert it into array of doubles,
repeating this step for all files. then perform your computations. if
you will try to read 100 files simultaneously, this may lead to
extensive disk seeking or cpu cache trashing

... even better, you should read one file, add its values to the
accumulators, then read next file...


> I have a series of NxM numeric tables I'm doing a quick
> mean/variance/t-test etcetera on.  The cell t1 [i,j] corresponds exactly
> to the cells t2..N [i,j], and so it's perfectly possible to read one
> item at a time from each of the 100 files and compute the mean/variance
> etcetera on all cells that way.  So what I propose to do is something
> along the lines of:

> openAndProcess filename = 
> f <- readFile filename
> return (map (L.split ',') . lines $ f)

> main = do 
>         fs <- getArgs
>         let items = map (map read) . map openAndProcess fs 
>         in do print . map (map $ mean) items
>               print . map (map $ variance) items

> How close am I to doing the right thing here? As I understand it, this
> will result in one hundred IO [String] instances being returned by the
> call to (map openAndProcess $ filenames).  Do I need to do something
> special to lift (read), (mean), and (variance), or even (map) into the
> IO monad so they can process the input as needed?

> Thanks in advance,
> -- Jeff

> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe



-- 
Best regards,
 Bulat                            mailto:Bulat.Ziganshin at gmail.com



More information about the Haskell-Cafe mailing list