[Haskell-cafe] How to deal with huge text file?

Magicloud Magiclouds magicloud.magiclouds at gmail.com
Tue May 25 02:12:13 EDT 2010


Yes, this code works with a little hack. Thank you.

On Tue, May 25, 2010 at 11:06 AM, Daniel Fischer
<daniel.is.fischer at web.de> wrote:
> On Tuesday 25 May 2010 04:26:07, Ivan Miljenovic wrote:
>> On 25 May 2010 12:20, Magicloud Magiclouds
>>
>> <magicloud.magiclouds at gmail.com> wrote:
>> > This is the function. The problem sure seems like something was
>> > preserved unexpected. But I cannot find out where is the problem.
>> >
>> > seperateOutput file =
>> >  let content = lines file
>> >      indexOfEachOutput_ = fst $ unzip $ filter (\(i, l) ->
>> >                                                 " Log for "
>> > `isPrefixOf` l ) $ zip [0..] content indexOfEachOutput =
>> > indexOfEachOutput_ ++ [length content] in
>>
>>      ^^^^^^^^^^^^^^^^
>>
>>      Expensive bit
>>
>> >  map (\(a, b) ->
>> >         drop a $ take b content
>> >      ) $ zip indexOfEachOutput $ tail indexOfEachOutput
>>
>> You're not "streaming" the String; you're also keeping it around to
>> calculate the length (I'm also not sure how GHC optimises that if at
>> all; it might even re-evaluate the length each time you use
>> indexOfEachOutput.
>
> Not that it helps, but it evaluates the length only once.
> But it does that at the very end, when dealing with the last log.
>
>>
>> The zipping of indexOfEachOutput should be OK without that length at
>> the end, as it will lazy construct the zipped list (only evaluating up
>> to two values at a time).  However, you'd be better off using "zipWith
>> f" rather than "map f . zip".
>
> There'd still be the problem of
>
> drop a $ take b content
>
> , so nothing can be garbage collected before everything's done.
>
> separateOutpout file =
>    let contents = lines file
>        split = break ("Log for " `isPrefixOf`)
>        msplit [] = Nothing
>        msplit lns = Just (split lns)
>    in drop 1 $ unfoldr msplit contents
>
> should fix it.
>
>



-- 
竹密岂妨流水过
山高哪阻野云飞


More information about the Haskell-Cafe mailing list