[Haskell-cafe] Re: Lazy IO and closing of file handles

Pete Kazmier pete-expires-20070513 at kazmier.com
Mon Mar 19 12:13:27 EDT 2007


"Matthew Brecknell" <haskell at brecknell.org> writes:

> Pete Kazmier:
>> I attempted to read Oleg's fold-stream implementation [1] as this
>> sounds quite appealing to me, but I was completely overwhelmed,
>> especially with all of the various type signatures used.  It would be
>> great if one of the regular Haskell bloggers (Tom Moertel are you
>> reading this?) might write a blog entry or two interpreting his
>> implementation for those of us starting out in Haskell perhaps by
>> starting out with a non-polymorphic version so as to emphasize the
>> approach.
>> 
>> [1] http://okmij.org/ftp/Haskell/fold-stream.lhs
>
> The basic idea of the paper is the use of a left-fold operator as the
> primary interface for enumarating collections. The recursive version
> (less general than the non-recursive version) of a left-fold operator
> for enumerating the lines of a text file might look something like this:
>
>> import Control.Monad.Fix
>> import Control.Exception
>> import Data.List
>> import qualified Data.Set as S
>> import qualified Data.Map as M
>> import System.IO
>> 
>> enumLines :: (a -> String -> Either a a) -> a -> FilePath -> IO a
>> enumLines iter accum filename = do
>>   h <- openFile filename ReadMode
>>   flip fix accum $
>>     \iterate accum -> do
>>       try_line <- try (hGetLine h)
>>       case try_line of
>>         Left e -> hClose h >> return accum
>>         Right line -> do
>>           case iter accum line of
>>             Left accum -> hClose h >> return accum
>>             Right accum -> iterate accum

I understand the intent of this code, but I am having a hard time
understanding the implementation, specifically the combination of
'fix', 'flip', and 'interate'.  I looked up 'fix' and I'm unsure how
one can call 'flip' on a function that takes one argument.

> To use this, you provide an "iteratee", a function which takes an
> accumulator and a line from the file, and returns a new accumulator
> embedded in an Either. Using the Left branch causes immediate
> termination of the enumeration. For example, to search for the first
> occurrence of each of a set of email headers:
>
>> getHeaders :: S.Set String -> FilePath -> IO (S.Set String, M.Map String String)
>> getHeaders hdrs file = enumLines findHdrs (hdrs,M.empty) file where
>>   findHdrs accum@(wanted,found) line =
>>     if null line
>>       then Left accum
>>       else
>>         case headerLine line of
>>           Nothing -> Right accum
>>           Just hdr ->
>>             case findDelete hdr wanted of
>>               Nothing -> Right accum
>>               Just wanted ->
>>                 let accum = (wanted, M.insert hdr line found) in
>>                   if S.null wanted
>>                     then Left accum
>>                     else Right accum
>> 
>> headerLine :: String -> Maybe String
>> headerLine (':':xs) = Just []
>> headerLine (x:xs) = fmap (x:) (headerLine xs)
>> headerLine [] = Nothing
>> 
>> findDelete :: Ord a => a -> S.Set a -> Maybe (S.Set a)
>> findDelete e s = if S.member e s
>>   then Just (S.delete e s)
>>   else Nothing
>
> It's a bit of a case-analysis nightmare, but when comparing this to
> previous approaches, note that file traversal and processing are cleanly
> separated, file handle closure is guaranteed to be timely, file
> traversal stops as soon as all the required headers have been found,
> memory usage is minimised.

Very nice.  I like the clean separation, but as you say, its one ugly
bit of code compared to my original code, although much more elegant
no doubt.

> I hope that helps.

Very much so.  Thank you for you help.



More information about the Haskell-Cafe mailing list