[Haskell-cafe] Lazy IO and closing of file handles

Claus Reinke claus.reinke at talk21.com
Tue Mar 20 07:44:56 EDT 2007


>[left-fold operator for enumerating the lines of a text file]
..
>> enumLines :: (a -> String -> Either a a) -> a -> FilePath -> IO a
>> enumLines iter accum filename = do
>>   h <- openFile filename ReadMode
>>   flip fix accum $
>>     \iterate accum -> do
>>       try_line <- try (hGetLine h)
>>       case try_line of
>>         Left e -> hClose h >> return accum
>>         Right line -> do
>>           case iter accum line of
>>             Left accum -> hClose h >> return accum
>>             Right accum -> iterate accum
..
>> getHeaders :: S.Set String -> FilePath -> IO (S.Set String, M.Map String String)
>> getHeaders hdrs file = enumLines findHdrs (hdrs,M.empty) file where

we can keep the Left/Right implicit, using either:

    getHeaders1 hdrs file = enumLines findHdrs (hdrs,M.empty) file
      where
      enumLines iter accum filename = do
        h <- openFile filename ReadMode
        flip fix accum $ \iterate accum -> 
            join $ (fmap (either
                           (const $ hClose h >> return accum)
                           (either ((hClose h >>) . return) iterate . iter accum)))
                         (try (hGetLine h))

or extract the reusable loop-with-exit-by-either functionality:

    loopME m stop continue acc = m >>= either (stop acc) (continue (loopME m stop continue) acc)

    getHeaders2 hdrs file = enumLines findHdrs (hdrs,M.empty) file
      where
      enumLines iter accum f = do
        h <- openFile f ReadMode
        loopME (try (hGetLine h))
               (\acc left->hClose h >> return acc)
               (\loop acc right->either ((hClose h >>) . return) loop (iter acc right))
               accum

or sneak some lazy i/o back in, using a fold-with-exit-by-either, similar to loopME:

    withFile       path m = bracket (openFile path ReadMode) hClose m
    withContentsOf path f = withFile path ((((return $!) . f ) =<<) . hGetContents)
    withLinesOf    path f = withContentsOf path (f . lines)

    foldE f a []     = a
    foldE f a (x:xs) = either id (\a'->foldE f a' xs) (f a x)

    getHeaders3 hdrs file = withLinesOf file (foldE findHdrs (hdrs,M.empty))

> To use this, you provide an "iteratee", a function which takes an
> accumulator and a line from the file, and returns a new accumulator
> embedded in an Either. Using the Left branch causes immediate
> termination of the enumeration. For example, to search for the first
> occurrence of each of a set of email headers:

>>   findHdrs accum@(wanted,found) line =
>>     if null line
>>       then Left accum
>>       else
>>         case headerLine line of
>>           Nothing -> Right accum
>>           Just hdr ->
>>             case findDelete hdr wanted of
>>               Nothing -> Right accum
>>               Just wanted ->
>>                 let accum = (wanted, M.insert hdr line found) in
>>                   if S.null wanted
>>                     then Left accum
>>                     else Right accum
>> 
>> headerLine :: String -> Maybe String
>> headerLine (':':xs) = Just []
>> headerLine (x:xs) = fmap (x:) (headerLine xs)
>> headerLine [] = Nothing
>> 
>> findDelete :: Ord a => a -> S.Set a -> Maybe (S.Set a)
>> findDelete e s = if S.member e s
>>   then Just (S.delete e s)
>>   else Nothing
> 
> It's a bit of a case-analysis nightmare

indeed, this part can be cleaned up considerably, using Monad Maybe:

    findHdrs accum@(wanted,found) line =
      if null line || S.null wanted
        then Left accum
        else maybe (Right accum) id $ do 
                (field,value) <- headerLine line 
                wanted' <- findDelete field wanted
                let found' = M.insert field value found
                return $! (Right $! ((,) $! wanted') $! found')

    headerLine :: String -> Maybe (String,String)
    headerLine xs = do (field,':':value) <- return (span (/=':') xs) 
                       let value' = dropWhile isSpace value
                       return $! ((,) $! field) $! strictly value'

    findDelete :: Ord a => a -> S.Set a -> Maybe (S.Set a)
    findDelete e s = guard (S.member e s) >> return (S.delete e s)

    strictly l = length l `seq` l

running the three variants over a moderately sized directory (>3k emails, one 
including a hugs-tarball;-), 1/2 are roughly equivalent, but Hugs claims that 3
allocates less and needs fewer garbage collections than 1/2, while GHC claims 
that it is the other way round..

claus



More information about the Haskell-Cafe mailing list