[Haskell-cafe] Re: Strange space leak

apfelmus apfelmus at quantentunnel.de
Mon Jul 14 08:08:43 EDT 2008


Grzegorz Chrupala wrote:
> Hi all,
> I just noticed that a tiny change to the program which I posted recently in
> the "More idiomatic use of strictness" thread causes a space leak.
>
> The code is:
> {-# LANGUAGE BangPatterns, PatternGuards #-}
> import Data.List (foldl')
> import Data.Char
> split delim s
> 	| [] <- rest = [token]
> 	| otherwise = token : split delim (tail rest)
>    where (token,rest) = span (/=delim) s
> 
> main = do
>   putStrLn =<< fmap (show . stats ["the","a","and"] . split "<DOC>" . words)
> getContents
> 
> stats ws docs =  foldl' f ((map (const 0) ws),0) docs
>     where f (dfs,n) d = let dfs' = zipWith (\w df -> (df + fromEnum (w
> `elem` d))) ws dfs
>                         in  sum dfs' `seq` (dfs',n+1)
> 
> If I change this line:
>   putStrLn =<< fmap (show . stats ["the","a","and"] . split "<DOC>" . words)
> getContents
> to this:
>   putStrLn =<< fmap (show . stats ["the","a","and"] . split "<DOC>" . words
> .. map toLower) getContents
> 
> suddenly the programs starts using tons of memory instead of running in
> small constant space.
> What's going on?

Answer:

   split "<DOC>" . words . map toLower = (:[]) . words . map toLower

Since you converted everything to lowercase, the string "<DOC>" will 
never appear in the text, resulting in a single huge document. 
Furthermore, due to  `elem` d , your  stats  function takes space 
proportional to the length of each document it processes.


Beauty & makeup tips:

     putStrLn =<< fmap f getContents
   = putStrLn . f =<< getContents
  ~= interact f

Here's a version with glittering nail polish that should run in constant 
space:

   split y xs = zs : case xs' of
           []    -> []
           _:xs' -> split y xs'
       where (zs,xs') = break (==y) xs

   main = interact $
       show . stats ["the","a","and"] . split "<DOC>" . words

   zipWith' f xs ys = zipWith f xs ys `using` rnf

   stats ws = foldl' (zipWith' (+)) zero
       . map (foldl' (zipWith' max) zero . map bits)
       where
       zero   = map (const 0) ws
       bits v = map (fromEnum . (== v)) ws


Regards,
apfelmus



More information about the Haskell-Cafe mailing list