[Haskell-cafe] Nice way to calculate character frequency in a string

Charles SDudu iwin_1 at hotmail.com
Tue Oct 25 12:49:17 EDT 2005


It seems like the fastest way to build a list of frequency from a string is 
this :
import Data.Array (assocs, accumArray, Array)
frequency :: String -> [(Char,Int)]
frequency = filter (\f -> snd f>0) . assocs . accumArray (+) 0 ('\0', 
'\255') . map (\x->(x,1))
-- (~1.3 sec on a string of 700ko)

But if we want to make this polymorphic we have to do this :
import Data.Ix (Ix)
frequency :: (Ix a, Bounded a) => [a] -> [(a,Int)]
frequency = filter (\f -> snd f>0) . assocs . accumArray (+) 0 (minBound, 
maxBound) . map (\x->(x,1))
-- (>5 sec on a string of 700ko)
but it's really much slower, it seems like min/maxBound are check each time 
or I dont know, and it's restrictive on the type of the elements of the 
list.

The best polymorphic version seems to be :
import Data.Map as Map (lookup, assocs, empty, insertWith)
frequency :: (Ord a) => [a] -> [(a,Int)]
frequency xs = Map.assocs $ foldl f Map.empty xs
	where f m x =	let
					m' = Map.insertWith (+) x 1 m
					Just v = Map.lookup x m'
					in seq v m'
-- (~2.1 sec on a string of 700ko)
Which is not as fast as the specialized version on string, but really faster 
and less restrictive than the second.

Thanks everyone for all your answers !
Charles




More information about the Haskell-Cafe mailing list