[Haskell-cafe] Nice way to calculate character frequency in a string

Charles SDudu iwin_1 at hotmail.com
Tue Oct 25 05:40:22 EDT 2005


Hello, I need to calculate the frequency of each character in a String. And 
if I can do this really well in C, I dont find a nice (and fast) answer in 
haskell. I tried several functions, listed below, and even the fastest do a 
lot of unnecessary things :

calc :: String -> [ (Char, Int) ]

-- 3.0s normally (without profiling)
-- time 10-12% alloc 59% (info from profiling)
-- so it's the fastest when I profile but not when I compile normally
-- mutable array may be better but it's to complicated for me

calc =	filter (\p -> snd p > 0) . assocs .
		foldl (\k c -> unsafeReplace k [(fromEnum c, (unsafeAt k (fromEnum c))+1)] 
) k
		where k = array (toEnum 0, toEnum 255) [(toEnum i, 0) | i <- [0 .. 255]] 
:: UArray Char Int


-- 2.1s normally
-- time 15-19% alloc 40% (info from profiling)
-- so for true, it's the best but the sort and group probably do unnecessary 
things
calc s = map (\l -> (head l, length l)) $ group $ sort s

-- 3.4s normally
-- time 58% alloc 0% (info from profiling)
-- this one dont do unnecessary things but has to read the file again for 
each character
-- calc s = map (\c -> (c, foldl (\a b -> if b==c then a+1 else a) 0 s)) $ 
nub s

-- 22s normally
-- time 85% alloc 92% (info from profiling)
-- this one read the file only one time but is really slow
calc = foldl (addfreq) []
	where addfreq f c =	let
						xs1 = takeWhile (\f -> fst f /= c) f
						xs2 = dropWhile (\f -> fst f /= c) f
						xs = if null xs2 then [(c,1)] else ((fst . head) xs2, (snd . head) xs2 
+ 1) : tail xs2
			 			in  xs1 ++ xs

-- I have a lot of even slower version but I wont include them
-- each compilation was done with GHC 6.4.1 with the -O flag and with -O 
-prof -auto-all for profiling

Thanks for your answer,
Charles

PS : Yes, english is not my mother language... :-(




More information about the Haskell-Cafe mailing list