[Haskell-cafe] Nice way to calculate character frequency in a string

Sebastian Sylvan sebastian.sylvan at gmail.com
Tue Oct 25 13:42:54 EDT 2005


On 10/25/05, Charles SDudu <iwin_1 at hotmail.com> wrote:
> It seems like the fastest way to build a list of frequency from a string is
> this :
> import Data.Array (assocs, accumArray, Array)
> frequency :: String -> [(Char,Int)]
> frequency = filter (\f -> snd f>0) . assocs . accumArray (+) 0 ('\0',
> '\255') . map (\x->(x,1))
> -- (~1.3 sec on a string of 700ko)
>
> But if we want to make this polymorphic we have to do this :
> import Data.Ix (Ix)
> frequency :: (Ix a, Bounded a) => [a] -> [(a,Int)]
> frequency = filter (\f -> snd f>0) . assocs . accumArray (+) 0 (minBound,
> maxBound) . map (\x->(x,1))
> -- (>5 sec on a string of 700ko)
> but it's really much slower, it seems like min/maxBound are check each time
> or I dont know, and it's restrictive on the type of the elements of the
> list.

Most likely maxBound is waaaaaay larger than '\255'. On my system it's
'\1114111'. So it's not really a fair comparison. Perhaps the
frequency function should just take the bounds as a parameter?
Also, you may want to look into making the array you're creating
unboxed (should work by just importing Data.Array.Unboxed instead, or
giving the type of the result of accumArray "UArray a Int"
explicitly). On my system this gave about 40% better performance on
some random text file I found.

Also, you may use STArrays (I think they come in unboxed as well) for
stateful code, which may be even faster (unless accumArray does some
neat trick to make it O(m) where m is the number of index/value
pairs).

/S

--
Sebastian Sylvan
+46(0)736-818655
UIN: 44640862


More information about the Haskell-Cafe mailing list