[Haskell-cafe] Re: Nice way to calculate character frequency in a string

Thomas Sutton thsutton at gmail.com
Tue Oct 25 09:00:40 EDT 2005


On 10/25/05, Jon Fairbairn <Jon.Fairbairn at cl.cam.ac.uk> wrote:
> Well, you can do this:
>
>    import Array
>
>    calc::String->[(Char, Int)]
>
>    calc = filter (\p-> snd p > 0)
> 	  . assocs
> 	  . accumArray (+) 0 (minBound, maxBound)
> 	  . (map (\x->(x,1)))
>
> You can import whatever sort of array you want, so long as
> it has assocs and accumArray.  You don't want to change
> minBound and maxBound to toEnum x, since they are the bounds
> of Char, which might or might not be 0 and 255.

Is there anything wrong with:

	import Data.List
	
	count :: (Ord a) => [a] -> [(a, Int)]
	count  = map (\x -> (head x, length x)) . group . sort

Unless I'm missing something, generating and filtering a list
like \NUL..\111411 (which is what calc does on my machine) is
going to be more expensive than playing around with lists for
all but the longest strings. As far as I can tell, count is
O(n log n) in the length of the string whereas calc has such
a massive constant (producing and filtering the list of
\NUL..\111411, most of which will be 0's), that it will
overwhelm the cost of the counting for most strings.

Cheers,
Thomas Sutton


More information about the Haskell-Cafe mailing list