[Haskell-cafe] Digrams

Dominic Steinitz dominic.steinitz at blueyonder.co.uk
Sat Feb 11 07:25:11 EST 2006


I've quickly put this together to measure frequencies of pairs of letters 
(e.g. 1st and 2nd) in words. It works fine on a small test data sets but I 
have a feeling that it will perform poorly as it spends a lot of time 
updating a 26*26 array. Before I throw a dictionary at it, does anyone have 
any suggestions?

Thanks, Dominic.
-------------- next part --------------
import System.IO
import Data.Char
import Data.Array
import Data.List

main =
   do h <- openFile "girls2005.txt" ReadMode
      c <- hGetContents h
      let freqs1 = g 1 2 (lines c) digramArr
          xs = map putStrLn . 
               map show . 
               reverse . 
               sort . 
               map Cell . 
               assocs $ freqs1
      sequence_ xs
      putStrLn "Finished"

newtype Cell = Cell ((Char,Char),Int)
   deriving Eq

instance Ord Cell where
   Cell (_,i) <= Cell (_,j) = i <= j

instance Show Cell where
   show (Cell ((i,j),f)) = i : ',' : j : ',' : show f

letters = ['A'..'Z']
 
digramElems = [((i,j),0) | i <- letters, j <- letters]

digramArr = array (('A','A'),('Z','Z')) digramElems
 
f n m s a = 
   a // [((i,j),x+1)]
   where i = toUpper (s!!(n-1))
         j = toUpper (s!!(m-1))
         x = a!(i,j)
 
g n m [] a = a
g n m (s:ss) a = g n m ss (f n m s a)



More information about the Haskell-Cafe mailing list