[Haskell-beginners] Simple data summarization

Tue Mar 10 12:46:43 EDT 2009

Andy,

I came up with this solution that works like you described:

import Data.List.Split

mysplit = wordsBy (==',')

toPairs :: [String] -> [(String, [String])]
toPairs (header:rows) = foldr f (initPairs header) $ splitRows rows
    where f row acc = zipWith (\f (h,r) -> (h,f:r)) row acc
          initPairs header = map (\h -> (h, [])) $ mysplit header
          splitRows rows = map (mysplit) rows

summarizeByWith :: String -> (Int -> Int -> Int) -> [(String,
[String])] -> (String, Int)
summarizeByWith var agg pairs = case (lookup var pairs) of
    Just vals -> (var, foldl agg 0 $ map (read) vals)
    otherwise -> ("", 0)

main = interact (show . summarizeByWith "Books" (+) . toPairs . lines)

However in my opinion a solution like that proposed by Roland is
preferable since it can process the input line by line instead of
storing it all in memory. It seems also simpler and propably more
efficient.

However it was interesting hacking at your algorithm because it made
me realize how you can use lists of pairs (association lists) in
haskell where you might have used hash tables in another language.

Cheers,

Patrick

On Tue, Mar 10, 2009 at 4:33 AM, Andy Elvey <andy.elvey at paradise.net.nz> wrote:
> Hi all -
> In the process of learning Haskell I'm wanting to do some simple data
> summarization.
> ( Btw, I'm looking at putting any submitted code for this in the "cookbook"
> section of
> the Haskell wiki.  Imo it would be very useful there as a "next step" up
> from just reading
> in a file and printing it out.  )
> This would involve reading in a delimited file like this - ( just a
> contrived example of how many books
> some people own ) -
>
> Name,Gender,Age,Ethnicity,Books
> Mary,F,14,NZ European, 11
> Brian,M,13,NZ European, 6
> Josh,M,12,NZ European, 14
> Regan,M,14,NZ Maori, 9
> Helen,F,15,NZ Maori, 17
> Anna,F,14,NZ European, 16
> Jess,F,14,NZ Maori, 21
>
> .... and doing some operations on it. As you can see, the file has column
> headings - I prefer to be able to manipulate data with
> headings (as it is what I do a lot of at work, using another programming
> language).
>
> I've tried to break the problem down into small parts as follows. a) Read
> the file into a list of pairs.
> The first element of the pair would be the column heading.
> The second will be a list containing the data.
> For example, ("Name",  [Mary,  Brian,  Josh,  Regan, ..... ]  )
> b) Select a numeric variable to summarise ( "Books" in this example) c) Do a
> fold to summarize the variable. I think a left-fold would be the one to use
> here, but I may
> be wrong....
>
> After looking through previous postings on this list, I found some code
> which is somewhat similar to what I'm after (although the data it was
> crunching is very different).  This is what I've come up with so far -
>
> summarize [] = []
> summarize ls = let
>       byvariable = head ls
>       numeric_variable = last ls
>       sum = foldl (+) 0 $ numeric_variable
>
>   in (byvariable, sum) : sum ls
>
> main = interact (unlines . map show . summarize . lines)
> I think this might be a useful start, but I still need to read the data into
> a list of pairs as mentioned, and I'm unsure as to how to
> do that.
> Many thanks in advance for any help received.  As mentioned, I'm sure that
> examples like this could be very useful to other beginners, so I'm keen to
> make sure that any help given is made maximum use of (by putting any code on
> the Haskell wiki). - Andy
>
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners
>

-- 
=====================
Patrick LeBoutillier
Rosemère, Québec, Canada