Tue Mar 10 04:33:00 EDT 2009

Hi all -

In the process of learning Haskell I'm wanting to do some simple data
summarization.
( Btw, I'm looking at putting any submitted code for this in the
"cookbook" section of
the Haskell wiki.  Imo it would be very useful there as a "next step" up
in a file and printing it out.  )

This would involve reading in a delimited file like this - ( just a
contrived example of how many books
some people own ) -

Name,Gender,Age,Ethnicity,Books
Mary,F,14,NZ European, 11
Brian,M,13,NZ European, 6
Josh,M,12,NZ European, 14
Regan,M,14,NZ Maori, 9
Helen,F,15,NZ Maori, 17
Anna,F,14,NZ European, 16
Jess,F,14,NZ Maori, 21

.... and doing some operations on it.
As you can see, the file has column headings - I prefer to be able to
manipulate data with
headings (as it is what I do a lot of at work, using another programming
language).

I've tried to break the problem down into small parts as follows.
a) Read the file into a list of pairs.
The first element of the pair would be the column heading.
The second will be a list containing the data.
For example, ("Name",  [Mary,  Brian,  Josh,  Regan, ..... ]  )

b) Select a numeric variable to summarise ( "Books" in this example)
c) Do a fold to summarize the variable. I think a left-fold would be the
one to use here, but I may
be wrong....

After looking through previous postings on this list, I found some code
which is somewhat similar to what I'm after (although the data it was
crunching is very different).  This is what I've come up with so far -

summarize [] = []
summarize ls = let
numeric_variable = last ls
sum = foldl (+) 0 \$ numeric_variable

in (byvariable, sum) : sum ls

main = interact (unlines . map show . summarize . lines)

I think this might be a useful start, but I still need to read the data
into a list of pairs as mentioned, and I'm unsure as to how to
do that.

Many thanks in advance for any help received.  As mentioned, I'm sure
that examples like this could be very useful to other beginners, so I'm
keen to make sure that any help given is made maximum use of (by putting
any code on the Haskell wiki).
- Andy