[Haskell-cafe] Parsing HTML tables with HXT

Albert Y. C. Lai trebla at vex.net
Mon Apr 11 02:55:53 CEST 2011


On 11-04-08 06:29 AM, Dmitry Simonchik wrote:
> Can someone please help me with getting the value of the table cell with
> HXT in the following html:
>
> <table class="tblc">
> <tr>
> <td class="tdc">x</td>
> <td>y</td>
> </tr>
> <tr>
> <td class="tdc">a</td>
> <td>b</td>
> </tr>
> </table>
>
> I need the value of the second cell in a row that has first cell with
> some predefined value (in the example above it can be x or a) I need the
> arrow of the type (IOSArrow XmlTree String) How to write it?

import Text.XML.HXT.Core

main = do
   rs <- runX (readDocument [] "example.xml" >>> example "x")
   mapM_ putStrLn rs

-- example "blah" reports those 2nd columns such that
-- their 1st columns equal "blah"
example :: String -> IOSArrow XmlTree String
example s = deep (is "table" />
                   is "tr" >>>
                   listA (getChildren >>> is "td" /> getText) >>>
                   arrL get2nd
                  )
   where get2nd (one:two:_) | one==s = [two]
         get2nd _ = []

is x = isElem >>> hasName x

The important part is using listA at the right point to extract the list 
of cells (belonging to the same row) so that with a list in your hand 
you can test the 1st item and find the 2nd item.



More information about the Haskell-Cafe mailing list