[Haskell-cafe] Accumulating related XML nodes using HXT

Daniel McAllansmith dm.maillists at gmail.com
Mon Oct 30 16:10:56 EST 2006


Hello.

I have some html from which I want to extract records.  
Each record is represented within a number of <tr> nodes, and all records <tr> 
nodes are contained by the same parent node.

The things I've tried so far end up giving me the cartesian product of record 
fields, so for the html fragment included below I'd end up with:

[ Prod "Television" 17 "/prod17" "A very nice telly."
, Prod "Television" 17 "/prod17" "Mind your fillings."
, Prod "Cyclotron" 24 "/prod24" "A very nice telly."
, Prod "Cyclotron" 24 "/prod24" "Mind your fillings."
]

instead of:

[ Prod "Television" 17 "/prod17" "A very nice telly."
, Prod "Cyclotron" 24 "/prod24" "Mind your fillings."
]


How should I go about accumulating related <tr> nodes into individual records?


Thanks
Daniel


HTML fragment follows:

...
<tr>
  <tr>
    <td><strong>Product:</strong></td>
    <td><strong><a href="/prod17">Television</a></strong> (code: 17)</td>
  </tr>
  <tr>
    <td><strong>Description:</strong></td>
    <td>A very nice telly.</td>
  </tr>

  <tr>
    <td><hr color="#00000"></td>
  </tr>

  <tr>
    <td><strong>Product:</strong></td>
    <td><strong><a href="/prod24">Cyclotron</a></strong> (code: 24)</td>
  </tr>
  <tr>
    <td><strong>Description:</strong></td>
    <td>Mind your fillings.</td>
  </tr>

  <tr>
    <td><hr color="#00000"></td>
  </tr>
</tr>
...


More information about the Haskell-Cafe mailing list