[Haskell-cafe] HTML library with DOM?

Michael Snoyman michael at snoyman.com
Thu Oct 7 08:37:53 EDT 2010


2010/10/7 Gregory Collins <greg at gregorycollins.net>:
> "Edward Z. Yang" <ezyang at MIT.EDU> writes:
>
>> Excerpts from Gregory Collins's message of Wed Oct 06 19:44:44 -0400 2010:
>>> I've got the month of October off, and one of the things I've been
>>> planning on working on is a compliant HTML5 parser for Haskell --
>>> something which is sorely needed! I will ping the list back if/when I
>>> get it finished.
>>
>> I've heard that some of the existing HTML parsers in Haskell were
>> already HTML5 compliant (this topic came up when I was complaining
>> that there were some algorithms that you absolutely had to have
>> state for, because that was how they were specified.)  I never
>> verified this assertion though.
>
> If there's already a library which *correctly* parses html5 documents
> into DOM trees, could someone please let me know so I can use it instead
> of wasting a bunch of time writing one?

As far as I know, Neil Mitchel's tagsoup[1] parses according to the
HTML 5 parsing rules, but it just generates a list of Tags[2], so
you'd have to build the DOM tree up from there. I personally have had
great experience with tagsoup. It's even the core of HTML-scraping
technology powering searchonce[3].

Michael

[1] http://hackage.haskell.org/package/tagsoup
[2] http://hackage.haskell.org/packages/archive/tagsoup/0.11.1/doc/html/Text-HTML-TagSoup.html#t:Tag
[3] http://www.search-once.com/


More information about the Haskell-Cafe mailing list