Thank you very much for this suggestion. I just tried the character class you mentioned and it works.<div><br></div><div>The stackoverflow post you mentioned was a nice read and I surely agree that regular expressions are normally not the way to go for most HTML munging needs. But luckily the generated HTML from pandoc is very specific and the <table> tag I wanted to match (for line-numbered code listings) does not contain any further tables so I thought it should be safe to approach it like this.</div>
<div><br></div><div>The resulting code is now:</div><div><br></div><div><div>-- Wraps numbered code listings within the page body with a div</div><div>-- in order to be able to apply some more specific styling.</div><div>
wrapNumberedCodelistings (Page meta body) =</div><div> Page meta newBody</div><div> where</div><div> newBody = regexReplace "<table\\s+class=\"sourceCode[^>]+>[\\s\\S]*?</table>" wrap body</div>
<div> wrap x = "<div class=\"sourceCodeWrap\">" ++ x ++ "</div>"</div><div><br></div><div>-- Replaces the whole match for the given regex using the given function</div><div>
regexReplace :: String -> (String -> String) -> String -> String</div><div>regexReplace regex replace text = go text</div><div> where</div><div> go text = case text =~~ regex of</div><div> Just (before, match, after) -></div>
<div> before ++ replace match ++ go after</div><div> _ -> text</div></div><div><br></div><div>Don't know though if it could be cleaned up further or even if this is by any means good style (being still fairly new to haskell).</div>
<div><br></div><div>Furthermore I would still be very interested in the right approach to manipulating the HTML structure as a whole and I too hope that another Haskeller could name a more suitable solution for manipulating HTML.</div>
<div>Or even how to pass the 's' modifier to Text.Regex.PCRE.</div><div><br></div><div>Best regards,</div><div><br></div><div>rico<br><br><div class="gmail_quote">On Wed, Jun 6, 2012 at 7:11 AM, Arlen Cuss <span dir="ltr"><<a href="mailto:a@unnali.com" target="_blank">a@unnali.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I'd be more inclined to look at a solution involving manipulating the HTML structure, rather than trying a regexp-based approach, which will probably end up disappointing. (See this: <a href="http://stackoverflow.com/a/1732454/499609" target="_blank">http://stackoverflow.com/a/1732454/499609</a>)<br>
<br>
I hope another Haskeller can speak to a library that would be good for this kind of purpose.<br>
<br>
To suit what you're doing now, though; if you change .*? to [\s\S]*?, it should work on multiline strings. If you can work out how to pass the 's' modifier to Text.Regexp.PCRE, that should also do it.<br>
<br>
—Arlen<br>
<div><div class="h5"><br>
<br>
On Wednesday, 6 June 2012 at 3:05 PM, Rico Moorman wrote:<br>
<br>
> Hello,<br>
><br>
> I have a given piece of multiline HTML (which is generated using pandoc btw.) and I am trying to wrap certain elements (tags with a given class) with a <div>.<br>
><br>
> I already took a look at the Text.Regex.PCRE module which seemed a reasonable choice because I am already familiar with similar regex implementations in other languages.<br>
><br>
> I came up with the following function which takes a regex and replaces all matches within the given string using the provided function (which I would use to wrap the element)<br>
><br>
> import Text.Regex.PCRE ((=~~))<br>
><br>
> -- Replaces the whole match for the given regex using the given function<br>
> regexReplace :: String -> (String -> String) -> String -> String<br>
> regexReplace regex replace text = go text<br>
> where<br>
> go text = case text =~~ regex of<br>
> Just (before, match, after) -><br>
> before ++ replace match ++ go after<br>
> _ -> text<br>
><br>
> The problem with this function is, that it will not work on multiline strings. I would like to call it like this:<br>
><br>
> newBody = regexReplace "<table class=\"sourceCode\".*?table>" wrap body<br>
> wrap x = "<div class=\"sourceCodeWrap\">" ++ x ++ "</div>"<br>
><br>
> Is there any way to easily pass some kind of multiline modifier to the regex in question?<br>
><br>
> Or is this approach completely off and would something else be more appropriate/haskelly for the problem at hand?<br>
><br>
> Thank you very much in advance.<br>
</div></div>> _______________________________________________<br>
> Beginners mailing list<br>
> <a href="mailto:Beginners@haskell.org">Beginners@haskell.org</a> (mailto:<a href="mailto:Beginners@haskell.org">Beginners@haskell.org</a>)<br>
> <a href="http://www.haskell.org/mailman/listinfo/beginners" target="_blank">http://www.haskell.org/mailman/listinfo/beginners</a><br>
<br>
<br>
<br>
</blockquote></div><br></div>