<div dir="ltr">(I&#39;ll be brief because my head is hurting, but please don&#39;t interpret that as an intent to offend)<br><div class="gmail_extra"><br>A few points:<br><br></div><div class="gmail_extra">1) Capture groups are all you need to do some meaningful interpretation of data; these were around long before perl.<br>

</div><div class="gmail_extra"><br>2) Yacc is typically used in conjunction with lex, partly for (a) efficiency and partly for (b) ease of use (compared to writing out [a-z] as production rules).<br><br></div><div class="gmail_extra">

3) I&#39;ve actually used lex without yacc (well, flex without bison) when faced with dealing with a language that&#39;s regular (and easy enough to express that way - cf. an enormous finite subset of a context-free language).<br>

<br></div><div class="gmail_extra"><br></div><div class="gmail_extra">2b is mostly irrelevant in Haskell, as Parsec already provides functions that can easily match the same things a regexp would.<br></div><div class="gmail_extra">

<br></div><div class="gmail_extra">2a, if it stands up to testing, is the best argument for ripping things apart in Haskell using a DFA.  Parsec and cousins are efficient, but it&#39;s hard to beat a single table lookup per character.  The questions are 1) is the difference enough to matter in many cases, and 2) is there a way to get this out of parsec without touching regexps?  (It&#39;s not impossible that parsec already recognizes when a language is regular, although I&#39;d be weakly surprised).<br>

<br></div><div class="gmail_extra"><br><br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Feb 14, 2013 at 3:07 PM, wren ng thornton <span dir="ltr">&lt;<a href="mailto:wren@freegeek.org" target="_blank">wren@freegeek.org</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="im">On 2/13/13 11:18 PM, wren ng thornton wrote:<br>

<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

On 2/13/13 11:32 AM, Nicolas Bock wrote:<br>

<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

Since I have very little experience with Haskell and am not used to<br>

Haskell-think yet, I don&#39;t quite understand your statement that<br>

regexes are<br>

seen as foreign to Haskell-think. Could you elaborate? What would a more<br>

&quot;native&quot; solution look like? From what I have learned so far, it seems to<br>

me that Haskell is a lot about clear, concise, and well structured<br>

code. I<br>

find regexes extremely compact and powerful, allowing for very concise<br>

code, which should fit the bill perfectly, or shouldn&#39;t it?<br>

</blockquote>

<br>

Regexes are powerful and concise for recognizing regular languages. They<br>

are not, however, very good for *parsing* regular languages; nor can<br>

they handle non-regular languages (unless you&#39;re relying on the badness<br>

of pcre). In other languages people press regexes into service for<br>

parsing because the alternative is using an external DSL like lex/yacc,<br>

javaCC, etc. Whereas, in Haskell, we have powerful and concise tools for<br>

parsing context-free languages and beyond (e.g., parsec, attoparsec).<br>

</blockquote>

<br>

<br></div>

Just to be clear, the problem isn&#39;t that proper regexes are only good for regular languages (many files have regular syntax afterall). The problem is that regexes are only good for recognition. They&#39;re an excellent tool for deciding whether a given string is &quot;good&quot; or &quot;bad&quot;; but they&#39;re completely unsuitable for the task of parsing/interpreting a string into some structure or semantic response. If you&#39;ve ever used tools like yacc or javaCC, one of the crucial things they offer is the ability to add these semantic responses. Parser combinator libraries in Haskell are similar, since the string processing is integrated into a programming language so we can say things like:<br>


<br>

    myParser = do<br>

        x &lt;- blah<br>

        guard (p x)<br>

        y &lt;- blargh<br>

        return (f x y)<br>

<br>

where p and f can be an arbitrary Haskell functions. Perl extends on regular expressions to try and do things like this, but it&#39;s extremely baroque, hard to get right, and impossible to maintain. (N.B., I was raised on Perl and still love it.) And at some point we have to call into question the idea of regexes as an embedded DSL when we then turn around and try to have Perl be a DSL embedded into the regex language.<br>


<br>

One of the big things that makes regexes so nice is that they identify crucial combinators like choice and repetition. However, once those combinators have been identified, we can just offer them directly as functions in the host language. No need for a special DSL or special syntax. The big trick is doing this efficiently. Parser combinators were an academic curiosity for a long time until Parsec came around and made them efficient. And we&#39;ve come a long way since then: with things like attoparsec, PEG parsing, and non-monadic applicative parsers (which can perform more optimizations because they can identify the structure of the grammar).<br>


<br>

The theory of regular expressions is indeed beautiful and elegant. However, it&#39;s a theory of recognition, not a theory of parsing; and that&#39;s a crucial distinction. Haskell is about clear, concise, and well-structured code; but to be clear, concise, and well-structured we have to choose the right tool for the job.<div class="">

<div class="h5"><br>

<br>

-- <br>

Live well,<br>

~wren<br>

<br>

______________________________<u></u>_________________<br>

Haskell-Cafe mailing list<br>

<a href="mailto:Haskell-Cafe@haskell.org" target="_blank">Haskell-Cafe@haskell.org</a><br>

<a href="http://www.haskell.org/mailman/listinfo/haskell-cafe" target="_blank">http://www.haskell.org/<u></u>mailman/listinfo/haskell-cafe</a><br>

</div></div></blockquote></div><br></div></div>