Hi Tom,<br><br>I've been working on another code-generating graphics compiler, generating GPU code. As always, I run into the problem of efficient common subexpression elimination. In Pan, Vertigo & Pajama, I used lazy memoization, using stable pointers and weak references, to avoid the worst-case-exponential behavior you mention below. I'm now using a bottom-up CSE method that's slower and more complicated than I'm going for.<br>
<br>What's your latest wisdom about CSE in DSELs?<br><br>Thanks, - Conal<br><br><div class="gmail_quote">On Thu, Feb 7, 2008 at 11:33 PM, Tom Hawkins <span dir="ltr"><<a href="mailto:tomahawkins@gmail.com">tomahawkins@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">I've been programming with Haskell for a few years and love it. One<br>
of my favorite applications of Haskell is using for domain specific<br>
languages. However, after designing a handful of DSLs, I continue to<br>
hit what appears to be a fundamental hurdle -- or at least I have yet<br>
to find an adequate solution.<br>
<br>
My DSLs invariably define a datatype to capture expressions; something<br>
like this:<br>
<br>
data Expression<br>
= Add Expression Expression<br>
| Sub Expression Expression<br>
| Variable String<br>
| Constant Int<br>
deriving Eq<br>
<br>
Using the datatype Expression, it is easy to mass a collections of<br>
functions to help assemble complex expressions, which leads to very<br>
concise programs in the DSL.<br>
<br>
The problem comes when I want to generate efficient code from an<br>
Expression (ie. to C or some other target language). The method I use<br>
invovles converting the tree of subexpressions into an acyclic graphic<br>
to eliminate common subexpressions. The nodes are then topologically<br>
ordered and assigned an instruction, or statement for each node. For<br>
example:<br>
<br>
let a = Add (Constant 10) (Variable "i1")<br>
b = Sub (Variable "i2") (Constant 2)<br>
c = Add a b<br>
<br>
would compile to a C program that may look like this:<br>
<br>
a = 10 + i1;<br>
b = i2 - 2;<br>
c = a + b;<br>
<br>
The process of converting an expression tree to a graph uses either Eq<br>
or Ord (either derived or a custom instance) to search and build a set<br>
of unique nodes to be ordered for execution. In this case "a", then<br>
"b", then "c". The problem is expressions often have shared,<br>
equivalent subnodes, which dramatically grows the size of the tree.<br>
For example:<br>
<br>
let d = Add c c<br>
e = Add d d -- "e" now as 16 leaf nodes.<br>
<br>
As these trees grow in size, the equality comparison in graph<br>
construction quickly becomes the bottleneck for DSL compilation.<br>
What's worse, the phase transition from tractable to intractable is<br>
very sharp. In one of my DSL programs, I made a seemingly small<br>
change, and compilation time went from milliseconds to<br>
not-in-a-million-years.<br>
<br>
Prior to Haskell, I wrote a few DSLs in OCaml. I didn't have this<br>
problem in OCaml because each "let" expression was mutable, and I<br>
could use the physical equality operator to perform fast comparisons.<br>
Unfortunately, I have grown to love Haskell's type system and its lack<br>
of side effects, and could never go back.<br>
<br>
Is there anything that can be done to dramatically speed up<br>
comparisons, or is there a better approach I can take to extract<br>
common subexpressions? I should point out I have an opportunity to<br>
get Haskell on a real industrial application. But if I can't solve<br>
this problem, I may have to resort to far less eloquent languages.<br>
:-(<br>
<br>
Thanks for any and all help.<br>
<br>
-Tom<br>
_______________________________________________<br>
Haskell-Cafe mailing list<br>
<a href="mailto:Haskell-Cafe@haskell.org">Haskell-Cafe@haskell.org</a><br>
<a href="http://www.haskell.org/mailman/listinfo/haskell-cafe" target="_blank">http://www.haskell.org/mailman/listinfo/haskell-cafe</a><br>
</blockquote></div><br>