[Haskell-cafe] streaming translation using monads

Tue Nov 18 21:23:29 EST 2008

I am working on a query language translator, and although I feel that  
a monadic formulation would work well for this application, I've  
stumbled on a number of questions and difficulties that I thought the  
knowledgeable people here might be able to help me with.

As a translator, there's a source language and a target language, and  
both of these are specified as a grammar of nested expressions. My  
first thought was to formulate the target language as an embedded  
language where each clause expression is represented as a monad  
operation, so that the bind operator can join the pieces together, e.g.:

  (clause1, clause2 ...)

could be specified as an embedded language as:

  clause1  >>= \ v1 ->
  clause2  >>= \ v2 -> ...

However, each of the clauses is actually an output routine to send the  
expression that it denotes to a remote server, and a parser for  
receiving the results. Since a clause is really a pair of operations,  
it doesn't seem possible to formulate a monad that will compose all  
the output routines together and compose all the input routines  
together in one shot. (Note that the variables in the above code (v1,  
v2) represent inputs to be received from the remote server -- all  
outputs are packaged into the clause expressions themselves and are  
effectively literals.)

A naive formulation of a monad to implement the above as "output ->  
input v" might appear to work, but has the ill-effect of interleaving  
the output and input statements for each clause rather than composing  
something that can send the entire request, and then receive the  
entire result.

This forces me to use "output * input v" as the type of each clause  
expression, but now it is not possible to write a monad with a bind  
operation that will compose pieces in terms of input variables.  
Instead I have had to resort to using a set of combinators that thread  
a continuation function through each clause and accumulate inputs as  
they are received:

  clause1 ==> (\ k v1 -> k (trans1 v1)) ++
  clause2 ==> (\ k v2 -> k (trans2 v2)) ++ ...

This threading is necessary in that I want to stream the translation  
back to the client requesting the translation rather than building up  
the (possibly large) results in memory.

This formulation has proven to be quite cumbersome in practice, as the  
resulting continuation types reflect the depth-first traversal of the  
nested query expressions, and type errors can be quite unintuitive.  
(It is quite interesting though that each continuation/transformation  
function can receive not only receive the input from the immediately  
preceding clause, but from any of the preceding clauses, and also  
return more or fewer results. However getting anything wrong can be  
very problematic in that it can lead to either downstream *or*  
upstream errors depending on how the clauses are composed into an  
overall query expression.)

An alternative to all this would be to use an algebraic datatype to  
specify the target language (with separate routines for the output and  
input operations), but that would appear to require another sum type  
to express the values to be received. I'd like to avoid that if  
possible since the projection of those values back into my program  
could lead to dynamic type errors, and also causes seemingly needless  
memory allocations.

There must be another technique for this sort of streaming translation  
out there... I welcome any suggestions you might have!

Warren Harris