[Haskell] Re: ANNOUNCE: Harpy -- run-time code generation library

Mon May 14 13:15:28 EDT 2007

Dirk Kleeblatt wrote:
> apfelmus wrote:
>> Note that even currently, your operations cannot be strict in the
>> address a label refers to because this may be determined later than the
>> first use of the label. In other words, your example code
>>
>>   fac = do
> [...]
>> (1)   jmp  loopTest
> [...]
>> (2)   loopTest @@ cmp ecx (0 :: Word32)
> [...]
>> already shows that the function jmp that generates a jmp-instruction may
>> not be strict in the position it jumps to as the address behind loopTest
>> is only known later at line (2).
> 
> When generating code for (1), the label loopTest is used as an index 
> into a map, to see whether there's a code position associated with it. 
> If it is, the code position is used to compute the jump offset for the 
> jmp instruction, if not (as in this example), a dummy value is placed in 
> the code buffer, and Harpy remembers this position to be patched later 
> on. At (2), the label is defined, and this leads to patching all 
> instructions that have been emitted before this definition.
> 
> So, yes, the code position is only used after the definition of the 
> label. But the "look up in a map"-part makes the jmp operation strict in 
> the label parameter.
> 
> We could omit the map, and just remember where to patch the code, but 
> then we'd need to call explicitly some function after code generation 
> that does the patching. We had implemented this, but found the current 
> solution easier to use, since backpatching is completely automatic and 
> hidden from the user.
> 
> However, this is just a description of the current implementation, not 
> an argument that there's no better implementation. Probably there is, 
> maybe using the binary package.
> 
>> Also, the explicit declaration of labels has an inherent safety problem.
> [...]
>> Declaring a label (via f.i.)
>>
>>      loopStart <- mul exc
>>
>> at it's instruction doesn't have this problem.
> 
> This looks quite elegant, I'll think about it...

If this is what I think it is (tying the knot), then essentially the thunk 
becomes the field reference, backpatching is thunk update, and the Haskell 
environment is the Map.  It should be possible to limit the laziness just to the 
"fields", but I'm not sure if it's possible to "limit the strictness".

> 
>> Furthermore, having to call 'ensureBufferSize 160' is very strange for
>> this is data that can be calculated automatically.
> 
> As I wrote at haskell-cafe, we require this only for performance 
> reasons, to keep buffer overflow checks as seldom as possible. But there 
> might be better ways to do this.
> 
>> I also think that having liftIO in the CodeGen-monad is plain wrong. I
>> mean, CodeGen is a monad that generates code without any execution
>> taking place. The execution part is already handled by runCodeGen.
>> Having liftIO means that arbitrary Haskell programs can be intertwined
>> with assembly generation and I doubt that you want that.
> 
> Feel free to doubt, but this is exactly what we want. :-)
> 
> Also, note that runCodeGen runs the code _generation_, executing the 
> generated code is done _within_ the CodeGen monad via the functions 
> generated by callDecl (or the predefined functions in the Harpy.Call 
> module). This is even more intertwined, but intentional.
> 
> Of course, again a different design is possible, making runCodeGen 
> return a binary code object, that can be called from the IO monad. But 
> then, the user has to care about releasing code buffers, and not to have 
> unevaluated closures having code pointers to already released run-time 
> generated code.

Having this as an option would be very nice I suspect.  I'd like it.