Non-updateable thunks

Wed Aug 1 12:38:03 CEST 2012

Hello,

I’m still working on issues of performance vs. sharing; I must assume
some of the people here on the list must have seen my "dup"-paper¹ as
referees.

I’m now wondering about a approach where the compiler (either
automatically or by user annotation; I’ll leave that question for later)
would mark some thunks as reentrant, i.e. simply skip the blackholing
and update frame pushing. A quick test showed that this should work
quite well, take the usual example:
        
        import System.Environment
        main = do
            a <- getArgs
            let n = length a
            print n
            let l = [n..30000000]
            print $ last l + last l

This obviously leaks memory:

        $ ./Test +RTS -t
        0
        60000000
        <<ghc: 2400054760 bytes, 4596 GCs, 169560494/935354240 avg/max
        bytes residency (11 samples), 2121M in use, 0.00 INIT (0.00
        elapsed), 0.63 MUT (0.63 elapsed), 4.28 GC (4.29 elapsed) :ghc>>


I then modified the the assembly (a crude but effective way of testing
this ;-)) to not push a stack frame:

$ diff -u Test.s Test-modified.s

--- Test.s	2012-08-01 11:30:00.000000000 +0200
+++ Test-modified.s	2012-08-01 11:29:40.000000000 +0200
@@ -56,20 +56,20 @@
 	leaq -40(%rbp),%rax
 	cmpq %r15,%rax
 	jb .LcpZ
-	addq $16,%r12
-	cmpq 144(%r13),%r12
-	ja .Lcq1
-	movq $stg_upd_frame_info,-16(%rbp)
-	movq %rbx,-8(%rbp)
+	//addq $16,%r12
+	//cmpq 144(%r13),%r12
+	//ja .Lcq1
+	//movq $stg_upd_frame_info,-16(%rbp)
+	//movq %rbx,-8(%rbp)
 	movq $ghczmprim_GHCziTypes_Izh_con_info,-8(%r12)
 	movq $30000000,0(%r12)
 	leaq -7(%r12),%rax
-	movq %rax,-24(%rbp)
+	movq %rax,-8(%rbp)
 	movq 16(%rbx),%rax
-	movq %rax,-32(%rbp)
-	movq $stg_ap_pp_info,-40(%rbp)
+	movq %rax,-16(%rbp)
+	movq $stg_ap_pp_info,-24(%rbp)
 	movl $base_GHCziEnum_zdfEnumInt_closure,%r14d
-	addq $-40,%rbp
+	addq $-24,%rbp
 	jmp base_GHCziEnum_enumFromTo_info
 .Lcq1:
 	movq $16,192(%r13)
     
Now it runs fast and slim (and did not crash on the first try, which I
find surprising after hand-modifying the assembly code):

        $ ./Test +RTS -t
        0
        60000000
        <<ghc: 4800054840 bytes, 9192 GCs, 28632/28632 avg/max bytes
        residency (1 samples), 1M in use, 0.00 INIT (0.00 elapsed), 0.73
        MUT (0.73 elapsed), 0.04 GC (0.04 elapsed) :ghc>>


My question is: Has anybody worked in that direction? And are there any
fundamental problems with the current RTS implementation and such
closures? 

Greetings,
Joachim


¹ http://arxiv.org/abs/1207.2017
currently not about to appear anywhere else, but I have not given up
hope yet :-)


-- 
Dipl.-Math. Dipl.-Inform. Joachim Breitner
Wissenschaftlicher Mitarbeiter
http://pp.info.uni-karlsruhe.de/~breitner
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20120801/b464ab2f/attachment.pgp>