The tale of keepAlive#
bgamari - 2021-06-08
In GHC 9.0 we have reworked the primops used for guaranteeing heap object
lifetime in the presence of off-heap references.
The result is a new primop, keepAlive#, which largely replaces the venerable
touch# operation. This post will give some background on touch#, describe
the rationale for this change, discuss some of the alternatives that were
evaluated on the way to this new design, and provide some guidance on how this
change affects users.
Users who merely want to know how to adapt their existing touch# uses for the
new keepAlive# should feel free to skip to the last section of
the post.
The challenge of foreign references
One of the joys of working in a garbage collected language is that one is freed from the burden of thinking about object lifetime. Instead, it is the responsibility of the runtime to retain each object as long as it is reachable from the roots of the heap graph.
Moreover, one of the great joys of Haskell is its robust foreign function interface, giving users the ability to easily weave together Haskell and foreign code in a safe and efficient manner.
However, taken together these two features pose significant challenges to language implementors; specifically, consider a program such as:
-- Recall that:
-- newPinnedByteArray :: Int -> IO (MutableByteArray RealWorld)
-- setByteArray :: Prim a => MutableByteArray RealWorld -> Int -> Int -> a -> IO ()
-- mutableByteArrayContents :: MutableByteArray RealWorld -> Ptr a
import Data.Primitive.ByteArray
main = printDots 10
printDots :: Int -> IO ()
printDots len = do
arr <- newPinnedByteArray len
setByteArray arr 0 len (fromIntegral $ ord '.' :: Word8)
let ptr :: Ptr CChar
ptr = byteArrayContents arr
puts ptr
-- Recall that:
-- void puts(const char *);
foreign import ccall "puts"
puts :: Ptr CChar -> IO ()Here we have allocated an array on the Haskell heap, initialized it, and passed
it to the C puts function, which will print it to stdout. On its face, this
program looks completely innocuous: it contains no mentions of unsafe
functions, the array’s bounds are clearly respected, and puts is specified to
treat its argument as immutable. Indeed, running the program even prints the
expected result (at least most of the time).
However, despite its appearance, this program is fatally flawed. Specifically,
note that puts is (implicitly) declared to be a safe foreign call; this
means that GHC is free to do as it pleases (e.g. continue running another
Haskell thread or, perhaps, initiate a GC) while the C call executes in another
operating system thread. This is problematic since upon entering puts there
are no references to arr remaining on the Haskell heap, leading the collector
to conclude that the array can be safely reclaimed. However, arr is not
dead: it is still in use via the off-heap reference passed to puts.
However, this reference is not visible to the garbage collector.
Premature collection of live heap objects may result in arbitrary heap
corruption, which may cause a program crash at best and incorrect runtime
result at worst. Also note that while FFI is the most common reason for off-heap
references, it is certainly not the only reason. In particular, any use of
byteArrayContents is potentially problematic in this way.
To make such off-heap references visible to the runtime, GHC has long provided the
touch operation (embodied by the GHC.Exts.touch# primop):
touch :: forall a. a -> IO ()Semantically (and, mostly, operationally) the action touch x is a no-op; it
merely instructs GHC to ensure that the given value is alive at the point when the
action is executed. Using this operation we can fix printDots as follows:
printDots len =
arr <- newPinnedByteArray len
setByteArray arr 0 len (fromIntegral $ ord '.' :: Word8)
puts (byteArrayContents arr)
touch arrHere we have added a touch after the puts, ensuring that arr is retained
at least until the call has returned.
To a functional programmer, touch may smell a bit odd, merely serving to
introduce a side-effect. By contrast, most idiomatic Haskell libraries rather
express resource lifetimes using the “with” pattern. That is, rather than
write:
hdl <- openFile "hello.txt"
hPutStrLn hdl "hello world!"
hClose hdlWe prefer to write:
withFile:
withFile "hello.txt" $ \hdl ->
hPutStrLn hdl "hello world!"as the latter makes the lifetime of the resource apparent from the program’s
syntactic structure alone. In the same way, users have historically been advised
to stay away from using touch and its variants directly. For instance, while
Foreign.ForeignPtr provides a touchForeignPtr operation the documentation
strong recommends that the user rather use withForeignPtr.
While it is rarely seen in user code, the touch# primop has served
the Haskell community well for many years: a handful of occurrences in base,
bytestring, and other foundational libraries ensured that garbage colection
and foreign calls could peacefully coexist. However, as we will see, the design
of the primop is ripe for unsoundness.
Aside: A bit of simplifier background
Over the years GHC’s simplifier (the GHC subsystem responsible for
optimising programs in its Core intermediate representation)
has consistently grown in its abilities. By GHC 8.2 we started to come to
the realization that all was not well with touch#. To see why, let us examine
a transform performed by GHC’s simplifier which I will call dead-continuation
elimination (DCE).
Consider that you have a program such as:
doThings =
case error "hello world" :: Int of
0 -> {- some large expression -}
1 -> {- another large expression -}
...Here we have a program with a divergent expression (namely the error
application) scrutinized by a case with a number of large alternatives.
Under Haskell’s semantics, it is impossible for program execution to reach
any of these alternatives’ right-hand-sides.
By the dead-continuation elimination transform, GHC’s simplifier will cull this unreachable code, turning the program into:
doThings =
case error "hello world" :: Int of { }This optimisation can significantly reduce program sizes. Note that exceptions are only one source of divergence: any non-terminating scrutinee can trigger this transformation.
The shortcomings of touch#
With this transformation in mind, consider a trivial modification to the
original printDots program which loops the puts call infinitely:
printManyDots :: Int -> IO ()
printManyDots len = do
arr <- newPinnedByteArray len
setByteArray arr 0 len (fromIntegral $ ord '.' :: Word8)
forever $ puts (byteArrayContents arr)
touch arrAt its face, this modification looks benign enough; indeed, many server
applications take a very similar form (replacing puts with some request
handling logic). However in #14346 we noticed that the simplifier could in
some cases determine that the forever $ ... expression would diverge and
consequently that the touch arr continuation was unreachable. By the DCE
transform, this continuation could be dropped. Without the touch
application, the garbage collector may prematurely collect arr, resulting in
unsoundness in an otherwise reasonable-looking program.
In light of this, it becomes clear that the design of touch# is not
compatible with DCE (which is otherwise a perfectly sound transform under
Haskell’s semantics): to faithfully serve its purpose, touch# must be
retained even if it could never be executed.
As there are relatively few uses of touch#, we tried for a while to paper
over this issue with the judicious use of NOINLINE to hide the divergence
from GHC’s sights. However, between 8.2 and 8.10 we encountered
at least six other user reports of unsoundness due to various manifestations of
this issue in various base interfaces
(#15260, #14403, #15544, #13707, #14329, #18061). For this reason, last year we
resolved to address the issue in the 9.0 release, marking the beginning of a
nearly year-long journey towards a possible replacement which lead us through
nearly a dozen design and implementation variants.
Finding an alternative design
At first glance, the design of a safer touch# seems obvious by analogy to
withFile:
with :: a -> (a -> m b) -> m bUnder such a design with x f would evaluate to f x, ensuring that x is
kept alive (at least) until evaluation finishes.
While this interface is simple and pleasingly similar to patterns which
Haskellers expect, it does not make for a good primop design. In particular,
touch# incurs nearly no runtime overhead (it doesn’t even emit code of its own) and
generally occurs in performance-critical, low-level code. Consequently,
it is important to minimize the performance overhead imposed by our new
primitive. However, the type given to with above is extremely restrictive to
the optimisations available to the simplifier.
For instance, consider a function like,
readWord8_touch :: ForeignPtr Word8 -> IO Word8
readWord8_touch fptr = do
n <- peek (unsafeForeignPtrToPtr fptr)
touch fptr
return nAfter a bit of simplification this will become:
readWord8_touch :: ForeignPtr Word8 -> IO Word8
readWord8_touch fptr = IO (\s0 ->
case readWord8# (unsafeForeignPtrToPtr fptr) s0 of (# s1, n #) ->
case touch# fptr s1 of s2 ->
(# s2, W8# n #)This function will benefit from GHC’s worker/wrapper transform, resulting in a pair of Core bindings1:
-- The worker:
worker_readWord8_touch :: ForeignPtr Word8 -> State# RealWorld -> (# State# RealWorld, Word8# #)
worker_readWord8_touch fptr s0 =
case peek (unsafeForeignPtrToPtr fptr) s0 of (# s0, n #) ->
case touch# fptr s0 of s1 ->
(# s1, n #)
-- The wrapper:
readWord8_touch :: ForeignPtr Word8 -> IO Word8
readWord8_touch fptr = IO $ \s ->
case worker_readWord fptr s of (# s1, n #) ->
(# s1, W8# n #)
{-# INLINE readWord8_touch #-}The readWord8_touch wrapper can then be inlined into call sites, allowing
elimination of the W8# allocation (via case-of-known-constructor) at
call-sites which strictly consume the result.
However, if we rewrite readWord8 using with we will rather get Core of the
form:
readWord8_with :: ForeignPtr Word8 -> IO Word8
readWord8_with fptr = with fptr $ \fptr' -> IO $ \s0 ->
case peek (unsafeForeignPtrToPtr fptr') s0 of (# s1, n #) ->
(# s1, W8# n #)Unlike readWord8_touch, this right-hand side does not have the
constructed-product property and therefore can not benefit from unboxing by the
worker/wrapper transform. Consequently, readWord8_with will allocate
a W8# result on every call. Empirically we found that these added
allocations significantly regress runtime performance of many, if not most,
Haskell programs. GHC in particular regressed in compile-time by nearly 10% in
some cases.
Recovering the CPR property?
At first appearance, there is no reason why GHC could not learn to transform
programs defined in terms of with to retain the CPR property. In particular,
this would require three changes:
since worker/wrapper may produce results of arbitrary
RuntimeRepwe must generalize the type ofwith:with# :: forall a (r :: RuntimeRep) (b :: TYPE r). a -> (a -> b) -> buse
State#tokens to enforce execution ordering:with# :: forall a (r :: RuntimeRep) (b :: TYPE r). a -> State# s -> (State# s -> b) -> bteaching the simplifier to “push” strict contexts (e.g.
case) intowith#applications. For instance, transformcase (with# x s (\s' -> k)) of alt -> rhsinto
with# x s (\s' -> case k of alt -> rhs)This transformation, which we call “strict context pushing” (SCP), is similar to a transform already performed on
runRW#applications (#15127). In this case ofwith#, it is sound as it strictly grows the scope of the continuation and therefore lengthens the lifetime of the kept-alive value.
This SCP transformation allows us to eliminate the allocation incurred in the
case of readWord8 by way of inlining and simplification. That is, consider a
strict call-site of readWord8:
IO $ \s0 ->
case readWord8 fptr s0 of (# s1, n #) ->
case n of W8# n# ->
doSomething n#First, we inline readWord8, resulting in:
IO $ \s0 ->
case (
with# fptr s0 (\s1 ->
case peek (unsafeForeignPtrToPtr fptr') s1 of (# s2, n# #) ->
(# s2, W8# n# #)
)
) of (# s3, n #) ->
case n of W8# n# ->
doSomething n#Now, by the SCP transform above, the outer case can be pushed inside of the
with#:
IO $ \s0 ->
with# fptr s0 (\s1 ->
case (
case peek (unsafeForeignPtrToPtr fptr') s1 of (# s2, n# #) ->
(# s2, W8# n# #)
) of (# s3, n #) ->
case n of W8# n# ->
doSomething n#
)Finally, case-of-known constructor can be applied twice, eliminating the W8#
allocation:
IO $ \s0 ->
with# fptr s0 (\s1 ->
case peek (unsafeForeignPtrToPtr fptr') s1 of (# s2, n# #) ->
doSomething n#
)Hooray!
The trouble with SCP
At first glance, this SCP transform looks like a promising way to eliminate a
major source of overhead imposed by with#. Unfortunately, testing quickly
revealed that matters are not so simple. For instance, consider a loop which
uses readWord8 to sum the elements of an array:
sumArray :: ForeignPtr Word8 -- ^ an array of Word8s
-> Int -- ^ array length
-> IO Word8
sumArray fptr n = IO (go 0# n)
where
go :: Word8# -> Int -> State# RealWorld -> (# State# RealWorld, Word8 #)
go accum 0 s0 = (# s0, W8# accum #)
go accum i s0 =
case readWord8 (fptr `plusForeignPtr` i) s0 of (# s1, n #) ->
case n of W8# n# ->
go (accum +# n#) (i-1) s1As written, this program will run in constant space (modulo short-lived garbage
W8# constructors). However, consider what happens if we inline readWord8#,
apply the SCP transform, and simplify in the second equation of go:
go accum i s0 =
with# (fptr `plusForeignPtr` i) s0 (\s1 ->
case peek (unsafeForeignPtrToPtr (fptr `plusForeignPtr` i)) s1 of (# s2, n# #) ->
go (accum +# n#) (i-1) s1While we have successfully eliminated the garbage W8# allocations,
the recursive go call (previously a tail-call) is now beneath a with#
application. Since most implementations of with# will have the operational
effect of pushing a frame to the evaluation stack, this transforms the
otherwise constant-space sumArray into an O(n) program. This is clearly not
acceptable.
Unfortunately, this problem is quite difficult to avoid. While one can think of placing various heuristic restrictions on the contexts where SCP is applied, there is no obviously-correct, local criterion which robostly preserves CPR while avoiding stack blow-up.
While we found no way to make SCP work, the modifications we made to with#’s
type at the beginning of this section are desireable regardless. That is:
with#must be polymorphic in theRuntimeRepof its result lest the user is unable to use it to compute unboxed results.- the use of
State#tokens is a good hint to the user that they must take care to control execution ordering to safely use the operation
Consequently, we will adopt the following type for with#:
with# :: forall a (r :: RuntimeRep) (b :: TYPE r).
a -> State# s -> (State# s -> b) -> bGenerating code for with#
Above we considered some of the design decisions surrounding the user-facing
interface of with#. However, the interface is only half of the story: the
compiler must also know how to generate code for with#.
Thankfully, doing so is (currently) reasonably straightforward as GHC’s DCE
optimisation strictly occurs on Core. For this reason, we can safely desugar
with# into touch# by inlining after the Core-to-Core optimisation pipeline
has run (namely in the CorePrep stage of compilation). Specifically, an
application:
-- Recall that
-- with# :: forall a (r :: RuntimeRep) (b :: TYPE r).
-- a -> State# s -> (State# s -> b) -> b
with# x s0 (\s0 -> k)can be rewritten to
case k s0 of r ->
case touch# x s0 of s1 ->
rAdmittedly, this rewrite is a bit suspicious as we there is nothing
ensuring the ordering between the case k s and the touch# (since they are
both applied to the State# token s). However, this is a pragmatic choice:
GHC’s STG pipeline currently performs no optimisation which could reorder
these expressions and forcing with# to return a (# State# s, b #) would
preclude its use in some pure contexts.
On the bright side, the fact that we can reuse touch# means that we can
provide with# with minimal overhead and no further changes to the backend.
keepAlive# in GHC 9.0
While we originally set out to supercede (and ultimately remove) touch#, the
issues described above (and others described in the Wiki) made it clear
that this plan was not feasible: what touch# lacks in robustness and
aesthetics, it makes up for in efficiency and ease-of-optimisation. Moreover,
there are occassionally cases which do not fit with the lexically-scoped
semantics of with#.
Consequently, in GHC 9.0 we opted for a pragmatic solution: continue to allow
use of touch# where necessary but prefer to use the safer with-style where
possible. GHC 9.0.1 exposes this operation as keepAlive# (the kinds in the
type below are slightly different than those present in 9.0, referring to
BoxedRep introduced in 9.2 for added precision):
keepAlive# :: forall (lev :: Levity) (a :: TYPE (BoxedRep lev))
(rep :: RuntimeRep) (b :: TYPE rep) s.
a
-> State# s
-> (State# s -> b)
-> bWhat implications does this have on user code? For most users, none: nearly all
uses of touch# are buried in low-level libraries which have already been
adapted by the GHC maintainers.
If you do for some reason use touch#, we advise that you first consider
whether it is possible to replace the usage with a higher-level, less fragile
abstraction: in our experience, there are few cases where touch# is necessary
which would not be equally-well served by, e.g., ForeignPtr or a higher-level
scoped allocation primitive.
If you have determined that the low-level primitive is indeed unavoidable then
the next question is whether touch# should be replaced with keepAlive#. In the
general case this unfortunately comes down to a safety/performance tradeoff.
Specifically, let us consider the example of the withForeignPtr operation
exposed by base. When written in terms of touch this operation is defined
as:
withForeignPtr :: ForeignPtr a -> (Ptr a -> IO b) -> IO b
withForeignPtr fptr k = do
k (unsafeForeignPtrToPtr fptr)
touch fptrWhile this can produce very efficient code, it is also deeply unsafe as the
user may provide a divergent continuation (e.g. withForeignPtr fptr (forever ...)). For the reasons described above, this can result in the dropping of the
touch#, resulting in unsoundness.
To avoid this, we should rather express withForeignPtr in terms of
keepAlive#:
withForeignPtr :: ForeignPtr a -> (Ptr a -> IO b) -> IO b
withForeignPtr fptr k = IO $ \s0 ->
keepAlive# fptr s0 (unIO $ k (unsafeForeignPtrToPtr fptr))This avoids unsoundness at the expense of needing to allocate the b result of
the continuation. This is the implementation provided with base-4.15.
Introducing unsafeWithForeignPtr
While the introduction of keepAlive# affects relatively little user-code, the
added cost of withForeignPtr has the potential to be more impactful. In
particular, it is not uncommon to find code such as:
withForeignPtr fptr (peek @Word8)in tight inner-loops. Adding an allocation in such a context would be
catestrophic for performance and consequently GHC.ForeignPtr exports the old,
touch#-based implementation of withForeignPtr as unsafeWithForeignPtr.
This allows performance sensitive code to avoid boxing when it is known that the
continuation is not certain to diverge. Since this operation is unsafe in the
presence of known-divergent continuations, users are strong encouraged to
use this operation sparingly and only in cases where the continuation is
obviously not divergent.
Guidance for earlier GHC releases
As noted, use of touch# in previous GHC versions is a bit perilous unless due
care is taken. Let’s consider an example:
foo :: ByteArray -> IO ()
foo x = do
doSomething x -- (a)
...
doForeignCall (byteArrayContents x) -- (b)
...
touch x -- (c)For this program to be safe, the user must guarantee that no subexpression in
the body of foo between (a) (the point that x potentially becomes
unreachable) and (c) (the touch, which we must ensure DCE does not drop)
is provably divergent. In the case that all of the ...s are statically known
it is possible to determine this by inspection.
However, things are harder if the function takes a user-provided continuation a
la withForeignPtr:
withForeignPtr :: ForeignPtr a -> (Ptr a -> IO b) -> IO b
withForeignPtr fptr k = do
k (unsafeForeignPtrToPtr fptr)
touch fptrIn this case the only reliable way to avoid unsoundness is to place a NOINLINE
on withForeignPtr. This deprives the simplifier of knowledge of k, ensuring
that it cannot conclude that touch is unreachable.
Conclusion
Here we have described a bit of the story that lead to keepAlive#. While the
full story is significantly longer (and ultimately lead to the three-month deferral
of the 9.0.1 release), the result is a primop which avoids a class of
pernicious soundness issues which had lurked in previous releases.
If you have any questions regarding touch# and keepAlive#, do be in touch
on the ghc-devs mailing list.
To avoid cluttering the code too much, we are playing a bit fast-and-loose with the
IOnewtype here. For instance, we treatpeekas a function of typeState# RealWorld -> (# State# RealWorld, a #)(i.e. the representation ofIO a)) instead ofIO a. We will continue this pragmatic abuse of notation below.↩︎