DWARF support in GHC (part 5)

Ben Gamari - 2020-04-07

This is the fifth and final post of a series examining GHC’s support for DWARF debug information and the tooling that this support enables:

Part 1 introduces DWARF debugging information and explains how its generation can be enabled in GHC.
Part 2 looks at a DWARF-enabled program in gdb and examines some of the limitations of this style of debug information.
Part 3 looks at the backtrace support of GHC’s runtime system and how it can be used from Haskell.
Part 4 examines how the Linux perf utility can be used on GHC-compiled programs.
Part 5 concludes the series by describing future work, related projects, and ways in which you can help.

Future work

In the previous four posts we saw of some the functionality enabled by DWARF debug information. As of GHC 8.10.2 everything we saw above should be possible with the standard DWARF-enabled GHC binary distributions.

However, there is still a great deal of untapped potential and much remains to be done. Here is a sampling of tasks in no particular order:

Merge the fruits of my latest push on DWARF support upstream (!2380, !2373, !2387)
Make GHC-generated symbols (e.g. 59fw_info) more reflective of their origin in the source program
Preserve call-stacks in exceptions (as discussed in part 3)
Reduce the size of debug information through more concise representation (see #17609)
Some RTS symbols (e.g. stg_PAP_apply) don’t have accurate unwind information, leading to truncated backtraces in some cases (#17627)
Implement a native (e.g. non-DWARF-based) stack unwinder in the GHC runtime system, allowing improved unwind performance in Haskell code
Windows PDB support (#12397)
Try moving GHC’s stack pointer to the native stack pointer register, enabling call-graph profiling via DWARF unwinding (as discussed in part 4, #8272)
Build statistical profiling support into the GHC runtime system (#10915)
Add support for expressing local variables in C–, enabling allocation profiling
Add support for tracking register value semantics in STG-to-C– and DWARF type information, enabling local variable introspection.
Implement thread support in GHC.ExecutionStack
Make better use of GHC-specific source-note information (mentioned briefly in part 1)
Symbol demangling support in the GHC RTS, perf, and gdb
Analysis tools

As always, we are looking for people to help with this effort. If any of the above tasks sound enticing to you, do let us know. Deep compiler experience is quite unnecessary for many of these tasks, especially those in the area of analysis tools.

Below I will describe in greater detail a few of the tasks which I think hold the greatest potential.

Profile analysis tools

In his thesis, Peter Wortmann shows that the one-to-one correspondence between instructions and line numbers required by DWARF (see part 1) can result in rather un-helpful profiles. He shows that one can do significantly better by splitting the attribution of an instruction across the full set of source locations that gave rise to it. This is not something that existing tools can do. One could implement this approach on top of the sample data produced by perf record (e.g. exporting the samples via the perf script tool or the linux-perf Haskell library) and using the the extended DWARF annotations produced by GHC.

Peter’s Haskell Implementor’s Workshop demonstration showed one possible interface for such an analysis tool, marrying Haskell source and Core with sample data in the ThreadScope interface. It would be great to continue exploration down this path.

Using native stack pointer register

As noted in part 4, GHC’s current execution model on x86 precludes use of perf record’s call-graph profiling functionality. The most promising avenue to fix this would be to rework GHC to use the native stack pointer register to track the Haskell stack (#8272). This would potentially carry a few benefits:

it would enable use of native profiling tools
the native code generator could use the PUSH and POP instructions, which may be more concise or better optimised in the microarchitecture than our current stack manipulation strategy

However, there are also a few tricky points:

LLVM makes very strong assumptions about the nature of the stack; consequently, moving the LLVM backend to this scheme may be non-trivial.
the System V ABI requires that the stack always have a small region above the stack pointer (called the “red zone”) which code can use for temporary storage. GHC would need to ensure this before calling into foreign code.

There is some interesting discussion surrounding this idea in #8272 and GHC Proposal MR #17.

Building sampling profiling into the GHC runtime

Without fixing the stack register issue described above, perf’s call-graph profiling functionality is unusable. However, nothing is stopping GHC from providing its own sampling infrastructure in the runtime (#10915). In 2016 I started a branch) doing exactly this using perf_events’s signal-based sampling interface, dumping samples to GHC’s eventlog.

As far as I can recall the wip/libdw-prof branch can readily collect samples; the work that remains primarily revolves around developing analysis tools.

One approach would be to build a tool to convert the GHC-eventlog-based output from the wip/libdw-prof branch into a perf.data file for use with perf report. However, one could no doubt do much better with a more specialised tool, as described in the “Profile analysis tools” above.

While simple, this signal-based approach does imply a slightly more overhead (in the form of context-switches) than necessary. A more efficient approach might involve the Linux eBPF mechanism, which can be triggered from a perf_events event.

In-scope bindings

Most imperative compilers produce debug information that allow debuggers display and modify in-scope variables and their values. In principle GHC could also provide such support. However, doing so in a way that will be useful in simplified programs would be quite non-trivial. For instance, consider the program:

f :: (Int, String) -> Int
f (x, _) = x + 4

GHC’s worker-wrapper transformation would likely transform this to,

f :: (Int, String) -> Int
f pair =
  case pair of (x, _) -> 
  case x of I# x# ->
  case $wf x# of result ->
    I# result

$wf :: Int# -> Int#
$wf x# = x# + 4

This sort of transformation is ubiquitous and critical to the quality of GHC’s produced code. Naturally, we would want to ensure that the debug information of $wf can represent the fact that x# is the unboxed first element of the argument of f. I suspect that the best way to accomplish this would be to propagate value provenance information through binders’ (e.g. in this case x#) IdInfo metadata.

This would involve:

Adding syntax in C– to encode local variable information
Producing such syntax in the STG-to-C– code generator
Adding information in Core to propagate value provenance, as discussed above
Populate this information in worker-wrapper

While being able to poke around at Haskell values in gdb is perhaps a tempting proposition, all-in-all I suspect that the costs (both in implementation time and complexity) of would likely outweigh the benefits it would bring. This is especially true given that GHC already has the GHCi debugger for cases where such interactive debugging is necessary.

Aside: Event tracing

Some users have related to me that they have sometimes wished that GHC programs were as “traceable” as other programming language. In particular, tools like perf, bcc, bpftrace, and dtrace provide robust, minimal-overhead, language-agonstic tracing infrastructure which can be invaluable in production settings. It would be great if Haskell programs could benefit from these same tools.

The easiest on-ramp to tracing support is via the User-space Statically-Defined Tracepoint (USDT) mechanism supported by all of the aforementioned tools. Under this scheme, the traced program embeds a bit of metadata describing the available tracepoints, the information they provide, and how they are enabled.

It turns out that GHC’s runtime system already defines a number of USDT tracepoints (although they need to be enabled when configuring GHC with the --enable-dtrace configure flag). However, it is possible that this support may have bit-rotted (#15543).

However, it may also be useful to be able to define USDT tracepoints in Haskell programs. A simple implementation of this would simply be a Template Haskell splice which would generate the necessary C stubs and splice in a foreign function import and call into the program.

Aside: LLVM and X-Ray

It should also be noted that LLVM provides another, much different approach to the tracing/profiling problem with its XRay instrumentation infrastructure. This approach seeks to introduce low-cost tracing instrumentation in generated code, allowing precise and highly detailed accounting of runtime costs.

Matthew Pickering tried (#15929) adding XRay support to GHC’s LLVM backend. Unfortunately, this effort ended up being rather stunted, in part due to limitations of LLVM itself (specifically difficulties with tail-calls) and in part due to limitations of GHC’s LLVM backend (namely, we rely on the LLVM IR alias mechanism to convince LLVM that our type annotations are correct; this confuses the XRay logic).

Acknowledgments

This work has been a multi-year (off-and-on) effort for me, but it would not have been possible without a number of others.

In particular, this work would never have even started without the efforts of Peter Wortmann. Not only does the causality formalism he described in his dissertation provide the theoretical foundation for all of this functionality, but his initial implmentation kick-started the effort and the promising results he demonstrated at the Haskell Implementors’ Workshop provided me with the motivation to keep picking away at the seemingly endless stream of details which arose as I refined the feature over the years.

In general, Well-Typed’s work on GHC (and, therefore, my own work) would not have been possible without the support of Microsoft Research, IOHK and others who have supported the position which allows me to work on GHC for many years. In addition, some of my early work in 2015 to clean up the original DWARF implementation was supported directly by funding from Microsoft Research.