The state of GHC on ARM

bgamari - 2020-05-15

The ARM architecture represents an overwhelming majority of CPUs on this planet. Furthermore, there are now GHC users (myself included) who stake their livelihood on being able to deploy their Haskell applications on ARM-based platforms. As such, the task of making GHC run well on ARM has never been more important.

This task has a long history, being the project that brought me to GHC development many years ago, and reliability of support has varied greatly from release-to-release. As I’ve had a few people ask about the state of GHC-on-ARM over the past few months, I thought now might be a good time to write some words on the state of things.

In short, things are in significantly better shape today than they were back in 2012. There are a few reasons for this:

  • LLVM and binutils have stabilized significantly on ARM (this wasn’t always the case)

  • ARM support in GHC’s linker is now mostly complete

  • (since GHC 8.10) GHC’s runtime system is much more careful about ensuring memory ordering (see !1128)

Consequently, it now fairly easy to bring up GHC on ARMv7 and ARMv8 machines.

Getting started on a Raspberry Pi

By far, the most readily-available ARM hardware running a standard Linux distribution is the Raspberry Pi. Moreover, with more recent hardware iterations (e.g. Raspberry Pi 4) the hardware is capable enough to run GHC itself without incurring too much of a pain due to long compile times.

Most Raspberry Pi users will be using the Raspbian Debian variant. Unfortunately, due to some creative packaging decisions on the part of the Raspbian maintainers, installing GHC under Raspbian requires a bit of manual intervention.

First, we need to install LLVM 9 (which GHC uses for code generation on ARM):

$ sudo apt-get install llvm-9

Next we can fetch and install the GHC 8.10.1 ARMv7 binary distribution for Debian 9:

$ wget http://downloads.haskell.org/~ghc/8.10.1/ghc-8.10.1-armv7-deb9-linux.tar.xz
$ tar -xf ghc-8.10.1-armv7-deb9-linux.tar.gz
$ cd ghc-8.10.1
$ ./configure CONF_CC_OPTS_STAGE2="-marm -march=armv7-a" CFLAGS="-marm -march=armv7-a"
$ sudo make install

Depending upon the speed of your storage medium this may take a while. Here we have had to override the C flags inferred by autoconf since otherwise gcc will produce invalid assembler (see #17856).

Finally, we can test our handiwork:

$ ghc --version
$ cat > Hello.hs <<EOF
> main = putStrLn "hello world!"
> EOF
$ ghc Hello.hs
$ ./Hello
hello world!

We can even use GHCi:

$ ghci
GHCi, version 8.10.1: https://www.haskell.org/ghc/  :? for help
Prelude> putStrLn $ cycle "turtles all the way down.\n"
turtles all the way down.
turtles all the way down.
turtles all the way down.
...

Currently Cabal’s upstream doesn’t provide a cabal-install binary distribution for ARM (although this will hopefully change soon). Nevertheless, I have provided one here.

Future work

As always, there is plenty left to be done.

Further memory ordering robustness

While !1128 was a significant step forward in stability on ARM (and other weakly-ordered architectures), there is still room for improvement. GHC’s runtime system has long been abusing C’s volatile keyword to prevent the compiler from doing unsound things with our ubiquitous reliance on undefined behavior (in our defense, prior to the introduction on the C memory model in C11 it was not possible to write programs like GHC’s RTS in standard C).

However, finding all of the data races in the RTS is no small task. I currently have an on-going series of merge requests which adds support for checking GHC’s runtime with the ThreadSanitizer data-race detector. This is itself a non-trivial task as it requires that GHC adopt C atomics in place of our previous use of explicit barriers wherever possible. That being said, I hope to have this work done for 8.14. This will squash a few more bugs and should make GHC on ARM quite solid.

Where ThreadSanitizer does not help us is in checking the lack of data races between the mutator and the runtime system. In principle it would be possible to emit ThreadSanitizer instrumentation, in practice we have found that ThreadSanitizer gets quite upset at our large address-space reservations. Nevertheless, we have found that even just checking the RTS in isolation is very helpful, having caught several bugs that would have likely otherwise gone unnoticed.

A native code generator?

In my opinion, it is a shame that we do not have a dedicated native code generator for the most populus architecture on the planet. ARM hardware is typically fairly slow relative to x86; even under the best of conditions compile-times will be rather long. The fact that we also rely on LLVM, which itself isn’t the fastest compiler under the sun, exacerbates this problem.

However, eliminating LLVM from the equation shouldn’t be hard. ARM isn’t a complex architecture; in particular, ARMv8 is a relatively clean RISC ISA with little in the way of historical baggage. It would be relatively easy for someone with a basic grasp of assembler to write a native code generator backend for this platform in a week or two. I think this would be a great project for someone; perhaps that person could be you!

Acknowledgments

Continuous integration for ARM would not be possible without hosted hardware contributions provided by Packet through the Works on ARM program.