[Haskell-cafe] What is the surefire way to handle all exceptions and make sure the program doesn't fail?

Tue Jul 17 22:10:33 CEST 2012

On 07/17/2012 08:34 AM, Yifan Yu wrote:
> First of all, apologise if the question is too broad. The background goes
> like this: I've implemented a server program in Haskell for my company
> intended to replace the previous one written in C which crashes a lot (and
> btw the technology of the company is exclusively C-based).  When I chose
> Haskell I promised my manager (arrogantly - I actually made a bet with
> him), "it won't crash". Now it has been finished (with just a few hundred
> LOC), and my test shows that it is indeed very stable. But by looking at
> the code again I'm a little worried, since I'm rather new to exception
> handling and there're many networking-related functions in the program. I
> was tempted to catch (SomeException e) at the very top-level of the program
> and try to recursively call main to restart the server in case of any
> exception being thrown, but I highly doubt that is the correct and
> idiomatic way. There are also a number of long-running threads launched
> from the main thread, and exceptions thrown from these threads can't be
> caught by the top-level `catch' in the main thread.
> My main function looks
> like this:
> 
[--snip--]

> I find that I can't tell whether a function will throw any exception at
> all, or what exceptions will be thrown, by looking at their documentation.
> I can only tell if I browse the source code. So the question is, how can I
> determine all the exceptions that can be thrown by a given function?

Look at its source.

> And
> what is the best way to handle situations like this, with both the
> long-running threads and main thread need to be restarted whenever
> exceptions happen.
> 

The most robust way is probably to use a completely independent
supervisor program, e.g. "upstart", "systemd", "runit", etc. These
usually have facilities for restarting the supervised program, and a
rate limit on exactly how often to try that (over a given period of time).

These *won't* work for a program that's deadlocked because an important
thread has died. For that you'll need either a watchdog (external) or an
in-program mechanism for "supervised threads" which can catch any and
all exceptions and restart threads as necessary. This tends to very
domain-specific, but you might take some inspiration for the way
supervisor hierarchies work in the actor model.

Regards,