Personal tools

Safely running untrusted Haskell code

From HaskellWiki

(Difference between revisions)
Jump to: navigation, search
(Thread in Haskell Cafe)
(EvilIx)
 
(21 intermediate revisions by 7 users not shown)
Line 1: Line 1:
  +
[[Category:How to]]
 
Obviously, don't run code in the IO monad, just show pure results (or possibly make your own monad that is a restricted subset of IO). But it's a lot more complicated than that...
 
Obviously, don't run code in the IO monad, just show pure results (or possibly make your own monad that is a restricted subset of IO). But it's a lot more complicated than that...
   
See [http://haskell.org/pipermail/haskell-cafe/2007-May/025941.html a long discussion] in Haskell Cafe.
+
== Verifying safety : lambdabot's approach ==
  +
  +
Since 2004, lambdabot has executed arbitrary strings of Haskell provided
  +
by user's of various [[IRC_channel|IRC channels]], in particular, the
  +
Haskell channel. In order to do this, a particular security policy is
  +
required. The policy, and its implementation, is described here.
  +
  +
=== The policy ===
  +
  +
Only allow execution of pure Haskell expressions.
  +
  +
=== The implementation ===
  +
  +
:''Note: This section refers to the old Lambdabot evaluator; as of 2009, lambdabot calls out to [http://hackage.haskell.org/package/mueval mueval], which while it uses many of the same techniques, is structured differently.''
  +
  +
The evaluator is essentially a function, <hask>eval :: String -> IO String</hask>, which takes a random Haskell string, verifies it,
  +
compiles it, and evaluates the result, returning a String representing
  +
the result, back over the network.
  +
  +
This function is implemented as two separate processes:
  +
  +
* [http://www.cse.unsw.edu.au/~dons/code/lambdabot/Plugin/Eval.hs Driver/simple verifier]
  +
* [http://www.cse.unsw.edu.au/~dons/code/lambdabot/scripts/RunPlugs.hs Evaluator binary]
  +
  +
The driver reads a String from the network, and then subjects it to a
  +
simple test:
  +
  +
* The expression is parsed as a Haskell 98 expression, hopefully preventing code injection (is this true? and can any string that can parse as a valid Haskell expression become something more sinister when put in a particular context?)
  +
  +
If the string parses as a Haskell 98 expression, the 'runplugs' process
  +
is then forked to evaluate the string, and the following checks are put
  +
in place:
  +
  +
* Only a trusted module set is imported, avoiding unsafePerformIO and unsafeIOtoST and such like.
  +
* Module imports are disallowed
  +
* Time and space limitations on the runplugs process are set by the OS 'rlimit' facility
  +
* The expression type checked, enforcing lack of memory errors
  +
* Because the user code is not at the beginning of the file, malicious {-# LANGUAGE #-} and {-# OPTIONS #-} flags are ignored
  +
* Only -fextended-default-rules are allowed, as language extensions over H98.
  +
* The resulting .o file is dynamically linked only into the throw-away runplugs instance
  +
* Even if all went well, the first 2048 characters of the shown string are returned to the caller (no infinite output DoS)
  +
  +
A few other niceties are provided:
  +
  +
* The expression is bound to a random identifier (harmless to guess), in order to allow nice line error messages with line pragmas.
  +
* The expression is wrapped in 'show'.
  +
* A catch-all instance of Show in terms of Typable is provided, to display non-displayable objects in a more useful way (e.g. putStrLn --> <[Char] -> IO ()>)
  +
* It is compiled to native code with -fasm for speed (compilation time is neglible compared to IRC lag)
  +
* The value is evaluated inside an exception handler; if an exception is thrown, the first 1024 characters of the exception string are returned.
  +
  +
== Exploits ==
  +
  +
A variety of interesting exploits have been found, or thought of, over
  +
the years. Those we remember are listed below:
  +
  +
* using newtype recursion to have the inliner not terminate
  +
* using pathological type inference cases to have the type checker not terminate
  +
* code injection of code fragments that aren't Haskell expressions
  +
* Template Haskell used to run IO actions during type checking
  +
* stToIO to convert a safe ST action, into an IO action that is run
  +
* large strings returned in exceptions
  +
* unsafePerformIO, of course
  +
* unsafeCoerce#
  +
* throwing a piece of code as an exception, which is evaluated when the exception is shown
  +
* non-terminating code, in a tight loop that doesn't allocate, can't use GHC's threadDelay/scheduler (let f () = f () in f ()) to timeout (must use OS [[resource limits]]).
  +
* large array allocations can fill memory
  +
* very large array allocations can integer overflow the storage manager, allowing arbitrary memory access (this appears to be fixed in GHC 6.8.x)
  +
* creating class instances that violate assumed laws (cf [http://www.haskell.org/pipermail/haskell-cafe/2006-December/019994.html EvilIx])
  +
* various literal strings that print IRC protocol commands could be printed using exceptions.
  +
* if a user guesses the top level identifier the expression is bound to, it can be used to print a silly string
  +
* zombies could be created by multiple runplugs calls, leading to blocking on endless output. the resulting zombies accumulate, eventually leading to DOS. (if waitForProcess was broken)
  +
  +
== Template Haskell ==
  +
  +
We believe that Template Haskell can be made safe for users by hiding runIO and reify.
  +
  +
== See also ==
  +
  +
* See [http://haskell.org/pipermail/haskell-cafe/2007-May/025941.html a long discussion] in Haskell Cafe.
  +
* The [http://hackage.haskell.org/trac/ghc/ticket/1380 GHC ticket] for -fsafe

Latest revision as of 10:41, 20 November 2012

Obviously, don't run code in the IO monad, just show pure results (or possibly make your own monad that is a restricted subset of IO). But it's a lot more complicated than that...

Contents

[edit] 1 Verifying safety : lambdabot's approach

Since 2004, lambdabot has executed arbitrary strings of Haskell provided by user's of various IRC channels, in particular, the Haskell channel. In order to do this, a particular security policy is required. The policy, and its implementation, is described here.

[edit] 1.1 The policy

Only allow execution of pure Haskell expressions.

[edit] 1.2 The implementation

Note: This section refers to the old Lambdabot evaluator; as of 2009, lambdabot calls out to mueval, which while it uses many of the same techniques, is structured differently.
The evaluator is essentially a function,
eval :: String -> IO String
, which takes a random Haskell string, verifies it,

compiles it, and evaluates the result, returning a String representing the result, back over the network.

This function is implemented as two separate processes:

The driver reads a String from the network, and then subjects it to a simple test:

  • The expression is parsed as a Haskell 98 expression, hopefully preventing code injection (is this true? and can any string that can parse as a valid Haskell expression become something more sinister when put in a particular context?)

If the string parses as a Haskell 98 expression, the 'runplugs' process is then forked to evaluate the string, and the following checks are put in place:

  • Only a trusted module set is imported, avoiding unsafePerformIO and unsafeIOtoST and such like.
  • Module imports are disallowed
  • Time and space limitations on the runplugs process are set by the OS 'rlimit' facility
  • The expression type checked, enforcing lack of memory errors
  • Because the user code is not at the beginning of the file, malicious {-# LANGUAGE #-} and {-# OPTIONS #-} flags are ignored
  • Only -fextended-default-rules are allowed, as language extensions over H98.
  • The resulting .o file is dynamically linked only into the throw-away runplugs instance
  • Even if all went well, the first 2048 characters of the shown string are returned to the caller (no infinite output DoS)

A few other niceties are provided:

  • The expression is bound to a random identifier (harmless to guess), in order to allow nice line error messages with line pragmas.
  • The expression is wrapped in 'show'.
  • A catch-all instance of Show in terms of Typable is provided, to display non-displayable objects in a more useful way (e.g. putStrLn --> <[Char] -> IO ()>)
  • It is compiled to native code with -fasm for speed (compilation time is neglible compared to IRC lag)
  • The value is evaluated inside an exception handler; if an exception is thrown, the first 1024 characters of the exception string are returned.

[edit] 2 Exploits

A variety of interesting exploits have been found, or thought of, over the years. Those we remember are listed below:

  • using newtype recursion to have the inliner not terminate
  • using pathological type inference cases to have the type checker not terminate
  • code injection of code fragments that aren't Haskell expressions
  • Template Haskell used to run IO actions during type checking
  • stToIO to convert a safe ST action, into an IO action that is run
  • large strings returned in exceptions
  • unsafePerformIO, of course
  • unsafeCoerce#
  • throwing a piece of code as an exception, which is evaluated when the exception is shown
  • non-terminating code, in a tight loop that doesn't allocate, can't use GHC's threadDelay/scheduler (let f () = f () in f ()) to timeout (must use OS resource limits).
  • large array allocations can fill memory
  • very large array allocations can integer overflow the storage manager, allowing arbitrary memory access (this appears to be fixed in GHC 6.8.x)
  • creating class instances that violate assumed laws (cf EvilIx)
  • various literal strings that print IRC protocol commands could be printed using exceptions.
  • if a user guesses the top level identifier the expression is bound to, it can be used to print a silly string
  • zombies could be created by multiple runplugs calls, leading to blocking on endless output. the resulting zombies accumulate, eventually leading to DOS. (if waitForProcess was broken)

[edit] 3 Template Haskell

We believe that Template Haskell can be made safe for users by hiding runIO and reify.

[edit] 4 See also