Performance/Accumulating parameter
From HaskellWiki
m |
|||
(10 intermediate revisions by 5 users not shown) | |||
Line 1: | Line 1: | ||
{{Performance infobox}} |
{{Performance infobox}} |
||
− | == Accumulating Paramaters: Getting rid of the 'almost' in "almost tail recursive" == |
+ | [[Category:Performance|Accumulating parameter]] |
+ | == Accumulating Parameters: Getting rid of the 'almost' in "almost tail recursive" == |
||
− | Often times you will write a recursive function which is almost tail recursive. Consider this naïve implementation of a reverse function. |
+ | Often times you will write a recursive function which is almost [[tail recursion|tail recursive]]. Consider this naïve implementation of a function to compute the length of a list. |
− | rev :: [a] -> [a] |
+ | <haskell> |
− | rev [] = [] |
+ | len :: [a] -> Int |
− | rev (x:xs) = rev xs ++ [x] |
+ | len [] = 0 |
+ | len (x:xs) = len xs + 1 |
||
+ | </haskell> |
||
Compile and run a simple program like the following: |
Compile and run a simple program like the following: |
||
− | main = putStrLn $ show $ rev [1..10000] |
+ | <haskell> |
+ | main = print $ len [1..200000] |
||
+ | </haskell> |
||
− | Profiling the above, I found that it took 13.86 seconds to run, allocating 1,202,138,892 bytes of memory! That rev function took over 99% of both the program's time and space. |
+ | Profiling the above, I found that it took 0.20 seconds to run. Increasing that number to 1000000 results in a stack overflow. |
− | Looking at the function, we see it is recursive. The base step handles the case of reversing an empty list. The inductive step reverses the tail of the list, and adds the first element of the list onto the end of the list. By adding an accumulating parameter, we can make this tail recursive. |
+ | The above function seems to delegate most of the actual work to the recursive case. The only thing keeping it from being tail recursive is the requirement to increment the length of the remainder of the list. |
− | rev' :: [a] -> [a] -> [a] |
+ | We can move this increment step into an accumulating parameter. This is an extra parameter that allows us to carry information along in the computation. |
− | rev' [] acc = acc |
||
− | rev' (x:xs) acc = rev' xs (x:acc) |
||
− | Here's how the new version works. acc is an accumulating parameter. The base case simply returns the accumulating parameter. The inductive step takes the head of the list and puts it on the front of the accumulator. Then it tail recurses with the tail of the list and the new accumulator value. In order to reverse a list, the accumulator should have an initial value of []. It's a good idea to provide a wrapper for functions written this way, so that the users don't have to deal with the accumulator. |
+ | <haskell> |
+ | len' :: [a] -> Int -> Int |
||
+ | len' [] acc = acc |
||
+ | len' (x:xs) acc = len' xs (1 + acc) |
||
+ | </haskell> |
||
− | rev xs = rev' xs [] |
+ | Now the function is tail recursive. The accumulating parameter is returned in some form by the base case, and in the inductive case, we perform the step that prevented us from being truly tail recursive. We should provide a wrapper function that hides the detail of that accumulating parameter from users of the function. |
− | Profiling this new tail recursive reverse function yields impressive results. The run time of the program is now 0.02 seconds! The reversal operation takes an immeasurably small time, most of the time was occupied with constructing that list of numbers. |
+ | <haskell> |
+ | len xs = len' xs 0 |
||
+ | </haskell> |
||
− | The real performance gain came from the tail recursive implementation. Accumulating parameters is merely a means to turn an almost tail recursive implementation into a tail recursive implementation. The pattern to apply this technique to are ones which involve a tail recursion and a cons step. This latter step is performed on the accumulator and then passed into the tail recursive step. |
+ | Running the above program with the new definition gives me a runtime of 0.02 seconds. |
+ | |||
+ | The performance gain came from the tail recursive implementation. The problem with the stack overflow was the result of keeping a long list of computations to do after "this next step." The tail recursive version eliminated the need to store all these computational intermediaries. |
||
+ | |||
+ | Accumulating parameters is merely a means to turn an almost tail recursive implementation into a tail recursive implementation. The pattern to apply this technique to are ones which involve a tail recursion and a cons step. This latter step is performed on the accumulator and then passed into the tail recursive step. |
||
+ | |||
+ | We need to fix a small problem with regards to the stack overflow. The function would accumulate (1 + acc) thunks as we pass down the list. In general, it makes sense to make your accumulator parameters strict, since its value will be needed at the very end. A better implementation would be: |
||
+ | |||
+ | <haskell> |
||
+ | len' [] acc = acc |
||
+ | len' (x:xs) acc = len' xs $! (1 + acc) |
||
+ | </haskell> |
||
+ | |||
+ | This explicitly evaluates 1 + acc before performing the tail recursion. This prevents the stack overflow. |
||
+ | |||
+ | |||
+ | == Doing the right thing automatically using standard higher order functions == |
||
+ | |||
+ | If you want to be sure to use a tail recursive pattern |
||
+ | then use appropriate higher order functions from the standard libraries. |
||
+ | The first tail recursive version is equivalent to |
||
+ | |||
+ | <haskell> |
||
+ | len = foldl (\acc x -> 1+acc) 0 |
||
+ | </haskell> |
||
+ | |||
+ | and the second one can be written as |
||
+ | |||
+ | <haskell> |
||
+ | len = foldl' (\acc x -> 1+acc) 0 |
||
+ | </haskell> |
Latest revision as of 07:51, 13 December 2009
Haskell Performance Resource
Constructs: Techniques: |
[edit] 1 Accumulating Parameters: Getting rid of the 'almost' in "almost tail recursive"
Often times you will write a recursive function which is almost tail recursive. Consider this naïve implementation of a function to compute the length of a list.
len :: [a] -> Int len [] = 0 len (x:xs) = len xs + 1
Compile and run a simple program like the following:
main = print $ len [1..200000]
Profiling the above, I found that it took 0.20 seconds to run. Increasing that number to 1000000 results in a stack overflow.
The above function seems to delegate most of the actual work to the recursive case. The only thing keeping it from being tail recursive is the requirement to increment the length of the remainder of the list.
We can move this increment step into an accumulating parameter. This is an extra parameter that allows us to carry information along in the computation.
len' :: [a] -> Int -> Int len' [] acc = acc len' (x:xs) acc = len' xs (1 + acc)
Now the function is tail recursive. The accumulating parameter is returned in some form by the base case, and in the inductive case, we perform the step that prevented us from being truly tail recursive. We should provide a wrapper function that hides the detail of that accumulating parameter from users of the function.
len xs = len' xs 0
Running the above program with the new definition gives me a runtime of 0.02 seconds.
The performance gain came from the tail recursive implementation. The problem with the stack overflow was the result of keeping a long list of computations to do after "this next step." The tail recursive version eliminated the need to store all these computational intermediaries.
Accumulating parameters is merely a means to turn an almost tail recursive implementation into a tail recursive implementation. The pattern to apply this technique to are ones which involve a tail recursion and a cons step. This latter step is performed on the accumulator and then passed into the tail recursive step.
We need to fix a small problem with regards to the stack overflow. The function would accumulate (1 + acc) thunks as we pass down the list. In general, it makes sense to make your accumulator parameters strict, since its value will be needed at the very end. A better implementation would be:
len' [] acc = acc len' (x:xs) acc = len' xs $! (1 + acc)
This explicitly evaluates 1 + acc before performing the tail recursion. This prevents the stack overflow.
[edit] 2 Doing the right thing automatically using standard higher order functions
If you want to be sure to use a tail recursive pattern then use appropriate higher order functions from the standard libraries. The first tail recursive version is equivalent to
len = foldl (\acc x -> 1+acc) 0
and the second one can be written as
len = foldl' (\acc x -> 1+acc) 0