<br><font size=2 face="sans-serif">Hello,</font>
<br>
<br><tt><font size=2>> I offer up the following example:<br>
> </font></tt>
<br><tt><font size=2>This is an instructive example.</font></tt>
<br><tt><font size=2><br>
> mean xs = sum xs / length xs<br>
> </font></tt>
<br><tt><font size=2>In order to type-check, I actually need to write something
like:</font></tt>
<br>
<br><tt><font size=2> mean xs = sum xs / fromIntegral (length
xs)</font></tt>
<br>
<br><tt><font size=2>There are other ways of get the numeric types to match
correctly, but this is fairly general. </font></tt>
<br>
<br><tt><font size=2>Then, I immediately blow my stack if I try something
like: </font></tt>
<br>
<br><tt><font size=2> mean [1..1000000000].</font></tt>
<br>
<br><tt><font size=2>The culprit is actually sum which is defined in the
base libraries as either a foldl or a direct recursion depending on a compiler
flag. In either case, the code is not strict enough; just trying to compute:</font></tt>
<br>
<br><tt><font size=2> sum [1..10000000] </font></tt>
<br>
<br><tt><font size=2>blows the stack. This can be easily fixed by defining
a suitable strict sum:</font></tt>
<br>
<br><tt><font size=2> sum' = foldl' (+) 0</font></tt>
<br>
<br><tt><font size=2>and now sum' has constant space. We could try to redefine
mean using sum':</font></tt>
<br>
<br><tt><font size=2> mean1 xs = sum' xs / fromIntegral (length
xs)</font></tt>
<br>
<br><tt><font size=2>but this still gobbles up memory. The reason is that
xs is used twice and cannot be discarded as it is generated. So we must
move to a direct fold, as you did, to get a space efficient mean.</font></tt>
<br><tt><font size=2><br>
> If we now rearrange this to<br>
> <br>
> mean = (\(s,n) -> s / n) . foldr (\x (s,n) -> let s'
= s+x; n' = n+1 <br>
> in s' `seq` n' `seq` (s', n')) (0,0)<br>
> <br>
> and run the same example, and watch it run in constant space.<br>
> <br>
This code actually blows the stack on my machine just like the first naive
mean. Foldl is perhaps more intuitive to use here, since we are summing
the numbers as we encounter them while walking down the list, and there
is a strict version, foldl', provided in the base libraries.</font></tt>
<br>
<br><tt><font size=2> mean2 = uncurry (/) . foldl' (\(s,n)
x -> (s+x, n+1)) (0,0)</font></tt>
<br>
<br><tt><font size=2>However, this still gobbles up memory... the reason
is that pairs are lazy. So we need a way to force the (s+x) and (n+1).
An easy, and unobtrusive way to do this is to use strict pattern matching:</font></tt>
<br>
<br><tt><font size=2> mean2 = uncurry (/) . foldl' (\(!s,
!n) x -> (s+x, n+1)) (0,0)</font></tt>
<br>
<br><tt><font size=2>Now we can run:</font></tt>
<br>
<br><tt><font size=2> mean2 [1..1000000000]</font></tt>
<br>
<br><tt><font size=2>in constant space.</font></tt>
<br>
<br><tt><font size=2>While using an explicit foldl is less readable than
the direct version (especially to a beginner), it is a standard functional
idiom. Furthermore, a good understanding of lazy evaluation should be enough
to guide you to using the strict foldl' and then then strict patterns.
Is this a reasonable analysis?</font></tt>
<br>
<br><tt><font size=2>Also, we've made no attempt to address speed. However,
I would argue that the code's performance time is predictable-- it grows
linearly with the size of the list. Improving the performance time is another
matter that requires knowing about the internal representation of the various
types being used.</font></tt>
<br>
<br><tt><font size=2>-Jeff</font></tt>
<br>
<br>
<br>
<span style="font-family:'Arial',sans-serif; font-size:8pt; color:#000000">---<br>
<br>
This e-mail may contain confidential and/or privileged information. If you <br>
are not the intended recipient (or have received this e-mail in error) <br>
please notify the sender immediately and destroy this e-mail. Any <br>
unauthorized copying, disclosure or distribution of the material in this <br>
e-mail is strictly forbidden.</span>