Haskell programming tips/Discussion
From HaskellWiki
Contents |
1 About
This page is meant for discussions about ThingsToAvoid, as consensus seems to be difficult to reach, and it'd be nice if newbies wouldn't bump into all kinds of Holy Wars during their first day of using Haskell ;)
You may want to add your name to your comments to make it easier to refer to your words and ridicule you in public.
2 Other Suggestions
This article is about elegance, so could we please inject some elegance into the article itself? Why do many of the functions have no type declaration? It took me quite some time to figure out the type declaration on foreach2: Monad m => [a] -> [b] -> (a -> b -> m c) -> m ()
These functions should actually be tested. The way linify is currently defined, it produces 4 GHC warnings. I do not even know how to get rid of 2 of them.
Readability could be considerably improved. At the moment, many sections start out fine, but then they suffer from a long list of additions which have not been properly integrated, so it reads something like this: Do this. Oh, but there is this too, and there is this caveat, but you could also do this, and performance is sometimes better if you do this, ...
May I also suggest that periods be left off the end of a sentence whose last word is a code section? Currently, the article is formatted such that I have this: Words words words, see my code here:
\n
[ Code section
Code section
Code section]
\n
.
\n
That is ridiculous. Just remove the period and the entailing massive whitespace.
3 Flame Away
3.1 Avoid recursion
Many times explicit recursion is the fastest way to implement a loop. e.g.
loop 0 _ acc = acc loop i v acc = ...
Using HOFs is more elegant, but makes it harder to reason about space usage, also explicit recursion does not make the code hard to read - just explicit about what it is doing.
-- EinarKarttunen
I disagree with this. Sometimes explicit recursion is simpler to design, but I don't see how it makes space usage any easier to reason about and can see how it makes it harder. By using combinators you only have to know the properties of the combinator to know how it behaves, whereas I have to reanalyze each explicitly implemented function. StackOverflow gives a good example of this for stack usage and folds. As far as being "faster" I have no idea what the basis for that is; most likely GHC would inline into the recursive version anyways, and using higher-order list combinators makes deforesting easier. At any rate, if using combinators makes it easier to correctly implement the function, then that should be the overriding concern.
-- DerekElkins
I read lots of code with recursion -- and it was hard to read, because it is hard to retrieve the data flow from it. -- HenningThielemann
IMO explicit recursion usually does make code harder to read, as it's trying to do two things at once: Recursing and performing the actual work it's supposed to do. Phrases like OnceAndOnlyOnce and SeparationOfConcerns come to the mind. However, the concern about efficiency is (partly) justified. HOFs defined for certain recursion patterns often need additional care to achieve the same performance as functions using explicit recursion. As an example, in the following code, two sum functions are defined using two equivalent left folds, but only one of the folds is exported. Due to various peculiarities of GHCs strictness analyzer, simplifier etc, the call from main to mysum_2 works, yet the call to mysum_1 fails with a stack-overflow.
module Foo (myfoldl_1, mysum_1, mysum_2) where -- exported myfoldl_1 f z xs = fold z xs where fold z [] = z fold z (x:xs) = fold (f z x) xs -- not exported myfoldl_2 f z xs = fold z xs where fold z [] = z fold z (x:xs) = fold (f z x) xs mysum_1 = myfoldl_1 (+) 0 mysum_2 = myfoldl_2 (+) 0
module Main where import Foo xs = [1..1000*1000] main = do print (mysum_2 xs) print (mysum_1 xs)
(Results might differ for your particular GHC version, of course...) -- RemiTurk
GHC made "broken" code work. As covered in StackOverflow, foldl is simply not tail-recursive in a non-strict language. Writing out mysum would still be broken. The problem here isn't the use of a HOF, but simply the use of non-tail-recursive function. The only "care" needed here is not relying on compiler optimizations (the code doesn't work in my version of GHC) or the care needed when relying on compiler optimizations. Heck, the potential failure of inlining (and subsequent optimizations following from it) could be handled by restating recursion combinator definitions in each module that uses them; this would still be better than explicit recursion which essentially restates the definition for each expression that uses it.
-- DerekElkins
Here is a demonstration of the problem - with the classic sum as the problem. Of course microbenchmarking has little sense, but it tells us a little bit which combinator should be used.
import Data.List import System sum' :: Int -> Int -> Int sum' 0 n = sum [1..n] sum' 1 n = foldl (\a e -> a+e) 0 [1..n] sum' 2 n = foldl (\a e -> let v = a+e in v `seq` v) 0 [1..n] sum' 3 n = foldr (\a e -> a+e) 0 [1..n] sum' 4 n = foldr (\a e -> let v = a+e in v `seq` v) 0 [1..n] sum' 5 n = foldl' (\a e -> a+e) 0 [1..n] sum' 6 n = foldl' (\a e -> let v = a+e in v `seq` v) 0 [1..n] sum' 7 n = loop n 0 where loop 0 acc = acc loop n acc = loop (n-1) (n+acc) sum' 8 n = loop n 0 where loop 0 acc = acc loop n acc = loop (n-1) $! n+acc main = do [v,n] <- getArgs print $ sum' (read v) (read n)
When executing with n = 1000000 it produces the following results:
* seq does not affect performance - as excepted. * foldr overflows stack - as excepted. * explicit loop takes 0.006s * foldl takes 0.040s * foldl' takes 0.080s
In this case the "correct" choice would be foldl' - ten times slower than explicit recursion. This is not to say that using a fold would not be better for most code. Just that it can have subtle evil effects in inner loops.
-- EinarKarttunen
This is ridiculous. The "explicit recursion" version is not the explicit recursion version of the foldl' version. Here is another set of programs and the results I get:
import Data.List import System paraNat :: (Int -> a -> a) -> a -> Int -> a paraNat s = fold where fold z 0 = z fold z n = (fold $! s n z) (n-1) localFoldl' c = fold where fold n [] = n fold n (x:xs) = (fold $! c n x) xs sumFoldl' :: Int -> Int sumFoldl' n = foldl' (+) 0 [1..n] sumLocalFoldl' :: Int -> Int sumLocalFoldl' n = localFoldl' (+) 0 [1..n] sumParaNat :: Int -> Int sumParaNat n = paraNat (+) 0 n sumRecursionNat :: Int -> Int sumRecursionNat n = loop n 0 where loop 0 acc = acc loop n acc = loop (n-1) $! n+acc sumRecursionList :: Int -> Int sumRecursionList n = loop [1..n] 0 where loop [] acc = acc loop (n:ns) acc = loop ns $! n+acc main = do [v,n] <- getArgs case v of "1" -> print (sumFoldl' (read n)) "2" -> print (sumLocalFoldl' (read n)) "3" -> print (sumParaNat (read n)) "4" -> print (sumRecursionNat (read n)) "5" -> print (sumRecursionList (read n))
(best but typical real times according to time of a few trials each)
sumFoldl' takes 2.872s sumLocalFoldl' takes 1.683s sumParaNat takes 0.212s sumRecursionNat takes 0.213s sumRecursionList takes 1.669s
sumLocalFoldl' and sumRecursionList were practically identical in performance and sumParaNat and sumRecursionNat were practically identical in performance. All that's demonstrated is the cost of detouring through lists (and the cost of module boundaries I guess).
-- DerekElkins
3.2 n+k patterns
n+k patterns are similar to the definition of infix functions, thus they make it harder to understand patterns. http://www.dcs.gla.ac.uk/mail-www/haskell/msg01131.html (Why I hate n+k)
So far I have seen only one rule for Good Coding Practice in Haskell: Do Not Use n+k Patterns. I hope someone can give some directions, how to avoid known pitfalls (especially Space Leaks). -- On the haskell mailing list
The most natural definition of many functions on the natural numbers is by induction, a fact that can very nicely be expressed with the (n+1)-pattern notation. Also, (n+k)-patterns are unlikely to produce space leaks, since if anything, they make the function stricter. The possible ambiguities don't seem to appear in real code. --ThomasJäger
If natural numbers would be defined by PeanoNumbers then pattern matching on successors would be straightforward. This would be fairly slow and space consuming, that's why natural numbers are not implemented this way. They are implemented using binary numbers and it is not even tried to simulate the behaviour ofLazyness/Strictness isn't really an argument in this situation, since when using a strict natural type, e.g.
data Nat = Zero | Succ !Nat
pattern matching on Nat behaves exactly like n+1 patterns. -- ThomasJaeger
n+k patterns also apply to negative numbers - don't they? Yes, I see the analogy but in the current implementation it's nothing than sugar. -- HenningThielemann
No, they don't. `let f (n+2) = n in f 1` is a runtime error. -- DerekElkins
But translating it into pattern matching is impossible, thus it must be a static error. -- HenningThielemann
3.3 Use syntactic sugar wisely
I have to say, i strongly disagree with most of what is said in this section. First of all the claim
Syntactic extensions make source code processors complicated and error prone. But they don't help to make programs safer (like type checks and contracts) or easier to maintain (like modularization and scoping).
is obviously wrong. There certainly are applications of syntatic sugar that make programs easier to read, therefore easier to understand, easier to maintain, and safer, as you are more likely to spot bugs.
- My statement was: Don't use syntactic sugar by default because you believe it makes your program more readable automatically (I've read lots of code of programmers who seem to believe that), but use syntactic sugar if (and only if) it makes the program more readable. Syntactic sugar is only a matter of readability not of safety in the sense of scoping and type checking. If I accidentally introduce inconsistencies into my code, the name resolver or the type checker will report problems, but not the de-sugarizer. -- HenningThielemann
ad. right sections are evil
I can't believe someone is seriously advocating to replace- Nobody advocated for replacing by($ x)this was just to demonstrate the problems arising with syntactic sugar. I believe that many people don't take that into account when requesting more sugar (such as parallel list comprehension). -- HenningThielemannflip ($) x
Infix notation is problematic for both human readers and source code formatters.
No, infix notation isn't problematic for human readers, it enables them to read the code faster in many cases.
- ... if he knows the precedences, which is only true for (some) of the predefined operators. I guess you don't know all of the precedences of the Prelude operators. I also use infix operations like (+) and (:) but I'm very concerned with introducing lots of new operator symbols and `f` notation. -- HenningThielemann
- Introducing new operators should definitely not be done careless (then again, one shouldn't be careless in programming anyway), and operator percedences might be better defined as a partial order. (e.g. there is an order between (+) and (*), and between (&&) and (||), but not between (+) and (&&)). Other proposals for replacing the current left/right associative + precedence system do exist. However, doing away with infix operators entirely appears to me to practically render combinator libraries unusable, which would make Haskell a lot less attractive. -- RemiTurk
- The nice thing about precedences in Haskell is that it's often not necessary to know them exactly in order
to use them. If the types of you operators are sufficiently general, or sufficiently distinct, only the sight way to parse them will lead to type-checking code, so you can just give it a shot without parenthesis and hopefully remember the precedence the next time you're in a similar situation. -- ThomasJaeger
- If you make it for readability, I agree, if you make it for fancyness, I disagree. In the case of it looks like the list can become longer, so it's worth of thinking about usingApp,foldland a list - though then I would certainly also use some syntactic sugar to enter the listApp. Btw. even the regular list notation is disputable since in the infix notation["f", "x", "y"]it is easier move elements around. So I end up with infix notation, ok. Even more,("f":"x":"y":[])shows clearly the left associativity, whereasfoldldoes not.`App`
- In the case of and`on`, I disagree.compose2is less informative thanon. And yes,compose2is a kind of generalization ofcompose2, though it is not the most popular one. -- HenningThielemann.
- You do indeed have a point there: it's indeed an extension of , which I incorrectly denied. However, as it's not the extension, and AFAIK not even the most used extension, I consider the name(.)to be slightly misleading. In addition, I think "group by equality on isAlpha" (compose2) is just too cute too resist. -- RemiTurkgroupBy ((==) `on` isAlpha)
- So, do you do it for readability or for fanciness? I find an infix function application hard to read, since it looks like just another argument. That is, reads very similar to((==) `on` isAlpha). In your example you really switch the prefered usage of the identifiers, you use the infix symbol((==) on isAlpha)in prefix notation and the prefix function name==in infix notation. -- HenningThielemannon
- So, do you do it for readability or for fanciness? I find an infix function application hard to read, since it looks like just another argument. That is,
- Any kind of syntax highlighting should make the difference between and((==) `on` isAlpha)obvious. Another argument for using an infix((==) on isAlpha)here is that it explains the order of the elements: Deciding ifonoron (==) isAlphais better and trying to remember which way the implementor choose is certainly more difficult than realizing thaton isAlpha (==)makes no sense (There are better examples for this such asisAlpha `on` (==), or think about the confusion between the order of arguments in`elem`vs.Data.FiniteMap). -- ThomasJaegerData.Map
- Any kind of syntax highlighting should make the difference between
- Btw. the function has found its way to the modulecomparing(http://www.haskell.org/ghc/dist/current/docs/libraries/base/Data-Ord.html). It is a composition ofData.Ordandcompare/on. However it does not satisfyingly help to implement extensions ofcompose2functions, because the key for sorting/grouping/maximising must be re-computed for each comparison if you write, say*By. -- HenningThielemannsortBy (comparing length)
- Btw. the function
- Of course, only works iffoldl Appmodels application in an untyped language. Using GADTs,Appcould be of typeApp, also, many combinator that works on different types can't be "sugared" using functions. -- ThomasJaegerExpr (a -> b) -> Expr a -> Expr b
Finally, there is no reason why one should expect a tool that processes haskell code not to be aware of Haskell 98's syntax. All mentioned syntactic extensions (list comprehension, guards, sections, infix stuff) fall under this category and can be used without any bad conscience.
Sorry for having become so "religous" -- ThomasJaeger
I agree. -- CaleGibbard
- If you want a good example for unnecessary sugar, take this one:
tuples :: Int -> [a] -> [[a]] tuples 0 _ = return [] tuples (r+1) xs = do y:ys <- tails xs (y:) `fmap` tuples r ys
- Why is infix more readable than prefix`fmap`? Where is the analogy tofmap? Why don't you usemapat all? I seemapas an operator which lifts scalar functions to list functions, this is perfectly expressed by prefix notation. What is the readability benefit ofmappattern and why is(r+1)more readable than explicitdohere? (Mostly because this is not the correct translation and the correct translation is unreadable -- DerekElkins) It's even worse thaninit (tails xs) >>= (\(y:ys) -> map (y:) (tuples (r-1) ys)). You rewrote my code just with sugar but the structure which must be understood remained the same. Sorry, I don't find it easier to understand. Maybe people who believe a common notation rather than to try to understand the meaning are happy with the sugar. -- HenningThielemann[(y:) `fmap` tuples r ys | (y:ys) <- tails xs]
- The pattern is exactly the reason the do-notation was introduced, so each time I write something like this, I replace it with a do notation for the following reason: It is definitely the more common style (nobody is usingm >>= \x -> f x-style these days), so much more likely to be understood faster (at least for myself), the do notation expresses nicely that monadic (in this case notdeterminstic) effects are taking place, and finally it is much easier to make changes to the code if it's in do-form (e.g. add additional guards). Of course you CAN do the same changes inm >>= \x -> \n-style, too, after all there is a straightforward translation (although complicated by the fact that you have to check if pattern matchings are exhaustive), but I'm not the kind of guy who does all kinds of verbose translation in his head just because he wants to stay away from syntactic sugar.>>=
- I disagree with arguments like "nobody is using ...". What does it tell about the quality of a technique? I write here to give reasons against too much syntactic sugar rather than to record today's habits of some programmers. -- HenningThielemann
- You are further critizing that I am using instead of the more specialfmap. I find it natural to usemapin monadic code to abstract from lists. If it weren't forfmap, the code would even have typetails, increasing usability.(MonadPlus m, Functor m) => Int -> [a] -> m [a]would also be acceptable (and for some strange reason even slightly faster), but it feels awfully wrong to me to useliftM, so that I'm willing to live with additionalliftMconstraints. This is also the reason while your list comprehension solution is clearly inferior to a monadic one.Functor
- One more thing about pattern match failure vs. . Though it doesn't matter match in this simple example, the version exploiting pattern match failure is closer to the conceptional algorithm, because it doesn't rely on the "low-level-property" ofinitthat the empty list is the last element of the returned list (I can never remember this kind of things, even though one of the possible behaviors makes much more sense).tails
- The function is defined by recursion on an Int and only uses the case of the predecessor, so this is a classical example for (n+1)-patterns. Note that the LHSs in your implementation are overlapped, so a reader might need more time to figure out what is going on in your implementation (I admit the effect is small, but this is a very tiny example).tuples
- Using infix is a personal habit of mine, but when you think about it, it makes a lot of sense. As we can't overload a space, it's the closest thing to application syntax we can get. I know you preferfmap-style, which seems more difficult to understand for most people. This just is a matter of personal style. Please do not mistake your own personal style for the only sensible one.App (App f x) y
- If no one else objects, I'd like to put my implementation back on the main page, possibly with a less controversial comment. --ThomasJaeger
- My argument is that the syntactic sugared version may be readable like the unsugared version, but it does not improve the readability, thus it should be avoided. Sugar shouldn't be the default, it shouldn't used just because it exists. That's the basic opinion where we disagree. Btw. I find the notation in connection with thedomonad very confusing because it looks imperative and it suggests that first something is chosen from the list then it is returned. -- HenningThielemannList
- My argument is that the syntactic sugared version may be readable like the unsugared version, but it does not improve the readability, thus it should be avoided. Sugar shouldn't be the default, it shouldn't used just because it exists. That's the basic opinion where we disagree. Btw. I find the
- While it may not be more readable for you, it is for me, for the reasons I'm getting tired of stating. Also, your opinions on the do-notation seem very strange to me. If we have monads - a way to unify different notions of effects - why make the artificial distinction between IO/State effects and more general ones again? The do-notation expresses in which order the effects are happening - that's the same for a list and an IO monad. However, a distinction between commutative and non-commutative monads would make sense, but unfortunately, there's no way to prove the commutativity of a monad statically in Haskell.
There are still issues that aren't implemented in GHC which belong to the Haskell 98 standard and which are of more importance, I think, such as mutual recursive modules and some special case of polymorphic mutual function recursion. So I don't vote for wasting the time with syntactic sugar when real enhancements are deferred by it. If I would write a Haskell code processor I would certainly prevent me from the trouble of supporting guards and (n+k) patterns. I'm also fed up with the similar situation in HTML with its tons of extensions and the buggy code which is accepted by most browsers (which is also a sort of inofficial extension) - there is simply no fun in processing HTML code. Not to mention C++.
By the way I'd like to have a real function- Agrees with that. I'm not using all that often, and could easily add a few braces. And, it would freeifandthenfor normal identifier use. -- RemiTurkelse
Personally, I like the explicit `then` and `else` and find that they help when reading code to separate where the break is between the sections. It's not that I necessarily disagree with the inclusion of such a function, it is an easy one to write in any case, but I think that some sugar in the form of a few extra words to mark important points in common structures is useful. Human languages have many such words, and their presence makes reading or listening much easier. - CaleGibbard
- Other people seem to have problems with this special syntax, too. And they propose even more special syntax to solve the problem. http://hackage.haskell.org/trac/haskell-prime/wiki/DoAndIfThenElse -- HenningThielemann
