Haskell programming tips
|Line 548:||Line 548:|
It should be noted that <hask>length</hask>, <hask>take</hask> and others wouldn't cause headache if they would count using [[Peano numbers]] as shown below.
It should be noted that <hask>length</hask>, <hask>take</hask>
can be replaced by <hask>genericLength</hask>, <hask>genericTake</hask> et.al.,
which allow the usage of [[Peano numbers]].
===Don't ask for the minimum when you don't need it===
===Don't ask for the minimum when you don't need it===
Revision as of 12:10, 3 July 2007
This page shows several examples of how code can be improved. We try to derive general rules from them, though they can not be applied deterministicly and are a matter of taste. We all know that, please don't add "this is disputable" to each item!
Instead, you can now add "this is disputable" on /Discussion and change this page only when some sort of consensus is reached.
2 Be concise
2.1 Don't reinvent the wheel
The standard libraries are full of useful functions, possibly too full. If you rewrite an existing function, the reader wonders what the difference to the standard function is. But if you use a standard function, the reader may learn something new and useful. If you have problems finding an appropriate list function, try this guide:
2.2 Avoid explicit recursion
Explicit recursion is not generally bad, but you should spend some time on trying to find a more declarative implementation using higher order functions.
raise :: Num a => a -> [a] -> [a] raise _  =  raise x (y:ys) = x+y : raise x ys
because it is hard for the reader to find out how much of the list is processed and on which values the elements of the output list depend. Just write
raise x ys = map (x+) ys
raise x = map (x+)
and the reader knows that the complete list is processed and that each output element depends only on the corresponding input element.
If you don't find appropriate functions in the standard library, extract a general function. This helps you and others understanding the program. Haskell is very good at factoring out parts of the code. If you find it very general, put it in a separate module and re-use it. It may appear in the standard libraries later, or you may later find that it is already there.Decomposing a problem this way has also the advantage that you can debug easier. If the last implementation of
Could this be stated more generally? It seems to me this is a special case of the general principle of separating concerns: iteration over a collection vs operating on elements of a collection should apply. If you can write the loop over a data structure (list, tree, whatever) once and debug it, then you don't need to duplicate that code over and over (at least in haskell), so your code can follow the principle of Wiki:OnceAndOnlyOnce ; Wiki:OnceAndOnlyOnce is a lot harder in languages that don't provide a certain level of functional programming support (i.e. Java requires copy and paste programming, the delegate C# syntax is clumsy but workable - using it is almost Wiki:GoldPlating).
which fulfill a certain property,i.e. the elements for which the predicate
I found the following code (but convoluted in a more specific function) in a Haskell program
count :: (a -> Bool) -> [a] -> Int count _  = 0 count p (x:xs) | p x = 1 + count p xs | otherwise = count p xs
which you won't like any longer if you become aware of
count p = length . filter p
2.3 Only introduce identifiers you need
Here is some advice that is useful for every language, including scientific prose
Introduce only identifiers you use.
The compiler will check that you if you pass an option like
-Wall for GHC.
In an expression like
[a | i <- [1..m]]
replicate m a
is certainly better here.
2.4 Remember the zeroDon't forget that zero is a natural number. Recursive definitions become more complicated if the recursion anchor is not chosen properly. As an example I have chosen the function
tuples :: Int -> [a] -> [[a]] tuples r l | r == 1 = [[el] | el <- l] | length l == r = [l] | otherwise = (map ([head l] ++) (tuples (r-1) (tail l))) ++ tuples r (tail l)
Do you have an idea what it does?
Let's strip the guards and forget about list comprehension.
tuples :: Int -> [a] -> [[a]] tuples 1 l = map (:) l tuples r l = if r == length l then [l] else let t = tail l in map (head l :) (tuples (r-1) t) ++ tuples r t
What about tuples with zero elements? We can add the pattern
tuples 0 _ = []
but then we can also omit the pattern for 1-tuples.
tuples :: Int -> [a] -> [[a]] tuples 0 _ = [] tuples r l = if r == length l then [l] else let t = tail l in map (head l :) (tuples (r-1) t) ++ tuples r t
tuples :: Int -> [a] -> [[a]] tuples 0 _ = [] tuples r l = if r > length l then  else let t = tail l in map (head l :) (tuples (r-1) t) ++ tuples r t
tuples :: Int -> [a] -> [[a]] tuples 0 _ = [] tuples _  =  tuples r (x:xs) = map (x :) (tuples (r-1) xs) ++ tuples r xs
You can even save one direction of recursionby explicit computation of the list of all suffixes provided by
You can do this with do notation
tuples :: Int -> [a] -> [[a]] tuples 0 _ = [] tuples r xs = do y:ys <- tails xs map (y:) (tuples (r-1) ys)
tuples :: Int -> [a] -> [[a]] tuples 0 _ = [] tuples r xs = concatMap (\(y:ys) -> map (y:) (tuples (r-1) ys)) (init (tails xs))
but this ends with a "Prelude.tail: empty list".
More generally, Base cases and identities
2.5 Don't overuse lambdas
Like explicit recursion, using explicit lambdas isn't a universally bad idea, but a better solution often exists. For example, Haskell is quite good at currying. Don't write
zipWith (\x y -> f x y) map (\x -> x + 42)
zipWith f map (+42)
also, instead of writing
-- sort a list of strings case insensitively sortBy (\x y -> compare (map toLower x) (map toLower y))
comparing p x y = compare (p x) (p y) sortBy (comparing (map toLower))
which is both clearer and re-usable.Actually, starting with GHC-6.6 you do not need to define
(Just a remark for this special example: We can avoid multiple evaluations of the conversions.
sortKey :: (Ord b) => (a -> b) -> [a] -> [a] sortKey f x = map snd (sortBy (comparing fst) (zip (map f x) x))
As a rule of thumb, once your expression becomes too long to easily be point-freed, it probably deserves a name anyway. Lambdas are occasionally appropriate however, e.g. for control structures in monadic code (in this example, a control-structure "foreach2" which most languages don't even support.):
foreach2 xs ys f = zipWithM_ f xs ys linify :: [String] -> IO () linify lines = foreach2 [1..] lines $ \lineNr line -> do unless (null line) $ putStrLn $ shows lineNr $ showString ": " $ show line
2.6 Bool is a regular type
Logic expressions are not restricted to guards and
Avoid verbosity like in
isEven n | mod n 2 == 0 = True | otherwise = False
since it is the same as
isEven n = mod n 2 == 0
3 Use syntactic sugar wisely
People who employ syntactic sugar extensively argue that their code becomes more readable by it. The following sections show several examples where less syntactic sugar is more readable.
It is argued that a special notation is often more intuitive than a purely functional expression. But the term "intuitive notation" is always a matter of habit. You can also develop an intuition for analytic expressions that don't match your habits at the first glance. So why not making a habit of less sugar sometimes?
3.1 List comprehension
List comprehension let you remain in imperative thinking, that is it let you think in variables rather than transformations. Open your mind, discover the flavour of the pointfree style!
[toUpper c | c <- s]
map toUpper s
[toUpper c | s <- strings, c <- s]
where it takes some time for the reader to find out which value depends on what other value and it is not so clear how many timesthe interim values
In contrast to that
map toUpper (concat strings)
can't be clearer.
When using higher order functions you can switch easier todata structures different from
map (1+) list
mapSet (1+) set
.If there would be a standard instance for the
you could use the code
fmap (1+) pool
for both choices.
If you are not used to higher order functions for list processing you feel like needing parallel list comprehension. This is unfortunately supported by GHC now,but somehow superfluous since various flavours of
3.2 do notation
do text <- readFile "foo" writeFile "bar" text
one can write
readFile "foo" >>= writeFile "bar"
do text <- readFile "foo" return text
can be simplified to
by a law that each Monad must fulfill.
You certainly also agree that
do text <- readFile "foobar" return (lines text)
is more complicated than
liftM lines (readFile "foobar")
.By the way, in the case of
Guards look like
-- Bad implementation: fac :: Integer -> Integer fac n | n == 0 = 1 | n /= 0 = n * fac (n-1)
which implements a factorial function. This example, like a lot of uses of guards, has a number of problems.The first problem is that it's nearly impossible for the compiler to check if guards like this are exhaustive, as the guard conditions may be arbitrarily complex (Ghc will warn you if you use the
-Walloption). To avoid this problem and potential bugs through non exhaustive patterns you should use an
-- Slightly improved implementation: fac :: Integer -> Integer fac n | n == 0 = 1 | otherwise = n * fac (n-1)
-- Less sugar (though the verbosity of if-then-else can also be considered as sugar :-) fac :: Integer -> Integer fac n = if n == 0 then 1 else n * fac (n-1)
But in this special case, the same can be done even more easily with pattern matching:
-- Good implementation: fac :: Integer -> Integer fac 0 = 1 fac n = n * fac (n-1)
Actually, in this case there is an even more easier to read version, which (see above) doesn't use Explicit Recursion:
-- Excellent implementation: fac :: Integer -> Integer fac n = product [1..n]
Note however, that there is a difference between this version and the previous ones: When given a negative number, the previous versions do not terminate (until StackOverflow-time), while the last implemenation returns 1.
Guards don't always make code clearer. Compare
foo xs | not (null xs) = bar (head xs)
foo (x:_) = bar x
or compare the following example using the advanced pattern guards (http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#PATTERN-GUARDS)
parseCmd ln | Left err <- parse cmd "Commands" ln = BadCmd $ unwords $ lines $ show err | Right x <- parse cmd "Commands" ln = x
with this one with no pattern guards:
parseCmd ln = case parse cmd "Commands" ln of Left err -> BadCmd $ unwords $ lines $ show err Right x -> x
parseCmd :: -- add an explicit type signature, as this is now a pattern binding parseCmd = either (BadCmd . unwords . lines . show) id . parse cmd "Commands"
By the way, a compiler has also problems with numerical patterns. E.g. the pattern
data Foo = Foo deriving (Eq, Show) instance Num Foo where fromInteger = error "forget it" f :: Foo -> Bool f 42 = True f _ = False
*Main> f 42 *** Exception: forget it
Only use guards if you need to, in general you should stick to pattern matching whenever possible.
3.4 n+k patterns
In order to allow pattern matching against numerical types, Haskell 98 provides so-called n+k patterns, as in
take :: Int -> [a] -> [a] take (n+1) (x:xs) = x: take n xs take _ _ = 
However, they are often critizised for hiding computational complexity and producing ambiguties, see /Discussion for details. They are subsumed by the more general Views proposal, which was unfortunately never implemented despite being around for quite some time now.
4 Efficiency and infinity
A rule of thumb is: If a function makes sense for an infinite data structure but the implementation at hand fails for an infinite amount of data, then the implementation is probably inefficient also for finite data.
4.1 Don't ask for the length of a list when you don't need it
length x == 0
x == 
The best to do is
atLeast :: Int -> [a] -> Bool atLeast 0 _ = True atLeast _  = False atLeast n (_:ys) = atLeast (n-1) ys
atLeast :: Int -> [a] -> Bool atLeast n x = n == length (take n x)
or non-recursive but fairly efficient
atLeast :: Int -> [a] -> Bool atLeast n = if n>0 then not . null . drop (n-1) else const True
atLeast :: Int -> [a] -> Bool atLeast 0 = const True atLeast n = not . null . drop (n-1)
The same problem arises if you want to shorten a list to the length of another one by
take (length x) y
But this can be useful to extract a finite prefix from an infinite list. So, instead
zipWith const y x
works well.It should be noted that
which allow the usage of Peano numbers.
4.2 Don't ask for the minimum when you don't need itThe function
isLowerLimit :: Ord a => a -> [a] -> Bool isLowerLimit x ys = x <= minimum ys
Compare it with
isLowerLimit x = all (x<=)
4.3 Use sharing
If you want a list of lists with increasing length and constant content, don't write
map (flip replicate x) [0..]
because this needs quadratic space and run-time. If you code
iterate (x:) 
then the lists will share their suffixes and thus need only linear space and run-time for creation.
4.4 Choose the right fold
See Stack overflow for advice on which fold is appropriate for your situation.
5 Choose types properly
5.1 Lists are not good for everything
5.1.1 Lists are not arrays
Lists are not arrays, so don't treat them as such.Frequent use of
This is very inefficient.
If you access the elements progressively like in
[x !! i - i | i <- [0..n]]
you should try to get rid of indexing like in
zipWith (-) x [0..n]
If you really need random access like in the Fourier Transformyou should switch to
5.1.2 Lists are not sets
If you manage data sets where each object can occur only once and the order is irrelevant, if you use list functions like
frequently, you should think about switching to sets. If you need multi-sets, i.e. data sets with irrelevant order but multiple occurence of an objectyou can use a
5.1.3 Lists are not finite maps
Similarly, lists are not finite maps, as mentioned on efficiency hints.
5.2 Reduce type class constraints
5.2.1 Eq type classWhen using functions like
Example:The following function takes the input list
Clear what it does? No? The code is probably more understandable
removeEach :: (Eq a) => [a] -> [[a]] removeEach xs = map (flip List.delete xs) xs
but it should be replaced by
removeEach :: [a] -> [[a]] removeEach xs = zipWith (++) (List.inits xs) (tail (List.tails xs))
5.3 Don't use Int when you don't consider integers
Before using integers for each and everything (C style) think of more specialised types.If only the values
If there are more but predefined choices and numeric operations aren't needed try an enumeration.
type Weekday = Int
data Weekday = Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday deriving (Eq, Ord, Enum)
You cannot accidentally mix up weekdays with numbers and the signature of a function with weekday parameter clearly states what kind of data is expected.
If an enumeration is not appropriateyou can define a
E.g. if you want to associate objects with a unique identifier,you may want to choose the type
newtype Identifier = Identifier Int deriving Eq
6.1 Separate IO and data processing
It's not good to use the IO Monad everywhere, much of the data processing can be done without IO interaction. You should separate data processing and IO because pure data processing can be done purely functionally, that is you don't have to specify an order of execution and you don't have to worry about what computations are actually necessary. You can easily benefit from lazy evaluation if you process data purely functionally and output it by a short IO interaction.
-- import Control.Monad (replicateM_) replicateM_ 10 (putStr "foo")
is certainly worse than
putStr (concat $ replicate 10 "foo")
do h <- openFile "foo" WriteMode replicateM_ 10 (hPutStr h "bar") hClose h
can be shortened to
writeFile "foo" (concat $ replicate 10 "bar")
in case of failure.
A function which computes a random value with respect to a custom distribution(
can be defined via IO
randomDist :: (Random a, Num a) => (a -> a) -> IO a randomDist distInv = liftM distInv (randomRIO (0,1))
but there is no need to do so. You don't need the state of the whole world just for remembering the state of a random number generator. What about
randomDist :: (RandomGen g, Random a, Num a) => (a -> a) -> State g a randomDist distInv = liftM distInv (State (randomR (0,1)))
6.2 Forget about quot and rem
They complicate handling of negative dividends.
a == b * div a b + mod a b mod a b < b mod a b >= 0
This seems to be more an issue of experience rather than one of a superior reason. You might argue, that the sign of the dividend is more important for you, than that of the divisor. However, I have never seen such an application,but many uses of
- Conversion from a continuously counted tone pitch to the pitch class, like C, D, E etc.: mod p 12
- Pad a list to a multiple ofxsnumber of elements:mxs ++ replicate (mod (- length xs) m) pad
- Conversion from a day counter to a week day: mod n 7
- Pacman runs out of the screen and re-appears at the opposite border: mod x screenWidth
6.3 Partial functions like fromJust and head
Avoid functions that fail for certain input values like
They raise errors that can only be detected at runtime. Think about how they can be avoided by different program organization or by choosing more specific types. Instead of
if i == Nothing then deflt else fromJust i
fromMaybe deflt i
See also #Reduce type class constraints.If it is not possible to avoid
fromMaybe (error "Function bla: The list does always contains the searched value") (lookup key dict)