Programming languages such as
C/C++/Java/Python are called imperative programming languages
because they consist of sequences of actions. The programmer tells
the computer how to perform a task, step-by-step.
Functional programming languages work differently. Rather than
performing actions in a sequence, they evaluate expressions.
The level of Abstraction
There are two areas that are fundamental to programming a computer
- resource management and sequencing. Resource
management (allocating registers and memory) has been the target of
vast abstraction, most new languages (imperative as well as
functional) have implemented garbage collection to remove
resource management from the problem, and lets the programmer focus
on the algorithm, not the book-keeping task of allocating memory.
Sequencing have also undergone some abstraction, although not
nearly to the same extent. Imperative languages have done so by
introducing new keywords and standard libraries. Most imperative
languages have, for example, special syntax for constructing a loop,
you no longer have to do all the tasks of managing this loop
yourself.
But imperative languages are based upon the notion of
sequencing - they can never escape it completely. The only way to
raise the level of abstraction in the sequencing area for an
imperative language, is to introduce more keywords or standard
functions, thus cluttering up the language (see Python and Java).
This close relationship between imperative languages and the task
of sequencing commands for the processor to execute means that
imperative languages can never rise above the task of
sequencing (if they do, they're not imperative anymore!), and as
such can never reach the same level of abstraction that functional
programming languages can.
In Haskell, the sequencing task is removed. You only care what
the program is to compute not how or when it is computed.
This makes Haskell a more flexible and easy to use language.
Haskell tends to be part of the solution for a problem, not a
part of the problem itself.
Using Functional Languages
You've probably already used a functional language without even
knowing it! The expression language in Microsoft Excel is
functional. You don't write any sequences, you simply tell the
program that the contents of a cell is the result of combining other
cells in some manner ("= A1 + A2" for instance).
So all this time you've been programming in a functional language,
and you probably didn't even consider it programming, this is
because functional programming is more intuitive. Things
behave like you'd expect them too. It's only when you've become used
to the imperative model that functional programming seems strange.
To non-programmers functional languages offer a smaller
semantic gap between the programmer and the language.
Functions and Side-Effects in Functional Languages Functions play an important role in functional
programming languages. Functions are considered to be values just
like Int or String. A function can return another function,
it can take a function as a parameter, and it can even be
constructed by composing functions.
This offers a stronger "glue" to combine the modules of your
program. A function that evaluates some expression can take part of
the computation as an argument for instance, thus making the
function more modular.
You could also have a function construct another function.
For instance, you could define a function "differentiate" that
will differentiate a given function numerically. So if you then have
a function "f" you could define f' = differentiate f, and use it
like you would normally do in a mathematic context.
These types of functions are called higher order functions,
to quote John Hughes "A powerful name, for a powerful feature".
Here is a short Haskell example of a function numOf that
counts the number of elements in a list that satisfy a certain
property.
numOf p xs = length (filter p xs)
We will discuss Haskell syntax later, but what this line says is
just "To get the result filter the list xs by the test p
and compute the length of the result".
Now p is a function that takes an element and returns True
or False determining whether the element passes or fails
the test. So numOf is a higher order function, some of the
functionality is passed to it as an argument. Notice that filter
is also a higher order function, it takes the "test function" as an
argument.
Let's play with this function and define some more specialized
functions from it.
numOfEven xs = numOf even xs
Here we define the function numOfEven which counts the
number of even elements in a list. Note that we do not need to
explicitly declare xs as a parameter. We could just as well write numOfEven
= numOf even. A very clear definition indeed. But we'll
explicitly type out the parameters for now.
Let's define a a function which counts the number of elements that
are greater or equal to 5 :
numOfGE5 xs = numOf (>=5) xs
Here the test function is just ">=5" which is passed to numOf
to give us the functionality wee need.
Hopefully you should now see that the modularity of functional
programming allows us to define a generic functions where some of
the functionality is passed as an argument, which we can later use
to define shorthands for any specialized functions
This small example is somewhat trivial, it wouldn't
be to hard too
re-write the function definition for all the functions above, but
for more complex functions this comes in handy.
You can, for instance, write only one function for
traversing an auto-balancing binary tree (so called Red Black
Tree.) and have it take some of the functionality as a
parameter (for instance the comparing function). This would allow
you to traverse the tree for any data type by simply providing
the relevant comparison function for your needs. Thus you can
expend some effort in making sure the general function is
correct, and then all the specialized functions will also be
correct. Not to mention you wouldn't have to copy and paste code
all over your project.
This concept is possible in imperative languages as well. In Java
you often have to provide a Comparator object for trees and other
standard data structures. The difference is that the Haskell way is
a lot more intuitive and elegant (creating a separate type just for
comparing two other types and then passing an object of this type is
hardly an elegant way of doing it), so it's more likely to be used
frequently (and not just in the standard libraries).
A central concept in functional languages is that the result of a
function is determined by its input, and only by its
input. There are no side-effects! This extends to variables
as well - variables in Haskell do not vary (just like you'd
expect from previous experiences with variables, from Mathematics
for instance).
This
can be very hard to grasp for programmers that are used to
imperative languages (where most programs consists almost entirely
of assignments to state changing variables). But once you wrap your
head around the functional paradigm you'll clearly see the
benefits.
So when you start thinking things like "This is too weird, I'm
going back to C++", force yourself to continue learning Haskell -
you'll be glad you did.
Removing side-effects from the equation allows expressions to be
evaluated in any order (although not all functional
languages use this). A function will always return the same
result if passed the same input.
This determinism removes a whole class of bugs found in imperative
programs. In fact, I would even go as far as to say that most
bugs in large systems can be traced back to side-effects - if not
directly caused by it, then caused by a flawed design made
possible only by using side-effects.
Functional programs will most often have less bugs!
Conclusion
Because functional languages are more intuitive and offer more and
easier ways to get the job done, functional programs tend to be
shorter (between 2 to 10 times shorter). The semantics are most
often a lot closer to the problem than an imperative version, which
makes it easier to verify that a function is correct. Furthermore
Haskell doesn't allow side-effects, which leads the less bugs.
Thus Haskell programs are easier to write, more stable, and
easier to maintain.
What can Haskell offer the programmer?
Haskell is a modern general purpose language
developed to incorporate the collective wisdom of the functional
programming community into one elegant, powerful and generallanguage.
Purity
Unlike some other functional programming languages Haskell is pure.
It doesn't allow any side-effects. This is clearly the
most important feature of Haskell. We've already briefly discussed
the benefits of pure, side-effect free, programming - and there's
not much more we can say about that. You'll need to experience it
yourself.
Laziness
Another feature of Haskell is that it is lazy. This means that
nothing is evaluated until it has to be evaluated. You could,
for instance, define an infinite list of primes without ending up in
infinite recursion. Only the elements of this list that are
actually used will be computed. This allows for some very
elegant solutions to many problems. A typical pattern of solving a
problem would be to define a list of all possible solutions
and then filtering away the illegal ones. The remaining list will
then only contain legal solutions. Lazy evaluation makes this
operation very clean. If you only need one solution you can simply
extract the first element of the resulting list - lazy evaluation
will make sure that nothing is needlessly computed.
Strong Typing
Furthermore Haskell is strongly typed, this means just what
it sounds like. It's impossible to inadvertedly convert a Double to
an Int, or follow a null pointer. This also leads to less bugs. It
might be a pain in the neck in the rare cases where you need to
convert an Int to a Double explicitly before performing some
operation, but in practice this doesn't happen often enough to
become a nuisance. In fact, forcing each conversion to be explicit
often helps to highlight problem code.
In other languages where these conversions are invisible, problems
often arises when the compiler treats a double like an integer or an
integer like a pointer.
Unlike other strongly typed languages types in Haskell are automatically
inferred. This means that you very rarely have to declare the
types of your functions, except as a means of code documentation.
Haskell will look at how you use the variables and figure out
from there what type the variable should be - then it will all be
type-checked to ensure there are no type-mismatches.
So you get the benefits of strong typing (bugs are caught at
compile-time, rather than run-time) without the hassle that comes
with it in other languages.
Furthermore Haskell will always infer the most general type
on a variable. So if you write, say, a sorting function without a
type declaration, Haskell will make sure the function will work for all
values that can be sorted.
Compare how you would do this in Java. To gain polymorphism you
would have to use some base class, and then declare your variables
as instances of subclasses to this base class. It all amounts to
tons of extra work and ridiculously complex declarations just to
proclaim the existence of a variable. Furthermore you would have to
perform tons of type conversions via explicit casts - defiantly not
a particularly elegant solution.
If you want to write a polymorphic function in Java you would
probably declare the parameters as an object of type Object, which
essentially allows the programmer to send anything into the
function, even objects which can't logically be passed to the
function. The end result is that most functions you write in Java
are not general, the only work on a single data type.
You're also moving the error checking from compile-time to
run-time. In large systems where some of the functionality is rarely
used, these bugs might never be caught until they cause a fatal
crash at the worst possible time.
Haskell provides an elegant, concise and safe way to write
your programs. Programs will not crash unexpectedly, and there will
be no core dumps!
Elegance
Another property of Haskell that is very important to the
programmer, even though it doesn't mean as much in terms of
stability or performance, is the elegance of Haskell. To put it
simply: stuff just works like you'd expect it to.
To highlight the elegance of Haskell we shall now take a look at a
small example. We choose QuickSort because it's a simple algorithm
that is actually useful. We will look at two versions - one written
in C++, an imperative language, and one written in Haskell.
Both versions use only the functionality available to the
programmer without importing any extra modules (otherwise we could
just call "sort" in each language's standard library and be done
with it!). Thus, we use the standard list primitives of each
language (a "list" in Haskell and an "array" in C++). Both
versions must also be polymorphic (which is done "automatically" by
Haskell, and with templates in C++). Both versions must use the same
recursive algorithm.
You should note that this is not intended as a definite comparison
between the two languages. It's intended to show the elegance of
Haskell, the C++ version is only included for comparison (and
would probably be coded differently if you used the Standard
Template Libraries).
template <typename T>void qsort (T *result, T *list, int n){ if (n == 0) return; T *smallerList, *largerList; smallerList = new T[n]; largerList = new T[n]; T pivot = list[0]; int numSmaller=0, numLarger=0; for (int i = 1; i < n; i++) if (list[i] < pivot) smallerList[numSmaller++] = list[i]; else largerList[numLarger++] = list[i]; qsort(smallerList,smallerList,numSmaller); qsort(largerList,largerList,numLarger); int pos = 0; for ( int i = 0; i < numSmaller; i++) result[pos++] = smallerList[i]; result[pos++] = pivot; for ( int i = 0; i < numLarger; i++) result[pos++] = largerList[i]; delete [] smallerList; delete [] largerList; return;};
I will not explain this code further, just note how complex and
difficult it is to understand at a glance.
We will now take a look at a Haskell version of QuickSort, which
might look a something like this.
.
qsort [] = []qsort (x:xs) = qsort less ++ [x] ++ qsort more where less = filter (<x) xs more = filter (>=x) xs
Let's dissect this code in detail, since it uses quite a lot of
Haskell syntax that you might not be familiar with.
The function is called qsort and takes a list as a
parameter. We define a function in Haskell like so: funcname a b
c = expr, where funcname is the name of the
function, a, b, and, c are the parameters and expr
is the expression to be evaluated (most often using the parameters).
Functions are called by simply putting the function name first and
then the parameter(s). Haskell doesn't use parenthesis for function
application. Functions simply bind more tightly than anything else,
so "f 5 * 2", for instance, would apply f to 5
and then multiply by 2, if we wanted the multiplication to
occur before the function application then we would use
parenthesis like so "f (5*2)".
Let's get back to QuickSort.
First we see that we have two definitions of the functions.
This is called pattern matching and we can briefly say that
it will test the argument passed to the function top-to-bottom and
use the first one that matches.
The fist definition matches against [] which in Haskell is
the empty list (a list of 1,2 and 3 is [1,2,3] so it makes
sense that an empty list is just two brackets).
So when we try to sort an empty list, the result will be an empty
list. Sounds reasonable enough, doesn't it?
The second definition pattern matches against a list with at least
one element. It does this by using (x:xs) for its argument.:
is the "cons" operator in Haskell, so 1 : [1,2,3] returns [1,1,2,3],
it simply puts an element in front of the list. So pattern matching
against (x:xs) is a match against the list with the
head x and the tail xs (which may or may not be the
empty list). In other words, (x:xs), is a list of at least
one element.
So since we will need to use the head of the list later, we can
actually extract this very elegantly by using pattern matching. You
can think of it as naming the contents of the list. This can
be done on any data construct, not just a list. It is possible to
pattern match against an arbitrary variable name and then use the head
function on that to retrieve the head of the list.
Now if we have a non empty list, the sorted list is produced by
sorting all elements that are smaller than x and putting that
in front of x, then we sort all elements larger than x
and put those at the end. We do this by using the list concatenation
operator ++. Notice that x is not a list so the ++
operator won't work on it alone, which is why we make it a singleton-list
by putting it inside brackets.
So the function reads "To sort the list, sandwich the head between
the sorted list of all elements smaller than the head, and the
sorted list of all elements larger than the head". Which could
very well be the original algorithm description. This is very common
in Haskell. A function definition usually resembles the informal
description of the function very closely. This is why I say
that Haskell has a smaller semantic gap than other languages.
But wait, we're not done yet! How is the list less and more
computed? Well, remember that we don't care about sequencing in
Haskell, so we've defined them below the function using the where
notation (which allows any definitions to use the parameters of the
function to which they belong). We use the standard prelude function filter,
I won't elaborate too much on this now, but the line less = filter (<x) xs will use filter
(<x) xs to filter the list xs. You can see that we
actually pass the function which will be used to filter the list to filter,
an example of higher order functions.
The function (<x) should be read out "the function 'less
than x'" and will return True if an element passed to it is less
than x (notice how easy it was to construct a function on
the fly, we put the expression "<x", "less than x", in
parenthesis and sent it off to the function - functions really are
just another value!).
All elements that pass the test are output from the filter function
and put inside less. In a same way (>=x) is used to
filter the list for all elements larger than or equal to x.
Now that you've had the syntax explained to you, read the function
definition again. Notice how little time it takes to get an
understanding about what the function does. The function
definitions in Haskell explain what it computes, not how.
If you've already forgotten the syntax outlined above, don't worry!
We'll go through it more thoroughly and at a slower pace in the
tutorials. The important thing to get from this code example is that
Haskell code is elegant and intuitive.
Haskell and Bugs We have several times stated that various features of
Haskell help fight the occurrence of bugs. Let's recap these.
Haskell programs have less bugs because Haskell is:
Pure. There
are no side effects.
Strongly
typed. There can be no dubious use of types. And No Core Dumps!
Concise.
Programs are shorter which make it easier to look at a function
and "take it all in" at once, convincing yourself that it's
correct.
High level.
Haskell programs most often reads out almost exactly like the
algorithm description. Which makes it easier to verify that the
function does what the algorithm states. By coding at a higher
level of abstraction, leaving the details to the compiler, there
is less room for bugs to sneak in.
Memory
managed. There's no worrying about dangling pointers, the Garbage
Collector takes care of all that. The programmer can worry about
implementing the algorithm, not book-keeping of the memory.
Modular. Haskell offers stronger and more
"glue" to compose your program from already developed modules.
Thus Haskell programs can be more modular. Often used modular
functions can thus be proven correct by induction. Combining two
functions that are proven to be correct, will also give the
correct result (assuming the combination is correct).
Furthermore most people agree that you just
think differently when solving problems in a functional
language. You subdivide the problem into smaller and smaller
functions and then you write these small (and
"almost-guaranteed-to-be-correct") functions, which are composed
in various ways to the final result. There just isn't any room for
bugs!
Haskell vs OOP The great benefit of Object Oriented Programming is
rarely that you group your data with the functions that act upon it
together into an object - it's that it allows for great data
encapsulation (separating the interface from implementation)
and polymorphism (letting a whole
set of data types behave
the same way). However:
Data encapsulation
and polymorphism are not exclusive to OOP!
Haskell has tools for abstracting data. We
can't really get into it without first going through the module
system and how abstract data types (ADT) work in Haskell,
something which is well beyond the scope of this essay. I will
therefore settle for a short description
of how ADTs and
polymorphism works in Haskell.
Data encapsulation is done in Haskell by declaring each data type
in a separate module, from this module you only export
the interface. Internally there might be a host of functions that
touch the actual data, but the interface is all that's visible from
outside of the module.
Note that the data type and the functions that act upon the data
type are not grouped into an "object", but they are (typically)
grouped into the same module, so you can choose to only export
certain functions (and not the constructors for the data
type) thus making these functions the only way to manipulate the
data type - "hiding" the implementation from the interface.
Polymorphism is done by using something called type classes.
Now, if you come from a C++ or Java background you might associate
classes with something resembling a template for how to construct an
object, but that's not what they mean in Haskell. A type class in
Haskell is really just what it sounds like. It's a set of rules for
determining whether a type is an instance of that class.
So Haskell separates the class instantiation and the construction
of the data type. You might declare a type "Porsche", to be an
instance of the "Car" type class, say. All functions that can be
applied onto any other member of the Car type class can then be
applied to a Porsche as well.
A class that's included with Haskell is the Show type class,
for which a type can be instantiated by providing a show
function, which converts the data type to a String. Consequently
almost all types in Haskell can be printed onto the screen by
applying show on them to convert them to a String, and then
using the relevant IO action (more on IO in the tutorials).
Note how similar this is to the the object notion in OOP when it
comes to the polymorphism aspect.
The Haskell system is a more intuitive system for handling
polymorphism. You won't have to worry about inheriting in the
correct hierarchical order or to make sure that the inheritance is
even sensible. A class is just a class, and types that are instances
of this class really doesn't have to share some parent-child
inheritance relationship. If your data type fulfills the
requirements of a class, then it can be instantiated in that class.
Simple, isn't it?
Remember the QuickSort example? Remember that
I said it was
polymorphic? The secret behind the polymorphism in qsort is
that it is defined to work on any type in the Ord type class
(for "Ordered"). Ord has a set of functions defined,
among them is "<" and ">=" which are sufficient for our
needs because we only need to know whether an element is smaller
than x or not. So if we were to define the Ord
functions for our Porsche type (it's sufficient to implement, say, <=
and ==, Haskell will figure out the rest from those) in an
instantiation of the Ord type
class, we could then use qsort to
sort lists of Porsches (even though sorting Porsches might not make
sense).
Note that we never say anything about which classes the elements of
the list must be defined for, Haskell will infer this automatically
from just looking at which functions we have used (in the qsort
example, only "<"and">=" are
relevant).
So to summarize: Haskell does include mechanisms for data
encapsulation that match or surpass those of OOP languages. The
only thing Haskell does not provide is a way to group functions
and data together into a single "object" (aside from creating a
data type which includes a function - remember, functions are
data!). This is, however, a very minor problem. To apply a function
to an object you would write "func obj a b c" instead of
something like "obj.func a b c".
Modularity A central concept in computing is modularity. A popular
analogy is this: say you wanted to
construct a wooden chair. If you
construct the parts of it separately, and then glue them together,
the task is solved easily. But if you were to carve the whole thing
out of a solid piece of wood, it would prove to be quite a bit
harder. John Hughes had this to say on the topic in his paper Why
Functional Programming Matters
Languages which aim
to improve productivity must support modular programming well. But
new scope rules and mechanisms for separate compilation are not
enough - modularity means more than modules. Our ability to
decompose a problem into parts depends directly on our ability to
glue solutions together. To assist modular programming, a
language must provide good glue.
Functional programming languages provide two new kinds of glue -
higher-order functions and lazy evaluation.
The Speed of Haskell Let me first state clearly that the following only
applies to the general case in which speed isn't critical. Where you
can accept a few percent longer execution time for the benefit of
reducing development time greatly. There are cases when
speed matters, and then the following section will not apply to the
same extent.
Now, some C++ programmers might claim that the C++ version of
QuickSort above is probably a bit faster than the Haskell
version. And this might even be true. As a general rule C++
should be faster than Haskell, there's no point claiming anything
else. For the vast majority of applications the speed difference
will be negligible, however.
I ask you now: How many programs can you think of that would spend
a considerable amount of time sorting lists?
Not many. Some compression algorithms do it, but that's about it.
Almost all programs in use today have a fairly even spread of
processing time among its functions. The most notable exceptions are
applications like MPEG encoders, and artificial benchmarks, which
spend a large portion of their execution time within a small
portion of the code.
If you really need speed at all costs, consider using C instead of
Haskell.
There's an old rule in computer programming called the "80/20
rule". It states that 80% of the time is spent in 20% of the code.
This is probably true for just about any small application, and
maybe for a few large ones as well.
But as applications grow larger, they usually do so in "80-part";
they generally don't do "the same stuff more often" but
rather just "more stuff". So for large programs the 80-part grows
larger and the 20-part gets smaller.
A better rule would be "30% of the time is spent in 1% of the code,
the remaining 70% of the time is spent on the remaining 99% of the
code" but that doesn't sound quite so snappy does it?
Applications tend to add features, thus spreading out the
processing time over more functions. The consequence of this
is that any given function in your system will likely be of minimal
importance when it comes to optimize for speed. There may be only a
handful of functions important enough to optimize. These important
functions could be written in C. The role of C could, and probably
will, take over the role of assembler programming - you use it for
the really time-critical bits of your system, but not for the whole
system itself.
We should continue to move to higher levels of abstraction, just
like we've done before. We should trade application speed for
increased productivity (measured as "time-to-market"), stability and
maintainability.
Programmer time is almost always more expensive than CPU time.
We aren't writing applications in assembler anymore for the same
reason we shouldn't be writing applications in C anymore.
Finally remember that algorithmic optimization
can give much better results than code optimization. For
theoretical examples when factors such as development times and
stability doesn't matter, then sure – C is faster than Haskell.
But in the real world development times do matter. If you can
develop your Haskell application in one tenth the time it would
take to develop it in C (from personal and others' experience I can
assure you that this is not at all uncommon) you will have lot's of
time to implement these algorithimic optimizations. So in the “real
world” when we don't have infinite amounts of time to program our
applications, Haskell programs can often be muchfaster
than C programs.
Epilogue
So if Haskell is so great, how come it isn't
"mainstream"? Well, one reason is that the operating system is
probably written in C or some other imperative language, so if your
application mainly interacts with the internals of the OS, you may
have an easier time using imperative languages. Another reason for
the lack of Haskell, and other functional languages, in mainstream
use is that programming languages are rarely thought of as tools
(even though they are). To most people their favorite programming
language is much more like religion - it just seems unlikely that
any other language exists that can get the job done better and
faster.
There is a paper by Paul Graham called Beating the Averages
describing his experience using Lisp, another functional
language, for an upstart company. In it he uses an analogy which
he calls "The Blub Paradox".
It goes a little something like this:
If a programmer's favorite language is Blub, which is positioned
somewhere in the middle of the "power spectrum", he can most often
only identify languages that are lower down in the spectrum. He can
look at COBOL and say "How can anyone get anything done in that
language, it doesn't have feature x", x being a feature in Blub.
However, this Blub programmer has a harder time looking the other
way in the spectrum. If he examines languages that are higher up in
the power spectrum, they will just seem "weird" because the Blub
programmer is "thinking in Blub" and can not possibly see the uses
for various features of more powerful languages. It goes without
saying that this inductively leads to the conclusion that to be
able to compare all languages you'll need to position
yourself at the top of the power spectrum. It is my belief that
functional languages, almost by definition, are closer to the top of
the power spectrum than imperative ones.
So languages can actually limit a programmers frame of thought. If
all you've ever programmed is Blub, you may not see the limitations
of Blub - you may only do that by switching to another level which
is more powerful.
One of the reasons the mainstream doesn't use Haskell is because
people feel that "their" language does "everything they need".
And of course it does, because they are thinking in Blub! Languages
aren't just technology, it's a way of thinking. And if you're not
thinking in Haskell, it is very hard to see the use of Haskell -
even if Haskell would allow you to write better applications in a
shorter amount of time!
Hopefully this article has helped you break out of the Blub
paradox. Even though you may not yet "think in Haskell", it is my
hope that you are at least aware of any limitations in your frame of
thought imposed by your current "favorite" language, and that
you now have more motivation to expand it by learning something new.
If you are committed to learn a functional language (to get a
better view of the power spectrum) then it is my belief that Haskell
is your best bet.