[Haskell-cafe] Re: Pattern matching, and bugs

Fri Dec 18 11:19:40 EST 2009

On Fri, 2009-12-18 at 13:04 +0100, András Mocsáry wrote:
> Hello,
> I was advised respectfully to post my query here.
> Please, read the whole letter before you do anything, because I tried
> to construct the problem step by step.
> Also keep in mind, that the problem I query here is more general, and
> similar cases occur elsewhere, not just in this particular example I
> present below.
> 
> Intro story ( Skip if you are in a hurry )
> I'm participating in several open-source server development projects,
> most of them are written in C, and some of them have C++ code hidden
> here and there.
> These programs, are developed by many, and by many ways, and so, more
> often than not it is very hard to determine the 'real cause of bugs'.
> 
> This almost always leads to 'bugfixes' which 'treat the crash'.
> Sometimes they are a few lines of extra checks, like IFs.
> 
> Sometimes they are complex, and even surprisingly clever hacks.
> Thus understanding the 'code' of them is challenging, but the end
> result is a pile of ... hacks fixing bugs fixing hacks fixing bug,
> which also were put there to fix yet another bugs.
> +
> When I started to learn functional programming, I was told, that
> the correctness of a functional program can be proved a lot more
> easily, in fact in a straight mathematical way.
> +
> My concern
> 
> is about predictable failure of sw written in Haskell.
> To illustrate it let's see a Haskell pattern matching example:
> 
> Let's say I have defined some states my object could be in, and I did
> in a switch in some C-like language:
>         switch ( x )
>         {
>         Case 0:
>           "Unchecked"
>         Case 1: 
>           "Checked"
>         Case 2:
>           "Unknown"
>         }
> 
> And in Haskell pattern matching:
> 
>         switch 1 =  "Unchecked"
> 
>         switch 2 =  "Checked"
>         switch 3 =  "Unknown"
> 
> Let's say, these are clearly defined states of some objects.
> Then let's say something unexpected happens: x gets something else
> than 0 1 2.
> Now we have a problem, which is most generally fixed in these ways:
> C-like:
>         switch ( x )
>         {
>         Case 0:
>           "Unchecked"
>         Case 1: 
>           "Checked"
>         Case 2:
>           "Unknown"
>         Default:
>           "Nothing"
>         }
> 
> Haskell like:
> 
> 
>         switch 1 =  "Unchecked"
>         switch 2 =  "Checked"
>         switch 3 =  "Unknown"
>         switch x =  "Nothing"
> These general ways really avoid this particular crash, but does
> something real bad to the code in my opinion.
> 
> 
> Below are some cases x can go wrong:
> 1. The bad data we got as 'x', could have came from an another part of
> our very program, which is the REAL CAUSE of the crash, but we
> successfully hide it.

Yes. If program have bug let it be a bug that crash (then you know there
is a bug). However in Haskell you have _|_ (read bottom) which indicate
that there is an error or exception (returning _|_ is related to
throwing excaptions in other languaged).

> Which makes it harder to fix later, and thus eventually means the
> death of the software product. Eventually someone has to rewrite it.
> Which is economically bad for the company, since rewriting implies
> increased costs.
> 

The first problem is that here int is used as enumeration like if it was
assembler. In C the enum is also strongly connected to it's binary
representation. But well - it is HL assembler and in some uses it is
justified.

data State = Unchecked | Checked | Unknown

switch Unchecked = "Unchecked"
switch Checked   = "Checked"
switch Unknown   = "Unknown"

Here compiler knows all possible inputs (Unchecked/Checked/Unknown) and
hence it can see nothing else is possible. If there is case not covered
it will produce warning.

Other module cannot run (switch 4) because 4 is not State. If there is
problem with user input you will know it in code which parse user input
(i.e. one which will convert 3 into Unknwn).

> 2. The bad data we got as 'x', could also could have come form a real
> word object, we have underestimated, or which changed in the meantime.
> 
> 3. This 'x' could have been corrupted over a network, or by 'mingling'
> or by other faulty software or something.
> 
> 
> Point 1:
> If we allow ourself such general bugfixes, we eventually kill the
> ability of the project to 'evolve'.
> 
> Point 2:
> 
> Programmers eventually take up such 'arguably bad' habits, thus making
> harder to find such bugs.
> 
> Thus it would be wiser to tell my people to never write Default cases,
> and such general pattern matching cases.
> 
> 
> Which leads to the very reason I wrote to you:
> 
> 
> I want to propose this for Haskell prime:
> 
> I would like to have a way for Haskell, not to crash, when my coders
> write pattern matching without the above mentioned general case.
> Like having the compiler auto-include those general cases for us,
> but when those cases got hit, then instead of crashing,
> it should report some error on stdout or stderr.
> 
> (It would be even nicer if it cold have been traced.)
> 
> 
> This is very much like warning suppression, just that it's crash
> suppression, with the need of a report message of course.
> 
> 
> I would like to hear your opinion on this.
> 
> I also think, that there are many similar cases in haskell, where not
> crashing, just error reporting would be way more beneficial.
> In my case for server software, where network corrupted data,
> ( and data which has been 'tampered with' by some 'good guy' who think
> he's robin hood if he can 'hack' the server )
> is an every day reality.
> 
> 
> Thanks for your time reading my 'storm'.
> 
> 
> Greets,
> 
> Andrew
> 

AFAIU you you ask about having:

someFunction :: TypeA -> TypeB
someFunction A = ...

transformed into:
someFunction :: TypeA -> TypeB
someFunction A = ...
someFunction B = someValue

but someValue would have to be of type forall a. a. It should work even
for:
{-# LANGUAGE EmptyDataDecls #-}
data Void

well - as far as I know there is only _|_ which fullfill it. Nothing
else belongs to for example Void. 

Other methods of dealing is to use Maybe:

switch 1 = Just "Checked"
switch 2 = Just "Unchecked" 
switch 3 = Just "Unknown"
switch _ = Nothing

Then operating with Maybe monad you can test it using maybe function
(returning error):

main = do i <- getIntFromSomewhere
          let s = switch i
          maybe (hPutStrLn stderr "Incorrect value") putStrLn s

Other method, if more complex cases are needed, is using Either or
ErrorT monad.

In any case you *need* to tell what to do. Should error be printed into
stderr? Or maybe send HTML which says input was incorrect? Inform NSA?
Give full access?

Personally I'd prefere to have:

processPacket = readPacketFromInput >>= \i ->
                  case parsePacket i of
                    Just typeSafeData -> doSomething typeSafeData
                    Nothing           -> returnErrorToUser

Where typeSafeData have type guarantee to be corrected (please remember
- in haskell type can guarantee for example that tree is 2-3 B-tree
[have equal hight etc.]).

Regards