Tuesday, April 26, 2011

Paradoxes of Test Driven Development

It is, fortunately, common nowadays to construct code by iterative test-driven design (TDD):
  1. Declare that the code under construction is assumed to have a new behavioral property it did not have up to now.
  2. Write a piece of test code that checks for this behavior.
  3. Run the test to prove that the behavior is still missing ("make it red").
  4. Continue construction so that the behavior emerges.
  5. Run the test again to prove that the behavior is now established ("make it green").
  6. Run all previous tests to prove that the construction did not remove a previously established behavior ("make all green").
Having said this, it is still worthwhile to point out the three paradoxes of iterative TDD:
  • The infinity paradox: A limited number of concrete test cases can never define a behavior sufficiently.
  • The hill-climbing paradox: Local iterations are not guaranteed to reach a globally agreeable (or sufficient) solution.
  • The dimensionality paradox: Separation of concerns along one axis may break concerns (or aspects) that would have been useful to keep together.
Of course, all these paradoxes are easily solved by intelligent engineers (one might call this, in itself, a paradox:
  • The paradox-transgression paradox: That somewhing consistent and useful can emerge from a paradoxical situation.)
Ok - I'm getting too philosophical. Let's look at the three paradoxes above. BTW, I have invented their names on the fly - so if you don't like them, or there are better ones (e.g. in texts I have not read), feel free to substitute your or someone else's names instead!

1. The infinity paradox

This is a well-known feature of TDD. Let's assume you have to implement a function that adds two integral numbers and returns their sum. "Mhm ... sum of numbers ... not an easy topic," you mumble. "Could you give me a very simple example?" "Sure: The sum of zero and zero is zero". "Aha, let's implement this. We do not know any other sum, so we can easily write:"

    public int Sum(int a, int b) {
        return 0;
    }
   
"But I can also tell you that the sum of three and zero is three." "Well, then so be it:"

        if (a == 3) {
            return 3;
        } else {
            return 0;
        }
       
"However, also zero plus three is three." "Fine:"

        if (a == 3 && b == 0) {
            return 3;
        else if (a == 0 && b == 3) {
            return 3;
        } else {
            return 0;
        }
       
"And five plus six is eleven." "Weeell:"

        if (a == 3 && b == 0) {
            return 3;
        else if (a == 5 && b == 6) {
            return 11;
        else if (a == 0 && b == 3) {
            return 3;
        } else {
            return 0;
        }
       
"And four plus seven is also eleven." "Okaaaaayy:"

        if (a == 3 && b == 0) {
            return 3;
        else if (a == 5 && b == 6) {
            return 11;
        else if (a == 0 && b == 3) {
            return 3;
        else if (a == 4 && b == 7) {
            return 11;
        } else {
            return 0;
        }
       
You get the idea.

We see: A finite and manageable number of cases is not sufficient to define fully a rule that should hold for large or even infinite input sets. We need to bring in additional techniques. There are three that I consider to be most important:
  • Number one is what I call "conceptualization." There is no way around "describing the complexity as it is." A sum is a sum is a sum is a sum: It cannot be simplified to something that is fundamentally weaker, like a list of cases. Only conceptualization can yield the implementation
       return a + b;

(Formalists turn this argument on its head: They claim that the formal - axiomatic, behavioral, ... - specification alone is sufficient to define a behavior. But time has disproven this - we also need examples; and preferably well-chosen examples).

  • Number two is integration (especially integration testing): When a concept is used in a broader context or environment, certain aspects of the concept might become more pronounced. So you write an integration test (or manually execute a system test) that actually uses your sum in the intended environment.
However, isn't this paradoxical again (a sub-paradox?)? After all, I have just argued that the "sum concept" cannot be reduced to a "list of cases." But also integration testing is only a "list of cases", albeit in a different, more encompassing domain. So why bother? Simple reason: This usage in a different and relevant domain will extract at least some implicit assumptions not spelt out in the examples or even the conceptual specification.

For example, when using the summing function above in a scenario where numbers count upwards all the time, the topic of overflow suddenly becomes very interesting. If the same function is used to increment and decrement a number representing the size of a stack, overflow is most probably no problem, but we now have to handle negative numbers, even though we'll expect the result never to be negative. When the value represents a temperature in degrees Celsius or Fahrenheit, we finally also have to accept negative numbers as result; etc.

One capability I admire very much is to be able to abstract away from a iven context (and thereby e.g. arrive at the concept of "integral number"), but on the other hand, always keep the "real world", the application in focus and not to "fall over the YAGNI edge."

  • Even though it only creates additional cases (and hence cannot overcome the fundamental infinity paradox), number three of the important techniques is intelligent selection of cases.
Good testing discipline has all the key ideas and phrases: "Boundary cases", "error cases", "coverage". However, testing argues for such techniques on a basis that does not necessarily interest us in TDD: Namely on "typical programmer errors". In classical testing, boundary cases are selected primarily because it is assumed that programmers tend to make "one off errors".
In TDD, there are no programmer errors - "not yet," at least. The focus on boundary cases here is, predominantly, one of clear behavioral definition. Having cases that show that 0 + x is the same as x, and that this is also true for x + 0 underline the fact that 0 is a neutral element of the summing function - regardless of the probability of a programmer to get that wrong.

Of course, there are other techniques to get the behavior right - reviews, random testing, formal and informal proofs etc. They will help, but are even more work!

2. The hill-climbing paradox

"Hill-climbing" is a mathematical optimization method for multi-dimensional functions. Essentially, it says: If you want to reach the top of the hill (the "optimum"), always go in the direction of the steepest incline. Sounds obvious, doesn't it? The problem with hill-climbing is that you might reach a "local optimum": A small hill far away from the mountain you actually wanted to conquer. Once you are atop the small hill, naive hill-climbing prevents you from going anywhere else: There simply is no incline any more in any direction.

Naive iterative development can lead you astray in similiar ways. Here is a very artificial example: You have to implement a function F(int a, int b) whose result is defined by the following table (which is finite! - so there's no infinity paradox involved):

    0 1 2 3
--+--------
0 | 0 1 2 3
1 | 3 0 1 2
2 | 2 3 0 1
3 | 1 2 3 0

You might start with the first line, i.e., have cases for (0,0), (0,1), (0,2), and (0,3). Including some conceptualization (abstraction; pattern recognition), you end up with something like

    if (a == 0) {
        return b;
    }
   
Including the second line, you arrive at

    if (a == 0) {
        return b;
    } else if (a == 1) {
        return b == 0 ? 3 : b - 1;
    }

etc. If, however, you start with different "dimensions", you might see that the difference b - a alone determines the result (by noting the falling "diagonal stripes" in the matrix), and your cases might be

    b - a = 0 ... this always yields zero.
    b - a > 0 ... this always yields b - a.
    b - a < 0 ... this yields b - a + 4.

And if you take a "very holistic" approach, you might actually see that the function is just (b - a + 4) % 4.

The morale of that story? Depending on the method you use to plan your next step (up the hill), you might end at very different solutions. And the simplest method of deciding on the direction - in the matrix example, the "line by line" method - might well lead to a sub-optimal solution (a local optimum).

This little example does of course not prove that much. Still, doing the same with a large state table (as used in some types of parsers) might actually give you a quite intractable piece of code rather than a manageable one; or vice versa.

Test cases are good. Believing that common sense test case selection will lead you to a good algorithm is not.

Continued at Paradoxes of test driven development, cont'd.

No comments:

Post a Comment