Saturday, April 30, 2011

Paradoxes of Test Driven Development cont'd.

3. The dimensionality paradox

After a few days, I'm no longer that happy with this name ... But let's look at a high-level example, and maybe find a better name later.

Assume that you have to write a regular expression recognizer; it only needs to output "match" or "no match" for a regular expression and one or possibly many input strings. You start with simple examples, like a or ab, then a?, ab?, continue maybe with [ab]* and (ab)* etc. The first few examples can be handled with a simple parser and a simple interpreter, but more and more complex regexes will make these machines more and more complex - and less and less performant. At some point, the iteratively developed design will no longer hold up to the combined requirements of efficiency and maintainability ...

If you take a look at modern regex recognizers, they contain a key design component: The regexp is translated into a "program" for a special "processor", which then does the input traversal. The design of this processor is influenced (a) by the problems of regexp recognition (e.g., the necessities of fast loop traversal; and backtracking in the case of non-greedy operators and back-references), but also (b) by the runtime processor of the regexp library - virtual machines in the case of Java and .Net, but maybe also real silicon processors.

This powerful, but also complicated in-between concept (between the regexp language with its semantics; and the recognition result "match" / "no match") is something you have to invent. It cannot be derived from the "mere" input-output behavior of the problem at hand. Rather, you need to step back quite far from the problem to see that this design is a possibility that will solve the problem better than more straightforward designs.

Now, what I said is, in a sense, unfair: Test driven design does not prevent that you get more and more understanding during the design and implementation process; and one possibility is that at one point, you rip out much of the previous design and replace it with a new one that now handles all the design aspects better. This is a valid test-driven proceeding, but it is certainly no longer that "incremental".

A similar, but certainly much more "theoretical" example for such an important "intermediate invention" is the heap sort algorithm: Where the data a first put into that strange, not at all sorted heap structure - and then can be pulled out sorted in an efficient way.

------

In a sense, all three paradoxes are special cases of the problem of induction: On which basis can we generalize from a few special cases? However, in another sense, I cheated: These are not paradoxes at all! Why? - because engineering is a discipline where we decide how we want something to be designed. Thus, we are not in the situation that we must detect the generalized rules (as e.g. natural sciences have to do), but we design the generalized rules from a generalized unterstanding. Yet, for all the good that TDD has done, I think we should not forget this:

Also under TDD, it is admissible - and, even more, desirable - that we create designs that transcend simple case-by-case solutions to problems that we present to ourselves (or that are presented to us) as a "mere set of cases."

No comments:

Post a Comment