Sunday, May 1, 2011

How do we select test cases in TDD - and, how do we integrate "insights" into TDD?

I don't know. I just do it. Typically, in my environment, this is a "low conflict" activity: When I pair with someone, we agree quite quickly on the first few test cases. Soon, case selection gets muddled up with other considerations like algorithm problems, API problems etc.etc., so cases tend to be a discussion base for such aspects. And then, rather sooner than later, time runs out, and integration of the new behavior with existing features becomes the most important activity. The TDD is then more or less over. Later, bugs might force us to add additional test cases - but at this point, the selection of test cases is quite focused: First, the bugs themselves provide test cases; second, we also do classical (e.g. risk-based) test case selection to guard against errors.

But how do we select cases in that "pure construction phase" of test driven development? I googled around only a little bit, and so I did not find anything that appealed to me ... The most straightforward message was at http://stackoverflow.com/questions/3922258/when-applying-tdd-what-heuristics-do-you-use-to-select-which-test-to-write-next, where two people answered:
  • The first one said essentially "I start by anchoring the most valuable behaviour of the code" and then "After that, I start adding the edge cases". The other says. "To begin with, I'll pick a simple typical case and write a test for it" and then "While I'm writing the implementation I'll pay attention to corner cases that my code handles incorrectly. Then I'll write tests for those cases."
Two scientific (or "scientific"?) links didn't make much sense:
  • "Metrics for Test Case Design in Test Driven Development" at http://www.ijcte.org/papers/269-G823.pdf is very bad English (p.853: "As we seen the V model says That someone should first test each unit") and, in my opinion, does not fulfil promises made in the paper nor in the title. So it goes in academia - "publish or perish."
  • "Regression Test Selection Techniques for Test-Driven Development" at http://www.math.tau.ac.il/~amiramy/testRank.pdf has the following goal: "Our goal is to automatically find, for a given small and localized code change, a small subset of tests, with a bounded cost of execution, that provides a high bug detection rate, close enough to that of full retesting" and "More formally, given a program P, a test suite T and a code change c, we define fail(T,P,c) as the subset of tests in T that fail on program P after change c." - also not what we do in TDD: There is no predefined test suite T.

Here is my attempt to somehow circle that topic of case selection in TDD. It will be a sort of rambling along ... maybe someone is interested in it.

As I see it, it is important to consider the "TDD cycle" (create red test, write code, show that test is green; interspersed with full test suite runs and refactorings) in the context of two activites:

(1) Creative problem solving.
(2) Classical "after-the-fact testing" - writing unit tests after coding is completed.

Why do I want to separate these from the "TDD proper"? Well, exactly because it seems to me that a few TDD aficionados assume implicitly or even require explicitly that these activities have to be part of the TDD cycle. A somewhat exaggerated formulation of that standpoint would be:

Regarding (1): You must not have a complete solution that you somehow arrived at in an "creative way:" The solution must evolve gradually through the addition of more and more cases that define and constrain the behavior.

Regarding (2): There is no need to write additional test cases after delivering a completed solution that has been arrived at by TDD. The test cases done in TDD are sufficient.

I hope that everyone agrees that both statements would be nonsense. No process should restrict the possibilities how we are creative; and how we ensure quality!

The picture that emerges for me can be roughly described as follows:
  • In a "waterfall world", two activities are ordered sequentially: First, there is construction; then, there is validation.
  • In a "TDD world", these two activities are dissolved into an iterative repetition of the TDD cycle. A "standard cycle" contains of both construction and validation.
But there are more possibilities in TDD:
  • Sometimes, we might include "pure validation cycles": A cycle that does not start with a new, initially red test case, but with an additional test case that checks the behavior implemented previously. Such a cycle often starts with someone saying: "Now, also that example we talked about earlier should work! Let's try it ...".
  • Symmetrically, it seems, there would also have to be "pure construction cycles." Yes, this seems strange: We never want to be in a state where there is some code that is not validated by supporting tests. But I do not want to outlaw "sparks of intuition:" In a TDD cycle, a participant should also be able to say "Ah - now I know how we can solve the problem - like this ...". What do we do with such a insight? I say that we do classical white-box testing: "Here is a proposed solution - now let's check whether it works!" And if it does, we are done, and are happy so! (There are arguments from inductive reasoning that such a case-transgressing insight is necessary to find a solution to generic problems; I tried to capture this in texts on the "paradoxes of TDD").
I believe (and have seen) that at such an "insight" point, there will be
  • some participants that are "cautious": "Are you sure this is a valid suggestion? Shouldn't we proceed in small steps?"; and 
  • some are "visionary": "Are we sure that a step-by-step process will lead to a good solution? Shouldn't we state some bold principles of the solution design?" - with both groups having good arguments on their sides, and therefore being in need of some sort of conflict resolution.
In general, it seems to me that a positive conflict resolution will happen along the following lines:
  • It is the responsibility of the "visionaries" to help to break down the vision into intelligible chunks (or "steps") so that the solution can be better understood and verified.
  • It is the responsibility of the "cautionarists" to help to arrive at a solution as quickly as possible. Therefore, they see to it that the steps taken go into the direction of that proposed solution, if that solution appears sensible.
(Of course, this assumes that no-one has some hidden agenda: I have seen a few cases of pairing TDD where one partner wanted to force a solution; and did so by selecting special test cases that would steer the code to be written into a certain direction - e.g. a table-driven solution. The "hill-climbing process" of TDD unfortunately plays into the hands of such undercover tactics, as one can hide decisions for some time by selecting trivial test-cases; but at some point, much code has already be written that then turns out to favor one design over another. Solving software problems should not be chess playing - one winner, one loser. Therefore, I whole-heartedly agree with the statements I cited at the beginning: "I start by anchoring the most valuable behaviour of the code" and "I'll pick a simple typical case and write a test for it").

(A similar "hill-climbing question" is behind refactoring, and is asked very often: Is it possible to get from one design [that is assumed to be inferior] to another design [a desirable and desired target design] by low-level refactoring steps? I do not know the answer, and have not seen any convincing one. On the whole, I remain skeptical e.g. for cases where you want to replace a compiler's single pass design with a triple-pass design with incrementally loaded parse trees; or where you want to replace an simple-induction-based algorithm with a divide-and-conquer version; etc. Therefore, I find it acceptable, and sometimes even desirable, to replace a design with a different one "all in one", "in a single big rewrite.")

(Back to TDD) Here is an example where one can try out one's preferences:

We are asked to write a function that
  • gets as input an array of different integers;
  • returns true if all products of two different numbers from the array are even.
So, e.g., if the array contains 1, 2, 3, we get false - because the product 1*3 is 3 which is not even. However, if the array contains 2, 3, 4, we get true: All pair products 2*3, 2*4, and 3*4 are even.

One approach could be:
  • We start with a "small, typical case" - e.g. { 1, 2 } which we expect to return true. Our algorithm, at this point, checks all the pairs (there is only one in this case, namely <1,2>) and returns the appropriate result.
  • The next case could be { 1, 3 }, to return false. We update the code (do we already need loops for 2-element arrays? - not yet really ...) and run the test.
  • We feel that we need those nested loops at some point, so we continue with a "long test case", e.g. { 1, 2, 3, 4, 5 }.
  • On the "opposite end", we could now continue with an single-element array { 3 } (what should the result be?), and ...
  • ... an empty array (what should the result be now??). 
  • We (might) feel that we are done now with development - but "just to be sure" (pure validation cycle!), we add another example with 5 entries that also include negative numbers and zero, e.g. { -2, -1, 0, 1, 2 } - which returns false, as -1*1 is odd. 
  • To balance this, we finish with another pure validation cycle, where we expect { -4, -2, 0, 2, 4 } to return true - and now we are really done.
Give these test cases to a hard-boiled tester, and he'll at least come up with a few more boundary cases: What about numbers that are larger than the square root of the maximal integer (so that the products will overflow)? What about very long arrays that yield a long runtime? What about cases with equal numbers in the array?
Did we do something wrong in TDD when we overlooked these cases? My point of view: I find it ok that such cases are not covered in the TDD cycle, i.e. while creating the solution. "Testing down" a solution requires a "different hat" - the hat of a tester; and it can be done afterwards.

Now let us assume that someone looks at the problem for a second and then says: "This is easy: You just need to check whether there is more than one odd number in the array." Voila - the complete solution in one line of code (with C#'s IEnumerable<>).

What do you do now? My point of view: Great - we do suddenly have a good solution candidate. So, we now immediately put on the tester's hat, design a set of test cases that cover a selected input partitioning, and validate the solution. As programmers, we expect to be lucky in that we solved the problem; but as testers, we must think about how to break the implementation - so we will do all the typical things: Boundary cases for  input and output (black box testing), as well as boundary cases for internal behavior (white box testing). When all these tests are green, we are altogether happy.

There's no real need to iterate, in a TDD sense, over the test cases: We can write all of them at once and then check the algorithm. Whether this is a good idea apparently hinges on the subtle idea of whether we think that the algorithm is correct anyway - then, any intermediate test runs are wasted - or we distrust that "solution from heaven" - then, intermediate test runs are fine, and we return to a sort of "degenerated TDD" (where no test ever runs red if the solution actually is one).
Is this ok? My point of view: Of course. A solution that survives more tests might not be as much fun as "solving by bug fixing" - but it is simply the fastest way to go.

There's not much more that I can say. I somehow circled this triplet of "incremental TDD", "visionary solutions" (that shortcut those increments), and "pure validation tests" (that are never red), and know now what I consider a sensible approach: Namely to work with an "overall incremental (TDD) process", but at any time allow "creative problem solving" and "classical (covering) test case design".

So be it.

1 comment:

  1. You are talking about test case creation. Regression test *selection* is relevant when you have accumulated lots of test cases, and the test suite becomes too slow to run after each change to the code.
    Hagai Cibulski (from the testRank.pdf)

    ReplyDelete