Monday, May 2, 2011

The PickAkin Kata - testing

As I decided to "just invent a solution out of the blue" in the previous posting, we are now in classical software engineering: I'm of course obliged to test that solution (or rather, solution candidate). So let me do a quite formal example test case design - it's a good exercise for me, anyway.

The standard functional test concept is "equivalence class partitioning": It should help us to arrive at a finite set of test cases that somehow covers "sufficiently" the infinitely many possible inputs and outputs of our code. Interestingly, equivalence class partitioning is not at all obvious for our small problem! To tackle this nice testing problem, let's first clearly separate the "spaces" to be covered:
  • Black box testing requires us to find a partitioning for the input space.
  • Black box testing also requires us to find a partitioning for the output space.
  • Finally, white box testing requires us to find a partitioning for any internal space created.
The input space consists of two sets. However, the problem groups items in each set according to their "DLQ" (the first letter of the strings, in our integration test) - so it is more appropriate to model each set as a bag of prefixes, i.e., a mapping from prefixes to non-negative integers. A set, and hence also a mapping, is typically covered by the 0-1-3 heuristics, i.e., we take the empty set, a singleton set, and a set with three elements for testing. Applied to our mappings, the patterns for one input set are therefore

    { }
    { . -> . }
    { . -> ., . -> ., . -> . }

On the left side of the mappings, we can take arbitrary prefixes. On the right side, we have to cover non-negative integers - again, we use the 0-1-3 heuristics. However, a mapping to 0 just means that the element is not present in the mapping - so we drop it. What do we get from our three "mapping templates"?
  • The empty set just remains as it was:
    { } - case "0"
  •     The single-element mapping can be expanded to two cases:

    { A -> 1 }  - case "1"
    { B -> 3 }  - case "3"
  • For the three-element mapping, we could create 2^3 = 8 versions. We arbitrarily restrict ourselves to "pairwise coverage", i.e., each combination of the two cases "1" and "3" must occur once. This requires just two mappings, e.g.

    { D -> 1, E -> 1, F -> 3 }  - case "113"
    { G -> 3, H -> 3, I -> 1 }  - case "331"

(Each "pair" created from the set { 1, 3 } occurs: {1,1} for D, E; {1,3} e.g. for E, F or for G, I; and {3,3} for G, H).
Trivially multiplying these five cases for the two input sets would give us 25 cases for input coverage. However, we want to restrict this for two reasons:
  • First, 25 cases is too much for the budget we have (says me).
  • Second, this trivial multiplication does not consider the output coverage.
Therefore, let's now consider the output coverage! The result of Pick.Akin are three disjoint bags. Doing the same as on the input side, we end up with five cases for each mapping (or bag), which altogether results in 5*5*5 = 125 different cases - even more than the 25 on the input side! Again, we can try to mechanically apply "pairwise coverage" - but let's please make sure that a few "common sense" boundary cases are among the tests. I get lazy here and use a tool - PICT. The input model for PICT can be written as

    ceoListAfter:   e0, e1, e3, e113, e331
    akinList:       a0, a1, a2, a112, a221
    ctrlListAfter:  t0, t1, t3, t113, t331

We see the 5 possibilities for each bag, marked with a letter to distinguish the three bags (I varied the size "3" for the akinList to "2", just so). As "common sense" cases, we take set sizes of [0, 0, 0], [0, 1, 0], [1, 0, 1], and [3, 2, 3], which gives us a PICT seed file of

    ceoListAfter    akinList    ctrlListAfter
    e0    a0    t0
    e0    a1    t0
    e1    a0    t1
    e3    a2    t3

The PICT run yields a whopping 28 test cases:

    e0      a0      t0
    e0      a1      t0
    e1      a0      t1
    e3      a2      t3
    e331    a221    t113
    e0      a2      t331
    e0      a112    t113
    e113    a221    t0
    e331    a112    t331
    e113    a1      t3
    e0      a221    t3
    e1      a112    t3
    e1      a2      t0
    e113    a0      t331
    e1      a221    t331
    e3      a0      t113
    e1      a1      t113
    e3      a1      t331
    e3      a112    t1
    e0      a2      t1
    e3      a112    t0
    e3      a221    t1
    e113    a2      t113
    e331    a2      t3
    e331    a1      t1
    e113    a112    t1
    e331    a0      t0
    e0      a0      t3

What can we do? There are only two ways to go:
(a) We accept this as our work load (the 28 cases would at least harmonize quite well with the 25 input cases we wanted to cover!).
(b) We try to argue that there are no interferences between the results in the three output parameters ... but this is certainly not true: They are tightly coupled (e.g. by the requirement that the prefixes in them are disjoint!).
Mhm.

-----------------

The only idea I have to reduce our test work load is to replace the 0-1-3 heuristics for sets with a 0-3 heuristic: Empty set - not-empty set. So let's start from scratch:

The input space per set consists of

    { }
    { . -> ., . -> ., . -> . }

For these two "mapping templates", we get the following cases:
  • For the empty case:

    { } - only case "0"   
  • For the three-element mapping, using "pairwise coverage" again, we get:

    { D -> 1, E -> 1, F -> 3 }  - case "113"
    { G -> 3, H -> 3, I -> 1 }  - case "331"
For the input side, fully multiplying these cases would give us 9 test cases.

For the output side, we would get 3*3*3 = 27 cases for full multiplication. Pairwise coverage with PICT can be done with the following model:

    ceoListAfter:   e0, e113, e331
    akinList:       a0, a112, a221
    ctrlListAfter:  t0, t113, t331

We keep the four common sense testcases from above - however, three of them will not be created by these mappings (e0/a1/t0, e1/a0/t1, e3/a2/t3). With the following seed file,

    ceoListAfter    akinList    ctrlListAfter
    e0    a0    t0

PICT gives us the following ten test cases for the 3 output lists ceoListAfter, akinList, and ctrlListAfter:

    e0      a0      t0
    e113    a221    t0
    e331    a0      t113
    e113    a112    t331
    e0      a112    t113
    e331    a221    t113
    e331    a0      t331
    e331    a112    t0
    e0      a221    t331
    e113    a0      t113

Some of these cases are quite superfluous: E.g., we don't need both e113/a112/t331 and e331/a221/t113. Moreover, we can rely on the symmetric code and therefore remove cases that are symmetrical for ceoListAfter and ctrlListAfter. That would leave us with the following cases:

    e0      a0      t0
    e113    a221    t0
    e331    a0      t113
    e113    a112    t331

plus the three common sense test cases

    e0    a1    t0
    e1    a0    t1
    e3    a2    t3

and the acceptance test. The whole undertaking gets a little boring now, so I declare that I'm happy with these combinations, and we construct the test data:

    e0      a0      t0:
This can only result from two empty input sets. Thus, input combination [{ }, { }] is covered.     
    e113    a221    t0:
We need three prefixes that end up in the ceoListAfter, and 3 more that end up in akinList. Here are possible mappings:
            ceoList  { A->1, B->1, C->3, D->1, E->1, F->1 }
            ctrlList                   { D->1, E->1, F->... }
Mhm: How can we have a single element for prefix F in the akinList? After all, the akinList must contain one instance from the ceoList and one from the ctrlList! So maybe there is a restriction that in the akinList, the number of elements per prefix is always even? ... or not: Assume that both lists contain the string "F1" - is our akinList assumed to contain "F1" twice, or only once? I take a shortcut: The code will return only one (as the call to Distinct() does not distinguish "left" and "right" elements), and so I say this is right. As an exercise, we could try to implement a version that will behave exactly the other way round - this seems not that easy!
After this (typical) "specification detour," we can create matching input sets from the mappings:
            ceoList  { A1, B1, C1, C2, C3, D1, E1, F1 },
            ctrlList                     { D2, E2, F1 }
This covers input combination [not empty, not empty].      
    e331    a0      t113:
After that initial construction, it now gets easy:
    ceoList  { A1, A2, A3, B1, B2, B3, C1 }
    ctrlList                              { D1, E1, F1, F2, F3 }
Again, we have covered  input combination [not empty, not empty].
        e113    a112    t331:

          ceoList  { A1, B1, C1, C2, C3, D1, E1, F1 }
          ctrlList                     { D1, E1, F2, G1, G2, G3, H1, H2, H3, I1 }
From here on, we should modify the list order. E.g., in this test case we could reverse the lists. But we notice that we are not going to cover any more input combinations that way! So we add two more cases for input combinations [empty, not empty] and [not empty, empty]:      
    e0      a0      t22:
            ceoList  { }
            ctrlList   { D1, D2, E1, E2 }
and  
    e1      a0      t0:
            ceoList  { A1 }
            ctrlList      {  }
Finally, here are the common sense test cases:  
    e0    a1    t0
            ceoList  { A1 }
            ctrlList { A1 }
    e1    a0    t1
            ceoList  { A1 }
            ctrlList      { B1 }
    e3    a2    t3
            ceoList  { A1, A2, A3, B1, B2 }
            ctrlList             { B1, B2, C1, C2, C3 }
        or
            ceoList  { A1, A2, A3, B1 }
            ctrlList             { B2, C1, C2, C3 }

And this is it. A little boring, was that testing, altogether.

The PickAkin Kata - a solution

The PickAkin kata has a simple specification which I copy straight from Ilker's text:

Write a program that takes two lists (ceoList and controlList) containing an arbitrary set of product codes in the above described format. The program produces three lists. The first list is the akinList, that is, the list of all akin products taken from the input lists. The remaining two lists are the reduced ceoList and controlList, respectively.

Algorithmically inclined peopla (like me) "just" look for a solution along these lines. So I'll give you just give a possible solution in a second.

But what about test-driven development? About incremental test-driven development? It took me a whole Sunday to find out why I do not have a bad conscience when I find a solution to a problem in a non-incremental way - I tried to explain this in my previous blog posting (which I might overhaul at times - it is somewhat chaotic ...). The quintessence is:
Of course it is ok to find a solution non-incrementally. You are then only obliged to test that solution sufficiently.
This posting will only contain the solution. Testing it will be the topic of the next one.

Finding a solution for the PickAkin kata is simple: Just take, from both lists, all product pairs and filter out the "akin" ones (similar, in my API) - this is the akinList. This is "almost easy":

    akinList = from a in listA
               from b in listB      // we now have all pairs (a,b)
               where similar(a, b)  // we take only the similar ones
               select new[] {a, b}; // assemble them into one structure 

We could happily decree that the problem is now solved. But of course, the data structure we get back from this Linq expression is not exactly what we need: We now have an IEnumerable<IEnumerable<T>> instead of the flattened list we need; and moreover, there will be duplicates in that structure. But these are "mere technical problems" - other collection concepts that are based on sets (where there are never duplicates) and do flattening implicitly would not need any additional code. In C#, flattening for IEnumerables is done with SelectMany(); and duplicates are removed with Distinct(). So here is the complete, C#-y solution for computing akinList:

    akinList = (from a in listA
                from b in listB
                where similar(a, b)
                select new[] { a, b }).SelectMany(li => li).Distinct();
 
Of course, we still have to reduce the two input lists - but this is simple:

    listAReduced = listA.Except(akinList);
    listBReduced = listB.Except(akinList);
 
Done ... Here is the complete code:

    public static class Pick {
        /// <summary>
        /// The pick-akin function of Ilker's Pick Akin Kata.
        /// </summary>
        /// <typeparam name="T"></typeparam>
        /// <param name="listA">Input set A - order not important</param>
        /// <param name="listB">Input set B - order not important</param>
        /// <param name="similar">The function to designate similar elements</param>
        /// <param name="listAReduced">Output set A - order not important</param>
        /// <param name="listBReduced">Output set B - order not important</param>
        /// <param name="akinList">Akin list - order not important</param>
        public static void Akin<T>(
            IEnumerable<T> listA,
            IEnumerable<T> listB,
            Func<T, T, bool> similar,
            out IEnumerable<T> listAReduced,
            out IEnumerable<T> listBReduced,
            out IEnumerable<T> akinList) {
            akinList = (from a in listA
                        from b in listB
                        where similar(a, b)
                        select new[] { a, b }).SelectMany(li => li).Distinct();
            listAReduced = listA.Except(akinList);
            listBReduced = listB.Except(akinList);
        }
    }
 

Of course, this code now has to be tested!

Sunday, May 1, 2011

How do we select test cases in TDD - and, how do we integrate "insights" into TDD?

I don't know. I just do it. Typically, in my environment, this is a "low conflict" activity: When I pair with someone, we agree quite quickly on the first few test cases. Soon, case selection gets muddled up with other considerations like algorithm problems, API problems etc.etc., so cases tend to be a discussion base for such aspects. And then, rather sooner than later, time runs out, and integration of the new behavior with existing features becomes the most important activity. The TDD is then more or less over. Later, bugs might force us to add additional test cases - but at this point, the selection of test cases is quite focused: First, the bugs themselves provide test cases; second, we also do classical (e.g. risk-based) test case selection to guard against errors.

But how do we select cases in that "pure construction phase" of test driven development? I googled around only a little bit, and so I did not find anything that appealed to me ... The most straightforward message was at http://stackoverflow.com/questions/3922258/when-applying-tdd-what-heuristics-do-you-use-to-select-which-test-to-write-next, where two people answered:
  • The first one said essentially "I start by anchoring the most valuable behaviour of the code" and then "After that, I start adding the edge cases". The other says. "To begin with, I'll pick a simple typical case and write a test for it" and then "While I'm writing the implementation I'll pay attention to corner cases that my code handles incorrectly. Then I'll write tests for those cases."
Two scientific (or "scientific"?) links didn't make much sense:
  • "Metrics for Test Case Design in Test Driven Development" at http://www.ijcte.org/papers/269-G823.pdf is very bad English (p.853: "As we seen the V model says That someone should first test each unit") and, in my opinion, does not fulfil promises made in the paper nor in the title. So it goes in academia - "publish or perish."
  • "Regression Test Selection Techniques for Test-Driven Development" at http://www.math.tau.ac.il/~amiramy/testRank.pdf has the following goal: "Our goal is to automatically find, for a given small and localized code change, a small subset of tests, with a bounded cost of execution, that provides a high bug detection rate, close enough to that of full retesting" and "More formally, given a program P, a test suite T and a code change c, we define fail(T,P,c) as the subset of tests in T that fail on program P after change c." - also not what we do in TDD: There is no predefined test suite T.

Here is my attempt to somehow circle that topic of case selection in TDD. It will be a sort of rambling along ... maybe someone is interested in it.

As I see it, it is important to consider the "TDD cycle" (create red test, write code, show that test is green; interspersed with full test suite runs and refactorings) in the context of two activites:

(1) Creative problem solving.
(2) Classical "after-the-fact testing" - writing unit tests after coding is completed.

Why do I want to separate these from the "TDD proper"? Well, exactly because it seems to me that a few TDD aficionados assume implicitly or even require explicitly that these activities have to be part of the TDD cycle. A somewhat exaggerated formulation of that standpoint would be:

Regarding (1): You must not have a complete solution that you somehow arrived at in an "creative way:" The solution must evolve gradually through the addition of more and more cases that define and constrain the behavior.

Regarding (2): There is no need to write additional test cases after delivering a completed solution that has been arrived at by TDD. The test cases done in TDD are sufficient.

I hope that everyone agrees that both statements would be nonsense. No process should restrict the possibilities how we are creative; and how we ensure quality!

The picture that emerges for me can be roughly described as follows:
  • In a "waterfall world", two activities are ordered sequentially: First, there is construction; then, there is validation.
  • In a "TDD world", these two activities are dissolved into an iterative repetition of the TDD cycle. A "standard cycle" contains of both construction and validation.
But there are more possibilities in TDD:
  • Sometimes, we might include "pure validation cycles": A cycle that does not start with a new, initially red test case, but with an additional test case that checks the behavior implemented previously. Such a cycle often starts with someone saying: "Now, also that example we talked about earlier should work! Let's try it ...".
  • Symmetrically, it seems, there would also have to be "pure construction cycles." Yes, this seems strange: We never want to be in a state where there is some code that is not validated by supporting tests. But I do not want to outlaw "sparks of intuition:" In a TDD cycle, a participant should also be able to say "Ah - now I know how we can solve the problem - like this ...". What do we do with such a insight? I say that we do classical white-box testing: "Here is a proposed solution - now let's check whether it works!" And if it does, we are done, and are happy so! (There are arguments from inductive reasoning that such a case-transgressing insight is necessary to find a solution to generic problems; I tried to capture this in texts on the "paradoxes of TDD").
I believe (and have seen) that at such an "insight" point, there will be
  • some participants that are "cautious": "Are you sure this is a valid suggestion? Shouldn't we proceed in small steps?"; and 
  • some are "visionary": "Are we sure that a step-by-step process will lead to a good solution? Shouldn't we state some bold principles of the solution design?" - with both groups having good arguments on their sides, and therefore being in need of some sort of conflict resolution.
In general, it seems to me that a positive conflict resolution will happen along the following lines:
  • It is the responsibility of the "visionaries" to help to break down the vision into intelligible chunks (or "steps") so that the solution can be better understood and verified.
  • It is the responsibility of the "cautionarists" to help to arrive at a solution as quickly as possible. Therefore, they see to it that the steps taken go into the direction of that proposed solution, if that solution appears sensible.
(Of course, this assumes that no-one has some hidden agenda: I have seen a few cases of pairing TDD where one partner wanted to force a solution; and did so by selecting special test cases that would steer the code to be written into a certain direction - e.g. a table-driven solution. The "hill-climbing process" of TDD unfortunately plays into the hands of such undercover tactics, as one can hide decisions for some time by selecting trivial test-cases; but at some point, much code has already be written that then turns out to favor one design over another. Solving software problems should not be chess playing - one winner, one loser. Therefore, I whole-heartedly agree with the statements I cited at the beginning: "I start by anchoring the most valuable behaviour of the code" and "I'll pick a simple typical case and write a test for it").

(A similar "hill-climbing question" is behind refactoring, and is asked very often: Is it possible to get from one design [that is assumed to be inferior] to another design [a desirable and desired target design] by low-level refactoring steps? I do not know the answer, and have not seen any convincing one. On the whole, I remain skeptical e.g. for cases where you want to replace a compiler's single pass design with a triple-pass design with incrementally loaded parse trees; or where you want to replace an simple-induction-based algorithm with a divide-and-conquer version; etc. Therefore, I find it acceptable, and sometimes even desirable, to replace a design with a different one "all in one", "in a single big rewrite.")

(Back to TDD) Here is an example where one can try out one's preferences:

We are asked to write a function that
  • gets as input an array of different integers;
  • returns true if all products of two different numbers from the array are even.
So, e.g., if the array contains 1, 2, 3, we get false - because the product 1*3 is 3 which is not even. However, if the array contains 2, 3, 4, we get true: All pair products 2*3, 2*4, and 3*4 are even.

One approach could be:
  • We start with a "small, typical case" - e.g. { 1, 2 } which we expect to return true. Our algorithm, at this point, checks all the pairs (there is only one in this case, namely <1,2>) and returns the appropriate result.
  • The next case could be { 1, 3 }, to return false. We update the code (do we already need loops for 2-element arrays? - not yet really ...) and run the test.
  • We feel that we need those nested loops at some point, so we continue with a "long test case", e.g. { 1, 2, 3, 4, 5 }.
  • On the "opposite end", we could now continue with an single-element array { 3 } (what should the result be?), and ...
  • ... an empty array (what should the result be now??). 
  • We (might) feel that we are done now with development - but "just to be sure" (pure validation cycle!), we add another example with 5 entries that also include negative numbers and zero, e.g. { -2, -1, 0, 1, 2 } - which returns false, as -1*1 is odd. 
  • To balance this, we finish with another pure validation cycle, where we expect { -4, -2, 0, 2, 4 } to return true - and now we are really done.
Give these test cases to a hard-boiled tester, and he'll at least come up with a few more boundary cases: What about numbers that are larger than the square root of the maximal integer (so that the products will overflow)? What about very long arrays that yield a long runtime? What about cases with equal numbers in the array?
Did we do something wrong in TDD when we overlooked these cases? My point of view: I find it ok that such cases are not covered in the TDD cycle, i.e. while creating the solution. "Testing down" a solution requires a "different hat" - the hat of a tester; and it can be done afterwards.

Now let us assume that someone looks at the problem for a second and then says: "This is easy: You just need to check whether there is more than one odd number in the array." Voila - the complete solution in one line of code (with C#'s IEnumerable<>).

What do you do now? My point of view: Great - we do suddenly have a good solution candidate. So, we now immediately put on the tester's hat, design a set of test cases that cover a selected input partitioning, and validate the solution. As programmers, we expect to be lucky in that we solved the problem; but as testers, we must think about how to break the implementation - so we will do all the typical things: Boundary cases for  input and output (black box testing), as well as boundary cases for internal behavior (white box testing). When all these tests are green, we are altogether happy.

There's no real need to iterate, in a TDD sense, over the test cases: We can write all of them at once and then check the algorithm. Whether this is a good idea apparently hinges on the subtle idea of whether we think that the algorithm is correct anyway - then, any intermediate test runs are wasted - or we distrust that "solution from heaven" - then, intermediate test runs are fine, and we return to a sort of "degenerated TDD" (where no test ever runs red if the solution actually is one).
Is this ok? My point of view: Of course. A solution that survives more tests might not be as much fun as "solving by bug fixing" - but it is simply the fastest way to go.

There's not much more that I can say. I somehow circled this triplet of "incremental TDD", "visionary solutions" (that shortcut those increments), and "pure validation tests" (that are never red), and know now what I consider a sensible approach: Namely to work with an "overall incremental (TDD) process", but at any time allow "creative problem solving" and "classical (covering) test case design".

So be it.