SWE Musings: 2011

Monday, May 2, 2011

The PickAkin Kata - testing

As I decided to "just invent a solution out of the blue" in the previous posting, we are now in classical software engineering: I'm of course obliged to test that solution (or rather, solution candidate). So let me do a quite formal example test case design - it's a good exercise for me, anyway.

The standard functional test concept is "equivalence class partitioning": It should help us to arrive at a finite set of test cases that somehow covers "sufficiently" the infinitely many possible inputs and outputs of our code. Interestingly, equivalence class partitioning is not at all obvious for our small problem! To tackle this nice testing problem, let's first clearly separate the "spaces" to be covered:

Black box testing requires us to find a partitioning for the input space.
Black box testing also requires us to find a partitioning for the output space.
Finally, white box testing requires us to find a partitioning for any internal space created.

The input space consists of two sets. However, the problem groups items in each set according to their "DLQ" (the first letter of the strings, in our integration test) - so it is more appropriate to model each set as a bag of prefixes, i.e., a mapping from prefixes to non-negative integers. A set, and hence also a mapping, is typically covered by the 0-1-3 heuristics, i.e., we take the empty set, a singleton set, and a set with three elements for testing. Applied to our mappings, the patterns for one input set are therefore

    { }
    { . -> . }
    { . -> ., . -> ., . -> . }

On the left side of the mappings, we can take arbitrary prefixes. On the right side, we have to cover non-negative integers - again, we use the 0-1-3 heuristics. However, a mapping to 0 just means that the element is not present in the mapping - so we drop it. What do we get from our three "mapping templates"?

The empty set just remains as it was:

{ } - case "0"

The single-element mapping can be expanded to two cases:

{ A -> 1 } - case "1"
{ B -> 3 } - case "3"

For the three-element mapping, we could create 2^3 = 8 versions. We arbitrarily restrict ourselves to "pairwise coverage", i.e., each combination of the two cases "1" and "3" must occur once. This requires just two mappings, e.g.

{ D -> 1, E -> 1, F -> 3 } - case "113"
{ G -> 3, H -> 3, I -> 1 } - case "331"

(Each "pair" created from the set { 1, 3 } occurs: {1,1} for D, E; {1,3} e.g. for E, F or for G, I; and {3,3} for G, H).

Trivially multiplying these five cases for the two input sets would give us 25 cases for input coverage. However, we want to restrict this for two reasons:

First, 25 cases is too much for the budget we have (says me).
Second, this trivial multiplication does not consider the output coverage.

Therefore, let's now consider the output coverage! The result of Pick.Akin are three disjoint bags. Doing the same as on the input side, we end up with five cases for each mapping (or bag), which altogether results in 5*5*5 = 125 different cases - even more than the 25 on the input side! Again, we can try to mechanically apply "pairwise coverage" - but let's please make sure that a few "common sense" boundary cases are among the tests. I get lazy here and use a tool - PICT. The input model for PICT can be written as

    ceoListAfter:   e0, e1, e3, e113, e331
    akinList:       a0, a1, a2, a112, a221
    ctrlListAfter: t0, t1, t3, t113, t331

We see the 5 possibilities for each bag, marked with a letter to distinguish the three bags (I varied the size "3" for the akinList to "2", just so). As "common sense" cases, we take set sizes of [0, 0, 0], [0, 1, 0], [1, 0, 1], and [3, 2, 3], which gives us a PICT seed file of

    ceoListAfter    akinList    ctrlListAfter
    e0    a0    t0
    e0    a1    t0
    e1    a0    t1
    e3    a2    t3

The PICT run yields a whopping 28 test cases:

    e0      a0      t0
    e0      a1      t0
    e1      a0      t1
    e3      a2      t3
    e331    a221    t113
    e0      a2      t331
    e0      a112    t113
    e113    a221    t0
    e331    a112    t331
    e113    a1      t3
    e0      a221    t3
    e1      a112    t3
    e1      a2      t0
    e113    a0      t331
    e1      a221    t331
    e3      a0      t113
    e1      a1      t113
    e3      a1      t331
    e3      a112    t1
    e0      a2      t1
    e3      a112    t0
    e3      a221    t1
    e113    a2      t113
    e331    a2      t3
    e331    a1      t1
    e113    a112    t1
    e331    a0      t0
    e0      a0      t3

What can we do? There are only two ways to go:

(a) We accept this as our work load (the 28 cases would at least harmonize quite well with the 25 input cases we wanted to cover!).
(b) We try to argue that there are no interferences between the results in the three output parameters ... but this is certainly not true: They are tightly coupled (e.g. by the requirement that the prefixes in them are disjoint!).

Mhm.

-----------------

The only idea I have to reduce our test work load is to replace the 0-1-3 heuristics for sets with a 0-3 heuristic: Empty set - not-empty set. So let's start from scratch:

The input space per set consists of

{ }
{ . -> ., . -> ., . -> . }

For these two "mapping templates", we get the following cases:

For the empty case:

{ } - only case "0"

For the three-element mapping, using "pairwise coverage" again, we get:

{ D -> 1, E -> 1, F -> 3 } - case "113"
{ G -> 3, H -> 3, I -> 1 } - case "331"

For the input side, fully multiplying these cases would give us 9 test cases.

For the output side, we would get 3*3*3 = 27 cases for full multiplication. Pairwise coverage with PICT can be done with the following model:

    ceoListAfter:   e0, e113, e331
    akinList:       a0, a112, a221
    ctrlListAfter: t0, t113, t331

We keep the four common sense testcases from above - however, three of them will not be created by these mappings (e0/a1/t0, e1/a0/t1, e3/a2/t3). With the following seed file,

    ceoListAfter    akinList    ctrlListAfter
    e0    a0    t0

PICT gives us the following ten test cases for the 3 output lists ceoListAfter, akinList, and ctrlListAfter:

    e0      a0      t0
    e113    a221    t0
    e331    a0      t113
    e113    a112    t331
    e0      a112    t113
    e331    a221    t113
    e331    a0      t331
    e331    a112    t0
    e0      a221    t331
    e113    a0      t113

Some of these cases are quite superfluous: E.g., we don't need both e113/a112/t331 and e331/a221/t113. Moreover, we can rely on the symmetric code and therefore remove cases that are symmetrical for ceoListAfter and ctrlListAfter. That would leave us with the following cases:

    e0      a0      t0
    e113    a221    t0
    e331    a0      t113
    e113    a112    t331

plus the three common sense test cases

    e0    a1    t0
    e1    a0    t1
    e3    a2    t3

and the acceptance test. The whole undertaking gets a little boring now, so I declare that I'm happy with these combinations, and we construct the test data:

    e0      a0      t0:

This can only result from two empty input sets. Thus, input combination [{ }, { }] is covered.

e113 a221 t0:

We need three prefixes that end up in the ceoListAfter, and 3 more that end up in akinList. Here are possible mappings:

ceoList { A->1, B->1, C->3, D->1, E->1, F->1 }
ctrlList { D->1, E->1, F->... }

Mhm: How can we have a single element for prefix F in the akinList? After all, the akinList must contain one instance from the ceoList and one from the ctrlList! So maybe there is a restriction that in the akinList, the number of elements per prefix is always even? ... or not: Assume that both lists contain the string "F1" - is our akinList assumed to contain "F1" twice, or only once? I take a shortcut: The code will return only one (as the call to Distinct() does not distinguish "left" and "right" elements), and so I say this is right. As an exercise, we could try to implement a version that will behave exactly the other way round - this seems not that easy!

After this (typical) "specification detour," we can create matching input sets from the mappings:

ceoList { A1, B1, C1, C2, C3, D1, E1, F1 },
ctrlList { D2, E2, F1 }

This covers input combination [not empty, not empty].

e331 a0 t113:

After that initial construction, it now gets easy:

ceoList { A1, A2, A3, B1, B2, B3, C1 }
ctrlList { D1, E1, F1, F2, F3 }

Again, we have covered input combination [not empty, not empty].

e113 a112 t331:

          ceoList  { A1, B1, C1, C2, C3, D1, E1, F1 }

          ctrlList                     { D1, E1, F2, G1, G2, G3, H1, H2, H3, I1 } 

From here on, we should modify the list order. E.g., in this test case we could reverse the lists. But we notice that we are not going to cover any more input combinations that way! So we add two more cases for input combinations [empty, not empty] and [not empty, empty]:

    e0      a0      t22:
            ceoList { }
            ctrlList   { D1, D2, E1, E2 }

and

    e1      a0      t0:
            ceoList { A1 }
            ctrlList      { }

Finally, here are the common sense test cases:

    e0    a1    t0
            ceoList { A1 }
            ctrlList { A1 }
    e1    a0    t1
            ceoList { A1 }
            ctrlList      { B1 }
    e3    a2    t3
            ceoList { A1, A2, A3, B1, B2 }
            ctrlList             { B1, B2, C1, C2, C3 }
        or
            ceoList { A1, A2, A3, B1 }
            ctrlList             { B2, C1, C2, C3 }

And this is it. A little boring, was that testing, altogether.

The PickAkin Kata - a solution

The PickAkin kata has a simple specification which I copy straight from Ilker's text:

Write a program that takes two lists (ceoList and controlList) containing an arbitrary set of product codes in the above described format. The program produces three lists. The first list is the akinList, that is, the list of all akin products taken from the input lists. The remaining two lists are the reduced ceoList and controlList, respectively.

Algorithmically inclined peopla (like me) "just" look for a solution along these lines. So I'll give you just give a possible solution in a second.

But what about test-driven development? About incremental test-driven development? It took me a whole Sunday to find out why I do not have a bad conscience when I find a solution to a problem in a non-incremental way - I tried to explain this in my previous blog posting (which I might overhaul at times - it is somewhat chaotic ...). The quintessence is:

Of course it is ok to find a solution non-incrementally. You are then only obliged to test that solution sufficiently.

This posting will only contain the solution. Testing it will be the topic of the next one.

Finding a solution for the PickAkin kata is simple: Just take, from both lists, all product pairs and filter out the "akin" ones (similar, in my API) - this is the akinList. This is "almost easy":

    akinList = from a in listA

               from b in listB      // we now have all pairs (a,b)

               where similar(a, b)  // we take only the similar ones

               select new[] {a, b}; // assemble them into one structure

We could happily decree that the problem is now solved. But of course, the data structure we get back from this Linq expression is not exactly what we need: We now have an IEnumerable<IEnumerable<T>> instead of the flattened list we need; and moreover, there will be duplicates in that structure. But these are "mere technical problems" - other collection concepts that are based on sets (where there are never duplicates) and do flattening implicitly would not need any additional code. In C#, flattening for IEnumerables is done with SelectMany(); and duplicates are removed with Distinct(). So here is the complete, C#-y solution for computing akinList:

    akinList = (from a in listA

                from b in listB

                where similar(a, b)

                select new[] { a, b }).SelectMany(li => li).Distinct();

Of course, we still have to reduce the two input lists - but this is simple:

    listAReduced = listA.Except(akinList);

    listBReduced = listB.Except(akinList);

Done ... Here is the complete code:

    public static class Pick {

        /// <summary>

        /// The pick-akin function of Ilker's Pick Akin Kata.

        /// </summary>

        /// <typeparam name="T"></typeparam>

        /// <param name="listA">Input set A - order not important</param>

        /// <param name="listB">Input set B - order not important</param>

        /// <param name="similar">The function to designate similar elements</param>

        /// <param name="listAReduced">Output set A - order not important</param>

        /// <param name="listBReduced">Output set B - order not important</param>

        /// <param name="akinList">Akin list - order not important</param>

        public static void Akin<T>(

            IEnumerable<T> listA,

            IEnumerable<T> listB,

            Func<T, T, bool> similar,

            out IEnumerable<T> listAReduced,

            out IEnumerable<T> listBReduced,

            out IEnumerable<T> akinList) {

            akinList = (from a in listA

                        from b in listB

                        where similar(a, b)

                        select new[] { a, b }).SelectMany(li => li).Distinct();

            listAReduced = listA.Except(akinList);

            listBReduced = listB.Except(akinList);

Of course, this code now has to be tested!

Sunday, May 1, 2011

How do we select test cases in TDD - and, how do we integrate "insights" into TDD?

I don't know. I just do it. Typically, in my environment, this is a "low conflict" activity: When I pair with someone, we agree quite quickly on the first few test cases. Soon, case selection gets muddled up with other considerations like algorithm problems, API problems etc.etc., so cases tend to be a discussion base for such aspects. And then, rather sooner than later, time runs out, and integration of the new behavior with existing features becomes the most important activity. The TDD is then more or less over. Later, bugs might force us to add additional test cases - but at this point, the selection of test cases is quite focused: First, the bugs themselves provide test cases; second, we also do classical (e.g. risk-based) test case selection to guard against errors.

But how do we select cases in that "pure construction phase" of test driven development? I googled around only a little bit, and so I did not find anything that appealed to me ... The most straightforward message was at http://stackoverflow.com/questions/3922258/when-applying-tdd-what-heuristics-do-you-use-to-select-which-test-to-write-next, where two people answered:

The first one said essentially "I start by anchoring the most valuable behaviour of the code" and then "After that, I start adding the edge cases". The other says. "To begin with, I'll pick a simple typical case and write a test for it" and then "While I'm writing the implementation I'll pay attention to corner cases that my code handles incorrectly. Then I'll write tests for those cases."

Two scientific (or "scientific"?) links didn't make much sense:

"Metrics for Test Case Design in Test Driven Development" at http://www.ijcte.org/papers/269-G823.pdf is very bad English (p.853: "As we seen the V model says That someone should first test each unit") and, in my opinion, does not fulfil promises made in the paper nor in the title. So it goes in academia - "publish or perish."
"Regression Test Selection Techniques for Test-Driven Development" at http://www.math.tau.ac.il/~amiramy/testRank.pdf has the following goal: "Our goal is to automatically find, for a given small and localized code change, a small subset of tests, with a bounded cost of execution, that provides a high bug detection rate, close enough to that of full retesting" and "More formally, given a program P, a test suite T and a code change c, we define fail(T,P,c) as the subset of tests in T that fail on program P after change c." - also not what we do in TDD: There is no predefined test suite T.

Here is my attempt to somehow circle that topic of case selection in TDD. It will be a sort of rambling along ... maybe someone is interested in it.

As I see it, it is important to consider the "TDD cycle" (create red test, write code, show that test is green; interspersed with full test suite runs and refactorings) in the context of two activites:

(1) Creative problem solving.
(2) Classical "after-the-fact testing" - writing unit tests after coding is completed.

Why do I want to separate these from the "TDD proper"? Well, exactly because it seems to me that a few TDD aficionados assume implicitly or even require explicitly that these activities have to be part of the TDD cycle. A somewhat exaggerated formulation of that standpoint would be:

Regarding (1): You must not have a complete solution that you somehow arrived at in an "creative way:" The solution must evolve gradually through the addition of more and more cases that define and constrain the behavior.

Regarding (2): There is no need to write additional test cases after delivering a completed solution that has been arrived at by TDD. The test cases done in TDD are sufficient.

I hope that everyone agrees that both statements would be nonsense. No process should restrict the possibilities how we are creative; and how we ensure quality!

The picture that emerges for me can be roughly described as follows:

In a "waterfall world", two activities are ordered sequentially: First, there is construction; then, there is validation.
In a "TDD world", these two activities are dissolved into an iterative repetition of the TDD cycle. A "standard cycle" contains of both construction and validation.

But there are more possibilities in TDD:

Sometimes, we might include "pure validation cycles": A cycle that does not start with a new, initially red test case, but with an additional test case that checks the behavior implemented previously. Such a cycle often starts with someone saying: "Now, also that example we talked about earlier should work! Let's try it ...".
Symmetrically, it seems, there would also have to be "pure construction cycles." Yes, this seems strange: We never want to be in a state where there is some code that is not validated by supporting tests. But I do not want to outlaw "sparks of intuition:" In a TDD cycle, a participant should also be able to say "Ah - now I know how we can solve the problem - like this ...". What do we do with such a insight? I say that we do classical white-box testing: "Here is a proposed solution - now let's check whether it works!" And if it does, we are done, and are happy so! (There are arguments from inductive reasoning that such a case-transgressing insight is necessary to find a solution to generic problems; I tried to capture this in texts on the "paradoxes of TDD").

I believe (and have seen) that at such an "insight" point, there will be

some participants that are "cautious": "Are you sure this is a valid suggestion? Shouldn't we proceed in small steps?"; and
some are "visionary": "Are we sure that a step-by-step process will lead to a good solution? Shouldn't we state some bold principles of the solution design?" - with both groups having good arguments on their sides, and therefore being in need of some sort of conflict resolution.

In general, it seems to me that a positive conflict resolution will happen along the following lines:

It is the responsibility of the "visionaries" to help to break down the vision into intelligible chunks (or "steps") so that the solution can be better understood and verified.
It is the responsibility of the "cautionarists" to help to arrive at a solution as quickly as possible. Therefore, they see to it that the steps taken go into the direction of that proposed solution, if that solution appears sensible.

(Of course, this assumes that no-one has some hidden agenda: I have seen a few cases of pairing TDD where one partner wanted to force a solution; and did so by selecting special test cases that would steer the code to be written into a certain direction - e.g. a table-driven solution. The "hill-climbing process" of TDD unfortunately plays into the hands of such undercover tactics, as one can hide decisions for some time by selecting trivial test-cases; but at some point, much code has already be written that then turns out to favor one design over another. Solving software problems should not be chess playing - one winner, one loser. Therefore, I whole-heartedly agree with the statements I cited at the beginning: "I start by anchoring the most valuable behaviour of the code" and "I'll pick a simple typical case and write a test for it").

(A similar "hill-climbing question" is behind refactoring, and is asked very often: Is it possible to get from one design [that is assumed to be inferior] to another design [a desirable and desired target design] by low-level refactoring steps? I do not know the answer, and have not seen any convincing one. On the whole, I remain skeptical e.g. for cases where you want to replace a compiler's single pass design with a triple-pass design with incrementally loaded parse trees; or where you want to replace an simple-induction-based algorithm with a divide-and-conquer version; etc. Therefore, I find it acceptable, and sometimes even desirable, to replace a design with a different one "all in one", "in a single big rewrite.")

(Back to TDD) Here is an example where one can try out one's preferences:

We are asked to write a function that

gets as input an array of different integers;
returns true if all products of two different numbers from the array are even.

So, e.g., if the array contains 1, 2, 3, we get false - because the product 1*3 is 3 which is not even. However, if the array contains 2, 3, 4, we get true: All pair products 2*3, 2*4, and 3*4 are even.

One approach could be:

We start with a "small, typical case" - e.g. { 1, 2 } which we expect to return true. Our algorithm, at this point, checks all the pairs (there is only one in this case, namely <1,2>) and returns the appropriate result.
The next case could be { 1, 3 }, to return false. We update the code (do we already need loops for 2-element arrays? - not yet really ...) and run the test.
We feel that we need those nested loops at some point, so we continue with a "long test case", e.g. { 1, 2, 3, 4, 5 }.
On the "opposite end", we could now continue with an single-element array { 3 } (what should the result be?), and ...
... an empty array (what should the result be now??).
We (might) feel that we are done now with development - but "just to be sure" (pure validation cycle!), we add another example with 5 entries that also include negative numbers and zero, e.g. { -2, -1, 0, 1, 2 } - which returns false, as -1*1 is odd.
To balance this, we finish with another pure validation cycle, where we expect { -4, -2, 0, 2, 4 } to return true - and now we are really done.

Give these test cases to a hard-boiled tester, and he'll at least come up with a few more boundary cases: What about numbers that are larger than the square root of the maximal integer (so that the products will overflow)? What about very long arrays that yield a long runtime? What about cases with equal numbers in the array?
Did we do something wrong in TDD when we overlooked these cases? My point of view: I find it ok that such cases are not covered in the TDD cycle, i.e. while creating the solution. "Testing down" a solution requires a "different hat" - the hat of a tester; and it can be done afterwards.

Now let us assume that someone looks at the problem for a second and then says: "This is easy: You just need to check whether there is more than one odd number in the array." Voila - the complete solution in one line of code (with C#'s IEnumerable<>).

What do you do now? My point of view: Great - we do suddenly have a good solution candidate. So, we now immediately put on the tester's hat, design a set of test cases that cover a selected input partitioning, and validate the solution. As programmers, we expect to be lucky in that we solved the problem; but as testers, we must think about how to break the implementation - so we will do all the typical things: Boundary cases for input and output (black box testing), as well as boundary cases for internal behavior (white box testing). When all these tests are green, we are altogether happy.

There's no real need to iterate, in a TDD sense, over the test cases: We can write all of them at once and then check the algorithm. Whether this is a good idea apparently hinges on the subtle idea of whether we think that the algorithm is correct anyway - then, any intermediate test runs are wasted - or we distrust that "solution from heaven" - then, intermediate test runs are fine, and we return to a sort of "degenerated TDD" (where no test ever runs red if the solution actually is one).
Is this ok? My point of view: Of course. A solution that survives more tests might not be as much fun as "solving by bug fixing" - but it is simply the fastest way to go.

There's not much more that I can say. I somehow circled this triplet of "incremental TDD", "visionary solutions" (that shortcut those increments), and "pure validation tests" (that are never red), and know now what I consider a sensible approach: Namely to work with an "overall incremental (TDD) process", but at any time allow "creative problem solving" and "classical (covering) test case design".

So be it.

Saturday, April 30, 2011

Paradoxes of Test Driven Development cont'd.

3. The dimensionality paradox

After a few days, I'm no longer that happy with this name ... But let's look at a high-level example, and maybe find a better name later.

Assume that you have to write a regular expression recognizer; it only needs to output "match" or "no match" for a regular expression and one or possibly many input strings. You start with simple examples, like a or ab, then a?, ab?, continue maybe with [ab]* and (ab)* etc. The first few examples can be handled with a simple parser and a simple interpreter, but more and more complex regexes will make these machines more and more complex - and less and less performant. At some point, the iteratively developed design will no longer hold up to the combined requirements of efficiency and maintainability ...

If you take a look at modern regex recognizers, they contain a key design component: The regexp is translated into a "program" for a special "processor", which then does the input traversal. The design of this processor is influenced (a) by the problems of regexp recognition (e.g., the necessities of fast loop traversal; and backtracking in the case of non-greedy operators and back-references), but also (b) by the runtime processor of the regexp library - virtual machines in the case of Java and .Net, but maybe also real silicon processors.

This powerful, but also complicated in-between concept (between the regexp language with its semantics; and the recognition result "match" / "no match") is something you have to invent. It cannot be derived from the "mere" input-output behavior of the problem at hand. Rather, you need to step back quite far from the problem to see that this design is a possibility that will solve the problem better than more straightforward designs.

Now, what I said is, in a sense, unfair: Test driven design does not prevent that you get more and more understanding during the design and implementation process; and one possibility is that at one point, you rip out much of the previous design and replace it with a new one that now handles all the design aspects better. This is a valid test-driven proceeding, but it is certainly no longer that "incremental".

A similar, but certainly much more "theoretical" example for such an important "intermediate invention" is the heap sort algorithm: Where the data a first put into that strange, not at all sorted heap structure - and then can be pulled out sorted in an efficient way.

------

In a sense, all three paradoxes are special cases of the problem of induction: On which basis can we generalize from a few special cases? However, in another sense, I cheated: These are not paradoxes at all! Why? - because engineering is a discipline where we decide how we want something to be designed. Thus, we are not in the situation that we must detect the generalized rules (as e.g. natural sciences have to do), but we design the generalized rules from a generalized unterstanding. Yet, for all the good that TDD has done, I think we should not forget this:

Also under TDD, it is admissible - and, even more, desirable - that we create designs that transcend simple case-by-case solutions to problems that we present to ourselves (or that are presented to us) as a "mere set of cases."

Tuesday, April 26, 2011

Paradoxes of Test Driven Development

It is, fortunately, common nowadays to construct code by iterative test-driven design (TDD):

Declare that the code under construction is assumed to have a new behavioral property it did not have up to now.
Write a piece of test code that checks for this behavior.
Run the test to prove that the behavior is still missing ("make it red").
Continue construction so that the behavior emerges.
Run the test again to prove that the behavior is now established ("make it green").
Run all previous tests to prove that the construction did not remove a previously established behavior ("make all green").

Having said this, it is still worthwhile to point out the three paradoxes of iterative TDD:

The infinity paradox: A limited number of concrete test cases can never define a behavior sufficiently.
The hill-climbing paradox: Local iterations are not guaranteed to reach a globally agreeable (or sufficient) solution.
The dimensionality paradox: Separation of concerns along one axis may break concerns (or aspects) that would have been useful to keep together.

Of course, all these paradoxes are easily solved by intelligent engineers (one might call this, in itself, a paradox:

The paradox-transgression paradox: That somewhing consistent and useful can emerge from a paradoxical situation.)

Ok - I'm getting too philosophical. Let's look at the three paradoxes above. BTW, I have invented their names on the fly - so if you don't like them, or there are better ones (e.g. in texts I have not read), feel free to substitute your or someone else's names instead!

1. The infinity paradox

This is a well-known feature of TDD. Let's assume you have to implement a function that adds two integral numbers and returns their sum. "Mhm ... sum of numbers ... not an easy topic," you mumble. "Could you give me a very simple example?" "Sure: The sum of zero and zero is zero". "Aha, let's implement this. We do not know any other sum, so we can easily write:"

    public int Sum(int a, int b) {
        return 0;
    }

"But I can also tell you that the sum of three and zero is three." "Well, then so be it:"

        if (a == 3) {
            return 3;
        } else {
            return 0;
        }

"However, also zero plus three is three." "Fine:"

        if (a == 3 && b == 0) {
            return 3;
        else if (a == 0 && b == 3) {
            return 3;
        } else {
            return 0;
        }

"And five plus six is eleven." "Weeell:"

        if (a == 3 && b == 0) {
            return 3;
        else if (a == 5 && b == 6) {
            return 11;
        else if (a == 0 && b == 3) {
            return 3;
        } else {
            return 0;
        }

"And four plus seven is also eleven." "Okaaaaayy:"

        if (a == 3 && b == 0) {
            return 3;
        else if (a == 5 && b == 6) {
            return 11;
        else if (a == 0 && b == 3) {
            return 3;
        else if (a == 4 && b == 7) {
            return 11;
        } else {
            return 0;
        }

You get the idea.

We see: A finite and manageable number of cases is not sufficient to define fully a rule that should hold for large or even infinite input sets. We need to bring in additional techniques. There are three that I consider to be most important:

Number one is what I call "conceptualization." There is no way around "describing the complexity as it is." A sum is a sum is a sum is a sum: It cannot be simplified to something that is fundamentally weaker, like a list of cases. Only conceptualization can yield the implementation

return a + b;

(Formalists turn this argument on its head: They claim that the formal - axiomatic, behavioral, ... - specification alone is sufficient to define a behavior. But time has disproven this - we also need examples; and preferably well-chosen examples).

Number two is integration (especially integration testing): When a concept is used in a broader context or environment, certain aspects of the concept might become more pronounced. So you write an integration test (or manually execute a system test) that actually uses your sum in the intended environment.

However, isn't this paradoxical again (a sub-paradox?)? After all, I have just argued that the "sum concept" cannot be reduced to a "list of cases." But also integration testing is only a "list of cases", albeit in a different, more encompassing domain. So why bother? Simple reason: This usage in a different and relevant domain will extract at least some implicit assumptions not spelt out in the examples or even the conceptual specification.

For example, when using the summing function above in a scenario where numbers count upwards all the time, the topic of overflow suddenly becomes very interesting. If the same function is used to increment and decrement a number representing the size of a stack, overflow is most probably no problem, but we now have to handle negative numbers, even though we'll expect the result never to be negative. When the value represents a temperature in degrees Celsius or Fahrenheit, we finally also have to accept negative numbers as result; etc.

One capability I admire very much is to be able to abstract away from a iven context (and thereby e.g. arrive at the concept of "integral number"), but on the other hand, always keep the "real world", the application in focus and not to "fall over the YAGNI edge."

Even though it only creates additional cases (and hence cannot overcome the fundamental infinity paradox), number three of the important techniques is intelligent selection of cases.

Good testing discipline has all the key ideas and phrases: "Boundary cases", "error cases", "coverage". However, testing argues for such techniques on a basis that does not necessarily interest us in TDD: Namely on "typical programmer errors". In classical testing, boundary cases are selected primarily because it is assumed that programmers tend to make "one off errors".
In TDD, there are no programmer errors - "not yet," at least. The focus on boundary cases here is, predominantly, one of clear behavioral definition. Having cases that show that 0 + x is the same as x, and that this is also true for x + 0 underline the fact that 0 is a neutral element of the summing function - regardless of the probability of a programmer to get that wrong.

Of course, there are other techniques to get the behavior right - reviews, random testing, formal and informal proofs etc. They will help, but are even more work!

2. The hill-climbing paradox

"Hill-climbing" is a mathematical optimization method for multi-dimensional functions. Essentially, it says: If you want to reach the top of the hill (the "optimum"), always go in the direction of the steepest incline. Sounds obvious, doesn't it? The problem with hill-climbing is that you might reach a "local optimum": A small hill far away from the mountain you actually wanted to conquer. Once you are atop the small hill, naive hill-climbing prevents you from going anywhere else: There simply is no incline any more in any direction.

Naive iterative development can lead you astray in similiar ways. Here is a very artificial example: You have to implement a function F(int a, int b) whose result is defined by the following table (which is finite! - so there's no infinity paradox involved):

    0 1 2 3

--+--------

0 | 0 1 2 3

1 | 3 0 1 2

2 | 2 3 0 1

3 | 1 2 3 0

You might start with the first line, i.e., have cases for (0,0), (0,1), (0,2), and (0,3). Including some conceptualization (abstraction; pattern recognition), you end up with something like

    if (a == 0) {

        return b;

}

Including the second line, you arrive at

    if (a == 0) {

        return b;

    } else if (a == 1) {

        return b == 0 ? 3 : b - 1;

}

etc. If, however, you start with different "dimensions", you might see that the difference b - a alone determines the result (by noting the falling "diagonal stripes" in the matrix), and your cases might be

    b - a = 0 ... this always yields zero.
    b - a > 0 ... this always yields b - a.
    b - a < 0 ... this yields b - a + 4.

And if you take a "very holistic" approach, you might actually see that the function is just (b - a + 4) % 4.

The morale of that story? Depending on the method you use to plan your next step (up the hill), you might end at very different solutions. And the simplest method of deciding on the direction - in the matrix example, the "line by line" method - might well lead to a sub-optimal solution (a local optimum).

This little example does of course not prove that much. Still, doing the same with a large state table (as used in some types of parsers) might actually give you a quite intractable piece of code rather than a manageable one; or vice versa.

Test cases are good. Believing that common sense test case selection will lead you to a good algorithm is not.

Continued at Paradoxes of test driven development, cont'd.

The PickAkin Kata - intermezzo

Ilker sits in the same room as I do. While my Windows had to boot (it didn't let go of a directory I needed to rename), I - yes, it's my fault - started a discussion with him about my posting. Here was his major critique: "Why do you start with an integration test? Even if you do not intend to run it right now, why do you use this complex test as a means to pin down the programming model?" (In this case, the whole programming model consists of an API - there are no installs, no generators, no build processes and the like). My answer was twofold:

First, I made an implicit assumption in that series which I should make explicit: I do not write a cookbook about what I would (or you could) or should do. Rather, I want to draw a few topics out into the open; and for this, I bundle bits and pieces maybe differently than when actually doing it "live".

In a real programming exercise, you might decide on a few pieces of the programming model when writing the first test; a few more in the third test; another part in the sixth or twenty-sixth test, where you also might revise a decision from earlier etc. Already if you are a group, this will start to get hard: Different people in the group might believe that decisions should be made at other times - so you'd have to make that decision process more explicit, like I tried to do it. But for my presentation purposes, distributing those decisions and, more interesting, my meta-discussion about why I want to void my own assumptions, would become chaotic.

Second, that "programming model issue" is important to me. After all, this is the "user interface" that is given to other programmers. So it is worth an explicit consideration.

But why then write that acceptance test? Wouldn't a simpler test be better? My - maybe at first astonishing - answer is: This is the simplest test to write! Every other test one could conceive - e.g. using only empty collections - would have to include an additional abstraction step away from the information given to me by Ilker. So everyone who'd want to understand that test would have to accept my arguments that "this smaller test is essentially what we need". But for the data we are given, there's no need to talk about this abstraction of "essentially being the same." So, I argue that, after having read the problem description, this is actually the simplest test regarding the programming model.
It is of course not the simplest test regarding the behavior of the code to be constructed - but that's a totally different issue.

We also had an additional debate that centered on the concept of decision-making in general. But I will not go into this - maybe Ilker writes something, maybe I later.

My PickAkin kata ramblings continue with a solution without TDD ...

The PickAkin Kata - part two

First, I reverse myself: This is a great kata! Why? Exactly because you can use it to tie so many aspects of programming to it. Here are three which I did not intend to write about - as you might guess when you read the following: My test code is wrong, or bad, or "sub-optimal" - in three ways.

The first, obvious problem is that I exchanged the expected and actual parameters. This is a problem I have had since the times I started using JUNit in the last millenium: Customarily, in languages of C heritage we check for a value as follows:

   if (a != 4) throw ...;

In xUnit, we write

   Assert.AreEqual(4, x);

(unless you like the new .That() style even for such simple assertions). Somewhere deep in my brain, the former seems so deeply ingrained that I need to remind me explicitly all the time to "do it the other way in NUnit". But with everything we need to remind ourselves explicitly of, we are bound to fail; especially under stress - like publishing the code to a blog ;-). I think we can learn an important fact from this - some might call it trivial, I like to call it fundamental:

Every engineering error we human beings can make will be made sooner or later.

I leave it to you to draw conclusions about all sorts of technologies that have more physical forces at work than that simple PickAkin kata.

The second problem with my test code is that it checks for the most important fact last. As long as the test runs green, this is ok. If it runs red when we expect it to run red - before we implement sufficient code to make it green -, this is also ok. But it is cumbersome, at least, if we expect it to run green, and it runs red - i.e., when we are in debugging. Then, a "secondary information" like the ceoList being wrong helps less than the "primary information" that the akin list is wrong.

Last, when we are in debugging, we need readable errors. If we check for the correct akin list by

CollectionAssert.AreEquivalent(new[] { "A1", "B2", "A2",
      "A3", "B3", "A4", "D1", "D2", "B1" }, akinList);

but get a different result from some computed IEnumerable<>, the error message looks about as follows:

Expected: equivalent to < "A1", "B2", "A2", "A3", "B3", "A4", "D1", "D2", "B1" >
But was: <System.Linq.Enumerable+WhereArrayIterator`1[System.String]>

You can produce this result e.g. by trivially implementing Pick.Akin(...) as

    akinList = listAReduced = listBReduced = listA.Where(x => x.Equals(x));

Thus, it helps to add .ToList() or .ToArray() to IEnumerable<>s. The trivial code, when checked with

CollectionAssert.AreEquivalent(new[] { "A1", "B2", "A2",
      "A3", "B3", "A4", "D1", "D2", "B1" }, akinList.ToList());

completes with the much nicer error message

    Expected: equivalent to < "A1", "B2", "A2", "A3", "B3", "A4", "D1", "D2", "B1" >
    But was: < "A1", "B1", "A2", "A3", "C1", "D1", "E1", "E2" >

So here is the revised integration test - this time already with an [Ignore] tag because it is not yet time to run it. BTW, in my humble opinion, each and every [Ignore] (and [Explicit]) tag is worth a textual explanation why the respective test is only to be run explicitly, or to be ignored altogether, at this time:

[Test]
[Ignore("Acceptance test cannot run for not yet production-ready implementation")]
public void AcceptanceTestUsingIlkersData() {
    IEnumerable ceoList = new[] {
        "A1", "B1", "A2", "A3", "C1", "D1", "E1", "E2" };
    IEnumerable ctrlList = new[] {
        "F1", "D2", "B2", "B3", "A4" };
    IEnumerable ceoListAfter;
    IEnumerable ctrlListAfter;
    IEnumerable akinList;
    Pick.Akin(ceoList, ctrlList, (a, b) => a[0] == b[0],
        out ceoListAfter, out ctrlListAfter, out akinList);
    CollectionAssert.AreEquivalent(new[] {
        "A1", "B2", "A2", "A3", "B3", "A4",
        "D1", "D2", "B1" }, akinList.ToList());
    CollectionAssert.AreEquivalent(new[] {
        "C1", "E1", "E2" }, ceoListAfter.ToList());
    CollectionAssert.AreEquivalent(new[] { "F1" },
ctrlListAfter.ToList());
}

So long!

Next part: The PickAkin kata - intermezzo

Sunday, April 24, 2011

The PickAkin Kata - part one

My colleague Ilker made a kata posting from a small algorithm question I sent around in our department. He wrote a nice story around it, made it suitably short, and hopefully more than one got bitten by that bug ... ah, well, "found interest in that little algorithmic problem."
I thought a little bit whether I should somehow reply to Ilker's posting - and then I started to get the feeling that that problem (my problem?) was not a good kata, after all. And alone this feeling makes it worthwhile to write about it, I thought.

How would you go about solving this puzzle? Well, I think a standard idea is to write an acceptance test first. Why? Certainly, we would not want to make this test run successfully - it is much too complex for an initial unit test in a TDD culture. No, the reason is that in that step we set the stage for interesting properties of the solution - in this case, the API. Here is my acceptance test:

[Test]
public void AcceptanceTestUsingIlkersData() {
    IEnumerable ceoList = new[] {
        "A1", "B1", "A2", "A3", "C1", "D1", "E1", "E2" };
    IEnumerable ctrlList = new[] {
        "F1", "D2", "B2", "B3", "A4" };
    IEnumerable ceoListAfter;
    IEnumerable ctrlListAfter;
    IEnumerable akinList;
    Pick.Akin(ceoList, ctrlList, (a, b) => a[0] == b[0],
        out ceoListAfter, out ctrlListAfter, out akinList);
    CollectionAssert.AreEquivalent(ceoListAfter, new[] {
        "C1", "E1", "E2" });
    CollectionAssert.AreEquivalent(ctrlListAfter, new[] {
        "F1" });
    CollectionAssert.AreEquivalent(akinList, new[] {
        "A1", "B2", "A2", "A3", "B3", "A4", "D1", "D2", "B1" });
}

... aaah, don't hit me - or at least not that hard! Already at this point, I have made quite a lot of decisions that could have been decided differently:

Implementing it in C# (instead of F#; or SQL)
The API function is called Akin, on a class Pick.
The API function is a static function.
The API function is a generic function - although you do not see this.
The input are two sets as well as a "being similiar" function.
The result is returned in three out parameters.
The parameters are actually of type IEnumerable, with an additional comment that the order is not important.

These are only the ones I know that I decided explicitly. You probably see a few more that, for me, were "a given", "granted", "not even worth mentioning." I won't even defend my choices - rather, I will write down reasons why these choices are bad. After all, that's one very important job of an engineer:

Know the problems of what you are going to introduce.

So here are reasons against my choices:

A set-based language (like SQL ...) would a better fit - after all, we had to annotate all parameters with "order is not important", which is a direct contradiction to the basic definition of IEnumerable.
The name PickAkin is a classic function name: "do what" - similar to FindMatching, InitLazily etc. Splitting the externally defined name into class and method is strange.
Static functions have problems with testability, with injection, with state (implicit singletons).
Generic functions add additional comprehension complexity; and need additional tests if the type parameter can also assume struct types.
Passing functions as delegates creates problems like "access to modified closure"; a classical abstract function in an abstract class with an override in the concrete application does not have this problem.
Out parameters need to be defined beforehand in the client code. At the minimum, this requires additional names for the result sets. Instead, the result could be returned in a classical way as the return type - probably in a special result type. Or, it could be returned into ICollection parameters by using Add on them.
Using immutable set types would directly capture the intent of the parameters.

Now it's your turn: You defend my choices (write down the reasons, as I did above). Why would you do that? Because you probably had different ideas - with their own problems. Defending a different design will highlight these problems - as I did above.

Slow going, isn't it, this kata?

If you do this in a dojo session with 2, 3, 4 people, you might have completely deadlocked at this point - because all these assumptions and decisions have to be brought out. Of course, if you are versed dojo-ers, you probably know how to resolve such discussions. Still, I think that trying out e.g. three different designs alone in this simple case will teach you all (the 2, 3, 4 of you) quite a few different viewpoints. Isn't that the idea of a kata, after all? Doing it over and over to find many "right ways"?

Next part: The PickAkin Kata - part two

Sunday, April 3, 2011

Logical Operators in NHibernate 3.x - Some Contribution of Mine

Just so that this blog doesn't dry up, I (allow myself to cross-)post part of a text I wrote about NHibernate 3.x's handling of logical operators. If you want to tune it to the discussion, you might want to read the group http://groups.google.com/group/nhibernate-development.

1. Considerations about the where semantics for an SQL Linq provider

Before we change the behavior of logical operators in NhHib.Linq, we should have some agreement about the intended behavior, i.e., a specification. Otherwise, one might argue that there is no problem with them - especially with the || operator! Unfortunately (and interestingly), it appears to me the answer to this is not as simple as some might think. The following are some general thoughts - bear with me, I'll talk about the logical operators, and especially the || operator, in a moment (namely in sections 2. to 6.).

The apparently correct way to specify an SQL Linq provider's (or any Linq provider's) behavior is:

    (BEHAV-1) If you have a set of objects S and evaluate a query Q using an SQL Linq provider, the result should be the same as when using the standard in-memory Linq provider Linq2Objects.

Obviously, this specification fails for methods that cannot be transformed to the SQL Linq provider's underlying technology (think virtual object methods, for example). So a more practical definition is:

    (BEHAV-2) ...BEHAV-1..., if the SQL Linq provider accepts Q, i.e., does not throw some exception saying "I do not support this and that".

But it gets more tricky: What are the possible results of Linq2Objects when evaluating Q? Essentially
(a) it can return an IQueryable; or
(b) it can throw an exception.

(BEHAV-2) would require that the SQL Linq provider throws an exception when Linq2Objects throws an exception. Here is an example (I suggest that the reader sketches this and the following class/association structures on a sheet of paper):

    Class CA has a nullable reference called B to class CB; and an int property P.
    Class CB likewise has a nullable reference called C to class CC.
    Class CC has an int property Q.
    In the database, we have a CA object na that references a CB where C is null (i.e. the CB does not reference a CC object).

    We ask

        from a in CA
        where a.B.C.Q > 25
        select a.P;

    Linq2Objects will throw a NullReferenceException when it evaluates a.B.C.Q for a = na.
    How could an SQL Linq provider mimic this exceptional behavior? I think we agree this is not simple:

Either we evaluate a.B.C.Q in-memory to force the NRE - which would of course uselessly load CB objects.
Or we create some code in the SQL statement that somehow throws an SqlException ... which would at least mystify the database's planner, I'd say.

AFAIK, every existing SQL Linq provider does something different: We allow an IQueryable result where Linq2Objects throws an exception. This contradicts (BEHAV-2)! A definition amended for this could be:

(BEHAV-3) ...BEHAV-2...; but if Linq2Objects throws a NullReferenceException, the SQL Linq provider may return an arbitrary IQueryable result.

The problem here is the "arbitrary" result: If you give this SQL Linq provider to programmers, they will come to rely on the result even in these cases - e.g., they will expect that that IQueryably result is always empty. If we could make it work so that the result is reasonably random, this could be avoided (as is the case with SQL SELECTs without ORDER BYs in database programming); but this is IMHO as hard as throwing an exception in the first place. So we give programmers consistent results even in these cases - and they will trust them. So we end up with

(BEHAV-4) ...BEHAV-2...; but if Linq2Objects throws a NullReferenceException, the SQL Linq provider may return an defined and reasonable IQueryable result.

And this is, in my opinion, the end of the story: It means that SQL Linq provider writers have to extend the semantics of Linq. The guidelines for this extension will of course be twofold:

The semantics must be reasonable (understandable; akin to known semantics like e.g. SQL).
It must be possible to implement the semantics efficiently.

I think we agree that the second point drives the definition: We implement the semantics as simple as possible; and then we explain it.

This is exactly what we now have to do for the logical operators and NHib.Linq.

------------------------------------------------------------------------------------------------------------------------

2. Logical operators in SQL Linq providers

"Logical operators" is not a term that is defined in Linq. Yet, we need to define which operators in expressions are interpreted as logical operators in HQL. A practical definition is the following:

|| with return type bool - which is mapped to OR
&& with return type bool - which is mapped to AND
! with return type bool - which is mapped to NOT
?: with return type bool - where a ? b : c is mapped to (a AND b) OR (NOT (a) AND c).

One can create Linq expressions where this mapping is wrong, but I will not talk about other ways of defining logical operators.

I will now define semantics for these operators that hopefully are "reasonable" in the sense von (BEHAV-4) above. In the course of this, I'll give a few examples that highlight problems of possible translations from Linq to SQL; and use these examples to decide on "better" semantics.

------------------------------------------------------------------------------------------------------------------------

3. Possible semantics for the || operator in the NHibernate-Linq provider

One possible semantics is obviously:

    (||-0) || is not supported.

We ignore this ridiculous idea.

Another possible semantics is:

    (||-1) If there is an entity member expressions at depth > 1 in the lhs of the ||, the same member expression must occur in the rhs at depth > 1; and vice versa.

What is an "entity member expression"?
    A member expression with a (result) type that is a mapped entity types. Other member expressions (with simple types or component types) are of no relevance to this discussion, except where explicitly noted.

What's the depth of a member expression?
    A member expression E.M in C# has a lhs expression E and a member M defined in E's type.
    When we parenthesize member expressions, the depth is the number of parentheses to the left of the expression.
    For example,
        - a.P is at depth 1 in (a.P);
        - a.B is at depth 2 in ((a.B).P) and ((a.B).C);
        - a.B is at depth 3 in (((a.B).C).Q); a.B.C is of depth 2 in the same expression.
Entity member expressions of depth > 1 are the ones that require a "Join" of some sort - that's the whole idea of the definition. Essentially, it says: If we have to join an association for the left side of the ||, we also must join it for the right side. This allows us to use only inner joins.

With (||-1), we would support

        from a in CA
        where a.B.C.Q > 25 || a.B.C.R < 15
        select a.P;

because

a.B (which is at depth 3 both in a.B.C.Q and a.B.C.R) occurs on both sides
a.B.C (which is at depth 2 on both sides) occurs on both sides

Likewise, we would support

        from a in CA
        where a.B.C != null && a.B.C.Q > 25 || a.B.C.R < 15
        select a.P;

... check out the depths yourself. However, we would not support

        from a in CA
        where a.B.C == null || a.B.C.Q < 15
        select a.P;

Reason: a.B.C is at depth 2 in a.B.C.Q. But a.B.C is at depth 1 on the left side, hence it does not occur at depth > 1 on the left side!
This is certainly not acceptable, as this is a customary idiom for checking values behind nullable references.

Here is another attempt - this time much less formal :-):

    (||-2) All entity member expressions below a || at depth > 1 are inner joined. The query is evaluated on the join result.

This is, as far as I know, the current NHib.Linq semantics. Alas, it is wrong - it does not conform with non-exceptional Linq2Objects behavior in some cases (see the example for NH-2583 in NHibernate Jira).

Next attempt:

    (||-3) All entity member expressions below a || at depth > 1 are left outer joined. The query is evaluated on the join result.

Here is an example which we should consider:

Class CD points to classes CE and CF via nullable properties E and F.
CE and CF each have a Nullable property P.
We set up a single CD object d0 that points to neither a CE nor a CF (so the tables for CE and CF remain empty); and ask

        from d in CD
        where d.E.P == null || d.F.P == null
        select d;

Obviously, this will throw a NRE if evaluated with Linq2Objects. Under (||-3), it returns d0. Is this "reasonable" as required by (BEHAV-4)? We would get the following effect: When we evaluate

        from d in CD
        where d.E.P == null
        select d;

we would not get d0: There is no ||, hence no outer join, but an inner join. But inner joining CD with CE (which is empty!) yields an empty result set. Likewise,

        from d in CD
        where d.F.P == null
        select d;

is empty. Still, the disjunction of the two queries with || yields d0! I say that this is not reasonable: The result size of an || should be less or equal than the sum of the two separate results. Let's call this common-sense rule "or-sum compatibility".

One remedy could be to always outer join classes mentioned via an entity member expression at depth > 1 anywhere in the query - even if there is no || operator:

    (||-4) All entity member expressions at depth > 1 are always left outer joined. The query is evaluated on the join result.

I do not think that this fundamentally wrong ...

... however, I propose another alternative:

    (||-5) Like (||-3); additionally, for each entity member expression M occurring in an boolean expression C at depth > 1, "M != null" is prepended to C with an &&.
        The "M != null" condition is called an "existence guard".

An example to explain:

        ...
        where ...some condition with a.B.C.Q...
        ...

becomes, under (||-5),

        where a.B != null
           && a.B.C != null
           && ...some condition with a.B.C.Q...

The reason for my proposal is that conceptually, a condition on an object (like a.B.C) should only be evaluated if the object exists - hence the term "existence guard". In the example from (||-3) above, i.e.,

        from d in CD
        where d.E.P == null || d.F.P == null
        select d;

the where condition would be interpreted as

where (d.E != null && d.E.P == null)
           || (d.F != null && d.F.P == null)

Hence, under this semantics, the complete query would not return d0 (as d0 has d.E == null and d.F == null), in agreement with the two queries asking for d.E.P == null and d.F.P == null. Thus, this would also yield an or-sum-consistent ("reasonable") behavior.

So, it's still 1:1 between (||-4) and (||-5). But here is another example:

Class CG points to class CH via a nullable property H.
CH has a Nullable property P.
First, we set up say 3 CHs; and ask

    from h in CH
    where ...condition on h...
    select h;

Let us say that this returns 2 of the 3 CHs.
Now, we set up 5 CGs; and let them point to all 3 CHs. Additionally, we set up another CG (let's call it g0) which does not point to a CH (i.e., g0.H == null). Now we ask

    (from g in CG
     where ...condition from above where h is replaced with g.H...
     select g.H)
    .Distinct();

That is, we find all CGs whose H fulfills the same condition as above; and then retrieve those CHs. Obviously, this set should just be same as the original one (as we made sure that the CGs reach all CHs). g0 should be ignored - after all, there is no CH behind it for which the condition can be evaluated and, hence, be true.

Now let us use the concrete condition

    h.P == null

in this setup. With semantics (||-5), we get exactly the expected result. With semantics (||-4), however, the g0 would be extended by null values; and "g0.H.P == null" would therefore be true; and hence, g0.H - which is null - would be returned. Thus, the second query would yield one more value (namely null) than the first query.
I call this problem the "projection anomaly". It is exhibited by designs that use outer joins but do not employ existence guards.

Added later: From a few tests that Patrick Earl (of NHibernate 3.x) and I have done, it appears that Microsoft's Entity Framework exhibits this behavior, i.e., it has (||-4) semantics.
Yes, this is a boundary case setup. But it shows that "nonchalantly including nulls in condition evaluations" can lead to, in my opinion, "non-reasonable" results. It must be noted that not only simple condition like h.P == null, but also complex conditions with ! and ?? operators (NOT and COALESCE in HQL and on the database) can "unexpectedly" return true for a vector of null values.

Because of the foregoing discussion, I want to argue that semantics (||-5) is the best one currently available.

(||-5) has the consequence that for all queries without || (but also ?: and ! - see below), the resulting HQL and SQL can be formulated with only inner joins.

(||-5) also has the consequence that "null reference violations" are a local phenomenon of the sub-expressions of an ||. In other words, if you have a condition like

        d.E.P == null || d.F.P == null

and d.E is null, then the left side evaluates to false, because it is defined to mean d.E != null && d.E.P == null. Hence, the "null reference violation" in d.E remains "local" to the left side. If now d.F exists (is != null) and d.F.P is null, the whole condition evaluates to true. This behavior cannot be "simulated" with Linq2Objects - there, the evaluation would throw a "global NRE" and hence abandon the evaluation. In this sense, (||-5) is a true semantical extension of the Linq2Objects semantics. As all of the preceding discussion should show, I hope that it is a "reasonable" extension in the sense of (BEHAV-4).

------------------------------------------------------------------------------------------------------------------------

4. Possible semantics for all the logical operator in the NHibernate-Linq provider

The discussion of the || operator has provided us with a semantics that interprets the NHib-Linq condition

    ...some condition with paths...

as roughly

    path1 != null && path2 != null && ... && ...some condition with paths...

where path1, path2, ... are outer joined. What is nice about that (||-5) semantics is that it is actually defined independently of the logical operators! So we can simply use it as a semantics for all logical conditions in NHib Linq. Here are a few examples for the other operators:

&& operator:
The NHib Linq query

        from d in CD
        where d.E.P == null && d.F.P == null
        select d;

is interpreted as equivalent to the Linq2Objects query

        from d in CD
        where (d.E != null && d.E.P == null)
           && (d.F != null && d.F.P == null)
        select d;

It is easy to see that outer joins are actually not necessary for the classes reached via E and F. Still, it is not wrong to use them, as the guards d.E != null and d.F != null will filter out joins where no d.E and d.F is reachable. Using inner joins here is a pure optimization, about which I'll talk below in a design section.

! operator:
The ! operator is actually interesting, because here we get deviations from the standard Linq2Objects semantics. Here is a first example of an NHib Linq query:

        from d in CD
        where !(d.E.P == 1)
        select d;

This is interpreted as equivalent to Linq2Objects' query

        from d in CD
        where !(d.E != null && d.E.P == 1)
        select d;

which is, by a simple de'Morgan, the same as Linq2Objects

        from d in CD
        where d.E == null || d.E.P != 1
        select d;

But if we replace !...== in the NHib Linq query with !=, i.e.,

        from d in CD
        where d.E.P != 1
        select d;

we get as equivalent Linq2Objects

        from d in CD
        where d.E != null && d.E.P != 1
        select d;

Thus, these two NHib Linq queries have two different meanings:

!(d.E.P == 1) means "not guaranteed d.E.P equal to 1, i.e., either it is some other value, or d.E is not even there"
d.E.P != 1 means "the existing d.E.P has some value other than 1; but d.E must be there"

One could call this unreasonable - then one would have to find another semantics. I think we can (or even have to) live with this. Remember that it is always possible to use only NHib Linq queries that are also valid Linq2Objects queries (i.e., not throwing an exception): Then, such extended ("funny") meanings do not occur.

In standard (propositional) logic, one can use de'Morgan's rules to rewrite an || with ! and &&. De'Morgan is actually also valid for shortcut operators, i.e. in Linq2Objects. It seems to me that De'Morgan also holds for the (||-5) semantics, but I'm too lazy to prove this here and now (if you try it: Take care that you do not replace !(... == ...) with (... != ...) - this is not equivalent, as we saw above; but this has nothing to do with de'Morgan).

?: operator:
When we interpret (a ? b : c) as (a && b) || (!(a) && c), then we can apply the semantics rules also to boolean conditions with ?: operators.

The discussion about the "right" (extended) semantics for NHibernate 3.x has just started in http://groups.google.com/group/nhibernate-development.