Marco Valtas

Software Philosophy - Testing is about induction

Testing software has some constraints, still is the best approach nowadays. Others have argued on mathematical proving of software, but this is costly and very hard to achieve. Yet, software quality is important. To understand why tests can't guarantee that your software is "bug free" but still important let's look at them from a philosophical point of view.

Inductive reasoning is a type of argumentation, the difference with deductive reasoning is in the former is possible for all premises be true and the conclusion false. This doesn't happen with deductive reasoning.

Inductive reasoning can be Strong or Weak, here some simple examples:

A strong induction:

Every day the Sun rises.
Therefore the Sun will rise tomorrow.

A weak induction:

Every Friday morning a see John walking his dog.
Therefore tomorrow I'll see John walking his dog.

You can see looking at these arguments that there's a believe playing a role here. Both arguments have premises that are true, nevertheless something could happen and make the conclusion false. Nothing prohibits that John's get sick and not walk the dog tomorrow or the Sun could just explode and disappear and will not rise anymore. The likeness of these outcomes is making them strong or weak. How this relate to testing? Let's look on a simple example, first a simple class called Person and this class knows how to abbreviate it's own name:

package com.marcovaltas.phi.testing;

public class Person {

private final String name;

public Person(String name) {
    if(name == null || name.isEmpty())
        throw new IllegalArgumentException("Person can't have a null name.");
    this.name = name;
}

public String abbreviateName() {
    String[] parts = this.name.split(" ");
    StringBuilder abbr = new StringBuilder();
    abbr.append(parts[parts.length - 1] + ", ");
    for(int i = 0; i < parts.length - 1; i++)
        abbr.append(parts[i].substring(0,1).toUpperCase());
    return abbr.toString();
}

}

In this case, a Person has to have a name, it cannot be created without one, and can abbreviate it's own name. Now the tests:

package com.marcovaltas.phi.testing;

import static org.junit.Assert.*;
import org.junit.Test;

public class PersonTest {
    @Test(expected=IllegalArgumentException.class)
    public void personCannotBeCreatedWithNullName() {
        new Person(null);
    }
    @Test(expected=IllegalArgumentException.class)
    public void personCannotBeCreatedWithEmptyName() {
        new Person("");
    }
    @Test
    public void personKnowsHowToAbbreviateName() {
        assertEquals("Cunha, MAV", new Person("Marco Aurelio Valtas Cunha").abbreviateName());
        assertEquals("Moreira, RC", new Person("Raquel Capistrano Moreira").abbreviateName());
    }
}

What role is tests playing in here? Well, they're given us reasons to believe that Person behaves like we need to. In fact, we know that will be not possible to test every name in the world, but we try to make enough cases that will make more likely that the abbreviation algorithm is right.

Induction, like Hume wrote, bases itself on the "Principle of Uniformity of Nature", or more loosely, that some things tend to behave like they did in the past (like the laws of physics), but still induction has problems.

In the code above, despite the tests, we are assuming that the Java language will behave like did before and that the standard libraries will too. But nothing prohibits that there's a bug on the String class or even JUnit could have bugs. The "Principle of Uniformity" here is all environment we're using that we can't control, we're assuming that it will be correct, which sometimes is not the case.

As the systems get bigger and complex more and more our tests are playing the role of making us believe that the system is correct. A code without tests seems like a conclusion without premises, it states something, but without much reasons for us to believe that it's correct.

For the question of "How many tests should we write?", there's no right answer other than "Enough to make it a strong induction that the software is correct." To make a inductive argument strong you basically need:

A reasonable sample.
A unbiased sample.
The conclusion should be relevant to the premises.

The first two are fairly clear, everyone that has some knowledge about statistics knows that sampling is an important part of a good research. The last one is more subtle, the linkage from the premises to the conclusion is very important too, there's a old joke about "The Sun rises everyday, therefore I will fly tomorrow.", very unlikely despite the premise.

In software testing the linkage is given by the use of the code that is tested. If the tests try to validate a class but doesn't use the class in any way, probably we have a weak link. I believe that coverage metrics measures this linkage. For reasonable sampling, we need to provide enough possible inputs, values and situations. If we're making an abbreviation algorithm maybe is a good idea to grab some names on a public list. Lastly, the unbiased sample. Tests can be written by the same programmer who wrote the code, but is possible that the knowledge of how the code was written will bias the programmer, pairing and QAs can mitigate this problem.

So, is not a question of "How many tests?" and, unless you have total control over all variables in your environment, is not possible to guarantee that your code is "bug free", specially in complex systems. Still, testing is the better way to gain the confidence that your software is correct, and like another things in programming, there's no final answer.

Published in Jun 17, 2010