SQL Testing, say what?!

So, it's being a couple weeks that I'm working with database only stories and one of the things my pair and I did was test ours SQLs. One could thing that this is not possible or necessary, but I will argue that it is possible and helps a lot.

In complex systems, databases can grow in complexity with hundred of tables and your SQL will grow in lines too, trying to fulfill all requirements and not loose control of what is happening inside it could be a challenge. As a good developer you start to wonder if some tests could help you on the way. But if you want write tests for SQL how they would be like?

Imagine you have to fetch some data of a table employees where the criteria is that they should be hired before (not including) 1997 and have a salary lower than 6000, so you come up with this SQL:

select first_name,employee_id from employees
where hire_date < date '1997-01-01' and salary < 6000;

This code is pretty simple and returns 7 rows (I'm the HR example schema of Oracle), but if returned a couple of thousands you can't eye inspect the results. But first, why test this simple SQL and this will not be the same of testing Oracle itself?

No, the primary concern that we want to test is not that if Oracle can honor the where clause, is just that we coded the right where in the first place. For more complex queries we can have inner selects, complex joins and such.

The way we can do this test in particular is thinking about set operations, one way to think is "We require all employees hired before 1997, so there will not be an intersection between employees who were hired after this date.". Let's see what the SQL looks like:

select first_name,employee_id 
  from employees where hire_date >= date '1997-01-01'

intersect

select first_name,employee_id 
  from employees where hire_date < date '1997-01-01' and salary < 6000

In this case using the intersect operator we assume that there's no common element between the two sets, resulting in an empty set and that is our 'OK!'. Now other test for the salary criteria:

select first_name,employee_id 
  from employees where salary >= 6000

intersect

select first_name,employee_id 
  from employees where hire_date < date '1997-01-01' and salary <= 6000

Note that this test will find an error (a row in this case), the second SQL was written with <= 6000 instead of < 6000, a mistyping that can easily happen in a complex query. In this particular query all combinations of <= or < or > or >= in the salary criteria will give you different results.

One of the benefits of doing this testing is just ensuring through simple queries the correctness of your solution. Other scenarios may include inserts, deletes, updates, staging tables, procedures and functions. In our case we had a multi SQLs with a reference table for a lot of updates. In every assumption made we thought in a way to write a simple query to test that assumption, the production code had all assumptions but the tests had only one per test. The tests gave us confidence that we were in the right direction when writing our big SQLs. These tests weren't automated nor we did TDD to come up with the big SQL, but it is possible.

One example for TDD is that you could have a temporary table to store intermediate results, start with a simple query written against it, the test will fail so you write the SQL to put data on the table, test pass, write the second test and so on.

What I want make sure you understand is that we're not trying to test the SQL interpreter or the database engine itself, that would be unnecessary. We're trying to make sure that what we wrote (which can turn out to be a big nasty SQL) is still complying with the requirements. If we need to optimize the SQL, split it or make a multi-pass solution the tests can still guide us, and if you ever tried to understand a big and complex query you know that sometimes this can get hard to do it without any test.

Published in Feb 14, 2011