As discussed last time, the value of a unit test comes when it fails, signaling that some necessary characteristic of the system is no longer present.
It’s important that we maintain that value: A test failure must indicate something needs fixing.
Have you ever found yourself running a test suite a second time, intending to dismiss the failure as transient if it works? I’ve done that; most of us have.
An unreliable test - one that sometimes fails in a transient fashion - is worse than useless. Not only will it waste developer time investigateing issues that aren’t actually problems, but it will encourage people to dismiss other test failures as transient, negating much of the effect of the test suite as a safety net.
The harsh truth is that there is no level of acceptable transient error.
The subtlety here is that the causes of transient errors are many and varied, and often very difficult to find. Here are some useful questions to ask.
Are my tests fully independent?
Good tests are completely independent of each other. Each test completely defines the required preconditions, carries out exactly one action, and fully checks the expected postconditions.
If you have a test (or many tests) that can’t be run in isolation, but only as a part of a full test suite, you’ve already got a problem. The time to start fixing it is now.
Are my tests properly resourced?
In an ideal world, tests would never be reliant on the passage of wall-clock time. Unfortunately, this isn’t always the case - and we’re left with tests that work perfectly when we run them in develop, but which fail (even if only rarely) outside reasons when being run automatically.
I saw this with a test suite that would only fail if it ran at 2am in the morning. After a lot of invesigative work, we discovered that the host virtual machine was suspended for several seconds when a snapshot was taken for backup and that was enough to break the tests.
Are my tests actually unit tests?
Is the test actually a fully isolated unit test or has a dependency crept in over time?
A friend told me about an interesting case where a set of hard coded categories was used by a particular set of unit tests. When these categories were moved from code into data, these unit tests started quietly loading their categories from the database. They ran just fine (if a little slowly), except when certain other tests modified the database at just the wrong time.
The speed of your tests is an important metric to monitor - if tests suddenly start running slowly, or if they never ran very quickly to start with, investigate to see how you can make them run faster.