Why we need better validation

The state of validation in C# leaves something to be desired. Recently I was writing some code in the same old classic styles and the frustration got to me - surely there must be a better approach that doesn’t involve great lists of strings, the use of exceptions for flow control, and brittle unit tests.

To illustrate, assume we have this simple class that we want to validate:

public class Person
{
    public string FullName { get; set; }
    public string FamilyName { get; set; }
    public string KnownAs { get; set; }
    public string SortKey { get; set; }
}

If all four of these properties are mandatory, how would you do that validation?

One common approach is to write a method that throws an exception if anything is awry, like this:

public void Validate()
{
    if (string.IsNullOrEmpty(FullName))
    {
        throw new InvalidDataException(
            "Mandatory property 'FullName' not supplied.");
    }

    if (string.IsNullOrEmpty(FamilyName))
    {
        throw new InvalidDataException(
            "Mandatory property 'FamilyName' not supplied.");
    }

    if (string.IsNullOrEmpty(KnownAs))
    {
        throw new InvalidDataException(
            "Mandatory property 'KnownAs' not supplied.");
    }

    if (string.IsNullOrEmpty(SortKey))
    {
        throw new InvalidDataException(
            "Mandatory property 'SortKey' not supplied.");
    }
}

There are several problems with this approach.

Using exceptions for flow control;
Only ever get a single message;
Unit testing is difficult;
Violates the single responsibility principle.

Our systems should be validating all data as it comes into the system - and in many cases we should also be validating things as they cross component boundaries as well. Having information fail validation isn’t exceptional, it’s damn right common - so it’s an ill fit to handle it by firing an exception. This is part of the reason that use of exceptions for flow control is widely considered to be very poor practice.

When things do fail validation, we are only getting a single message back, telling us only about the first problem we encounter. Imagine if we took our car in for a Warrant of Fitness and it failed because the right-rear taillight didn’t work. So we drive down to our local auto store, buy a new lightbulb to fix the taillight, then take it back for a recheck - only to be told that we’ve now failed the WOF because the other taillight doesn’t work. We fix that, go back for another recheck, and now we fail because our tires are worn! I’d be livid if this happened to me - and this illustrates why validation should always return a comprehensive list of issues.

Writing unit tests to ensure that validation is working correctly is pretty important - these are the checks that protect the rest of our system from dodgy, potentially malicious, inputs. Yet testing of validation written in this style is very difficult because we need to craft our test data so that it passes all the checks prior to the one that we need to test. For example, to properly test validation of the SortKey property, we first need to craft an instance of Person that passes all of the prior tests for FullName, FamilyName and KnownAs. If a future developer reorders the validation checks (perhaps to complete all the fast/cheap tests before commiting to expensive service calls), many of the tests would break. Having a test suite this brittle is far from optimal.

All of the property checks for Person are conflated together into a single validation check. For a simple class like this one, that might be acceptable - but when the validation becomes more complicated, you can easily end up with a very large method that does a lot of different checks. I’ve seen code like this that clearly grew organically over time - in one case a method that was several hundred lines long that did some of the same checks multiple times, in different ways, presumably because a later developer just added rules a the end of the method without checking to see if they were already there!

We can avoid the use of exceptions for flow control by reworking the validation code:

public bool Validate(out string message)
{
    if (string.IsNullOrEmpty(FullName))
    {
        message = "Mandatory property 'FullName' not supplied.";
        return false;
    }

    if (string.IsNullOrEmpty(FamilyName))
    {
        message = "Mandatory property 'FamilyName' not supplied.";
        return false;
    }

    if (string.IsNullOrEmpty(KnownAs))
    {
        message = "Mandatory property 'KnownAs' not supplied.";
        return false;
    }

    if (string.IsNullOrEmpty(SortKey))
    {
        message = "Mandatory property 'SortKey' not supplied.";
        return false;
    }

    message = string.Empty;
    return true;
}

This is somewhat better, but still suffers from the other problems described above.

We can enable robust unit testing for each distinct property by separating validation into separate methods. We can also return multiple validation results by using IEnumerable<string> as our return type, giving us code like this:

public IEnumerable<string> Validate()
{
    var result = new List<string>();

    result.AddRange(ValidateFullName());
    result.AddRange(ValidateFamilyName());
    result.AddRange(ValidateKnownAs());
    result.AddRange(ValidateSortKey());

    return result;
}

private IEnumerable<string> ValidateFullName()
{
    if (string.IsNullOrEmpty(FullName))
    {
        yield return "Mandatory property 'FullName' not supplied.";
    }
}

private IEnumerable<string> ValidateFamilyName()
{
    if (string.IsNullOrEmpty(FamilyName))
    {
        yield return "Mandatory property 'FamilyName' not supplied.";
    }
}

private IEnumerable<string> ValidateKnownAs()
{
    if (string.IsNullOrEmpty(KnownAs))
    {
        yield return "Mandatory property 'KnownAs' not supplied.";
    }
}

private IEnumerable<string> ValidateSortKey()
{
    if (string.IsNullOrEmpty(SortKey))
    {
        yield return "Mandatory property 'SortKey' not supplied.";
    }
}

But. by addressing our initial concerns, we’ve introduced a number of other problems.

The amount of boilerplate code in this style is very high - each of the individual property validation methods is very much like all of the others; it would be very easy to make a simple mistake when writing these - all it would take would be a copy/paste error where the property being checked isn’t updated, or the message isn’t modified. In the top level Validate method, the overhead of joining all the individual property checks together completely swamps the code.

We’re also pushing an increased cognative load on our consumers - they need to understand that an empty return sequence implies a lack of problems. While this is easily discerned, it’s not completely obvious - it’s something else that our consumers need to learn.

There must be a better way.

With a little bit of work, we can push all of the boilerplate to the side, allowing us to focus on the actual validation that’s required - for example, the individual methods for checking each property might look like this:

private ValidationResult ValidateFullName()
    => Validation.ErrorWhen(
        string.IsNullOrEmpty(FullName),
        "Mandatory property 'FullName' not supplied.");

private ValidationResult ValidateFamilyName()
    => Validation.ErrorWhen(string.IsNullOrEmpty(FamilyName),
        "Mandatory property 'FamilyName' not supplied.");

private ValidationResult ValidateKnownAs()
    => Validation.ErrorWhen(
        string.IsNullOrEmpty(KnownAs),
        "Mandatory property 'KnownAs' not supplied.");

private ValidationResult ValidateSortKey()
    => Validation.ErrorWhen(
        string.IsNullOrEmpty(SortKey),
        "Mandatory property 'SortKey' not supplied.");

Even though you haven’t yet seen any of the implementation, I’m hoping that the intent of this code is fairly straightforward to discern.

For each of the individual properties, we have a single declarative check ErrorWhen() that returns an error if the property hasn’t been supplied. Each individual method is independently unit testable.

What do you think of this style of validation? How would you implement a simple library to provide this API in a clean and easy to understand fashion?

Instance validation in C#
Why we need better validation

About this series

Posts in this series

Comments

Instance validation in C#Why we need better validation

About this series

Posts in this series

Comments

Instance validation in C#
Why we need better validation