As anyone who has seen my presentation Becoming a Better Developer will know, the anti-pattern primitive-obsession describes a practice that encourages the proliferation of bugs. The best way to counter this problem is to introduce semantic types.

Existing Examples

The .NET framework itself contains a number of semantic types that we use all the time without really considering the benefits - two are FileInfo and DateTime.

The FileInfo class captures information about a file. If the file exists, we can access information about that file, such as its location and size. But, we’re not required to only use FileInfo when the file already exists - we can construct a new FileInfo to represent a file we are going to create, or even a file we’re looking to find.

Similarly, the DateTime class captures information about a particular instant in time. The designers of the .NET framework didn’t take the approach of Excel, and just use a double to represent a timestamp - perhaps they realised the flaws in that approach.

Why use Semantic Types?

The key with a semantic type is to isolate values of one type, to specifically define that they are a unique thing unto themselves.

Why would you do this?

Consider this example from my presentation:

public void SaveAsHtml(string destination)
{
    // ...	
}

By using a parameter of type string, this method becomes somewhat ill defined. The obvious problem is that you pass any kind of string - a username, phone number, email address or ISBN. More subtly, what kind of destination is expected? Sure, it might support a file path - but what happens if you supply a file:// or http:// URL?

Being explicit about the required type makes it clear that the method is expecting a file path:

public void SaveAsHtml(FileInfo destination)
{
    // ...	
}

It also allows for an overload to add explicit URL support:

public void SaveAsHtml(URI destination)
{
    // ...	
}

Defining your own

Writing your own semantic types has never been particularly difficult, but the new language features in C# 6 make this particularly easy.

Consider a software system where every organisation has a unique identifying code, known as the Organisation Code that is used throughout. So, the “Toyota Motor Corporation” has the code “TOYOTA”.

A very simple class to represent this code might look like this:

public sealed class OrganisationCode
{
    public string Code { get; }
        
    public OrganisationCode(string code)
    {
        Code = code;
    }
}

Note that this is an immutable class using the new syntax for a readonly auto property.

While this is a good start, it’s not sufficient. To interact correctly with other classes in the .NET framework such as List<T> and HashSet<T> our semantic class needs to implement
Equals() and GetHashCode(). Similarly, for it to work with String.Format it needs to implement ToString().

public override string ToString()
{
    return "OrganisationCode:" + Code;
}

public override bool Equals(object instance)
{
    if (instance == null
        || instance.GetType() != typeof(OrganisationCode))
    {
        return false;
    }

    var other = (OrganisationCode)instance;
    return string.Equals(
        Code, 
        other.Code, 
        StringComparison.InvariantCultureIgnoreCase);
}

public override int GetHashCode()
{
    return Code.ToLowerInvariant().GetHashCode();
}

What else do we need?

Protecting our system from invalid data is important. Defense in depth is a worthwhile approach, one that can make our system resiliant in the face of attack, even when tainted data makes it past our surface layers.

Let’s add a method we can use to check the validity of a code, and then use that in our constructor to ensure that we never wrap an invalid code.

public static bool IsValidCode(string code)
{
    // ...
}

public OrganisationCode(string code)
{
    if (!IsValidCode(code))
    {
        throw new ArgumentException(
            "Supplied organisation code does not meet business rules", 
            nameof(code));
    }

    Code = code;
}

Here we have a well functioning .NET class, one that meets our requirements. We can now use the class as a parameter and rely on the compiler to ensure that we don’t pass the wrong thing:

public Organisation FindByCode(OrganisationCode code)
{
    // ...
}

public void VeryImportantMethod(string code)
{
    if (!OrganisationCode.IsValidCode(code))
    {
        // Validation failed
        return;
    }

    var organisation = FindByCode(new OrganisationCode(code));
    if (organisation == null)
    {
        // Not found
        return;
    }

    // Do something useful and important
}

However, that line including new looks a bit cumbersome. Fortunately, we can do better by introducing support for a couple of typecasts.

public static explicit operator OrganisationCode(string code)
{
    return new OrganisationCode(code);
}

public static explicit operator string (OrganisationCode code)
{
    return code.Code;
}

With these in place, converting a string into an OrganisationCode looks like this:

var organisation = FindByCode((OrganisationCode)code);
if (organisation == null)
{
    // Not found
    return;
}

Questions and Answers

Isn’t this a lot of work?

While it’s a non-trivial amount of code to write, I’d suggest that the clarity semantic classes introduce through documentation, coupled with the bug elimination you gain from the enforcement of strong typing works out to a net positive.

What about implicit conversions?

If you modify the explicit conversions and make them implicit, you get rid of the typecasts - and you also defeat the enforcement of strong types, reopening the door to easy mistakes where the wrong value is passed as an argument.

Wouldn’t a struct perform better?

The case can be made that using a struct for small semantic types instead of a class would be a good idea because it lessens the number of heap allocations, and therefore reduces the amount of work requred of the garbage collector. In most cases, I suspect the actual performance impact would be nigh on immeasurable, with any differences swamped by other effects. The key, as always with performance issues, is to rely on objective performance measures, not subjective handwavium. If you think a struct would perform better, measure the results and find out for sure.

Updated 28/9 - fixed a bug in the implementation of GetHashCode()

Comments

blog comments powered by Disqus