As anyone who has seen my presentation Becoming a Better Developer will know, the anti-pattern primitive-obsession describes a practice that encourages the proliferation of bugs. The best way to counter this problem is to introduce semantic types.
Existing Examples
The .NET framework itself contains a number of semantic types that we use all the time without
really considering the benefits - two are
FileInfo
and DateTime
.
The FileInfo
class captures
information about a file. If the file exists, we can access information about that file, such as
its location and size. But, we’re not required to only use FileInfo
when the file already
exists - we can construct a new FileInfo to represent a file we are going to create, or even a file
we’re looking to find.
Similarly, the DateTime
class
captures information about a particular instant in time. The designers of the .NET framework didn’t
take the approach of Excel, and just use a double to represent a timestamp - perhaps they realised
the flaws in that approach.
Why use Semantic Types?
The key with a semantic type is to isolate values of one type, to specifically define that they are a unique thing unto themselves.
Why would you do this?
Consider this example from my presentation:
By using a parameter of type string
, this method becomes somewhat ill defined. The obvious
problem is that you pass any kind of string - a username, phone number, email address or ISBN. More
subtly, what kind of destination is expected? Sure, it might support a file path - but what happens
if you supply a file://
or http://
URL
?
Being explicit about the required type makes it clear that the method is expecting a file path:
It also allows for an overload to add explicit URL support:
Defining your own
Writing your own semantic types has never been particularly difficult, but the new language features in C# 6 make this particularly easy.
Consider a software system where every organisation has a unique identifying code, known as the Organisation Code that is used throughout. So, the “Toyota Motor Corporation” has the code “TOYOTA”.
A very simple class to represent this code might look like this:
Note that this is an immutable class using the new syntax for a readonly auto property.
While this is a good start, it’s not sufficient. To interact correctly with other classes in the
.NET framework such as
List<T>
and HashSet<T>
our semantic class needs to implement
Equals()
and
GetHashCode()
.
Similarly, for it to work with
String.Format
it needs to implement
ToString()
.
What else do we need?
Protecting our system from invalid data is important. Defense in depth is a worthwhile approach, one that can make our system resiliant in the face of attack, even when tainted data makes it past our surface layers.
Let’s add a method we can use to check the validity of a code, and then use that in our constructor to ensure that we never wrap an invalid code.
Here we have a well functioning .NET class, one that meets our requirements. We can now use the class as a parameter and rely on the compiler to ensure that we don’t pass the wrong thing:
However, that line including new looks a bit cumbersome. Fortunately, we can do better by introducing support for a couple of typecasts.
With these in place, converting a string into an OrganisationCode
looks like this:
Questions and Answers
Isn’t this a lot of work?
While it’s a non-trivial amount of code to write, I’d suggest that the clarity semantic classes introduce through documentation, coupled with the bug elimination you gain from the enforcement of strong typing works out to a net positive.
What about implicit conversions?
If you modify the explicit conversions and make them implicit, you get rid of the typecasts - and you also defeat the enforcement of strong types, reopening the door to easy mistakes where the wrong value is passed as an argument.
Wouldn’t a struct perform better?
The case can be made that using a struct for small semantic types instead of a class would be a good idea because it lessens the number of heap allocations, and therefore reduces the amount of work requred of the garbage collector. In most cases, I suspect the actual performance impact would be nigh on immeasurable, with any differences swamped by other effects. The key, as always with performance issues, is to rely on objective performance measures, not subjective handwavium. If you think a struct would perform better, measure the results and find out for sure.
Updated 28/9 - fixed a bug in the implementation of GetHashCode()
Comments
blog comments powered by Disqus