Following on from our discussion on extension methods, another technique you can use when eliminating the dumping ground of your utility class is the extraction of buried semantic types. This is possible when you find a set of closely related methods with linked semantics.

Imagine you find that your 15k line utility class contains all of the following methods:

// Check that a series id is valid, throwing an exception if it is not
public static void ValidateSeriesId(string seriesId) { ... }

// Compare two series ids for sorting
public static int CompareSeriesIds(string left, string right) { ... }

// Test to see if two series ids are equal
public static bool EqualSeriesIds(string left, string right) { ... }

// Convert a series id into canonical form
public static string CanonicalId(string seriesId) { ... }

You can probably see the common theme across these already when you see them grouped together like this. In the real world, when the methods are interspersed with all the other utility methods, separated by hundreds or thousands of lines of other code in between, spotting the theme is far more challenging.

Instead of this group of methods that all treat a string as a seriesId, consider creating a genuine semantic type like this:

public struct SeriesId : IEquatable<SeriesId>, IComparable<SeriesId>
    // The actual series id we wrap
    private readonly string _id;

    // Create a new series id, throwing an exception if the id is invalid
    public SeriesId(string id) { ... }

    // Test to see if this series id equals another
    public bool Equals(SeriesId other) { ... }

    // Compare with another series id for sorting
    public int CompareTo(SeriesId other) { ... }

    // Implement equality 
    public override bool Equals(object obj) { ... }

    // Get a hashcode based on our id
    public override int GetHashCode() { ... }

    // Return a string value for output
    public override string ToString { ... }

    // Convert our string id to our canonical form, used during construction
    private string ToCanonicalForm(string id) { ... }

This semantic type wraps all the functionality of the previous methods, but in a way that interoperates in a rich way with the rest of the .NET ecosystem.

Introducing such a semantic type has a number of strong benefits.

  • Validation is enforced by the constructor, so you can’t ever end up with an invalid series id. You don’t need to rely on the string being independently validated by a call to ValidateSeriesId at every possible entry point of the system.

  • Every series id is forced into canonical form automatically by the constructor. You don’t have to worry that a new API entry point might bypass conversion into canonical form and allow malformed data into the system.

  • Testing for equality is baked in, ensuring that the entire system does things in the same way. You won’t have one code path correctly using EqualSeriesIds and another using String.Equals() instead.

  • Similarly, ordering is baked in, ensuring that all sorted lists of series ids are sorted the same way.

  • It’s no longer possible to confuse a series id with another kind of identifier - while you can pass any string value to any string argument, you can’t pass a string value to a SeriesId argument, nor can you pass a SeriesId to a string argument. (For this to work, you need to resist the temptation to allow implicit conversion from a string to a SeriesId or back again.)

Adopting a semantic type like this doesn’t have to be a big bang change that happens all at once. Instead, pick a boundary of your system and start using the semantic type there, passing into the core of your system as far as makes sense for the current release. Next release, make further use of the type.

Starting at the edge of the system like this ensures the validation and verification you get from the semantic type is applied as early in the flow of operation as possible. This can protect other parts of your system from dodgy data.

When I’ve done this in the past, it has always flushed out lurking problems in the system - places where invalid data was being admitted into the system, or where there were inconsistencies in the way information was being sorted or compared.


blog comments powered by Disqus