At the opening of this series I wrote about how a correct implementation of equality is essential for the correct behaviour of many fundamental .NET types - including List, HashSet and Dictionary. Here’s an example to show how they can break.

Consider this test code:

[Fact]
public void SetContains_WhenGivenItemFromSet_ReturnsTrue()
{
    var voters = new HashSet<Voter>(GenerateVoters(1000));

    foreach (var v in voters)
    {
        var found = voters.Contains(v);
        Assert.True(found);
    }
}

This test generates a large list of Voter objects, puts them all into a HashSet and then checks that all of the items in the set are really in the set.

Unsurprisingly, this test passes.

Let’s change the test slightly, by modifying one of the objects in the set after the set has been populated:

[Fact]
public void SetContains_WhenGivenItemFromSet_ReturnsTrue()
{
    var voters = new HashSet<Voter>(GenerateVoters(1000));

    // Modify the vote of one voter that's already in the set
    var index = _random.Next(voters.Count);
    var voter = voters.ElementAt(index);
    voter.Vote = "Chartreuse";

    foreach (var v in voters)
    {
        var found = voters.Contains(v);
        Assert.True(found);
    }
}

Now, the test fails. There’s something in the set that isn’t in the set. Oh dear.

What went wrong?

It’s important to note that this is not a bug with HashSet<T>. Let me repeat that. It’s not a bug with HashCode<T>.

Instead, the problem is due to Voter not correctly implementing the contract for equality and hash codes. Let’s look at the code.

public class Voter : IEquatable<Voter>
{
    public Guid PersonId { get; }
    public string Vote { get; set; }

    public Voter(Guid personId, string vote)
    {
        PersonId = personId;
        Vote = vote;
    }

    public bool Equals(Voter other)
        => other != null
            && PersonId.Equals(other.PersonId)
            && string.Equals(Vote, other.Vote, StringComparison.OrdinalIgnoreCase);

    public override bool Equals(object obj)
        => obj is Voter v && Equals(v);

    public override int GetHashCode()
        => PersonId.GetHashCode() * 7
            ^ Vote.GetHashCode() * 13;
}

Look at the implementations of .Equals(Voter) and .GetHashCode(). Both methods use PersonId and use Vote. When the code in the second test modifies the vote of one of the voters, it changes the result of .GetHashCode() and now the HashSet can’t find the value.

In this case, we find that our Voter class doesn’t really know what it is. If it was a proper entity type, we’d use only the PersonId when evaluating .Equals(Voter) and .GetHashCode(). If it was a proper value type, Vote would be read-only and changing the value wouldn’t be possible.

Such a simple demo makes the problem easy to see. In real-world codebases that solve real business problems, the problem can exist without being anywhere this easy to observe.

About this series

Why does the implementation of Equality matter in .NET and how do you do it right?

Posts in this series

Why is Equality important in .NET?
Types of Equality
Equality has Symmetry
Equality and GetHashCode
Implementing Entity Equality
Implementing Value Equality
Types behaving badly
Prior post in this series:
Implementing Value Equality

Comments

blog comments powered by Disqus