Given some of my recent blog posts, such as “Boolean Parameters are Evil” and “Multiple Boolean Parameters are Really Evil, it might seem that I have some kind of personal vendetta against booleans.

That’s not actually the case - boolean values are critical to good software development. It’s just that they are often overused in ways that make code harder to understand and maintain.

Here’s an example - the simple List.Contains() method.

“What?” you say - that’s built into the core framework and has been since the earliest days of .NET. How could that be a problem?

The problem is less with the Contains() method itself and more with the way it is commonly used. Consider this fragment of code that I’m sure you’ve written yourself:

IList<Item> myList = ...;
if (!myList.Contains(item)) 
{
    myList.Add(item);
}

Such simple code, but it harbors some real issues.

Semantics

The code wants to have a bunch of items without any duplicates. That’s not a list, that’s a set.

By definition a set may not contain duplicates, so our sample code would simplify dramatically:

ISet<Item> mySet = ...;
mySet.Add(item);

We’ve had ISet<T> and HashSet<T> in the .NET Framework since the release of .NET 4.0 and Visual Studio 2010, so there’s not really any excuse for not knowing about them.

The biggest impediment to adoption has to be the need to interoperate with existing code and libraries - for example, if you’re still using an older ORM that doesn’t support sets, you may be forced into some use of lists.

Performance

Testing a list for membership is a linear [O(n)] operation; put it into a loop and you’ve got an O(n^2) behavior, or worse.

Switching to a set changes things - you’ll need to test for membership less often, and for most implementations, a membership test is constant [O(1)] time.

Another way to improve performance, assuming you cannot switch to a set for other reasons, is to accumulate a list with all the duplicates and then using a call to Distinct() to reduce the list down.

IList<Item> tempList = ...;
tempList.Add(item);
// ...
var myList = tempList.Distinct();

Did you know that the default LINQ implementation of Distinct() uses a set internally to do the filtering?

Concurrency

If the list is accessible across multiple threads (and almost all modern code needs to be thread safe these days), there’s a possibility of some other thread adding the item to the list in between your membership test and the Add() call.

I have seen list implementations with AddUnique() or AddIfMissing() methods to give the list some set-like semantics; I’m not really sure if I’m a fan of this approach or not.

Of course, switching to a concurrent ISet<T> implementation avoids this race condition by reducing the code to a simple call.

How is this the fault of Contains()?

The Contains() method is a key enabler of a common, though flawed, programming idiom. Sure, the method has valid uses - I’m not arguing that it should be removed.

However any time you catch yourself using Contains() you should take a step backward to assess whether this is a code smell telling you that you’re using the wrong type.

More broadly, I’ve found that methods returning a bool are often a good place to look for code smells. Next time you’re tempted to write a method returning bool, think about whether a better design might make things easier for your consumers.

Comments

blog comments powered by Disqus