Thursday, August 21, 2014

My gripe with "The Rule of Three"

In software development, good developers will bend over backwards to make their code as maintainable as possible. We recognize that a given piece of code will be read many more times than it is written. It may take some time to craft code that's easy to read and change, but you'll save many times as much time in the long run.

One of the most recognized tricks to writing maintainable code is avoiding duplication of code. If you've got two chunks of code that look almost exactly the same, that's a "code smell"--an indication that something is wrong. This doesn't always mean you need to refactor the code to eliminate the duplication, but you should at least spend a minute thinking about whether there's an appropriate abstraction you can use that would make the code easier to understand. In almost every case where I've seen two or more lines of almost-identical code, I've found that removing the duplication yields code that is easier to understand.

Mark Seeman recently blogged about some cases where removing duplicate code can be harmful. He correctly pointed out that sometimes apparent duplication is actually coincidental--there's not really an appropriate abstraction for it, and you'll just make your code more confusing and less cohesive by forcing an abstraction on it. He also argued that a developer may choose the wrong abstraction because he lacks enough use cases to know which technique is going to best accommodate future needs. So once a future need arises, the developer has to refactor again, increasing the maintenance costs.

He says:
This is, in my experience, the most important reason to follow the Rule of Three: wait, until you have more facts available to you. You don't have to take the rule literally either. You can wait until you have four, five, or six examples of the duplication, if the rate of change is low.
The Rule of Three basically says that it's okay to have duplication in two pieces of code, but once you have three pieces of duplicate code then you need to refactor. I feel that advocates of the Rule of Three are missing some important realities about modern software development, though.

Reality #1: Preventive refactoring is cheap

If I see some lines of code that I'd like to mostly duplicate, I can use the refactoring tools that are either built into my IDE or come with a plugin like Resharper. With a few keyboard shortcuts, I:

  1. Extract the values that are going to need to change into variables (select an expression, then "extract variable").
  2. Move the variable declarations above the lines of code that I want to duplicate. 
  3. "Extract method" on the lines of code I need to duplicate.
  4. Go back to the method call and "inline" all the variables that I created in step 1.
It doesn't typically take more than a minute to move the extracted method into a different class, if necessary, either. And all of these changes are practically guaranteed not to change the behavior of the code (as long as step 2 doesn't change order of operations in an important way). So the cost of avoiding duplication in the first place is actually pretty minimal.

Likewise, if another developer comes along and realizes that I chose the wrong form of abstraction for my method, and needs to refactor it to accommodate a different use case, he can use similar tricks in his refactoring. The worst case scenario is that he's forced to inline the method call with a quick keyboard shortcut and start over with exactly the same effort he might have made if you'd followed the Rule of Three and copy/pasted the code in the first place.

I say might because without the extracted method there's no guarantee that the next developer would have noticed the need to refactor in the first place, which leads me to my next point.

Reality #2: Duplicate code is usually only obvious to the person duplicating it

Let's say Alice writes some code, and then Bob realizes he needs to do something very similar. Following the Rule of Three, he copy/pastes a chunk of Alice's code, and then makes a few tweaks to fit his scenario.

Now Charlie needs to do something similar to what Alice or Bob did. What are the chances that he'll be aware of both places that contain duplicate code? Honestly, pretty slim. So he looks around to find an example of where someone else did what he's looking to do, and stumbles across Bob's example. Thinking that he's following the Rule of Three, he then copies Bob's code and tweaks it to meet his own needs. 

Do you see what happened? Because Bob's and Alice's code didn't call into a common method somewhere, Charlie had no idea that he'd reached three instances of the duplicate code. Even if Charlie ended up having two new places to use this code, and chose to refactor at that point in time, he still would have overlooked Alice's piece of duplicate code. And if all of these developers follow Mark Seeman's advice, they may each be aware of five or six instances of the duplicate code and not feel the need to refactor

This example uses three developers, but it's perfectly possible for the same thing to happen to a single developer, given time in between touching the duplicate code.

So unless the duplicate code is highly localized (within a few lines of each other, e.g.) it is almost impossible to truly follow the Rule of Three.

And it doesn't really matter if the rate of change is low--all it takes is one person needing to make one change at some point in time. In the best-case scenario, they notice all the instances of duplication, they take the time to refactor all of them (which will probably take longer than the original refactoring would have--even if Charlie had to refactor it again--because there are now somewhere between three and eighteen chunks of mostly-duplicate code that need to be consolidated). But that best-case scenario is unlikely--there's a good chance that they'll fail to notice all the places where the code needs consolidation, and they'll end up only changing some of these places. This will lead to inconsistent behavior, which could lead to bad data and other serious problems.


While DRY should not be a dogmatic mantra that we follow blindly, it should be a guiding principle. The best time to eliminate duplicate code is generally the moment it appears. Other people may need to refactor the code again later, but the costs of incremental refactoring are usually dwarfed by the costs of failing to do so. 

The so-called "Rule of Three" is bound to lead to dangerous levels of duplication in all but the simplest of systems. The decision to refactor should not be based on how many instances of duplication there are, but how much code is duplicated, how far away those instances of duplication are from one another, and how much cohesion that duplication really has, with relation to the overall code base. 

Thursday, May 15, 2014

Service Locators and the Interface Segregation Principle

Okay, it has been a long time since I've posted here, and boy have I learned a lot in the years since my last post! A co-worker has been encouraging me to start writing blog posts about programming, and in a conversation on Google Plus, Mark Seemann suggested that I should put some of my thoughts in a blog post, so here goes:

Mark Seemann recently blogged (again) about the Service Locator anti-pattern. He tried to describe why it's an antipattern in terms of the Interface Segregation Principle:

From a client's perspective, there no limit to how many variations of the Create method it can invoke:
var foo = serviceLocator.Create<IFoo>();
var bar = serviceLocator.Create<IBar>();
var baz = serviceLocator.Create<IBaz>();
// etc.Again, from the client's perspective, that's equivalent to multiple method definitions like:
IFoo CreateFoo();
IBar CreateBar();
IBaz CreateBaz();
// etc.
However, the client can keep coming up with new types to request, so effectively, the number of Create methods is infinite!

This argument goes part-way there, but needs a little more fleshing out. After all, you could say that any method with parameters can be expanded out in this way:

var x = service.Foo(1);
var y = service.Foo(2);
var z = service.Foo(3);
// etc.

corresponds to

var x = service.Foo1();
var y = service.Foo2();
var z = service.Foo3();
// etc.

The key difference is that programmers almost never write code like what's in the example above: what's up with those magic numbers? We generally say `var a = Foo(x)`, where `x` is somehow derived from the method's input. That means you can write a meaningful unit test that ensures a given `x` gets passed into `Foo()` when you provide a given `y` as a parameter. This aspect of your method is unlikely to change as you modify implementation details--or if it does, it will probably be the sort of change which will require the test to be updated before it compiles.

On the other hand, if you're using a service locator, you're effectively passing hard-coded "magic classes" into the method all over the place. The values you pass in are rarely based on the input provided to the method, and can easily change as you refactor, without forcing anyone to recognize that your method now makes use of a different "interface" method, which needs to be mocked up before the test will run.

The fact that the intended usage of the interface involves passing in hard-coded values that are not defined by an `enum` indicates that you are, in essence, creating an infinite number of different "ways" to call methods on your interface. If you subscribe to the notion of "role-based" interfaces (which I do), then the usage of the interface should be what informs how the interface is written, and if there are a large number of ways for calling code to invoke a method on your interface, it creates the same problems as having a large number of methods on the interface.

What about Generics?

"Okay," you may be saying, "but wouldn't that make this an argument against all generics? After all, you can pass any type you want into your `Maybe.Not<>()` method, and generic parameters are hard-coded by definition: isn't that an anti-pattern?

The difference is that the behavior of a `List<>` constructor, or `Maybe.Not<>()` is exactly the same no matter what type gets passed in. These are data structures, not service methods. 

On the other hand, if you're talking about something like Automapper's `Mapper.Map<,>()` method, its behavior is going to vary widely based on the generic type arguments, and I'd suggest that this is, indeed, an anti-pattern. Your call to AutoMapper will be working fine one day, and then fail the next just because you renamed a property somewhere, and no compiler ever gave you so much as a hint that you broke your mapping.

I've got more opinions to offer on AutoMapper, but now's not the time. Hopefully this was enlightening.