Thursday, August 21, 2014

My gripe with "The Rule of Three"

In software development, good developers will bend over backwards to make their code as maintainable as possible. We recognize that a given piece of code will be read many more times than it is written. It may take some time to craft code that's easy to read and change, but you'll save many times as much time in the long run.

One of the most recognized tricks to writing maintainable code is avoiding duplication of code. If you've got two chunks of code that look almost exactly the same, that's a "code smell"--an indication that something is wrong. This doesn't always mean you need to refactor the code to eliminate the duplication, but you should at least spend a minute thinking about whether there's an appropriate abstraction you can use that would make the code easier to understand. In almost every case where I've seen two or more lines of almost-identical code, I've found that removing the duplication yields code that is easier to understand.

Mark Seeman recently blogged about some cases where removing duplicate code can be harmful. He correctly pointed out that sometimes apparent duplication is actually coincidental--there's not really an appropriate abstraction for it, and you'll just make your code more confusing and less cohesive by forcing an abstraction on it. He also argued that a developer may choose the wrong abstraction because he lacks enough use cases to know which technique is going to best accommodate future needs. So once a future need arises, the developer has to refactor again, increasing the maintenance costs.

He says:
This is, in my experience, the most important reason to follow the Rule of Three: wait, until you have more facts available to you. You don't have to take the rule literally either. You can wait until you have four, five, or six examples of the duplication, if the rate of change is low.
The Rule of Three basically says that it's okay to have duplication in two pieces of code, but once you have three pieces of duplicate code then you need to refactor. I feel that advocates of the Rule of Three are missing some important realities about modern software development, though.

Reality #1: Preventive refactoring is cheap

If I see some lines of code that I'd like to mostly duplicate, I can use the refactoring tools that are either built into my IDE or come with a plugin like Resharper. With a few keyboard shortcuts, I:


  1. Extract the values that are going to need to change into variables (select an expression, then "extract variable").
  2. Move the variable declarations above the lines of code that I want to duplicate. 
  3. "Extract method" on the lines of code I need to duplicate.
  4. Go back to the method call and "inline" all the variables that I created in step 1.
It doesn't typically take more than a minute to move the extracted method into a different class, if necessary, either. And all of these changes are practically guaranteed not to change the behavior of the code (as long as step 2 doesn't change order of operations in an important way). So the cost of avoiding duplication in the first place is actually pretty minimal.

Likewise, if another developer comes along and realizes that I chose the wrong form of abstraction for my method, and needs to refactor it to accommodate a different use case, he can use similar tricks in his refactoring. The worst case scenario is that he's forced to inline the method call with a quick keyboard shortcut and start over with exactly the same effort he might have made if you'd followed the Rule of Three and copy/pasted the code in the first place.

I say might because without the extracted method there's no guarantee that the next developer would have noticed the need to refactor in the first place, which leads me to my next point.

Reality #2: Duplicate code is usually only obvious to the person duplicating it

Let's say Alice writes some code, and then Bob realizes he needs to do something very similar. Following the Rule of Three, he copy/pastes a chunk of Alice's code, and then makes a few tweaks to fit his scenario.

Now Charlie needs to do something similar to what Alice or Bob did. What are the chances that he'll be aware of both places that contain duplicate code? Honestly, pretty slim. So he looks around to find an example of where someone else did what he's looking to do, and stumbles across Bob's example. Thinking that he's following the Rule of Three, he then copies Bob's code and tweaks it to meet his own needs. 

Do you see what happened? Because Bob's and Alice's code didn't call into a common method somewhere, Charlie had no idea that he'd reached three instances of the duplicate code. Even if Charlie ended up having two new places to use this code, and chose to refactor at that point in time, he still would have overlooked Alice's piece of duplicate code. And if all of these developers follow Mark Seeman's advice, they may each be aware of five or six instances of the duplicate code and not feel the need to refactor

This example uses three developers, but it's perfectly possible for the same thing to happen to a single developer, given time in between touching the duplicate code.

So unless the duplicate code is highly localized (within a few lines of each other, e.g.) it is almost impossible to truly follow the Rule of Three.


And it doesn't really matter if the rate of change is low--all it takes is one person needing to make one change at some point in time. In the best-case scenario, they notice all the instances of duplication, they take the time to refactor all of them (which will probably take longer than the original refactoring would have--even if Charlie had to refactor it again--because there are now somewhere between three and eighteen chunks of mostly-duplicate code that need to be consolidated). But that best-case scenario is unlikely--there's a good chance that they'll fail to notice all the places where the code needs consolidation, and they'll end up only changing some of these places. This will lead to inconsistent behavior, which could lead to bad data and other serious problems.

Conclusion

While DRY should not be a dogmatic mantra that we follow blindly, it should be a guiding principle. The best time to eliminate duplicate code is generally the moment it appears. Other people may need to refactor the code again later, but the costs of incremental refactoring are usually dwarfed by the costs of failing to do so. 

The so-called "Rule of Three" is bound to lead to dangerous levels of duplication in all but the simplest of systems. The decision to refactor should not be based on how many instances of duplication there are, but how much code is duplicated, how far away those instances of duplication are from one another, and how much cohesion that duplication really has, with relation to the overall code base.