Learning TDD

A friend described recently what happened when his some of his team started learning TDD.

They were doing the Roman Numerals kata, where you write a class to convert arabic numerals into their roman equivalents. So for example; an input of ‘7’ gives an output of ‘VII’.
The code to support the first test was easy.

public String arabicToRoman(int input) {
    return "1";
}

So was the second.

public String arabicToRoman(int input) {
    if (input == 1) {
        return "I";
    } else {
        return "II";
    }
}

However by the time they got to the third there was a problem. The code turned into a switch statement. Everyone knew that as the numbers got larger, a better algorithm would be needed, but didn’t know when make the change.

What was missing?

One of the key steps in TDD is to refactor to remove duplication after you get the test working. That’s key, but isn’t always fully appreciated.

  • Duplication refers not just to code blocks but of logic as well. That seems to be the case here
  • The open-closed principle may also help when deciding when to refactor. In this case the switch statement must be modified every time a number is added.

Refactored logic might look like this. An aggressive refactorer might even have done this at the second step:

public String arabicToRoman(int arabic) {
    String roman = "";
    for (;arabic > 0; arabic--) {
        roman = roman.concat("I");
    }
    return roman;
}

Of course the algorithm will continue to evolve as we add support for more numbers. And the tests need to be refactored as we go, just as aggressively as the production code.

So what?

To some, this might seem obvious and maybe simplistic. But its easy to miss subtleties when trying out the mechanical aspects of a new practice such as TDD – I’m sure I did something similar. Exploring a few blind alleys is a natural part of learning.

.

Technical debt and legacy code

Every job I’ve ever had has involved dealing with some amount of technical debt. The amount has varied, but its always there, even in modern codebases.

So what is technical debt?

There are loads of definitions – I think of it as code that is difficult or dangerous to change or extend. Technical debt is often associated with legacy code, which Michael Feathers defines as code without automated tests, though i’ve seen legacy code that was straightforward to change, and code with automated tests that wasn’t.

There’s a surprising amount of code with tests thats difficult to change or extend. Often dependencies have not been isolated well, so tests have extensive scaffolding. Another reason is overly complex or poor designs – if I find i’m changing every class in a subsystem for a simple addition, there is often a design problem.

The issue for me is not that technical debt exists (we’ve all written some – or at least I have), but how to address it. I think its as much a people problem as a technical one and there are many reasons for it

Managers discourage changes to existing code

  • Some managers don’t understand why existing code should change to accommodate new features. They ask why couldn’t you get it right first time? Ironically the more successful a product, the more likely it is to change and grow, often in unexpected ways.
  • Managers who have been developers in the past, know the dangers of making changes to code that has no tests, and shy away from making changes for that reason.
  • Others have been burnt by teams that got bogged down performing epic refactoring’s or rewrites
  • Finally, there is the ever present pressure of the roadmap, which managers are typically far more exposed to than any individual developer. Under that pressure, its tempting for even the best manager to encourage shortcuts.
I’m maybe being unfair to managers here. Much of the above could be applied to product owners, team leads, or anyone with else with a say in what’s being built.

We want to work on new code

Preferably in the hottest language using a cool new framework. It combines with appeal of the new with a great bullet point for the resume. And of course, the newer frameworks and languages can be more productive than older ones. Many (perhaps most) developers would rather attempt a rewrite than refactor legacy code. That can be the right thing to do; often it isn’t.
  • Old code still needs maintaining while the rewrite is happening
  • In agile teams where backlogs can change rapidly its easy for large rewrites to stall or be abandoned as priorities change.
  • If you are dealing with a codebase that has a lot of duplication, it may be better to reduce the duplication first, rather than introduce yet another mechanism for doing something
  • If the code is reasonably modern and has decent test coverage, refactoring is often by far the safer route – though some developers still argue for a rewrite

We don’t recognise that there is a problem

Ignorance or apathy? I’m not sure.

We’re scared to touch it

After being burnt a couple of times by introducing subtle bugs its easy to see why. Which leads to…

We don’t have the skills

There are many developers with good skills these days in TDD/BDD (or at least writing tests concurrently with the code). Fewer know the techniques for dealing with legacy code. A rewrite often feels easier.

What to do about it?

Change attitudes

I like Robert Martins boy scout rule: ‘Leave the campground cleaner than you found it’, which encourages a continual improvement mindset. Improve the code base in a minor way every time you make a change, even if its just clarifying a name or removing an unused variable.
Changing managerial attitudes can be a little trickier. Modern code is meant to be malleable, and not everyone gets that. Incremental refactoring rather than big epics can help.

Skills and Training

There are some great books, primarily ‘Working Effectively with Legacy Code’, which give practical techniques and insights.
A simple example:
Imagine we have a class that we cannot easily put under test – maybe it references many other classes or a socket or database connection, but we need to add or change some behaviour. Here are a few techniques to consider:
  • If the method in question does not reference any member variables we can make it static and thereby write tests for it without having to instantiate the whole class. Ugly but effective.
  • We can subclass and in the derived class override selected methods to effectively null out dependencies
  • We can add setter methods to override dependencies.
  • We can make private methods public to get access to them (!)
  • We can link to mock libraries
  • When adding behaviour we could create a small object with the new behaviour using TDD and then just call it from the legacy class. In the short term this can be pretty ugly – maybe the new class has only a single method; but over time we could move behaviour as appropriate from the legacy class to the new one.

These techniques are highly incremental and can make the code feel worse in the short term. Maybe thats why they are used as much as they could be.

Another fun exercise is to try out the excellent ‘Gilded Rose’ kata, which presents a horribly messed up method and asks you to refactor it.

Practice

I learn a lot by doing dry runs. Check out the code and try out a few refactorings. Then throw that code away and try it for real. Don’t be surprised if your real refactoring works out differently – the point of the dry run is to gain confidence and context.

Personally I find improving a legacy codebase can be a rewarding activity in itself. Good luck with yours!

Favourite books – better programming

I didn’t do software engineering at college, so for a long time I worried about what I didn’t know, which translated into a continuing quest for knowledge. This and a few followup posts will list a few of my favourite books and other sources, with some brief explanations of why I think they are important. Your opinions may vary. 🙂

So here goes…

Programming

These are books about the technicalities of programming; about being better programmers. We can get away with minimal designs and documentation, and many products have shipped with little of either, but there is always code, which has to be maintained and extended. In fact the more successful a product is, the more its likely to change.
All of the following cover (to greater or lesser extents) style, structure, debugging, documentation and low level design. The principles apply across many languages and environments, though the examples typically use the most popular language of the day.

The practice of Programming (Kernighan and Pike)

Brian Kernighan is famous as the co author of ‘The C programming language’, another of my favourite books, and his clear writing style is put to good use here. The tag line to this book is: Simplicity, Clarity, Generality. Written in 1999, its examples are mostly in C, C++ and Java, though there is a C and Unix bias, not surprising from the people so heavily involved with both. And it’s short, only 288 pages.

Code Complete (McConnell)

A heavyweight tome. Very comprehensive and a little verbose. McConnell backs up his advice with plenty of citations from industry studies. This one tops many peoples favourite technical book lists. It’s best to pick up the second edition, which has more of an OO focus than the first. I’m hoping for a third edition one day incorporating more agile and lean practises.

Clean code (Martin)

A book for the agile era, so an emphasis on things like unit testing and code as documentation. The cartoon about the only true measure of code quality being ‘WTFs per minute’ is a classic, and sets the tone. Bob Martin is very opinionated, so the book is a bit of a polemic, but the content (especially the early chapters) is excellent.
This book is required reading for all developers in our company.
There are mny books that focus on a single practice (such as TDD) or best use of a given language, but surprisingly few books like the three above, that focus on the nuts and bolts of general programming. Hopefully there will soon be one incorporating functional programming elements…