Those who know me, know I’m a big proponent of unit testing. I try to write my tests first, but I don’t believe in absolutes. If you write your tests afterwards, at least you’re testing. But not everyone is even convinced of the value of automated tests (be it unit, integration, end-to-end).
Frequently, this discussion is a pointless one, because it’s all about vague arguments and based on experience. You have to take a ride in the car (or at least see it riding) to believe it goes forward without horses.
So I took some time to actually find out what’s out there that proves TDD works.
Keeping the numbers for later, the main arguments for TDD are:
- It pushes you towards a loosely-coupled, modular design. This will be very handy later, when you need to refactor and move parts around in your project. I’m currently working on a large industrial-automation project where I’ve done four big architectural changes without a hitch.
- It verifies your code works, and does so in an automated fashion. With an extensive test-suite, it is easy to test the health of your code by a single click of a button (or not even that, when using a build server). If you’re currently testing by debugging, you won’t know if the bug you fixed four years ago, is still fixed or if it hasn’t been re-introduced again by some other change. Better to know it early, instead of having to scramble when it happens in production.
- It provides a certain level of documentation for your code. Well-written tests can explain how the application should react to certain events and why.
In 2008, Nagappan, Maximilien, Bhat & Williams did actual scientific research and published a paper named "Realizing quality improvement through test driven development: results and experiences of four industrial teams". It used to be available from the Microsoft Research website, but currently, I could only find it in the Google Cache. It’s an interesting read, but in short it comes down to this.
They studied 3 teams at Microsoft and one at IBM. Codebase sizes ranged from 6000 lines of code to a whopping 155K. This is part of their conclusion:
All the teams demonstrated a significant drop in defect density: 40% for the IBM team;
60–90% for the Microsoft teams.
Using TDD does take a little longer, though:
The increase in development time ranges from 15% to 35%.
From an efficacy perspective this increase in development time is offset by the […]
reduced maintenance costs due to the improvement in quality
contact with the IBM team indicated that in one of the subsequent releases (more than five releases since the case study) some members of the team (grown 50% since the first release) have taken some shortcuts by not running the unit tests, and consequently the defect density increased temporally compared to previous releases.
Other studies have similar conclusions. TDD increases reliability of software, while only marginally decreasing productivity (in some cases even increasing it, according to Maria Siniiaalto). And even when it does decrease productivity, teams find it was worth it, because increased productivity would be lost by maintenance anyway.
It is also important to write tests alongside the actual application code (before or after is another discussion). George & Williams find that writing tests after the application is more or less done, will lead to writing less tests because developers think of less cases, and also the application is less testable (harder to test).
A final interesting thing in the George & Williams study is that 79% of the developers in this study also experienced that TDD leads to a simple design.
- Unit testing is not only fit for small projects. The Visual Studio team had 155000 lines of code to work with. If anything, unit testing gets more interesting and more necessary the larger a project becomes.
- TDD leads to more reliable code.
- TDD increases development time only slightly, but most studies point out that this is worthwhile because of the increased code quality.
- One study points out that TDD leads to simpler design.
- It is important to write tests at the same time as the actual code, not weeks or months after.
- This is backed by scientific studies, not by developer’s hunches.