Testing legacy code

July 15, 2010

Note: This blog post is more than 14 years old. Some information might be outdated.

Testing legacy code

Today we know about the benefits of Test Driven Development and normally also start new projects using TDD. Most projects from the last couple of years integrated this method in their daily development process, which often results in in good code coverage results of 90% and above. But what about all the other old projects, you still manage in your daily work?

Those projects already exist for several years, doing their jobs and are adding everyday value for their users. They contain of thousands or hundreds of thousands lines of code, but most or all of them are untested. Estimating the time for modifications and maintenance is nearly impossible and risky, because side effects are always possible. So what can we do with such code bases?

Uncovered source code

Option one: Ee can try to write tests for the whole application, afterwards. But for this task a developer could spend all his working years without doing anything else then writing tests. So, this isn't really a realistic solution.

Option two: We can start to write tests only for newly added features. At a first glance this seems to be a good compromise. You don't have to invest too much time, and all newly added features are tested. But this solution has one major drawback. What's with all that legacy code? The code that makes the maintenance a nightmare and estimating a gambling game. It is still untested.

This brings us to the third possible solution: We write tests for newly added features and also we add tests for the existing code once we have to change it. With this approach, we have the advantage of both new and old source code will be tested over time, without spending too much time and money for legacy code tests.

Following approach three, chances are good that we get an application stable within a few month, because normally changes to legacy code are not spread throughout the whole application. This means that parts of an application will never change a) because no one uses them or b) because they are so stable so that no one has to change them. While other parts of the application will change regularly a) because there are so many bugs in that part of the source code or b) because there are continuous requirement changes and improvements.

If you would like to get professional support for PHP_ChangeCoverage or quality in general don't hesitate to contact us.

Sounds good, right? But using a normal coverage report, you may never get an appealing percentage of 70%, 80% or more. The report will stay red and the covered lines will stuck around 25% to 30%. This condition can be really frustrating for all participants, because they work hard on their tests, but there is no positive and motivating feedback. And here comes PHP_ChangeCoverage into play.

PHP_ChangeCoverage combines the coverage data collected by PHPUnit and the commit history of your version control system and generates a new coverage report, which only reflects those parts of the application that have been changed within a specified time range. You can get a first version of PHP_ChangeCoverage from its github repository.

~ $ git clone git://github.com/manuelpichler/php-change-coverage.git phpccov
~ $ cd phpccov

In order to run PHP_ChangeCoverage two additional dependencies must be installed. The first one is PHPUnit and the second one is PHP_CodeCoverage. Both can be installed through PHPUnit's PEAR channel.

~ $ sudo pear channel-discover pear.phpunit.de
~ $ sudo pear channel-discover components.ez.no
~ $ sudo pear install --alldeps phpunit/PHPUnit
~ $ sudo pear install --alldeps phpunit/PHP_CodeCoverage-beta

Coverage for partial tested code

Now that all required programs are installed we can start a small example session. Therefore you can checkout out the small example application which can be found under docs/example.

~ $ cd docs/example
~ $ cd example
~ $ svn co file://`pwd`/svnrepo checkout

Now let's generate the code coverage report for the project the first time:

~ $ ../../phpccov --coverage-html coverage \
                        checkout/test/PHP/ChangeCoverage/ExampleTest.php

As you can see, the PHP_CodeCoverage command line interface accepts the same arguments as the used PHPUnit version accepts. And without any special parameters both tools have nearly the same behavior. If we look onto the generated code coverage report (Coverage for partial tested code), we can see that the source has a line coverage of round about 60% and method coverage of 50%. If we assume now that the methods getBaz() and showBaz() are frequently changed, while the other two methods are just kept, because no one knows where or if at all they are used. So let us rerun the tests, but this time we only want the coverage information for that part of source that was modified in time frame, starting from a specified date.

Coverage based on code changes

~ $ ../../phpccov --modified-since 2010/07/27 \
                        --coverage-html coverage \
                        checkout/test/PHP/ChangeCoverage/ExampleTest.php

This time we got a coverage report that only highlights those lines that were changed since the specified date. All the other lines are flagged as dead code. If you don't like the dead code behavior and you prefer to highlight all unchanged lines as covered, you can add the --unmodified-as-covered option to the phpccov command line call.

~ $ ../../phpccov --modified-since 2010/07/27 \
                        --unmodified-as-covered \
                        --coverage-html coverage \
                        checkout/test/PHP/ChangeCoverage/ExampleTest.php

Currently PHP_ChangeCoverage supports five different version control systems, through the underlying vcs_wrapper.

If PHP_ChangeCoverage cannot detect one of these version control systems it will fallback to a simple file based implementation that uses the last modification time to collect only changed files.

Additionally PHP_ChangeCoverage uses PHPUnit to execute the test cases and PHP_CodeCoverage to generate the different report formats.

The project is licensed under New BSD license and is available on Github for forking.