Wednesday, February 20, 2008

Enterprise Agile Testing Part III: Managing Tests with ToolsHarness, Individually

This is the third part of the Enterprise Agile Testing: Continuous Integration has proven to be one of the most important practices in agile software development. Every time that a developer checks in the code, the resulting code base is rebuilt and tests are run against it. The end result of the integration tells everyone on the project if the codebase is good enough for release. There are some prefer synchronous continuous integration through a push-button process over an asynchronous process through a tool like CruiseControl. But everyone agrees that it is something very useful.

the Difficulty of Holding the Line

With a tool or not, the most difficult part of installing such a process is probably holding the line of "zero broken tests". In my past consulting and coaching experience, it sometimes takes great effort and time to get the team into the habit of running all the unit tests before checking into the code, as well as making writing tests and test fixing as the highest priority, and keep tuning the test so that the whole process does not exceed ten minutes. Even that, not all the teams kept up with the practice after we left the project on a good note.

I recently got a chance to catch up Greg, one of the ex-ThoughtWorkers that I used to work with and respect. He showed interest in what I am writing and expressed his opinion, as I quote:

Our test suite is too large and too slow to run with every build. We are lucky to get results once a day.

- Not everyone cares about the unit tests to the same degree. Some people are too busy to track down failures right away. Not everyone sees the value in the unit tests, mostly because our coverage figures aren't high enough.
- Not everyone has the skills to write decent tests or design their code in a modular, testable fashion.

When I read that email, I became more motivated working on this third post, because it is exactly what I want to write about. In my previous job, I found out that the only way to make agile development work was to follow the XP practices, especially when it comes to continuous integration. In an enterprise environment (as defined by Introduction ), there seemed to be no middle ground between "green all the time" and "red all the time". It seemed that the moment the team fell off the status of zero broken test and couldn't recover quickly, they would be in a deep hole right away.

The first thing that I have learned at Guidewire, is that the above problems can be solved better with the help of a comprehensive continuous integration tool, ToolsHarness in this case. I am not saying this is a silver bullet, because there are still a lot of development practices and disciplines required. But I can certainly say that I am now seeing the light.

Test Farm for Parallel Testing

No matter how hard you try, eventually you will not have enough time to run all the tests that you would like before checking in code. When this becomes the case, test breaking will become the norm rather than exception, and the XP way of handling broken builds would not apply anymore. For a complicated system, the time to run the full tests will be huge. At the same time, agile software development dictates fast feedback time on code changes. The longer the turnaround time is, the more fraction there is on the iterative development - how is it even possible for the team to build on top of something not yet proven to be working and expect to have a high throughput?

Guidewire has a big testing farm, composed of dozens of machines, mostly on Linux and configured with H2 database. These machines are configured so that when a build is available, they will check out the test suites to run.

When a developer checks in a change list, ToolsHarness will first pull the source and do a full build to make sure that the projects still compile. Once the compiling finishes successfully, ToolsHarness will post the build for test. The tests used to be divided into different suites based on which test class they extends, for example, Database test, Metadata test, or Server test. With the introduction of TestBase, they are all converted to NewTestSuite and NewSmokeTestSuite. (The difference is that the tests in NewSmokeTestSuite are acceptance tests, which are full end-to-end test and require additional sample data.)

Based on the previous run of the test, the suites are divided evenly into several parts, so that each testing machine can check out each part and run it. In this way through parallel testing, it takes no more than 20 minutes to run a suite that would normally take hours to finish. This system is highly scalable, because all you need to do is adding more machines. With such a fast feedback loop, the developers can work on the large code base and still make medium size changes. The worst thing that can happen would be to revert the change that you made less than half an hour ago.

It is easy to run the tests against different databases and J2EE servers too. Some testing machines are configured with Oracle database or SQL database, some are configured on Windows platform, and some are configured using Tomcat or WebLogic. As I am writing this post, the tools pod (the team in charge of developing this server) is working on ‘customer build testing’, so that the test environment will be exactly as the production environment when running the acceptance tests.

Tracking Tests Individually

With a large team, you will have developers with different skill sets. While it is easy for an senior developers to be conscious about making small changes at a time, and be able to identify the problem based on the broken test, it normally takes a much longer time for a junior developer to fix them. Unless all your senior developers happen to be good coaches, you are going to be stuck with broken tests popping up here and there for a while.

When a testing machine is done with the test, it will post the test result back to the ToolsHarness. ToolsHarness will parse the XML file and store the result of each test into the database to track them. The benefit of this is so that developers can start tracking the tests individually. When a test is broken, ToolsHarness will make an educated guess based on the change list and changed package to assign it to the developer. If it turns out to be wrong, the developer can easily assign it to the right person.
When a developer logs into ToolsHarness, the first page, Desktop, contains a list of tests that have been assigned to him or her. In this way, you won’t be distracted as long as there are no tests assigned to you. The summary of all the tests is also on this page so that you know not to check-in anything when there are hundreds of tests broken, or give the senior developers a clue to check in others in fixing tests.

For each test, you can see the failure message in the form of the stack trace, the change lists associated with it, the history of the test to help you figure out the reason that the test is broken. You can also look into the log directory to see any additional generated file like server log and HTML page snapshots.

If you have written a broken test that you cannot yet fix, you can annotate it with KnownBreak, and it will show up properly in the ToolsHarness. If you have determined that a test in failing none-deterministically but you still cannot yet figure out why, you can mark it with NoneDeterministic, and it will show up as such in ToolsHarness. The key is to keep the noise of broken to minimum, if not zero, so that developers will get the notification accurately and fix them effectively.

Localizing the Damage through Branches

With aggressive refactoring, you will not be able to leave your platform code alone. Sometimes you know that the only way to be sure is to check in the code, let the continuous integration server do a full test on the changes. With this approach, you are going to risk putting the build into unstable state for a while before you can figure out the best solution. If the whole development team has to rely on a good build, they will either be out of commission for a while, or they are going to accumulate changes that will cause another wave of instability after you are done. And that is if you are lucky enough to quickly finish with the cycle of check in, revert, revert the revert to make more changes, check in, revert, ...

Sometimes, especially for the platform team and application framework team, you need to make big change in the code base. When it proved to break lots of tests in ToolsHarness, the best thing to do is to move forward by checking in more fixes, instead of reverting the change to do it again. The only problem with that, is that the code base comes unstable during the process. If you have a large team with others working on other areas at the same time, the number of broken tests could be disturbing. And as Greg pointed out at the beginning, not every team care about the tests in the same way. Some would prefer finishing the job at hand before tracking down the broken tests. The line for none-deterministic tests are much more blurry.

At Guidewire, the way to make it work for all the teams working on the same code base is through the branches. Each team works on an 'active' branch that they are free to do whatever they feel most productive. The only rule that they need to follow is to fix all the tests before pushing the change to the 'stable' branch, where every team is pulling change from on a daily basis. If it takes a team a couple of week to get into a stable state, then they will have to risk the merge conflicts. For a team that is diligent on fixing the test, their branch will be stable most of the time, and pushing would come much easier. It is exactly like what most agile books recommend on checking code -- check in as often as possible as long as the tests are passing -- except on the level of branches.
Sometimes you can still make a mistake and push a broken test (or more) into the stable. In this case, there is a 'merge' branch that you can use to fix the builds. Most of the time however, the fix is very easy, then all you need to do is to find out which team is next in line to push to stable, and manually integrate your fix into their branch. There are merge script written in Ruby to help the pull and push process. They are very robust and well tested, so that majority of the push and pull are merely a push-button process, i.e., you type in the command "merge.rb --pull", and you are good to go. We have a merge machine set up specifically for this job, so that the merger wouldn't have to give up his or her local resources.


As many articles, books, blogs have pointed out, people are always the center of the agile development process. Even with a powerful tool like ToolsHarness, it is still up for the team to apply disciplines and agile practices. Because the team does not have to stop everything to fix any broken tests, it is actually easy for people to ignore the tests. Given enough time, enough code changes would have been checked in, making it much harder than it should be to fix the tests.

So the rule of the thumb is still the same: fixing any broken test as quickly as possible when they come up. The old tricks still apply, which including things like revert the changes that broke the build, make small changes, check-in often, monitoring email notifications, run tests before checking in, etc.

No comments: