- Introduction
- Test fixtures like assertion, builder
- Test Environment Set Up with TestBase
- ToolsHarness, a continuous integration server farm that treats tests individually
- Active and stable branch, localizing the damage
the Difficulty of Holding the Line
With a tool or not, the most difficult part of installing such a process is probably holding the line of "zero broken tests". In my past consulting and coaching experience, it sometimes takes great effort and time to get the team into the habit of running all the unit tests before checking into the code, as well as making writing tests and test fixing as the highest priority, and keep tuning the test so that the whole process does not exceed ten minutes. Even that, not all the teams kept up with the practice after we left the project on a good note.
I recently got a chance to catch up Greg, one of the ex-ThoughtWorkers that I used to work with and respect. He showed interest in what I am writing and expressed his opinion, as I quote:
Our test suite is too large and too slow to run with every build. We are lucky to get results once a day.
- Not everyone cares about the unit tests to the same degree. Some people are too busy to track down failures right away. Not everyone sees the value in the unit tests, mostly because our coverage figures aren't high enough.
- Not everyone has the skills to write decent tests or design their code in a modular, testable fashion.
When I read that email, I became more motivated working on this third post, because it is exactly what I want to write about. In my previous job, I found out that the only way to make agile development work was to follow the XP practices, especially when it comes to continuous integration. In an enterprise environment (as defined by Introduction ), there seemed to be no middle ground between "green all the time" and "red all the time". It seemed that the moment the team fell off the status of zero broken test and couldn't recover quickly, they would be in a deep hole right away.
The first thing that I have learned at Guidewire, is that the above problems can be solved better with the help of a comprehensive continuous integration tool, ToolsHarness in this case. I am not saying this is a silver bullet, because there are still a lot of development practices and disciplines required. But I can certainly say that I am now seeing the light.
Test Farm for Parallel Testing
No matter how hard you try, eventually you will not have enough time to run all the tests that you would like before checking in code. When this becomes the case, test breaking will become the norm rather than exception, and the XP way of handling broken builds would not apply anymore. For a complicated system, the time to run the full tests will be huge. At the same time, agile software development dictates fast feedback time on code changes. The longer the turnaround time is, the more fraction there is on the iterative development - how is it even possible for the team to build on top of something not yet proven to be working and expect to have a high throughput?
Guidewire has a big testing farm, composed of dozens of machines, mostly on Linux and configured with H2 database. These machines are configured so that when a build is available, they will check out the test suites to run.
When a developer checks in a change list, ToolsHarness will first pull the source and do a full build to make sure that the projects still compile. Once the compiling finishes successfully, ToolsHarness will post the build for test. The tests used to be divided into different suites based on which test class they extends, for example, Database test, Metadata test, or Server test. With the introduction of TestBase, they are all converted to NewTestSuite and NewSmokeTestSuite. (The difference is that the tests in NewSmokeTestSuite are acceptance tests, which are full end-to-end test and require additional sample data.)
Based on the previous run of the test, the suites are divided evenly into several parts, so that each testing machine can check out each part and run it. In this way through parallel testing, it takes no more than 20 minutes to run a suite that would normally take hours to finish. This system is highly scalable, because all you need to do is adding more machines. With such a fast feedback loop, the developers can work on the large code base and still make medium size changes. The worst thing that can happen would be to revert the change that you made less than half an hour ago.
It is easy to run the tests against different databases and J2EE servers too. Some testing machines are configured with Oracle database or SQL database, some are configured on Windows platform, and some are configured using Tomcat or WebLogic. As I am writing this post, the tools pod (the team in charge of developing this server) is working on ‘customer build testing’, so that the test environment will be exactly as the production environment when running the acceptance tests.
Tracking Tests Individually
With a large team, you will have developers with different skill sets. While it is easy for an senior developers to be conscious about making small changes at a time, and be able to identify the problem based on the broken test, it normally takes a much longer time for a junior developer to fix them. Unless all your senior developers happen to be good coaches, you are going to be stuck with broken tests popping up here and there for a while.
For each test, you can see the failure message in the form of the stack trace, the change lists associated with it, the history of the test to help you figure out the reason that the test is broken. You can also look into the log directory to see any additional generated file like server log and HTML page snapshots.
If you have written a broken test that you cannot yet fix, you can annotate it with KnownBreak, and it will show up properly in the ToolsHarness. If you have determined that a test in failing none-deterministically but you still cannot yet figure out why, you can mark it with NoneDeterministic, and it will show up as such in ToolsHarness. The key is to keep the noise of broken to minimum, if not zero, so that developers will get the notification accurately and fix them effectively.
Localizing the Damage through Branches
With aggressive refactoring, you will not be able to leave your platform code alone. Sometimes you know that the only way to be sure is to check in the code, let the continuous integration server do a full test on the changes. With this approach, you are going to risk putting the build into unstable state for a while before you can figure out the best solution. If the whole development team has to rely on a good build, they will either be out of commission for a while, or they are going to accumulate changes that will cause another wave of instability after you are done. And that is if you are lucky enough to quickly finish with the cycle of check in, revert, revert the revert to make more changes, check in, revert, ...
Sometimes, especially for the platform team and application framework team, you need to make big change in the code base. When it proved to break lots of tests in ToolsHarness, the best thing to do is to move forward by checking in more fixes, instead of reverting the change to do it again. The only problem with that, is that the code base comes unstable during the process. If you have a large team with others working on other areas at the same time, the number of broken tests could be disturbing. And as Greg pointed out at the beginning, not every team care about the tests in the same way. Some would prefer finishing the job at hand before tracking down the broken tests. The line for none-deterministic tests are much more blurry.
Antidote
As many articles, books, blogs have pointed out, people are always the center of the agile development process. Even with a powerful tool like ToolsHarness, it is still up for the team to apply disciplines and agile practices. Because the team does not have to stop everything to fix any broken tests, it is actually easy for people to ignore the tests. Given enough time, enough code changes would have been checked in, making it much harder than it should be to fix the tests.
So the rule of the thumb is still the same: fixing any broken test as quickly as possible when they come up. The old tricks still apply, which including things like revert the changes that broke the build, make small changes, check-in often, monitoring email notifications, run tests before checking in, etc.
No comments:
Post a Comment