Saturday, November 15, 2008

Staying Agile by Going off "Agile"

This is a blog post of a statement that I finally made after reading "The Decline and Fall of Agile"
I am going off "Agile"
No, I am not going to give up test-driven development. In fact, I am doing more of it by adopting more behavior-driven development, which is actually harder at certain cases. It helps me understand the code and verifies the design (I believe it does that rather than "drives" out the design nowadays but that is another post).

No, I am not going to give up aggressive refactoring. Every time, and I do mean EVERY TIME, I slack on it, I end up paying the price one way or another and kicking myself. I have been proud of every single line of code that I have produced (cannot say that for all the code that I have inherited and worked on), and they always serve me and my team well.

No, I am not going to give up on iterative development in the form of Iterations or Sprints. They help my teams focus, avoid distractions, and can still response to the request from outside the team with crystal clear transparency.

So what is it?

I am going to take "agile" off my vocabulary in all communications.

Rather than saying
"Not able to have QA accepting the stories as soon as they are finished is not agile"
I'll say "We need to get those finished stories accepted as soon as possible, so that we can close the feedback loop. When they are accepted, we know we are doing a good job. And when they are not, we can trace back to our thoughts as we were developing them and understand where it went wrong".

Rather than saying
"Not setting a goal at the begining of the Sprint and verifying them through the Sprint signature is not agile"
I'll say "We need to establish a way to provide feedback regarding our work and make continuous improvement to the way we work, so that we can provide better value to the people who pay us. One way we can do that is to look back at our progress in the past Sprint, talk about our experiences and thoughts, and come up with action items to make things better"

Apparently this will make conversation longer, because I have to present proof more than a good book (dozens of good books available as a matter of fact). Sometimes, I will have to wait, patiently, for the opportunity to present itself so that I can use as an example to persuade others to slow down, do it right, and do it well.

Why

I have been thinking along this line for a while. I read the "Good Agile and Bad Agile", felt annoyed because there is truth in what he is saying. From time to time, I get annoyed by the negative comments that don't even make sense to me. I wrote one post about "Things You Cannot Get Certified For", and argued hard on several news groups that I subscribe.

Very soon I got tired of it. Using the word "agile" has caused more distraction than its worth. Practices are sometimes picked upon literally and are attacked. Rather than looking at the value something is trying to bring, many seem to tend to look at the cost(time, tools, processes) first. It got attacked, it got debated, and at the end nothing is done and the bad things just keep going. And you get people from all over the world writing about how "agile" did not work for them and laughing at anyone who is interested in trying.

To add insult to injury, you can also hear usage of agile in the format like "Let's be agile about it, instead of insisting on ... ". It is really hard to argue in this situation, because you cannot just simply say "no, lets not be agile about it because we should insist on going through this three hour meeting to make sure that our stories are up to the standard".

I have been avoiding throwing agile around for a while and I think I am happy with the result. I also have been ignoring the bad usage of 'agile' out there so that I can stay healthy to focus on bringing agility to my teams. (I swear this is the last time). A month ago, I went through all my blog posts and took agile out of the labels and categories. This post will remain the only one with "Agile" as the label.

I have been thinking about writing a post like this and finally decided to do it after reading James' post "The Decline and Fall of Agile"

Other References

I am collecting references of others with similar ideas here:

Tuesday, October 28, 2008

Two Sprint Equations

What should be the order of items to do when installing a Sprint process from scratch? In the coaching days, we requested one to one ratio between the coaches and the rest of the developers plus a project manager, go all out for a couple of sprints, give everyone a chance to adjust to the process, before adjusting the process to the team.

For a single person, the strategy would have to be different. You would need to look at all the practices and make trade-offs with eyes on the big picture. The following two equations are what kept popping into my mind as I am installing the process to the two teams.

Value = Scope * (Feature Quality * Code Quality)

One comment to make here is that during the product development, these three factors are sometimes working against each other. A good business analyst or product manage is one that knows how and when to balance them. Only then, one with a strong personality can bring the best value to the product.

Scope is measurable, as the second section will show. Feature quality and code quality however are simply not something can be determined by objective measurement, not purely on it anyway. When pushed on something like a scope, the things that are not measurable get sacrificed, and everyone ends up paying for it sooner or later.

Scope = Velocity * Number of Sprints

Assuming that the quality of the product is controlled, the scope would be the next thing to look out for during a project. This is pretty easy to understand: the more the team can do without sacrificing the quality (both feature quality and code quality), the better.

Velocity is something that can only be affected by tuning the Sprint process of the team, but can never be demanded. What is left for this equation to work would be to adjust either the scope of the project, or the time of the project (number of sprints), and most of the time both. This is probably one of the commonly stated facts, at the same time it is probably also one of the most ignored fact.

Boosting velocity is the same as boosting productivity of the team, which is the job for the team lead. This is the purpose of a lot of XP practices: TDD, paired programming, co-location, shared ownership, continuous integration.

Friday, October 24, 2008

An Interesting Agile Team Paradox

I learned this on the latest BayXP meeting

A good self-organizing team is commonly one with a strong leadership
I think there are a lot to this interesting comment. I'd like to find out more about its origin and elaborate on it at latter time. I just want to write it down because I have almost forgot it twice already.

Wednesday, August 13, 2008

Building Java Project with Ruby

I probably would be stretching it if I say this is THE solution. But would you not look at this and think 'this just feels right'?

Next to do:

* Implement 'project.test'
* Implement 'project.report_coverage(dir_to_emma_files)'

Thursday, July 10, 2008

Converting in Image in Ruby - Can it get any easier?

require 'RMagick'
Magick::ImageList.new('image.jpg').write('image.tif')


Ok, that is way too short. Let's add crop operation:

image = Magick::ImageList.new('image.jpg')
image.crop(0, 0, 64, 64).write('image.tif')
(I think I can add the first example to Cotta)

Monday, June 16, 2008

Five Sprints into SCRUM

We have kicked off Sprint 5 today for Application Framework team.

Guidewire development is following SCRUM methodology. However, through all these years, due to various reason, the ideas behind Sprints are not exactly followed. There are many reasons for this, some of which are actually good reasons. However, that does not mean it was the best decision, and some development teams are trying to bring back meaningful Sprints to the development process, including AF team.

So what have I done differently this time?

We ended up using JIRA to track our stories. There are many reasons for this. I think the first one is the kind of work we are doing right now. We are not yet doing active development, but rather fixing bugs for a point release and run performance testing. Since all the bugs are created in JIRA already, using JIRA to track items that are not bugs makes it easy to track all the items we need to do given any Sprint. On the weekly work-from-home day, which each Guidewire employee can choose freely, it is very convenience to go to JIRA to pick the next work to do.

I am still keeping a Sprint board by writing down the JIRAs on the story cards but it is not as effective as I would like it to be. I think one reason is that QAs are verifying the JIRAs on their own schedule. (And the reason for that is some QAs are not part of AF team, because AF work affects other application teams). I know it sounds strang, but that is the situation right now. We are talking about how to get away from this mode and have a real complete independent development teams but before that happens, we will just have to pull it through.

The purpose for Sprint board now is more for daily Sprint meeting, where we talk about what we have achieved yesterday and are planning to do today. I use it to help the team focus and work on only the blocker JIRA or the JIRAs scheduled for the Sprint. Old habits die hard but we are making progress in that direction. When we schedule too many for the Sprint, which has been the case for all the past Sprints, I use Sprint board to figure out what to push to the next Sprint. I have not bee doing this aggressively. Now that I have an idea of our current velocity, I'll do more now.

I am also changing the Sprint planning format. I am not going through the JIRAs one by one and ask question on them anymore, because the feedback has been that it takes a long time and becomes uninteresting. I think the first reason is that we are not sharing enough to make it a team conversation. Rather, it is me and whoever owns that part of the system talking with each other and figuring out the tasks to do. Even that, because I cannot pair on each and every JIRA, I am not able to track and check that each JIRA is estimated correctly and each JIRA is done within a Sprint. Without following them up and closing the feedback loop, all the work of creating tasks and track them become rather pointless.

So in the Sprint plannig, I now show the JIRA list scheduled, talk about them briefly in groups by the functional area, and track down the estimate after the meeting. I think I will change the estimation to be before the meeting next time, so that I would know how much to schedule for the Sprint.

Sunday, May 18, 2008

Sunday, February 24, 2008

Guidewire Development Blog

I have developers from Guidewire customer commenting on my blog so I would like to mention that Guidewire now has its own development blog which I think you will find interesting.

http://guidewiredevelopment.wordpress.com

Enjoy!

Proposal for Agile 2008 Submitted

Even though I still have one more post to go for this topic on Enterprise Agile Testing, submission deadline for Agile 2008 is approaching. So I looked at other submissions, looked at what I have written, and submitted my proposal:

http://submissions.agile2008.org/node/3736

Feedbacks are welcome either through the submission page or my website http://www.shaneduan.com/contact.html

Wednesday, February 20, 2008

Enterprise Agile Testing Part III: Managing Tests with ToolsHarness, Individually

This is the third part of the Enterprise Agile Testing: Continuous Integration has proven to be one of the most important practices in agile software development. Every time that a developer checks in the code, the resulting code base is rebuilt and tests are run against it. The end result of the integration tells everyone on the project if the codebase is good enough for release. There are some prefer synchronous continuous integration through a push-button process over an asynchronous process through a tool like CruiseControl. But everyone agrees that it is something very useful.

the Difficulty of Holding the Line

With a tool or not, the most difficult part of installing such a process is probably holding the line of "zero broken tests". In my past consulting and coaching experience, it sometimes takes great effort and time to get the team into the habit of running all the unit tests before checking into the code, as well as making writing tests and test fixing as the highest priority, and keep tuning the test so that the whole process does not exceed ten minutes. Even that, not all the teams kept up with the practice after we left the project on a good note.

I recently got a chance to catch up Greg, one of the ex-ThoughtWorkers that I used to work with and respect. He showed interest in what I am writing and expressed his opinion, as I quote:

Our test suite is too large and too slow to run with every build. We are lucky to get results once a day.

- Not everyone cares about the unit tests to the same degree. Some people are too busy to track down failures right away. Not everyone sees the value in the unit tests, mostly because our coverage figures aren't high enough.
- Not everyone has the skills to write decent tests or design their code in a modular, testable fashion.

When I read that email, I became more motivated working on this third post, because it is exactly what I want to write about. In my previous job, I found out that the only way to make agile development work was to follow the XP practices, especially when it comes to continuous integration. In an enterprise environment (as defined by Introduction ), there seemed to be no middle ground between "green all the time" and "red all the time". It seemed that the moment the team fell off the status of zero broken test and couldn't recover quickly, they would be in a deep hole right away.

The first thing that I have learned at Guidewire, is that the above problems can be solved better with the help of a comprehensive continuous integration tool, ToolsHarness in this case. I am not saying this is a silver bullet, because there are still a lot of development practices and disciplines required. But I can certainly say that I am now seeing the light.

Test Farm for Parallel Testing

No matter how hard you try, eventually you will not have enough time to run all the tests that you would like before checking in code. When this becomes the case, test breaking will become the norm rather than exception, and the XP way of handling broken builds would not apply anymore. For a complicated system, the time to run the full tests will be huge. At the same time, agile software development dictates fast feedback time on code changes. The longer the turnaround time is, the more fraction there is on the iterative development - how is it even possible for the team to build on top of something not yet proven to be working and expect to have a high throughput?

Guidewire has a big testing farm, composed of dozens of machines, mostly on Linux and configured with H2 database. These machines are configured so that when a build is available, they will check out the test suites to run.


When a developer checks in a change list, ToolsHarness will first pull the source and do a full build to make sure that the projects still compile. Once the compiling finishes successfully, ToolsHarness will post the build for test. The tests used to be divided into different suites based on which test class they extends, for example, Database test, Metadata test, or Server test. With the introduction of TestBase, they are all converted to NewTestSuite and NewSmokeTestSuite. (The difference is that the tests in NewSmokeTestSuite are acceptance tests, which are full end-to-end test and require additional sample data.)

Based on the previous run of the test, the suites are divided evenly into several parts, so that each testing machine can check out each part and run it. In this way through parallel testing, it takes no more than 20 minutes to run a suite that would normally take hours to finish. This system is highly scalable, because all you need to do is adding more machines. With such a fast feedback loop, the developers can work on the large code base and still make medium size changes. The worst thing that can happen would be to revert the change that you made less than half an hour ago.

It is easy to run the tests against different databases and J2EE servers too. Some testing machines are configured with Oracle database or SQL database, some are configured on Windows platform, and some are configured using Tomcat or WebLogic. As I am writing this post, the tools pod (the team in charge of developing this server) is working on ‘customer build testing’, so that the test environment will be exactly as the production environment when running the acceptance tests.

Tracking Tests Individually

With a large team, you will have developers with different skill sets. While it is easy for an senior developers to be conscious about making small changes at a time, and be able to identify the problem based on the broken test, it normally takes a much longer time for a junior developer to fix them. Unless all your senior developers happen to be good coaches, you are going to be stuck with broken tests popping up here and there for a while.

When a testing machine is done with the test, it will post the test result back to the ToolsHarness. ToolsHarness will parse the XML file and store the result of each test into the database to track them. The benefit of this is so that developers can start tracking the tests individually. When a test is broken, ToolsHarness will make an educated guess based on the change list and changed package to assign it to the developer. If it turns out to be wrong, the developer can easily assign it to the right person.
When a developer logs into ToolsHarness, the first page, Desktop, contains a list of tests that have been assigned to him or her. In this way, you won’t be distracted as long as there are no tests assigned to you. The summary of all the tests is also on this page so that you know not to check-in anything when there are hundreds of tests broken, or give the senior developers a clue to check in others in fixing tests.

For each test, you can see the failure message in the form of the stack trace, the change lists associated with it, the history of the test to help you figure out the reason that the test is broken. You can also look into the log directory to see any additional generated file like server log and HTML page snapshots.

If you have written a broken test that you cannot yet fix, you can annotate it with KnownBreak, and it will show up properly in the ToolsHarness. If you have determined that a test in failing none-deterministically but you still cannot yet figure out why, you can mark it with NoneDeterministic, and it will show up as such in ToolsHarness. The key is to keep the noise of broken to minimum, if not zero, so that developers will get the notification accurately and fix them effectively.

Localizing the Damage through Branches

With aggressive refactoring, you will not be able to leave your platform code alone. Sometimes you know that the only way to be sure is to check in the code, let the continuous integration server do a full test on the changes. With this approach, you are going to risk putting the build into unstable state for a while before you can figure out the best solution. If the whole development team has to rely on a good build, they will either be out of commission for a while, or they are going to accumulate changes that will cause another wave of instability after you are done. And that is if you are lucky enough to quickly finish with the cycle of check in, revert, revert the revert to make more changes, check in, revert, ...

Sometimes, especially for the platform team and application framework team, you need to make big change in the code base. When it proved to break lots of tests in ToolsHarness, the best thing to do is to move forward by checking in more fixes, instead of reverting the change to do it again. The only problem with that, is that the code base comes unstable during the process. If you have a large team with others working on other areas at the same time, the number of broken tests could be disturbing. And as Greg pointed out at the beginning, not every team care about the tests in the same way. Some would prefer finishing the job at hand before tracking down the broken tests. The line for none-deterministic tests are much more blurry.

At Guidewire, the way to make it work for all the teams working on the same code base is through the branches. Each team works on an 'active' branch that they are free to do whatever they feel most productive. The only rule that they need to follow is to fix all the tests before pushing the change to the 'stable' branch, where every team is pulling change from on a daily basis. If it takes a team a couple of week to get into a stable state, then they will have to risk the merge conflicts. For a team that is diligent on fixing the test, their branch will be stable most of the time, and pushing would come much easier. It is exactly like what most agile books recommend on checking code -- check in as often as possible as long as the tests are passing -- except on the level of branches.
Sometimes you can still make a mistake and push a broken test (or more) into the stable. In this case, there is a 'merge' branch that you can use to fix the builds. Most of the time however, the fix is very easy, then all you need to do is to find out which team is next in line to push to stable, and manually integrate your fix into their branch. There are merge script written in Ruby to help the pull and push process. They are very robust and well tested, so that majority of the push and pull are merely a push-button process, i.e., you type in the command "merge.rb --pull", and you are good to go. We have a merge machine set up specifically for this job, so that the merger wouldn't have to give up his or her local resources.

Antidote

As many articles, books, blogs have pointed out, people are always the center of the agile development process. Even with a powerful tool like ToolsHarness, it is still up for the team to apply disciplines and agile practices. Because the team does not have to stop everything to fix any broken tests, it is actually easy for people to ignore the tests. Given enough time, enough code changes would have been checked in, making it much harder than it should be to fix the tests.


So the rule of the thumb is still the same: fixing any broken test as quickly as possible when they come up. The old tricks still apply, which including things like revert the changes that broke the build, make small changes, check-in often, monitoring email notifications, run tests before checking in, etc.

Friday, February 08, 2008

Enterprise Agile Testing Part II : Test Environment Set Up with TestBase

This is the second part of the Enterprise Agile Testing (Not exactly following my original order here):

  • Introduction
  • Test fixtures like assertion, builder
  • Test Environment Set Up with TestBase
  • ToolsHarness, a continuous integration server farm that treats tests individually
  • Active and stable branch, localizing the damage

Testing through Inversion of Control (IoC) Container

(For concept of IoC container, see Martin Fowler's article: Inversion of Control Containers and the Dependency Injection Pattern)

Ever since testing through dependency injection was formally named, it has become the most popular pattern for unit testing. You control the environment in which the class is tested by carefully constructing the classes in the dependency before injecting them into the class under test. In this style, a typical test is composed of three parts. They are named differently if you talk to different people, but the one that I like is what I learned when I presented the "Given, When, Ensure" notation of jBehave to BayXP meetings:

* Assemble: Construct the environment that the test is going to run.
* Act: Invoke the method(s) that you want to test.
* Assert: Assert that the tested method has caused the predicted change in the environment.

It is safe to say that anyone who has done enough testing won't have any problem with "Act" and "Assert". It is the "Assemble" that has been giving us trouble. The following is an illustration extending PicoContainer's diagram.
To test the class marked by the big arrow, you will need to create the world as this class sees, then invoke methods on class and assert changes caused by the invocation. Among the ways of constructing the world according to the class under test, Stub and Mocks are probably the only pattern that has been well documented. As indicated by the article, each solution has its own limitations. For a small to medium size application, these kind of tests are generally manageable. But if you have done enough enterprise application development (as defined in Introduction), then you probably have seen your fair share of mocks and stubs getting out of hand, as was the case for Guidewire tests until year 2007.

Testing with a Loaded Container

During the past year, Guidewire has been slowly converting its tests onto a home-grown JUnit extension framework. The framework does the heavy lifting of constructing the dependencies, so that by the time the test code is called, all the dependencies have already been set up properly. If you really want to, you can even access a full web container through the embedded Jetty server. By putting your class inside a full container, you get a lot of benefits that you won't normally with a bare bone unit test, and you without breaking a sweat.

The immediate benefit is that you no long deal with mocking, stubbing, guessing. When your test calls into a method, you can be sure that the class would be in the same state as it is called in the real world (It might still not be in the one that you want in your test but that is a separate issue). Without mocking and stubbing, you don't need to walk on the egg shells any more as you change the class responsibilities and collaborations. You can call into a real messaging manager through the container, enable a message destination, commit an entity, and test that the message of the changes appear. All the codes paths match exactly the real world, so that your won't have any integration surprises down the road.

Because all the required validations are turned on, you are forced to create realistic data. With realistic data, your test becomes more realistic. You can put your test under debug mode at any time and get a good sense of what the data will be like in a real server. If you make a mistake and forget to set a non-nullable field, your test will blow up right away.

With a loaded container, you feel more confidence in the class that you are designing. Because you can see easily how this class fits into the whole world, you can make sure it becomes a good citizen by doing just its job, no more, no less.

This framework is extremely flexible, making it very powerful. You can modify the testing environment by annotating your test and registering your own annotation handlers. In this way, you can add additional set up code without even creating your own super test base, a typical case of favoring composition over inheritance. You will see many annotations that we have built already in the past half year.

Performance Improvement Considerations

Of course, all these are much easier said than done. And we are sort of going against the conventional wisdom of unit testing here. The first question most readers will raise would probably be "Loading the whole container for a simple unit test??? How can your test perform!?" Please trust me when I say that I had the same concerns. But after adapting to it for half a year, I think this is definitely a good solution.

First of all, performance is overrated. No, I am just kidding. The first thing that I would like to say is that if you are a TDD veteran, in that you know how to design your class such that you can manage your own dependencies well most of the time, then kudos to you and you can use the @RunLevel annotation to tell the framework not to do any set up that for you (see below)

I was actually not totally joking. I would like to argue that for an enterprise application (as described in Introduction), it is not uncommon that some part of the system is not designed as cleanly as it could have been. As a result, you have to choose between making the test run fast through the kind of mock that no one knows what is going on, or making the test run a bit slower but reflects the real system. Since design validation is the whole purpose of tests, I vouch for testing the right thing with a bit of sacrifice on the speed.

In addition, the test framework has a set of performance considerations in place to make sure that overall the test performs well.

Run Level

Guidewire applications have the notion of run level as a way to bring the system online in stages. You can annotate each test with the desired run level to have just the things you need set up before the test. The following is the list of run levels that I have used.

* NONE: This is just like good-old jUnit test.
* Shutdown: At this level, you have all the system configuration read in and meta data loaded. You can run any test that does not touch database
* No Daemon: This is the default value. At this level, you have the database connection initialized and the schema updated. You can run any test that hits the database.
* Multiple User: At this level, you have a full blown application server with background batch process running. This is typically used by QA for acceptance testing.

Database Tests

By default, all tests are using H2 as the embedded databases which greatly improves the test performance. I have been a big fan on in-memory database since HSQLDB. DBFixture is the proof.

During the development, the database schema changes all the time. Guidewire products have an upgrader built in place to compare the database schema and automatically issues SQL statements to upgrade the database to the right schema. However, the upgrade process can take time. To save time, a backup copy is created after the upgrade finishes so that the database can be restored as necessary (See @ChangesSchema). There is one implementation for each database that we officially support so all the tests can run on all databases if we choose to.

For each table there is also a shadow table that stores the default data set up by the test environment. Before each test run, the data in each table is restored from the shadow. In this way, different tests won't step on each other's toes and end up causing other tests to fail. For performance reason, the data is only restored once for each test classes, because it is easier to make sure that the test methods in the same test class don't affect each other's data.

Server Mode for Web Testing

The QA acceptance tests are written in GScript. When running in browser mode, it uses Selenium to drive the browser to connect to the server and run tests. However, when you have enough tests, the slowness of the browser really shows. Guidewire applications are built on top of JSF framework, where the generated HTML source is driven by the page model on the server. With the exactly same script, we can run them in server mode, where the scripts are run against the page models in the server session. Without the browser layer, HTTP connection, HTML generation and parsing, the test run is cut down dramatically again.

Functional Considerations

The meta data layer of Guidewire applications is extremely extend-able and configurable, and the SQL being executed in the database layers is generated dynamically based on the metadata configuration and the database set up. It would not be practical to mock out the whole thing. The test framework provides a fixed out-of-the-box container for each test and locks it down so that the test or the code under test wouldn't accidentally try to change those dependencies. But the developers can modify the test environment through annotations. The following are the typical annotations:

@IncludeModules for Configuration Testing: With this annotation, you can specify a list of directory where the test should load the additional configuration from. In this way, you can configure the test environment (registering additional plugin, registering additional SOAP interface, extend the basic data model, add additional web pages, etc.). This is great when you want to test different configuration cases, and still leave the base configuration simple and fast.

@ChangesTime for Time-based Testing: Sometimes your test is date sensitive. With this annotation, you get a hook to change the system date on the fly before you creates the data you want so that timestamp meets your condition.

@ChangesSchema for upgrade testing: With this annotation, your test can run wild and make a havoc of the database schema. At the end of your test, the schema will be restored from the backup automatically. This is very useful for upgrader related tests.

Testing Annotations

These are the additional annotations telling the test framework how you want your test to run:

@ProductUnderTest: You can write a test, put it in a common module and tell the test framework which product you want this test to run. For example, we need to make sure that the base data model can pass the validation for all applications. We can write a test that will start the validation without being dependent on which product it is. With this annotation, the same test can be run with data model from each products. Think dependency injection on production is a good way to go? Why not apply it to test?

@TestInDatabase: From time to time, you have to implement something that is a little different for different databases, or a feature that is only applicable to one database (Oracle AWR report, for example). With this annotation, you can tell the test framework which database this test should be run against. By default, all tests are running in H2 database only for performance reason.

@DoNotRunInHarness: This is for push-button tests that cannot be run automatically. For example, we have a test that pings map point web services and make sure that we can parse the result properly. Map point ended up telling us not to ping their staging server continuously. So this test is disabled in the testing server.

Testing Semantics

There are also other productivity improvements. Your test case can now implement beforeClass(), afterClass(), beforeMethod() and afterMethod() to be run in the way, well, as the name indicated. After answered enough question about when setUp() and tearDown() are run, I think it is a nice change.

Because jUnit holds on to ALL the test instances, each fields in the test class is actually a memory leak as far as the test concerns. The test framework automatically null out all the fields (with some configurable exceptions) at the end of the test case when all the tests methods are done.

Other Considerations

This kind of test writing is also supported by our other development practices, namely ToolsHarness and Branching strategy, which I will cover in detail in later posts.

With your tests covering more code, the tests could very well break for the wrong reason. With the ToolsHarness, we were able to exam each test failure easily, locate and isolate the problems easily and the development won't grind to a halt every time there is a broken test. With the test farm provided by the ToolsHarness, our test can run concurrently so we can have better tolerance on the speed of individual test.

With the branching strategy, we are making sure that the platform code is in a good enough state before it is released to the application team.


Appendix: Things to watch out for

At the same time we creating a path to make test easier to write, we also put ourselves on a slippery slope that could lead us further and further away from effective unit testing. Sometimes it is much easier to write a test that covers a lot of than to set up the environment so that only the code you want to be tested will be tested. Why is that bad? Here is an example:

As I am writing this post, I am wrapping up a feature called "Field Level Encryption" by adding upgrade support from an earlier version of the application. It was extremely tempting to do the following:

...
// the column is length 6 nullable, alter it to leng 3 and not nullable
String[] sqls = getDbCatalogSupport().alterColumn(table, column).withLength(3).withNullability(false).getSql()
DatabaseTestUtil.updateInTx(sqls)

// Insert data that need to be updated
DatabaseTestUtil.updateInTx("insert into px_test_encryption (id) values (1)")

// run the upgrader to make sure it does not fail
new Upgrader(database).upgrade()

// run the schema checking to make sure everything is up-to-date
List error = new DatabaseSchemaVerifier(getDbCatalogSupport.buildSchema()).verifyAll()
assertThat().list(error).isEmpty()

Object[] row = assertThat().sql("select encrypted_field from px_test_encryption where id = 1", new Class[] {String.class}).hasOneRow()
assertThat().array(row).is("tluafeddefault") // null column should be updated with encrypted default value.

I am very sure that we can all agree that this is very concise and expressive. Change the database schema, insert data, run the upgrade, make sure that the schema is now up-to-date and that the row is updated correctly, just like it should be, right?

Not quite...

The problem with this test lies in the "upgrade()" and "verifyAll()" method calls. They are both very comprehensive and cover a lot of area. As a result, this test runs for a long time (over a minute). At the same time, someone could check-in a code with bug in either the upgrade code, or schema verification that has nothing to do with encryption, and this test will be broken. In an enterprise environment, you only need a small portion of tests like this to generate enough noise. And eventually developers will be so tire of spending time on a broken test only to find out that three other people are also looking at it and it will be fixed by one of them. You will start delaying looking at broken tests and they will stay broken for a long time, other changes will be applied on top of the changes that broke the test, you will have a hard time fixing them, you will start hate tests, you will write less, the quality of the product will go down...

So, for the sake of everybody, lets spent more time making the test as fine-grained as possible

...
// the column is length 6 nullable, alter it to leng 3 and not nullable
String[] sqls = getDbCatalogSupport().alterColumn(table, column).withLength(3).withNullability(false).getSql()
DatabaseTestUtil.updateInTx(sqls)

// Insert data that need to be updated
DatabaseTestUtil.updateInTx("insert into px_test_encryption (id) values (1)")

// run the upgrader to make sure it does not fail
new Upgrader(database).encryptDecrptUpgrade()

... (Some other code to verify just this schema)...

Object[] row = assertThat().sql("select encrypted_field from px_test_encryption where id = 1", new Class[] {String.class}).hasOneRow()
assertThat().array(row).is("tluafeddefault") // null column should be updated with encrypted default value.

However, this is not to say a test like the original one does not provide some value for being comprehensive. We do have upgrade tests like this for specific kind of upgrades, the ones that our customers are going to go through. Those tests will load the database from a backup so that the schema matches the ones that we release to our customer, then we run upgrader through it and verify that the schema is up-to-date. Each test also has an opportunity to insert additional data before the upgrade and do additional verification after the upgrade. In this way, when our customers get our newer build, rest assured the upgrade will not blow up horribly.

Saturday, January 12, 2008

Enterprise Agile Testing Part I : Introduction

The idea of "Enterprise Agile Testing" has been in my head for several months now, result of what I have learned at Guidewire and based on my previous XP experience at ThoughtWorks. I am planning on a proposal to Agile 2008 on this topic. Before I can choose my proposal topic, I need to write everything down first, kind of like a project engagement.

Actually, project engagement is not a bad analogy. I must define what my proposal is about and what it not about --- what is out of scope if you will. My approach is to write a series of blog posts, each cover a specific topic and look back to see what I end up with when I am done. If I approach it as if I like writing a book or even like that agile 2007 paper, I might never finish it.

The Enterprise in Agile Testing

Enterprise here means large scale software development. The large scale can come about through a large code base or a large team. Here I am ignoring the controversial topic of whether or not large code base or large team are problems that should be avoided in the first place. They exist, I just want to point out two things that result from them with regarding testing in such an environment.

First, with a large code base, a tester cannot clearly hold onto the code's design in his or her head, let alone the intention of the test. Instead, agile testing in enterprise environments requires a comprehensive testing framework. This framework must do more than what JUnit does out-of-the box, so that anyone (including you) can come back to a test at any time and understand it.

Second, with large team, it is pretty much impossible to ensure everyone is aware of how important testing is. This is not to say that you should give up on a large team's continuous improvement on treating testing seriously and writing better tests, but I have found out that the line of "Zero Test breakage" is extremely hard to hold. As a result, some middle ground must be reached between complete awareness and total ignorance. Only in this way is it possible to see results in testing improvement efforts.

Agile

Since Rob and Jim pulled me away from the EJB madness and introduced me to the wonderful world of XP in 2001, "Enterprise" has slowly restored its place in my vocabulary. At the same time, the term "Agile" is getting closer and closer to my list of red flag words.

Agile here refers to the situation where the code is constantly under changes. This can happen because the requirements keep changing, or the development is done iteratively through story driven development. Constant change makes the ability to write concise tests more important. Additionally, the tests status of project must be treated as more than a binary state if testing is to keep pace with development. In this way, the development would not be paralyzed because there will almost always a test broken here and there, and the turn-around time for testing the checked-in code is not as short as 10 minutes.

Testing

So you have a large project code base that you keep changing, with a team of members with mixed skills. You need to ensure that the code (including the tests) you write is of high enough quality so that two moths from now you can still read them, understand what they do, understand why they do that, and change them. At the same time, you want to give others the time and tools to adjust to test infected development and hopefully test-driven development eventually.

Content and Structure

So, the above is the introduction. The items that are in my mind are as following, I'll update the links as I post them.
  • Test utilities like assertion, builder
  • TestBase with annotations for test environment configuration
  • ToolsHarness, a continuous integration server that treats tests individually
  • Active and stable branch, localizing the damage

Tuesday, January 08, 2008

Team Estimation Game - By Steve Bockman

Well, Steve didn't have time to put it up his website, so I figure I'll write up what I have learned here.

I ran into Steve during the agile 2007 conference so I invited him to do the session for us at BayXP. Last October, Steve came down and hosted a session of the Team Estimation Game. This is the first time I learned it I really liked it.

Given a set of defined stories, the purpose of this game is to come to the consensus about the relative estimation of the work to be done for each story. If you don't have a set of stories to try it out, you can use the sample produced by Steve here.

So once you have all the story cards, you first estimate their complexity relative to each other:


  1. Place Story Cards in pile on table.

  2. First player places top card on playing surface.

  3. Next player places top card on playing surface relative to first card.

  4. Next player can either:
    * Play top card from pile, or
    * Move a card on the playing surface, or
    * Pass
  5. Repeat Step 4 until
    a) no more cards remain in pile, and
    b) no player wishes to move a card

Here is an example of the result:


After the first stage is finished, you can now go on to assign points to them so that you can track your team velocity:

6) As a team, choose estimation units and values.

With the agreed units:

7) Place an estimate at the top of each column.
8) Change estimates until all team members agree.

We did it during the meeting and it went very smoothly. The interesting thing is that Steve told us the result is not too much different from the ones that he has conducted elsewhere with other groups.

I am definitely going to try it on my next story estimation session.

References:

* Planning poker, which is another popular and useful estimation game.
* Steve's slides: http://www.shaneduan.com/estimation/Team_Estimation_Game.ppt
* Planning Poker Cards you can buy here.