Friday, February 08, 2008

Enterprise Agile Testing Part II : Test Environment Set Up with TestBase

This is the second part of the Enterprise Agile Testing (Not exactly following my original order here):

  • Introduction
  • Test fixtures like assertion, builder
  • Test Environment Set Up with TestBase
  • ToolsHarness, a continuous integration server farm that treats tests individually
  • Active and stable branch, localizing the damage

Testing through Inversion of Control (IoC) Container

(For concept of IoC container, see Martin Fowler's article: Inversion of Control Containers and the Dependency Injection Pattern)

Ever since testing through dependency injection was formally named, it has become the most popular pattern for unit testing. You control the environment in which the class is tested by carefully constructing the classes in the dependency before injecting them into the class under test. In this style, a typical test is composed of three parts. They are named differently if you talk to different people, but the one that I like is what I learned when I presented the "Given, When, Ensure" notation of jBehave to BayXP meetings:

* Assemble: Construct the environment that the test is going to run.
* Act: Invoke the method(s) that you want to test.
* Assert: Assert that the tested method has caused the predicted change in the environment.

It is safe to say that anyone who has done enough testing won't have any problem with "Act" and "Assert". It is the "Assemble" that has been giving us trouble. The following is an illustration extending PicoContainer's diagram.
To test the class marked by the big arrow, you will need to create the world as this class sees, then invoke methods on class and assert changes caused by the invocation. Among the ways of constructing the world according to the class under test, Stub and Mocks are probably the only pattern that has been well documented. As indicated by the article, each solution has its own limitations. For a small to medium size application, these kind of tests are generally manageable. But if you have done enough enterprise application development (as defined in Introduction), then you probably have seen your fair share of mocks and stubs getting out of hand, as was the case for Guidewire tests until year 2007.

Testing with a Loaded Container

During the past year, Guidewire has been slowly converting its tests onto a home-grown JUnit extension framework. The framework does the heavy lifting of constructing the dependencies, so that by the time the test code is called, all the dependencies have already been set up properly. If you really want to, you can even access a full web container through the embedded Jetty server. By putting your class inside a full container, you get a lot of benefits that you won't normally with a bare bone unit test, and you without breaking a sweat.

The immediate benefit is that you no long deal with mocking, stubbing, guessing. When your test calls into a method, you can be sure that the class would be in the same state as it is called in the real world (It might still not be in the one that you want in your test but that is a separate issue). Without mocking and stubbing, you don't need to walk on the egg shells any more as you change the class responsibilities and collaborations. You can call into a real messaging manager through the container, enable a message destination, commit an entity, and test that the message of the changes appear. All the codes paths match exactly the real world, so that your won't have any integration surprises down the road.

Because all the required validations are turned on, you are forced to create realistic data. With realistic data, your test becomes more realistic. You can put your test under debug mode at any time and get a good sense of what the data will be like in a real server. If you make a mistake and forget to set a non-nullable field, your test will blow up right away.

With a loaded container, you feel more confidence in the class that you are designing. Because you can see easily how this class fits into the whole world, you can make sure it becomes a good citizen by doing just its job, no more, no less.

This framework is extremely flexible, making it very powerful. You can modify the testing environment by annotating your test and registering your own annotation handlers. In this way, you can add additional set up code without even creating your own super test base, a typical case of favoring composition over inheritance. You will see many annotations that we have built already in the past half year.

Performance Improvement Considerations

Of course, all these are much easier said than done. And we are sort of going against the conventional wisdom of unit testing here. The first question most readers will raise would probably be "Loading the whole container for a simple unit test??? How can your test perform!?" Please trust me when I say that I had the same concerns. But after adapting to it for half a year, I think this is definitely a good solution.

First of all, performance is overrated. No, I am just kidding. The first thing that I would like to say is that if you are a TDD veteran, in that you know how to design your class such that you can manage your own dependencies well most of the time, then kudos to you and you can use the @RunLevel annotation to tell the framework not to do any set up that for you (see below)

I was actually not totally joking. I would like to argue that for an enterprise application (as described in Introduction), it is not uncommon that some part of the system is not designed as cleanly as it could have been. As a result, you have to choose between making the test run fast through the kind of mock that no one knows what is going on, or making the test run a bit slower but reflects the real system. Since design validation is the whole purpose of tests, I vouch for testing the right thing with a bit of sacrifice on the speed.

In addition, the test framework has a set of performance considerations in place to make sure that overall the test performs well.

Run Level

Guidewire applications have the notion of run level as a way to bring the system online in stages. You can annotate each test with the desired run level to have just the things you need set up before the test. The following is the list of run levels that I have used.

* NONE: This is just like good-old jUnit test.
* Shutdown: At this level, you have all the system configuration read in and meta data loaded. You can run any test that does not touch database
* No Daemon: This is the default value. At this level, you have the database connection initialized and the schema updated. You can run any test that hits the database.
* Multiple User: At this level, you have a full blown application server with background batch process running. This is typically used by QA for acceptance testing.

Database Tests

By default, all tests are using H2 as the embedded databases which greatly improves the test performance. I have been a big fan on in-memory database since HSQLDB. DBFixture is the proof.

During the development, the database schema changes all the time. Guidewire products have an upgrader built in place to compare the database schema and automatically issues SQL statements to upgrade the database to the right schema. However, the upgrade process can take time. To save time, a backup copy is created after the upgrade finishes so that the database can be restored as necessary (See @ChangesSchema). There is one implementation for each database that we officially support so all the tests can run on all databases if we choose to.

For each table there is also a shadow table that stores the default data set up by the test environment. Before each test run, the data in each table is restored from the shadow. In this way, different tests won't step on each other's toes and end up causing other tests to fail. For performance reason, the data is only restored once for each test classes, because it is easier to make sure that the test methods in the same test class don't affect each other's data.

Server Mode for Web Testing

The QA acceptance tests are written in GScript. When running in browser mode, it uses Selenium to drive the browser to connect to the server and run tests. However, when you have enough tests, the slowness of the browser really shows. Guidewire applications are built on top of JSF framework, where the generated HTML source is driven by the page model on the server. With the exactly same script, we can run them in server mode, where the scripts are run against the page models in the server session. Without the browser layer, HTTP connection, HTML generation and parsing, the test run is cut down dramatically again.

Functional Considerations

The meta data layer of Guidewire applications is extremely extend-able and configurable, and the SQL being executed in the database layers is generated dynamically based on the metadata configuration and the database set up. It would not be practical to mock out the whole thing. The test framework provides a fixed out-of-the-box container for each test and locks it down so that the test or the code under test wouldn't accidentally try to change those dependencies. But the developers can modify the test environment through annotations. The following are the typical annotations:

@IncludeModules for Configuration Testing: With this annotation, you can specify a list of directory where the test should load the additional configuration from. In this way, you can configure the test environment (registering additional plugin, registering additional SOAP interface, extend the basic data model, add additional web pages, etc.). This is great when you want to test different configuration cases, and still leave the base configuration simple and fast.

@ChangesTime for Time-based Testing: Sometimes your test is date sensitive. With this annotation, you get a hook to change the system date on the fly before you creates the data you want so that timestamp meets your condition.

@ChangesSchema for upgrade testing: With this annotation, your test can run wild and make a havoc of the database schema. At the end of your test, the schema will be restored from the backup automatically. This is very useful for upgrader related tests.

Testing Annotations

These are the additional annotations telling the test framework how you want your test to run:

@ProductUnderTest: You can write a test, put it in a common module and tell the test framework which product you want this test to run. For example, we need to make sure that the base data model can pass the validation for all applications. We can write a test that will start the validation without being dependent on which product it is. With this annotation, the same test can be run with data model from each products. Think dependency injection on production is a good way to go? Why not apply it to test?

@TestInDatabase: From time to time, you have to implement something that is a little different for different databases, or a feature that is only applicable to one database (Oracle AWR report, for example). With this annotation, you can tell the test framework which database this test should be run against. By default, all tests are running in H2 database only for performance reason.

@DoNotRunInHarness: This is for push-button tests that cannot be run automatically. For example, we have a test that pings map point web services and make sure that we can parse the result properly. Map point ended up telling us not to ping their staging server continuously. So this test is disabled in the testing server.

Testing Semantics

There are also other productivity improvements. Your test case can now implement beforeClass(), afterClass(), beforeMethod() and afterMethod() to be run in the way, well, as the name indicated. After answered enough question about when setUp() and tearDown() are run, I think it is a nice change.

Because jUnit holds on to ALL the test instances, each fields in the test class is actually a memory leak as far as the test concerns. The test framework automatically null out all the fields (with some configurable exceptions) at the end of the test case when all the tests methods are done.

Other Considerations

This kind of test writing is also supported by our other development practices, namely ToolsHarness and Branching strategy, which I will cover in detail in later posts.

With your tests covering more code, the tests could very well break for the wrong reason. With the ToolsHarness, we were able to exam each test failure easily, locate and isolate the problems easily and the development won't grind to a halt every time there is a broken test. With the test farm provided by the ToolsHarness, our test can run concurrently so we can have better tolerance on the speed of individual test.

With the branching strategy, we are making sure that the platform code is in a good enough state before it is released to the application team.


Appendix: Things to watch out for

At the same time we creating a path to make test easier to write, we also put ourselves on a slippery slope that could lead us further and further away from effective unit testing. Sometimes it is much easier to write a test that covers a lot of than to set up the environment so that only the code you want to be tested will be tested. Why is that bad? Here is an example:

As I am writing this post, I am wrapping up a feature called "Field Level Encryption" by adding upgrade support from an earlier version of the application. It was extremely tempting to do the following:

...
// the column is length 6 nullable, alter it to leng 3 and not nullable
String[] sqls = getDbCatalogSupport().alterColumn(table, column).withLength(3).withNullability(false).getSql()
DatabaseTestUtil.updateInTx(sqls)

// Insert data that need to be updated
DatabaseTestUtil.updateInTx("insert into px_test_encryption (id) values (1)")

// run the upgrader to make sure it does not fail
new Upgrader(database).upgrade()

// run the schema checking to make sure everything is up-to-date
List error = new DatabaseSchemaVerifier(getDbCatalogSupport.buildSchema()).verifyAll()
assertThat().list(error).isEmpty()

Object[] row = assertThat().sql("select encrypted_field from px_test_encryption where id = 1", new Class[] {String.class}).hasOneRow()
assertThat().array(row).is("tluafeddefault") // null column should be updated with encrypted default value.

I am very sure that we can all agree that this is very concise and expressive. Change the database schema, insert data, run the upgrade, make sure that the schema is now up-to-date and that the row is updated correctly, just like it should be, right?

Not quite...

The problem with this test lies in the "upgrade()" and "verifyAll()" method calls. They are both very comprehensive and cover a lot of area. As a result, this test runs for a long time (over a minute). At the same time, someone could check-in a code with bug in either the upgrade code, or schema verification that has nothing to do with encryption, and this test will be broken. In an enterprise environment, you only need a small portion of tests like this to generate enough noise. And eventually developers will be so tire of spending time on a broken test only to find out that three other people are also looking at it and it will be fixed by one of them. You will start delaying looking at broken tests and they will stay broken for a long time, other changes will be applied on top of the changes that broke the test, you will have a hard time fixing them, you will start hate tests, you will write less, the quality of the product will go down...

So, for the sake of everybody, lets spent more time making the test as fine-grained as possible

...
// the column is length 6 nullable, alter it to leng 3 and not nullable
String[] sqls = getDbCatalogSupport().alterColumn(table, column).withLength(3).withNullability(false).getSql()
DatabaseTestUtil.updateInTx(sqls)

// Insert data that need to be updated
DatabaseTestUtil.updateInTx("insert into px_test_encryption (id) values (1)")

// run the upgrader to make sure it does not fail
new Upgrader(database).encryptDecrptUpgrade()

... (Some other code to verify just this schema)...

Object[] row = assertThat().sql("select encrypted_field from px_test_encryption where id = 1", new Class[] {String.class}).hasOneRow()
assertThat().array(row).is("tluafeddefault") // null column should be updated with encrypted default value.

However, this is not to say a test like the original one does not provide some value for being comprehensive. We do have upgrade tests like this for specific kind of upgrades, the ones that our customers are going to go through. Those tests will load the database from a backup so that the schema matches the ones that we release to our customer, then we run upgrader through it and verify that the schema is up-to-date. Each test also has an opportunity to insert additional data before the upgrade and do additional verification after the upgrade. In this way, when our customers get our newer build, rest assured the upgrade will not blow up horribly.

No comments: