Category Archives: Testing

Extremely Agile Testing using a Lazy Test Harness

“it is generally a lot easier to verify a result reported by the test harness than it is to figure out the right answer yourself beforehand and write the code to check for it”

I don’t particularly enjoy writing tests, but l noticed long ago that I enjoy debugging insufficiently tested systems even less. If you have a suite of old tests that you can run whenever you think you’ve gotten something new working, it can save you a ton of trouble later by showing you that you’ve broken something — right away, when you know that something in those 40 lines you just mucked with must have caused the problem.

The traditional way of writing tests looks something like this:

  1. Have the system under test do something.
  2. Capture some result that you can use to determine whether what you tried worked.
  3. Check that result in various ways, raising a flag if it’s wrong.

Eons ago I was writing a date/time library and realized that I would need hundreds of tests; it was worth my effort to make writing them as simple as possible. I created a simple CLI to invoke the routines in the library and a test harness that used the CLI to run a bunch of tests. Each test was just a single line that made the library do something, e.g. convert a date to Julian; the harness did all the rest.

“Wait a minute,” you complain — “how did the harness know whether the test passed? A test must have done more than just make the system do something!”

But it did not.  A test really did just look something like

  toJulian 1988-03-21


  addDays 1988-12-29 5

So how did the test harness know whether the test passed or failed? Well, the first time the test was run, the harness did not know — it had to ask whether the output was correct. If I said yes, it saved the output as the correct result. The tests were “lazy” inasmuch as the correct results were not established until the tests were run the first time.

This approach proved extremely convenient — I could create a new test in a few seconds. And while the regression tests usually just passed without incident, there were in fact many times when I had inadvertently broken something and the tests saved my bacon. Without those tests I would have discovered far later, perhaps even at a customer site, that something somewhere wasn’t working, and would have had to laboriously trace it back to those 40 lines. And it might take me a while to fully remember what those 40 lines were about.

The point of doing tests this lazy way is that it is generally a lot easier to verify a result reported by the test harness than it is to figure out the right answer yourself beforehand and write the code to check for it. This is especially true if the right answer is, as is often the case for me, a dump of some tree structure, dozens of lines long. I can look at such output and pretty quickly say “yes, that’s right,” but if I had to explicitly code such a tree structure into each test, you sure wouldn’t see me writing many of them!

Furthermore, if I deliberately change something that causes the tests to fail, I don’t have to go back and fix all of the affected tests manually. The harness stops at every failure, shows me a side-by-side diff of what was expected and what we actually got, and asks me whether I want to accept the new output as the “good” output. If I say yes, the harness fixes the test for me. I can even tell the harness that if further tests fail with the same diff, they should be automatically updated without asking.

In some scenarios this approaches presumes the existence of a CLI. These days I write server apps with REST APIs, and I always create CLIs for them. “We don’t have a requirement for a CLI,” a manager told me recently, thinking we would save time by not bothering with one. “You’re getting one anyway,” I responded. I always port/write a shell-based CLI, giving us a very powerful way to control and script the server — very handy. Then I port/write a lazy regression test harness (a couple of hundred lines of shell, currently) to the CLI and begin writing lots of one-line tests.

And suddenly testing is not such a drag.

UPDATE:  Eric Torreborre would like to see support for lazy test development in his highly regarded “specs2” unit test framework for Scala code. That would be a fantastic feature.

UPDATE:  I discovered from Bill Venners (who wrote ScalaTest) that somebody has created a facility for doing this with unit tests, called ApprovalTests.  Unfortunately it seems tightly bound to JUnit, so interfaces to specs2 and ScalaTest are unlikely.

Test with Bigger Integers

It’s always a jolt when some simple expression in a language I’ve been using for a long time evaluates to something I can’t explain.  It happened today with Java.

I had asked a coworker to run FindBugs against one of our products. He forwarded me the output, which contained several issues of this type:

Suspicious comparison of Integer references

Sure enough, when I looked at the code I found expressions using == to compare one Integer to another. That’s usually bad because Integers are objects, and == for objects tests whether they are the same object, not whether they have equal values, so two Integers could appear to be != even though they hold the same value.

That’s easy enough to fix, but just to make sure I understood all the cases I wrote a test that compared, among other things, an Integer 88 to an int 88 and to another Integer 88. The good news was that Integer == int was true (autounboxing). The bad news was that Integer == Integer was also true — there was something wrong with my test.

I tried a number of similar tests, all with the same result. I began to suspect that this was some kind of optimization, and in that case it was likely only for small numbers. Ultimately I figured out that == always returns true when comparing Integers from -128 to 127. Once I saw that, I was able to find references to this behavior in the specification. The same is true for Longs and Integers and Shorts. The JRE prepares objects for small numbers in advance, and hands you one of those instead of creating a new one each time. Since you get the same object each time for those numbers, == works. But it doesn’t work for numbers outside that range.

OK, cool — I learned something new. But looking at the output of FindBugs, I realized that we hadn’t seen those bugs in testing, probably because we typically used small numbers for those values. Let’s see, we need a couple of numbers for VLAN tags — what do we pick? Typically 1 and 2. Maybe 7 and 12 if we’re feeling wild and crazy. Had we never tested with a number greater than 127, we might have shipped the code with that problem. Yay FindBugs!

So here are some thoughts:

  • If you are in QA for a Java product, use numbers outside the range -128..127 wherever they are allowed.
  • If you are writing a Java product, consider initializing ID generators to 128 rather than 0 or 1, at least during testing.
  • Run FindBugs on your code.
  • Use Scala.

The semantics for == in Java are troublesome: value comparison for primitives, but identity comparison for objects, but effectively back to value comparison for small numeric objects. Oh, and with autoboxing we’re going to magically convert between primitives and objects to help you out, sometimes (depending on the actual value) changing the semantics of ==. You’re welcome.

In Scala, == compares values. Always. That doesn’t depend on what types you are comparing, or on the actual values. If you really want to test whether two variables refer to the same object, you use eq, which isn’t even allowed on Ints, Longs, etc. In Scala it’s pretty hard to screw this up.