“this was one of those rare occasions where an idea turns out to be a lot better than intended”
In late 2009 I left a dying Sun Microsystems (right before the Oracle acquisition) to help my good friend Alok with his tiny un-funded startup called Yunteq. He was understandably having trouble finding engineers willing to work for a stake in the company rather than a salary, and since he was trying to make it as an entrepreneur and was in desperate need of help in a critical portion of the system, I went to his aid. The unsalaried period was only going to last three months — I could handle that — and he thought the company would probably be acquired in five, by a suitor they had been in talks with for a while. They were almost to the goal line, but they had a big problem; if I stepped in as a third developer to get them through it, I could own a significant share of a company about to be acquired. He had assembled a good team, and we would make decisions together, as co-founders.
It sure sounded good. In reality it was an extremely bumpy ride. :^)
Anyway, in early 2010 I started on the task of designing a public REST API for the product. It had to present the functionality of the server — which was all about managing and organizing roomfuls of servers, and the virtual machines running on them — to the outside world. We started with a typical design in which the URL broke down the universe into a hierarchy of resources like this one:
The resources in such an API effectively form a tree, the shape of which is defined by the application. But at some point Alok mentioned to me that while the above was fine for a cloud provider, a large company would probably not want the customer part but would want some way of representing divisions, regions, groups, departments, or whatever. “We’ll probably have to fork the code base for each customer,” he said.
I nearly choked on my tea at the sound of that. “Ummmm, let me think about this,” I responded.
In a couple of days I presented him with a design that at first he had trouble understanding. There was no fixed hierarchy to the API at all — the resource tree would be whatever we made it, by creating new nodes and saying what type we wanted each to be. Not only could each customer have a different tree arrangement, we didn’t have to set it up for them — they could do it themselves. They could do it piecemeal; anytime they wanted to add to or rearrange the tree, they could do so easily from the CLI. All using a single code base.
That is, this kind of API was really not about customers and applications and VMs and Disks and NICs and so on — it was about manipulating a tree of typed nodes. If you want a top-level grouping by customer, fine — create a node of type Folder called “cust” and within that a Folder for each customer. If you want to organize things some other way, no problem — just create the appropriate nodes in the tree to represent it. It was completely analogous to the way you create directories in a filesystem to organize your data the way you want, except that you could create nodes of arbitrary types from some available palette. A Folder here, a VM there, an ISO repository over there, etc. — however you want to arrange it.
By mid-2010 we had such a system working; the public API allowed clients to manipulate a dynamically typed resource tree.
But this was one of those rare occasions where an idea turns out to be a lot better than intended. Yes, the interface was now infinitely more flexible, and that was great. Each customer could organize its resources in our system in whatever way made sense for that enterprise. But that was just the beginning.
The API itself treated all nodes in the tree the same — as typed sets of attributes. How much RAM a VM was to have, or CPU, or disk space — all were just attributes of different types. The API was entirely about manipulating a tree of attribute-sets; it didn’t know or care what the resources or attributes were about. Developing in this system involved writing new resource types, but you never had to go back and fiddle with the API to be able to handle them; the API could already handle them. To it, your new resource type was just another typed set of attributes.
And then our CLI got simpler. There was no longer any code specific to VMs or customers or a disks or anything else — it just extended the API’s view of the world, allowing the user to manipulate a tree of attribute-sets from the command line. Not only did we no longer have to fuss with the API when we added functionality, we didn’t have to fuss with the CLI either. As we added new resource types, or added/changed attributes of existing resource types, no work at all had to be done in the API or CLI. This was way cool.
Then we added the notion of a “policy” — a kind of resource that changes the way the system deals with other resources. The fact that the customer was in control of the arrangement of resources gave us a scoping mechanism for policies: wherever you put the policy in the tree, the policy was in effect for the subtree under that point. And if you put another instance of the same type of policy below it, the deeper policy overrode the shallower one. This was a simple yet powerful scoping mechanism, made possible by the fact that we had handed control over the tree’s layout to the user.
Testing was also improved. Among other things, the fact that resource types and attributes were treated so generically meant that much testing could be done once and for all. For example, a single suite of tests can check that an attribute which is supposed to be an integer within a specified range works as it should; then if there are actually a dozen such attributes in the system there is no need to test out-of-range values or non-integer input on each.
Even the API documentation and help system documentation were simplified. Once you understand the basic idea, that the API just allows you to manipulate a tree of typed resources, the bulk of what you need to know is 1) what resource types are available, and 2) what are their attributes? Much of this kind of documentation can be generated automatically by the system.
In effect we had created some good generic infrastructure for creating applications. Starting from that infrastructure, you just define resource types for your domain and you are done. Of course, those resource types are non-trivial — that’s the meat of the application. But you get a lot for free.
There’s more to this, of course — I can write about the API because it is public, but I won’t say much about the interesting mechanics inside the server. I will say, though, that I had wanted to do this in Scala and Akka, but management had never heard of those (“Forget Scala,” I was told) and got someone to set up a much more conventional stack. It worked reasonably well, but had its problems. A bit frustrated, I spent a few weeks over Christmas writing a Scala/Akka/Spray prototype and demoed it to them when they got back from their vacations. They were really impressed at how much cleaner the API code was (the Spray DSL for creating APIs is awesome), and defining resource types in this new system was much easier. To their credit, they took a serious look at Scala and Akka and decided that we should use it. I now have new server infrastructure for dynamic resource trees working in Scala/Akka/Spray; it’s a huge improvement and I am a much happier camper.
And now that same management has told the entire group to use Scala. Go figure.