Category Archives: Software

O(1) sum_of_multiples() in Rust

I had been working mostly in Scala for a while, then took a diversion into Swift and Objective C.  I wanted to learn another language after that, and had all but decided on Clojure.  But Rust kept nagging at me — there was something about it.  Perhaps I’ll blog more about that later.

So I watched some videos, then read the book, and then started the Rust track at  Nice site.  One of the exercises is about generating the sum of all multiples under some limit for a given list of factors.  So for example, if the factors are 3, 5, and 7, then the sum of the multiples under 12 is 3 + 5 + 6 + 7 + 9 + 10 = 40.

A naive solution is pretty straightforward:

pub fn naive_sum_of_multiples(limit: u32, factors: &[u32]) -> u32 {
    (1..limit).filter( |&i|
        factors.iter().any(|&j| i % j == 0)

It just iterates through all of the eligible numbers, picking out any that are evenly divisible by any of the factors, and sums them.  Rust’s iterators make that look pretty similar to a lot of other modern languages, like Scala.

But this solution is slow.  I mean really slow — at least compared to what’s possible.  As limit grows, the execution time increases linearly; that is, this is O(n).  (With respect to the limit, that is — we’re ignoring the (very real) impact of the number of factors on execution time. Note that in Exercism’s tests, there are generally two factors, max three, so for simplicity let’s focus on the limit here.)

Well, O(n) doesn’t sound so bad, does it?  But here’s the thing: it could be O(1).

What???  O(1)?  That would mean that you aren’t doing any more work if you increase the limit — and thus the numbers summed — by a factor of 100.  That can’t be right!  But yes, it is.

Here is the key observation.  The instant you see it, your brain will likely jump halfway to the solution.  The sum of (for example)

3 + 6 + 9 + 12 + 15 + 18 + 21

is just

3 * ( 1 + 2 + 3 + 4 + 5 + 6 + 7)

Got it yet?  Remember that the sum of the numbers from 1 to n is simply the value of n * (n + 1) / 2 — actually summing the numbers is unnecessary!

That’s not all there is to it, though. If asked to sum the multiples of 3 and 5 less than 1000, we can use the above technique to sum up the multiples of 3 and the multiples of 5 and then add them together, but we will have counted the multiples of 15 twice; we have to subtract those out.

And that’s a little trickier to do that than it sounds. First, what you really need to subtract out are the multiples of the least common multiple (lcm) of the two numbers, not their product. So, for example, if asked to sum the multiples of 6 and 15, we need to subtract off the multiples of 30 (not 90). The lcm of two numbers is their product divided by their greatest common divisor (gcd).

Also, we need to do this for an arbitrarily long list of numbers, so consider what happens if we are asked to sum the multiples of 4, 6, and 10:

  • First sum the multiples of 4.
  • Then add in the multiples of 6, but subtract the multiples of lcm(4, 6) = 12.
  • Then add in the multiples of 10, but subtract the multiples of lcm(4, 10) = 20 and the multiples of lcm(6, 10) = 30.

But oops, now we have gone the other way, subtracting off the multiples of 20 and 30 in common (60, 120, …) twice, and our result is too low, so we’ll have to add those back in. And if there were multiple corrections at that level (i.e. if we were given a larger list of numbers), we’d have to subtract their elements in common, and so on ad infinitum. At every step we have to take care not to add or subtract the same numbers twice.

That sounds like a pain, but using recursion it’s actually fairly straightforward.  In the following Rust code, I’ve changed the API a bit from the Exercism problem.  First, the integers are u64, so that we can use much bigger limits.  And secondly, in this case we’ll sum all multiples up to and including limit.  It’s an arbitrary choice anyway, and doing it this way will save us a small step for clarity.

pub fn fast_sum_of_multiples(limit: u64, factors: &[u64]) -> u64 {
  fn lcm(a: u64, b: u64) -> u64 { a*b / gcd(a,b) }
  fn gcd(a: u64, b: u64) -> u64 { if b == 0 {a} else { gcd(b, a%b) } }
  fn sum_from_ix(i: usize, limit: u64, factors: &[u64]) -> u64 {
    if i == factors.len() {  // we've processed all factors
    } else {
      let factor = factors[i];
      let n = limit / factor;  // # of multiples of factor to sum
      let sum_of_multiples_of_factor = factor * (n*(n+1)/2);
      let new_factors: Vec<_> = factors[..i].iter()
        .map(|&prev_factor| lcm(prev_factor, factor))
        .filter(|&factor| factor <= limit)
      let sum_of_previously_seen_multiples_of_factor =
        sum_from_ix(0, limit, &new_factors[..]);
      let sum_of_multiples_of_rest_of_factors =
        sum_from_ix(i+1, limit, factors);
        - sum_of_previously_seen_multiples_of_factor
        + sum_of_multiples_of_rest_of_factors
  sum_from_ix(0, limit, factors)

This is not too far from what a Scala solution would look like, although I think the Scala solution is a bit more readable.  In part this owes to Scala’s persistent List data structure, which was a little cleaner to work with here than a Rust Vec.  Also, Scala’s implicits make it unnecessary to make two calls that are explicit in the Rust version: iter() and collect().  Oh, and in Rust functions cannot close over variables; only closures can do that, but we couldn’t use a closure since in Rust closures cannot be recursive.  That forced us to explicitly include limit in the argument list of sum_from_ix().  Ampersands here and there in closures.  Semicolons.  These are little things, but collectively noticeable.

def fastSumOfMultiples(limit: Long, factors: List[Long]): Long = {
  def lcm(a: Long, b: Long) = a*b / gcd(a,b)
  def gcd(a: Long, b: Long): Long = if (b == 0) a else gcd(b, a%b)
  def sumOfMults(factors: List[Long], prevFactors: List[Long] = Nil): Long =
    factors match {
      case Nil => 0
      case factor::rest =>
        val n = limit / factor  // # of multiples of factor to sum
        val sum_of_multiples_of_factor = factor * (n*(n+1)/2)
        val sum_of_previously_seen_multiples_of_factor =
          sumOfMults(, factor)).filter(_ <= limit))
        val sum_of_multiples_of_rest_of_factors =
          sumOfMults(rest, factor::prevFactors)
        sum_of_multiples_of_factor -
           sum_of_previously_seen_multiples_of_factor +

But really, the difference is not so stark.  And the Rust version does not need a garbage collector — amazing!

On to the next exercise….

Reboot/Restart in a REST API using PUT

it is actually quite possible to do the reboot/reset in an idempotent manner using PUT

There was at one time a controversy around whether you were restricted to CRUD (Create/Read/Update/Delete) in defining REST APIs or whether it is OK to use POST for the odd “do something” request. Roy Fielding, who came up with REST in the first place, largely put this to bed by saying, in effect, “I never said you couldn’t create additional commands.”

The problem often surfaced when someone asked

I have a a resource with a status attribute. What should a request to reboot (or restart) it look like?

If you were using SOAP, the answer is obvious: have a Reboot command. But this is REST; is that the right thing to do? Why not use a PUT to set the status to Rebooting?

The problem with that is that it’s not idempotent. An intermediate server between the client and the REST API is allowed to reissue the command, which means that the system could, at least in theory (I wonder whether a server could really reboot fast enough for this to be an issue in practice), get rebooted a second time as a result.

On the other hand, you can understand REST API designers’ reluctance to just invent a new command. Falling back on POST to create new commands for things you don’t know how to do idempotently is, in a sense, the API equivalent of

Just use a goto statement.

That is, the facility is too general — it’s a catch-all that has completely open-ended semantics — “do something.” It begs to be abused. In my opinion, it is better to create useful abstractions on top of such open-ended facilities and then restrict developers to those abstractions. Just as I don’t want us to run a server as root or use a programming language with a goto statement, I don’t want us to have a “do something” facility in the abstraction layers over the API.

But the main point I want to make is that it is actually quite possible to do the reboot/reset in an idempotent manner using PUT. The reason it isn’t usually thought of is that people are a priori focused on a status attribute. Imagine that you also have a last_reboot attribute that holds the time of the last reboot; to reboot the system, simply do a PUT to change that to the current time.

The result is perfectly idempotent; if an intermediate server resends the command, it will have the same last_reboot time, and such an update is treated as a no-op. And an attempt to change the last_reboot time to a time older than its current value is an error. So picture something along these lines:

  class Server {
    def reboot = put( "/last_reboot", getTime )

Note that last_reboot is useful information about the system, as is its time. Sure, you could instead model this as a new, non-idempotent Reboot command that has a side-effect on the last_reboot value, but — uhhh, why? You already have a perfectly good, idempotent command that will do it, whose effect on last_reboot is not implicit.

I’m not saying that there will never be a case where you ought to create a command. But if you are stuck thinking that there is no idempotent way to make a certain update, perhaps you are thinking about the wrong attributes. Don’t be too quick to use a goto.

A Monte Carlo Simulation in Scala

On LinkedIn a fellow named Francois Dewaste posts links to some Scala-related articles he runs across, and I often check them out. Recently he shared an article about a Monte Carlo simulation in Scala, written by Alvin Alexander. I found the problem interesting, so I want to share my own Scala solution, which is written using a more functional style, and also a mathematical interpretation of the results.

First I’ll repeat the basic problem:

If two people who are each right on a test 80% of the time (but whose answers are uncorrelated) have the same answer for a particular question, what is the probability that their answer is right?

You’ll get much more out of this post if you spend a minute thinking about the problem. How would you model it mathematically? How would you explore the problem using software?

Apparently several statisticians he asked said that the answer was 80%. Presumably their thinking was along the lines that since their answers are uncorrelated, the fact that two (or three or …) agree means nothing. But that’s wrong.

If you’re like me, your immediate answer will be “there isn’t enough information.” Clearly you would have to know how many possible answers there are, because that will affect the probability that they have the same wrong answer. If there are only two possible answers, then if they are wrong they agree 100% of the time; if there are an infinity of possible answers, then if they are wrong they agree 0% of the time. Looking at his code, though, it was clear that he meant for there to only be two possible answers to each question.

OK — ready for the answers?

First the Simulation

I know most folks are here to look at the Scala code, so here it is. To make it a bit easier to follow, bold font indicates where the parameters of the simulation are used.

object MonteCarlo extends App {

  // Get the simulation parameters from the command line.
  val Array(numQuestions, numChoices, numPlayers, pctCorrect, numMustAgree)
    = args map (_.toInt)

  // The choices will be 0 .. numChoices-1; call the last choice correct.
  val correctAnswer = numChoices - 1

  // Generates an answer with a pctCorrect chance of being correct.
  def genAnswer =
    if (util.Random.nextInt(100) < pctCorrect)
    else  // pick a wrong answer from 0 to correctAnswer-1

  // For each question, generate player answers and look for consensus.
  // Where consensus is achieved, yield whether or not it is correct.
  // The result is an array, with one element for each consensus reached,
  // containing true if the answer they agreed on was correct.
  val correctnessOfConsensus =
    for { i <- 1 to numQuestions
          (answer,instances) <- Array.fill(numPlayers)(genAnswer) groupBy identity
          if instances.size >= numMustAgree
    } yield answer == correctAnswer

  // Print how often the consensus answer was correct.
  val timesAgreed = correctnessOfConsensus.size
  val timesRight  = correctnessOfConsensus count identity  // num true
  println( s"Consensus meant correctness $timesRight out of $timesAgreed times." )
  if (timesAgreed > 0) {
    println( f"That's ${ 100.0 * timesRight / timesAgreed }%.1f%%." )


It’s interesting to note that because of Scala’s type inference there isn’t a single type specification in the entire program, in spite of the fact that there are eleven bound variables. The call to toInt() is a conversion, not a type specification, and it would be needed in Python or Ruby as well.

The for-comprehension is the heart of the program: it repeatedly (once for each question) generates the players’ answers with the specified probability of being correct, figures out how many times each answer was used, picks out the cases where the required number of people agreed, and yields a Boolean indicating whether or not the agreed-upon value was correct.  Note that there can be more than one consensus for a given question; for example, if you use ten players, five of which must agree, then you can achieve consensus twice for a single question. Twelve players can reach consensus four times on a question if only three have to agree.  The for-comprehension (with its underlying flatmap) makes all of that trivial.

Now let’s run it for the parameters of the original problem, but with a million questions so we get a more accurate probability estimate:

  $ scala MonteCarlo 1000000 2 2 80 2
  Consensus meant correctness 640039 out of 679984 times.
  That's 94.1%.

And they were also interested in the probability of the answer being correct if 3 out of 3 had it:

  $ scala MonteCarlo 1000000 2 3 80 3
  Consensus meant correctness 512213 out of 520220 times.
  That's 98.5%.

Those are essentially the same answers Alvin got.

Now the Math

So what’s behind these numbers? In the above runs (where everyone must agree), there are three possibilities for each question:

  • A. They all agree and they are correct.
  • B. They all agree and they are incorrect.
  • C. They don’t all agree.

We don’t care about C; what we are interested in is

      p(A) + p(B)

Since their answers are uncorrelated, the probability that two players will both be right is the product of the probabilities that each is right. Since each is right 80% of the time, p(A) = square(0.8) = 0.64, and p(B) = square(0.2) = 0.04. So the expression above is 0.64 / ( 0.64 + 0.04 ) = 0.941 (roughly).

With three people, p(A) = cube(0.8) = 0.512 and p(B) = cube(0.2) = 0.008, so the probability that the consensus is correct is 0.512 / ( 0.512 + 0.008 ) = 0.9846 (roughly).

Oh, and remember the point about the number of possible answers mattering? Let’s say that instead of 2 possibilities there are a million:

  $ scala MonteCarlo 1000000 1000000 2 80 2
  Consensus meant correctness 640167 out of 640167 times.
  That's 100.0%.

With so many possible answers, the players almost never agree if they are incorrect. So, if they do agree, it’s because they are right.

There you have it! Interesting problem, Alvin — thanks!

[other Scala posts]

How to Make Google Chrome Launcher Open New Window

I’ve used Google Chrome on Linux for quite a while now, mostly happily, but there is one thing I’ve always found extremely frustrating. I just fixed it, so I thought I’d share.

Here was the problem. I usually have a dozen or so Chrome windows open at once, each of which may have several tabs. These windows, spread across four desktops, represent work in progress for me, as I handle interruptions or set aside a task while waiting for a response, or whatever. OK, so when I click on the Chrome icon to open a new browser window, I don’t get one — Chrome instead finds my most recently touched browser window and adds a new tab to it. It doesn’t matter if it has been hours since I touched that window, or if the window is on a different desktop — that other window gets the new tab. So I have to pull the tab from the lucky window, iconize the old window again, move the new window to the desktop I was on, and switch back to that desktop myself. Aaargh!

I have a hard time understanding why the launcher would work this way. If I want a new tab in some Chrome window, I can simply press the new tab icon in that window. If I go to the panel and ask for a new browser, why would I want it to dig up some old window and add a new tab to that?

I had looked at the man page, but there was no switch for this behavior. But today I found an undocumented (why???) switch that does just what I want:


So I edited the properties for the Google Chrome launcher to add in the switch, and all is golden:


Dynamic Resource Types for a No-Fuss API

“this was one of those rare occasions where an idea turns out to be a lot better than intended”

In late 2009 I left a dying Sun Microsystems (right before the Oracle acquisition) to help my good friend Alok with his tiny un-funded startup called Yunteq. He was understandably having trouble finding engineers willing to work for a stake in the company rather than a salary, and since he was trying to make it as an entrepreneur and was in desperate need of help in a critical portion of the system, I went to his aid. The unsalaried period was only going to last three months — I could handle that — and he thought the company would probably be acquired in five, by a suitor they had been in talks with for a while. They were almost to the goal line, but they had a big problem; if I stepped in as a third developer to get them through it, I could own a significant share of a company about to be acquired. He had assembled a good team, and we would make decisions together, as co-founders.

It sure sounded good. In reality it was an extremely bumpy ride.  :^)

Anyway, in early 2010 I started on the task of designing a public REST API for the product. It had to present the functionality of the server — which was all about managing and organizing roomfuls of servers, and the virtual machines running on them — to the outside world. We started with a typical design in which the URL broke down the universe into a hierarchy of resources like this one:


The resources in such an API effectively form a tree, the shape of which is defined by the application. But at some point Alok mentioned to me that while the above was fine for a cloud provider, a large company would probably not want the customer part but would want some way of representing divisions, regions, groups, departments, or whatever. “We’ll probably have to fork the code base for each customer,” he said.

I nearly choked on my tea at the sound of that. “Ummmm, let me think about this,” I responded.

In a couple of days I presented him with a design that at first he had trouble understanding. There was no fixed hierarchy to the API at all — the resource tree would be whatever we made it, by creating new nodes and saying what type we wanted each to be. Not only could each customer have a different tree arrangement, we didn’t have to set it up for them — they could do it themselves. They could do it piecemeal; anytime they wanted to add to or rearrange the tree, they could do so easily from the CLI. All using a single code base.

That is, this kind of API was really not about customers and applications and VMs and Disks and NICs and so on — it was about manipulating a tree of typed nodes. If you want a top-level grouping by customer, fine — create a node of type Folder called “cust” and within that a Folder for each customer. If you want to organize things some other way, no problem — just create the appropriate nodes in the tree to represent it. It was completely analogous to the way you create directories in a filesystem to organize your data the way you want, except that you could create nodes of arbitrary types from some available palette. A Folder here, a VM there, an ISO repository over there, etc. — however you want to arrange it.

By mid-2010 we had such a system working; the public API allowed clients to manipulate a dynamically typed resource tree.

But this was one of those rare occasions where an idea turns out to be a lot better than intended. Yes, the interface was now infinitely more flexible, and that was great. Each customer could organize its resources in our system in whatever way made sense for that enterprise. But that was just the beginning.

The API itself treated all nodes in the tree the same — as typed sets of attributes. How much RAM a VM was to have, or CPU, or disk space — all were just attributes of different types. The API was entirely about manipulating a tree of attribute-sets; it didn’t know or care what the resources or attributes were about. Developing in this system involved writing new resource types, but you never had to go back and fiddle with the API to be able to handle them; the API could already handle them. To it, your new resource type was just another typed set of attributes.

And then our CLI got simpler. There was no longer any code specific to VMs or customers or a disks or anything else — it just extended the API’s view of the world, allowing the user to manipulate a tree of attribute-sets from the command line. Not only did we no longer have to fuss with the API when we added functionality, we didn’t have to fuss with the CLI either. As we added new resource types, or added/changed attributes of existing resource types, no work at all had to be done in the API or CLI. This was way cool.

Then we added the notion of a “policy” — a kind of resource that changes the way the system deals with other resources. The fact that the customer was in control of the arrangement of resources gave us a scoping mechanism for policies: wherever you put the policy in the tree, the policy was in effect for the subtree under that point. And if you put another instance of the same type of policy below it, the deeper policy overrode the shallower one. This was a simple yet powerful scoping mechanism, made possible by the fact that we had handed control over the tree’s layout to the user.

Testing was also improved. Among other things, the fact that resource types and attributes were treated so generically meant that much testing could be done once and for all. For example, a single suite of tests can check that an attribute which is supposed to be an integer within a specified range works as it should; then if there are actually a dozen such attributes in the system there is no need to test out-of-range values or non-integer input on each.

Even the API documentation and help system documentation were simplified. Once you understand the basic idea, that the API just allows you to manipulate a tree of typed resources, the bulk of what you need to know is 1) what resource types are available, and 2) what are their attributes? Much of this kind of documentation can be generated automatically by the system.

In effect we had created some good generic infrastructure for creating applications. Starting from that infrastructure, you just define resource types for your domain and you are done. Of course, those resource types are non-trivial — that’s the meat of the application. But you get a lot for free.

There’s more to this, of course — I can write about the API because it is public, but I won’t say much about the interesting mechanics inside the server. I will say, though, that I had wanted to do this in Scala and Akka, but management had never heard of those (“Forget Scala,” I was told) and got someone to set up a much more conventional stack. It worked reasonably well, but had its problems. A bit frustrated, I spent a few weeks over Christmas writing a Scala/Akka/Spray prototype and demoed it to them when they got back from their vacations. They were really impressed at how much cleaner the API code was (the Spray DSL for creating APIs is awesome), and defining resource types in this new system was much easier. To their credit, they took a serious look at Scala and Akka and decided that we should use it. I now have new server infrastructure for dynamic resource trees working in Scala/Akka/Spray; it’s a huge improvement and I am a much happier camper.

And now that same management has told the entire group to use Scala. Go figure.

Extremely Agile Testing using a Lazy Test Harness

“it is generally a lot easier to verify a result reported by the test harness than it is to figure out the right answer yourself beforehand and write the code to check for it”

I don’t particularly enjoy writing tests, but l noticed long ago that I enjoy debugging insufficiently tested systems even less. If you have a suite of old tests that you can run whenever you think you’ve gotten something new working, it can save you a ton of trouble later by showing you that you’ve broken something — right away, when you know that something in those 40 lines you just mucked with must have caused the problem.

The traditional way of writing tests looks something like this:

  1. Have the system under test do something.
  2. Capture some result that you can use to determine whether what you tried worked.
  3. Check that result in various ways, raising a flag if it’s wrong.

Eons ago I was writing a date/time library and realized that I would need hundreds of tests; it was worth my effort to make writing them as simple as possible. I created a simple CLI to invoke the routines in the library and a test harness that used the CLI to run a bunch of tests. Each test was just a single line that made the library do something, e.g. convert a date to Julian; the harness did all the rest.

“Wait a minute,” you complain — “how did the harness know whether the test passed? A test must have done more than just make the system do something!”

But it did not.  A test really did just look something like

  toJulian 1988-03-21


  addDays 1988-12-29 5

So how did the test harness know whether the test passed or failed? Well, the first time the test was run, the harness did not know — it had to ask whether the output was correct. If I said yes, it saved the output as the correct result. The tests were “lazy” inasmuch as the correct results were not established until the tests were run the first time.

This approach proved extremely convenient — I could create a new test in a few seconds. And while the regression tests usually just passed without incident, there were in fact many times when I had inadvertently broken something and the tests saved my bacon. Without those tests I would have discovered far later, perhaps even at a customer site, that something somewhere wasn’t working, and would have had to laboriously trace it back to those 40 lines. And it might take me a while to fully remember what those 40 lines were about.

The point of doing tests this lazy way is that it is generally a lot easier to verify a result reported by the test harness than it is to figure out the right answer yourself beforehand and write the code to check for it. This is especially true if the right answer is, as is often the case for me, a dump of some tree structure, dozens of lines long. I can look at such output and pretty quickly say “yes, that’s right,” but if I had to explicitly code such a tree structure into each test, you sure wouldn’t see me writing many of them!

Furthermore, if I deliberately change something that causes the tests to fail, I don’t have to go back and fix all of the affected tests manually. The harness stops at every failure, shows me a side-by-side diff of what was expected and what we actually got, and asks me whether I want to accept the new output as the “good” output. If I say yes, the harness fixes the test for me. I can even tell the harness that if further tests fail with the same diff, they should be automatically updated without asking.

In some scenarios this approaches presumes the existence of a CLI. These days I write server apps with REST APIs, and I always create CLIs for them. “We don’t have a requirement for a CLI,” a manager told me recently, thinking we would save time by not bothering with one. “You’re getting one anyway,” I responded. I always port/write a shell-based CLI, giving us a very powerful way to control and script the server — very handy. Then I port/write a lazy regression test harness (a couple of hundred lines of shell, currently) to the CLI and begin writing lots of one-line tests.

And suddenly testing is not such a drag.

UPDATE:  Eric Torreborre would like to see support for lazy test development in his highly regarded “specs2” unit test framework for Scala code. That would be a fantastic feature.

UPDATE:  I discovered from Bill Venners (who wrote ScalaTest) that somebody has created a facility for doing this with unit tests, called ApprovalTests.  Unfortunately it seems tightly bound to JUnit, so interfaces to specs2 and ScalaTest are unlikely.

Test with Bigger Integers

It’s always a jolt when some simple expression in a language I’ve been using for a long time evaluates to something I can’t explain.  It happened today with Java.

I had asked a coworker to run FindBugs against one of our products. He forwarded me the output, which contained several issues of this type:

Suspicious comparison of Integer references

Sure enough, when I looked at the code I found expressions using == to compare one Integer to another. That’s usually bad because Integers are objects, and == for objects tests whether they are the same object, not whether they have equal values, so two Integers could appear to be != even though they hold the same value.

That’s easy enough to fix, but just to make sure I understood all the cases I wrote a test that compared, among other things, an Integer 88 to an int 88 and to another Integer 88. The good news was that Integer == int was true (autounboxing). The bad news was that Integer == Integer was also true — there was something wrong with my test.

I tried a number of similar tests, all with the same result. I began to suspect that this was some kind of optimization, and in that case it was likely only for small numbers. Ultimately I figured out that == always returns true when comparing Integers from -128 to 127. Once I saw that, I was able to find references to this behavior in the specification. The same is true for Longs and Integers and Shorts. The JRE prepares objects for small numbers in advance, and hands you one of those instead of creating a new one each time. Since you get the same object each time for those numbers, == works. But it doesn’t work for numbers outside that range.

OK, cool — I learned something new. But looking at the output of FindBugs, I realized that we hadn’t seen those bugs in testing, probably because we typically used small numbers for those values. Let’s see, we need a couple of numbers for VLAN tags — what do we pick? Typically 1 and 2. Maybe 7 and 12 if we’re feeling wild and crazy. Had we never tested with a number greater than 127, we might have shipped the code with that problem. Yay FindBugs!

So here are some thoughts:

  • If you are in QA for a Java product, use numbers outside the range -128..127 wherever they are allowed.
  • If you are writing a Java product, consider initializing ID generators to 128 rather than 0 or 1, at least during testing.
  • Run FindBugs on your code.
  • Use Scala.

The semantics for == in Java are troublesome: value comparison for primitives, but identity comparison for objects, but effectively back to value comparison for small numeric objects. Oh, and with autoboxing we’re going to magically convert between primitives and objects to help you out, sometimes (depending on the actual value) changing the semantics of ==. You’re welcome.

In Scala, == compares values. Always. That doesn’t depend on what types you are comparing, or on the actual values. If you really want to test whether two variables refer to the same object, you use eq, which isn’t even allowed on Ints, Longs, etc. In Scala it’s pretty hard to screw this up.


Scala Performance

“You can’t observe that some poorly performing algorithm was far easier to implement in Scala than Java and conclude that Scala’s performance is poor.”

[This is one of my most popular posts, but it’s long; if you are pressed for time, just read item 4, look at the qualitative chart there, and read the conclusion.]

You may have read the infamous email in which a Fantom fan complained about his difficulties using Scala at Yammer. He complained about a lot of things, including how hard it is to get started with Scala collections and the poor performance they were seeing and the worthless Scala community. Hmm. It is certainly worth reading, with a grain of salt, as the flip side of the success stories, some of which were explicitly about improving performance. Apparently it doesn’t work for everyone. Scala has improved since Yammer’s struggles with Scala began, but still, it’s worth reading about one company’s bad experience.

By the way, I am certainly waiting to see whether Fantom and/or Kotlin gets some traction; I’m always happy to see good competition in languages. And I have some sympathy for a few of the author’s concerns about Scala. For example, while I am happily using SBT now, I was mightily confused at first, in part because of the major changes it was going through at the time. For most of his complaints, though, I have quite a different impression. In particular it’s hard for me to imagine how he got the impression he did of the Scala community; I have found folks to be quite helpful.

But the author of this post read  waaay  too much into the Yammer mail, ultimately saying this about what he infers to be the relative performance of Java and Scala:

“…if I can serve all of my customers on one machine instead of 100 machines …”

Whoa, wait a minute, dude. What’s that you’re smoking?

1. First of all, the implication that real-world code is 100x slower in Scala than Java is so radically far from what we see elsewhere that we should wonder just what the heck they were doing. OK, the Yammer folks didn’t say that the whole app ran 100x slower, but “some parts” did; we don’t know how much difference it made overall. But even so, something is up — 100x is a huge and uncharacteristic difference. When I mentioned this to a colleague, he said “If for loops are M times slower and immutable collections are N times slower, then a for loop on immutable collections will be MxN times slower.” But that isn’t how it works! The truth is that if every single aspect of a program is made M times slower, the result will be just M times slower. So unless we see gists of realistic code demonstrating a 100x difference (don’t hold your breath), we should assume that they were doing something a little goofy. For example, they mentioned an “inner serialization loop”; it’s entirely believable that when they wrote their app they didn’t design the data structures for efficient serialization (long linked lists of Option[Int] vs. arrays of unboxed Ints, for example), but that wouldn’t be Scala’s fault. Also, accurately measuring the performance of code on the JVM is tricky — they wouldn’t be the first to get it wrong. Or perhaps they were repeatedly using a view, causing the computation to be done over and over again — who knows. But if this 100x difference they saw is really more than carelessness on their part, if it’s really a legitimate performance problem in Scala itself, well, gist or it didn’t happen! — extraordinary claims demand extraordinary evidence.

2. A lot of large applications are fairly I/O-bound. In many apps there may be no code at all which, if made several times faster, would have a substantial impact on normal application performance. And if you think about it, it’s quite obvious that this is true, since interpreted languages like Ruby and Python really are orders of magnitude slower than Java and Scala at computation. Does anyone seriously think that companies using Rails and Django are OK with buying 30 times as many machines to run their web sites? “Bill, order another 300 servers — we’ve decided to use Ruby.” Of course not — the penalty is nowhere near that steep for most applications, because so much of the total time is spent waiting for I/O to complete.

3. Generally only 5% or so of the code in a large application really matters much from a performance perspective — the rest could be made several times faster and you could hardly tell. Might there be some small portion of the code that you want to finesse for the sake of performance? Sure, happens all the time — in Scala, in Java, in C, and even in assembly language. If performance matters enough, you even move functionality to hardware — ask anyone who works on high-performance networking equipment or devices that process or render images. But we don’t move *everything* to hardware just because custom hardware is faster — that would be suicide. We move what matters. You take that small portion of code that’s critical to performance and you herniate it and yourself to speed it up, but you leave the rest of the program alone, unoptimized but maintainable, because it just doesn’t matter enough to overall performance. It would be silly to do otherwise.

4. The performance of Scala code depends on how you write it. It is true that on the JVM there is a performance hit for writing functional code, and that really does mean that in that 5% or so of the code that is performance-critical you should consider writing while loops rather than for-comprehensions, using mutable variables, preferring arrays over other collection types, and so on. At that point you’re just using Scala as a better Java, taking advantage of type inference, omitting semicolons, enjoying a better type system, writing tail-recursive code instead of loops that mutate data, etc. But you are still getting essentially the same performance you get from Java itself, with those conveniences. So even in this case, what is the incentive to use Java? And for the other 95% or so of the code, your goal should be to make it robust, maintainable, extensible, etc., in which case you are far better off with Scala, using functional code, immutable data, actors, and so on.

This post gives a great example of the difference between using Scala as a better Java and writing functional Scala. Using immutable linked lists rather than mutable arrays, and filtering rather than modifying in place, make the code dramatically simpler — but also much slower. What may be more of a surprise is that when the author used Scala as a better Java on an in-place sort in an array, the Scala version outperformed the Java version (because the Scala compiler optimizes simple tail recursion). So it’s a trade-off, and it’s up to you to decide when to strive for maintainability and when to strive for performance. But if you are leaning hard toward performance more than about 5% of the time in a large app, you are probably doing your team a disservice.

The functional version — taken from an old paper introducing Scala — should be thought of as a demonstration of the expressive power of Scala, not as the “right way” to do things in Scala.

In fact, the performance difference demonstrated in that post is not really about Scala vs. Java. If you wrote Java code to sort an immutable linked list using the same technique, it would perform just as poorly. The only reason we even think to blame Scala for its poor performance in that algorithm is that the Java implementation would have been so painful to write that we wouldn’t bother. Immutable linked lists are easy in Scala and perform admirably for many algorithms, but they are a poor choice for huge sorts; we developers are expected to know when their use is appropriate. You can’t observe that some poorly performing algorithm was far easier to implement in Scala than Java and conclude that Scala’s performance is poor. It’s not Scala’s fault if you do something silly, even if Scala made it easy to do it.

This StackOverflow answer about how to count the distinct vowels in an English word gives a dramatic Scala example of recoding something that is straightforward for much higher performance.

Scala gives you a wider range of choice in the tradeoff between performance and maintainability than Java does, but you can always get Java-level performance if you need it. So the real difference between the languages is not about performance at all — it’s that you can write much more maintainable code in Scala where performance is not critical. Where it is critical, you do what you need to do, and that’s pretty much the same thing in the two languages. Don’t take this chart too literally, but I’ve tried to roughly convey Scala’s performance spectrum here:

How to think about Scala performance

How to think about Scala performance

What you should get out of this diagram is that for that 5% where performance is critical, Scala code can be about as fast as Java code while being somewhat more convenient; and for the 95% where performance isn’t so important, Scala offers agility comparable to that of dynamic languages while performing better. We just discussed an instance of the former; here are some examples of Scala’s agility: AB, C. I doubt that any could be expressed better, or perhaps even as well, in Python or Ruby.

5. Immutable collections are slower than their mutable counterparts, leading some to suggest that they should be avoided. However, immutable data structures are a huge win when it comes to making your code concurrent, since you can hand off a structure and then modify it without synchronizing or making a copy, both of which have serious performance implications. Also, the use of immutable data usually makes your code easier to reason about. So you should lean toward immutable collections, and Scala makes that easy. Then if it turns out that you could improve performance substantially by using a mutable collection somewhere, go for it, but be careful not to paint yourself into a corner. If your app is a success and demand increases dramatically, you may very well want to make your app more concurrent, and everything mutable will be a land mine waiting to blow up in your face.

6. Often the really big performance wins are in the architecture rather than in individual lines of code — the problem is what you are doing, not how fast you are doing it. Opportunities for transformative architectural change are likely to be more obvious to you — and easier to implement — if you are not knee-deep in boilerplate, imperative code, and mutable data.


So please, no more talk of “if I can serve all of my customers on one machine instead of 100 machines….” Balderdash! And mandates like “never use closures” or “never use immutable collections” are certainly not good advice.

A much better mandate would be to deal with performance in Scala the way you do — or at least should — in any language: design good abstractions, write maintainable code, profile it, and then trade off maintainability for performance only where the engineering cost of writing/testing/maintaining a more complex implementation is exceeded by the performance benefit it delivers. If you do that, you will end up with an app that performs close enough to a Java app for your needs but is much more maintainable and scalable.

UPDATE: This article talks about the goals for Scala 2.11 (currently under construction), which further target performance.

[other Scala posts]  [An Impromptu Scala/Groovy Competition]

An Impromptu Scala/Groovy Competition

I got a kick out of this post. The author, who was learning Scala, came up with a solution to this problem. Don’t bother to analyze it, but here is his solution (digits — what that post calls bigNum — is a String of 1000 digits):

def product( digits:String ) = { java.lang.String.valueOf(_).toInt ).reduceLeft( (prod,c) => prod*c ) }
digits.sliding(5) s => product(s) ).max

He was really happy with it and, understandably, wanted to share it and explain how it works.

Well, the very first comment was a somewhat tacky “mine is better because it’s shorter” Groovy solution (don’t bother with it either), albeit shorter mostly by not breaking out some of the work into a function as the post’s author had chosen to do for clarity (I’ve changed variable names to match the Scala version):

( 0 .. digits.size()-5 ).collect { digits.substring( it, it+5 ).toList().inject(1) { prod,c -> prod * c.toInteger() } }.max {it}

But then a more seasoned Scala developer absolutely blew the Groovy solution’s doors off with this:{_.asDigit}.sliding(5).map{_.product}.max

That is, convert the digit characters to their numeric values, map every sliding window of five numbers to its product, and find the maximum of those. Assuming you know what a “sliding window” is and what map does, it’s hard to imagine a much more readable solution.

Now the truth of the matter is that having the shortest solution is usually not nearly as important as having a maintainable (understandable and non-fragile) solution, but in this case the Scala solution is far better on all of those counts. Also, Scala code will often be longer than the equivalent Groovy code, because Scala is statically typed. Whether or not performance, and catching type errors before you ship, are worth having to specify some type information is up to you. But in this case those who bragged about Groovy have good reason to be embarrassed.

But there was more. The ill-fated Groovy challenge was followed by another comment saying, effectively, that the Scala code was unreadable and that clever coders should be fired and replaced with average coders who write three times as much code in Java to do the same thing, because then everyone will understand it later.


If a company I owned stock in expressed that attitude, I would start thinking about putting in a sell order. There are very good business reasons to use modern programming languages, and there are very good business reasons to hire smart software developers. I am bewildered by people who think it is better to stick with older, less capable languages so that they only need mediocre staff. Good luck!

There’s a big difference between hard-because-unfamiliar and obtuse. The seasoned Scala coder’s solution is way more understandable, to anyone who knows Scala, than an equivalent Java solution would be to another Java developer.

But even if it weren’t understandable, to you — even if you didn’t have any idea what those methods do, or couldn’t synthesize their collective effect in your mind — it is really easy to figure such things out in the REPL. Just start by evaluating some subexpression and iterate toward the whole thing, recalling the previous line with an up-arrow and adding to it:

scala> val digits = "9873459873"
digits: String = 9873459873
res0: scala.collection.immutable.IndexedSeq[Int] = Vector(9, 8, 7, 3, 4, 5, 9, 8, 7, 3)
res1: Iterator[scala.collection.immutable.IndexedSeq[Int]] = non-empty iterator
scala>{_.asDigit}.sliding(5) foreach println
Vector(9, 8, 7, 3, 4)
Vector(8, 7, 3, 4, 5)
Vector(7, 3, 4, 5, 9)
Vector(3, 4, 5, 9, 8)
Vector(4, 5, 9, 8, 7)
Vector(5, 9, 8, 7, 3)
scala>{_.asDigit}.sliding(5).map{_.product} foreach println
res4: Int = 10080

For very little effort we have walked through the code step by step as it transforms the data into the solution. Given a working knowledge of Scala, the most junior developer on your staff could do this.

The REPL is a fantastic tool in the developer’s toolbox. Back when IDE support for Scala was young and unstable, you’d see comments about how tool support for Scala was not as good as Java’s. (To be honest, it’s still not as good, but it’s quite usable.) It was a legitimate concern, but even then Scala had a nice REPL and Java did not.

And now there is a “Scala worksheet” in Eclipse and IntelliJ that allows you to do such explorations in a way which is often more convenient, and these explorations can be stored alongside your code as documentation. And because Scala is a statically typed language, these tools can look up and complete method names for you, a big time-saver. So the guy who comes along in six months and has to figure out what some functional Scala code does is going to find that much easier than reverse-engineering the equivalent imperative Java code.

But please, go right ahead and fire those smart software developers who insist on programming in modern languages. We want to hire some!

[other Scala posts]

ZSH map and filter functions using “anonymous functions”

UPDATE: Arash Rouhani picked this up and took it quite a bit further here.

The other day I was thinking that it would be handy to have a map function (in the functional programming sense) for zsh, and I couldn’t see anything in zsh itself that looked like what I wanted. So a quick google turned up this page by Yann Esposito, which gives an implementation of not only map but also filter and fold. Very cool!

The only problem, as the author points out, is that it is inconvenient to have to actually define a separate named function in order to use the facilities. So I groveled around in the zsh docs and found the (e) qualifier for parameter expansion, which causes an evaluation. That led me to write new versions of map and filter and related commands that work with anonymous “functions” — really just bits of code that get evaluated with $1 set to something — like so:

### Map each of a list of integer Xs to X+1:

$ mapa '$1+1' {1..4}

### Map each FOO.scala to FOO.class:

$ map '$1:r.class' test{1,2}.scala

### Get the subset which are ordinary files (bin is a dir):

$ filterf 'test -f' bin test{1,2}.scala

### Get the even numbers between 1 and 5:

$ filtera '$1%2 == 0' {1..5}

### Map each filename to 0 if it is an ordinary file:

$ each '[ -f $1 ]; echo $?' /bin test{1,2}.scala

### Given a directory tree containing some Foo.task files,
### an isStartable function that returns success if a task is startable now,
### and a startTask function that starts it, start all startable tasks.

$ eachf startTask $( filterf isStartable **/*.task )

Here are the functions:

###### map{,a}

### map Word Arg ...
### For each Arg, evaluate and print Word with Arg as $1.
### Returns last nonzero result, or 0.

function map() {
  typeset f="$1"; shift
  typeset x
  typeset result=0
  for x; map_ "$x" "$f" || result=$?
  return $result
function map_() {
  print -- "${(e)2}"

### mapa ArithExpr Arg ...   # is shorthand for
### map '$[ ArithExpr ]' Arg ...

function mapa() {
  typeset f="\$[ $1 ]"; shift
  map "$f" "$@"

###### each{,f}

### each Command Arg ...
### For each Arg, execute Command with Arg as $1.
### Returns last nonzero result, or 0.

function each() {
  typeset f="$1"; shift
  typeset x
  typeset result=0
  for x; each_ "$x" "$f" || result=$?
  return $result
function each_() {
  eval "$2"

### eachf Command Arg ...   # is shorthand for
### each 'Command $1' Arg ...

function eachf() {
  typeset f="$1 \"\$1\""; shift
  each "$f" "$@"

###### filter{,f,a}

### filter Command Arg ...
### For each Arg, print Arg if Command is successful with Arg as $1.

function filter() {
  typeset f="$1"; shift
  typeset x
  for x; filter_ "$x" "$f"
  return 0
function filter_() {
  eval "$2" && print -- "$1"

### filterf Command Arg ...   # is shorthand for
### filter 'Command "$1"' Arg ...

function filterf() {
  typeset f="$1 \"\$1\""; shift
  filter "$f" "$@"

### filtera ArithRelation Arg ...  # is shorthand for
### filter '(( ArithRelation ))' Arg ...

function filtera() {
  typeset f="(( $1 ))"; shift
  filter "$f" "$@"

Writing this kind of code is tricky for me; it’s easy to get the quoting wrong. For example,

$ each 'echo "$1"' \*

should just print an asterisk, but at first I was getting a list of files.

Anyway, it was fun. I might add fold and friends later, when I have more time.

Oh, one last thing. I use this function for testing zsh code:

TEST() {
  echo TEST: "${(qq)@}"

That way I can write a suite of many tests like this one

TEST filtera '$1%2 == 0' {1..5}

and get output like this, showing me what is being tested, followed by the results:

TEST: 'filtera' '$1%2 == 0' '1' '2' '3' '4' '5'

Without the (qq) you wouldn’t be able to tell where one argument ends and the next begins:

TEST: filtera $1%2 == 0 1 2 3 4 5