Scala Performance

“You can’t observe that some poorly performing algorithm was far easier to implement in Scala than Java and conclude that Scala’s performance is poor.”

[This is one of my most popular posts, but it’s long; if you are pressed for time, just read item 4, look at the qualitative chart there, and read the conclusion.]

You may have read the infamous email in which a Fantom fan complained about his difficulties using Scala at Yammer. He complained about a lot of things, including how hard it is to get started with Scala collections and the poor performance they were seeing and the worthless Scala community. Hmm. It is certainly worth reading, with a grain of salt, as the flip side of the success stories, some of which were explicitly about improving performance. Apparently it doesn’t work for everyone. Scala has improved since Yammer’s struggles with Scala began, but still, it’s worth reading about one company’s bad experience.

By the way, I am certainly waiting to see whether Fantom and/or Kotlin gets some traction; I’m always happy to see good competition in languages. And I have some sympathy for a few of the author’s concerns about Scala. For example, while I am happily using SBT now, I was mightily confused at first, in part because of the major changes it was going through at the time. For most of his complaints, though, I have quite a different impression. In particular it’s hard for me to imagine how he got the impression he did of the Scala community; I have found folks to be quite helpful.

But the author of this post read  waaay  too much into the Yammer mail, ultimately saying this about what he infers to be the relative performance of Java and Scala:

“…if I can serve all of my customers on one machine instead of 100 machines …”

Whoa, wait a minute, dude. What’s that you’re smoking?

1. First of all, the implication that real-world code is 100x slower in Scala than Java is so radically far from what we see elsewhere that we should wonder just what the heck they were doing. OK, the Yammer folks didn’t say that the whole app ran 100x slower, but “some parts” did; we don’t know how much difference it made overall. But even so, something is up — 100x is a huge and uncharacteristic difference. When I mentioned this to a colleague, he said “If for loops are M times slower and immutable collections are N times slower, then a for loop on immutable collections will be MxN times slower.” But that isn’t how it works! The truth is that if every single aspect of a program is made M times slower, the result will be just M times slower. So unless we see gists of realistic code demonstrating a 100x difference (don’t hold your breath), we should assume that they were doing something a little goofy. For example, they mentioned an “inner serialization loop”; it’s entirely believable that when they wrote their app they didn’t design the data structures for efficient serialization (long linked lists of Option[Int] vs. arrays of unboxed Ints, for example), but that wouldn’t be Scala’s fault. Also, accurately measuring the performance of code on the JVM is tricky — they wouldn’t be the first to get it wrong. Or perhaps they were repeatedly using a view, causing the computation to be done over and over again — who knows. But if this 100x difference they saw is really more than carelessness on their part, if it’s really a legitimate performance problem in Scala itself, well, gist or it didn’t happen! — extraordinary claims demand extraordinary evidence.

2. A lot of large applications are fairly I/O-bound. In many apps there may be no code at all which, if made several times faster, would have a substantial impact on normal application performance. And if you think about it, it’s quite obvious that this is true, since interpreted languages like Ruby and Python really are orders of magnitude slower than Java and Scala at computation. Does anyone seriously think that companies using Rails and Django are OK with buying 30 times as many machines to run their web sites? “Bill, order another 300 servers — we’ve decided to use Ruby.” Of course not — the penalty is nowhere near that steep for most applications, because so much of the total time is spent waiting for I/O to complete.

3. Generally only 5% or so of the code in a large application really matters much from a performance perspective — the rest could be made several times faster and you could hardly tell. Might there be some small portion of the code that you want to finesse for the sake of performance? Sure, happens all the time — in Scala, in Java, in C, and even in assembly language. If performance matters enough, you even move functionality to hardware — ask anyone who works on high-performance networking equipment or devices that process or render images. But we don’t move *everything* to hardware just because custom hardware is faster — that would be suicide. We move what matters. You take that small portion of code that’s critical to performance and you herniate it and yourself to speed it up, but you leave the rest of the program alone, unoptimized but maintainable, because it just doesn’t matter enough to overall performance. It would be silly to do otherwise.

4. The performance of Scala code depends on how you write it. It is true that on the JVM there is a performance hit for writing functional code, and that really does mean that in that 5% or so of the code that is performance-critical you should consider writing while loops rather than for-comprehensions, using mutable variables, preferring arrays over other collection types, and so on. At that point you’re just using Scala as a better Java, taking advantage of type inference, omitting semicolons, enjoying a better type system, writing tail-recursive code instead of loops that mutate data, etc. But you are still getting essentially the same performance you get from Java itself, with those conveniences. So even in this case, what is the incentive to use Java? And for the other 95% or so of the code, your goal should be to make it robust, maintainable, extensible, etc., in which case you are far better off with Scala, using functional code, immutable data, actors, and so on.

This post gives a great example of the difference between using Scala as a better Java and writing functional Scala. Using immutable linked lists rather than mutable arrays, and filtering rather than modifying in place, make the code dramatically simpler — but also much slower. What may be more of a surprise is that when the author used Scala as a better Java on an in-place sort in an array, the Scala version outperformed the Java version (because the Scala compiler optimizes simple tail recursion). So it’s a trade-off, and it’s up to you to decide when to strive for maintainability and when to strive for performance. But if you are leaning hard toward performance more than about 5% of the time in a large app, you are probably doing your team a disservice.

The functional version — taken from an old paper introducing Scala — should be thought of as a demonstration of the expressive power of Scala, not as the “right way” to do things in Scala.

In fact, the performance difference demonstrated in that post is not really about Scala vs. Java. If you wrote Java code to sort an immutable linked list using the same technique, it would perform just as poorly. The only reason we even think to blame Scala for its poor performance in that algorithm is that the Java implementation would have been so painful to write that we wouldn’t bother. Immutable linked lists are easy in Scala and perform admirably for many algorithms, but they are a poor choice for huge sorts; we developers are expected to know when their use is appropriate. You can’t observe that some poorly performing algorithm was far easier to implement in Scala than Java and conclude that Scala’s performance is poor. It’s not Scala’s fault if you do something silly, even if Scala made it easy to do it.

This StackOverflow answer about how to count the distinct vowels in an English word gives a dramatic Scala example of recoding something that is straightforward for much higher performance.

Scala gives you a wider range of choice in the tradeoff between performance and maintainability than Java does, but you can always get Java-level performance if you need it. So the real difference between the languages is not about performance at all — it’s that you can write much more maintainable code in Scala where performance is not critical. Where it is critical, you do what you need to do, and that’s pretty much the same thing in the two languages. Don’t take this chart too literally, but I’ve tried to roughly convey Scala’s performance spectrum here:

How to think about Scala performance

How to think about Scala performance

What you should get out of this diagram is that for that 5% where performance is critical, Scala code can be about as fast as Java code while being somewhat more convenient; and for the 95% where performance isn’t so important, Scala offers agility comparable to that of dynamic languages while performing better. We just discussed an instance of the former; here are some examples of Scala’s agility: AB, C. I doubt that any could be expressed better, or perhaps even as well, in Python or Ruby.

5. Immutable collections are slower than their mutable counterparts, leading some to suggest that they should be avoided. However, immutable data structures are a huge win when it comes to making your code concurrent, since you can hand off a structure and then modify it without synchronizing or making a copy, both of which have serious performance implications. Also, the use of immutable data usually makes your code easier to reason about. So you should lean toward immutable collections, and Scala makes that easy. Then if it turns out that you could improve performance substantially by using a mutable collection somewhere, go for it, but be careful not to paint yourself into a corner. If your app is a success and demand increases dramatically, you may very well want to make your app more concurrent, and everything mutable will be a land mine waiting to blow up in your face.

6. Often the really big performance wins are in the architecture rather than in individual lines of code — the problem is what you are doing, not how fast you are doing it. Opportunities for transformative architectural change are likely to be more obvious to you — and easier to implement — if you are not knee-deep in boilerplate, imperative code, and mutable data.

Conclusion

So please, no more talk of “if I can serve all of my customers on one machine instead of 100 machines….” Balderdash! And mandates like “never use closures” or “never use immutable collections” are certainly not good advice.

A much better mandate would be to deal with performance in Scala the way you do — or at least should — in any language: design good abstractions, write maintainable code, profile it, and then trade off maintainability for performance only where the engineering cost of writing/testing/maintaining a more complex implementation is exceeded by the performance benefit it delivers. If you do that, you will end up with an app that performs close enough to a Java app for your needs but is much more maintainable and scalable.

UPDATE: This article talks about the goals for Scala 2.11 (currently under construction), which further target performance.

[other Scala posts]  [An Impromptu Scala/Groovy Competition]

Advertisements

5 Comments

  1. Posted November 17, 2012 at 10:06 pm | Permalink | Reply

    Many years ago I was talking with Lars Bak, the lead developer on the HotSpot JVM at the time. Lars is a really down-to-earth guy, which is a joy in someone so smart. I listened attentively while he explained how some aspect of the system worked. When he was done, I asked “Wouldn’t it be faster to do X?”

    “Probably,” he replied.

    “Do you want me to file an RFE [request for enhancement]?”

    “No,” he said immediately. “It would never show up in profiling.”

    I wanted to slap myself.

  2. Posted November 19, 2012 at 12:34 am | Permalink | Reply

    By the way, all of this is not to say that we aren’t looking forward to improvements in the performance of Scala code. Scala 2.10 brings some performance improvements, like value classes, and I wonder whether more will fall out of macros in the releases to come. The JVM will probably get better at optimizing functional code since Java 8 will have support for closures. What all of this will do is reduce the percentage of cases where it is worthwhile to herniate your code for performance reasons. Bring it!

  3. Posted May 31, 2013 at 1:41 am | Permalink | Reply

    Some discussion about this post on scala-user: https://groups.google.com/forum/?fromgroups=#!topic/scala-user/Q9uYyJCUFCI

  4. Posted April 11, 2014 at 10:57 pm | Permalink | Reply

    Just ran across a nice post about some new performance-oriented map types in Scala 2.11:
    http://capecoder.wordpress.com/2014/03/09/scala-2-11-high-performance-map-benchmark-anyrefmap/

  5. Posted January 11, 2016 at 6:05 pm | Permalink | Reply

    Rüdiger Klaehn has written a library that provides immutable collections by wrapping Arrays. For large collections this can save a lot of memory, and in many cases is significantly faster. However, they are slow if you have to build them incrementally.

    https://rklaehn.github.io/2015/12/18/array-based-immutable-collections/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: