Category Archives: Reader Favorites

Scala Performance

“You can’t observe that some poorly performing algorithm was far easier to implement in Scala than Java and conclude that Scala’s performance is poor.”

[This is one of my most popular posts, but it’s long; if you are pressed for time, just read item 4, look at the qualitative chart there, and read the conclusion.]

You may have read the infamous email in which a Fantom fan complained about his difficulties using Scala at Yammer. He complained about a lot of things, including how hard it is to get started with Scala collections and the poor performance they were seeing and the worthless Scala community. Hmm. It is certainly worth reading, with a grain of salt, as the flip side of the success stories, some of which were explicitly about improving performance. Apparently it doesn’t work for everyone. Scala has improved since Yammer’s struggles with Scala began, but still, it’s worth reading about one company’s bad experience.

By the way, I am certainly waiting to see whether Fantom and/or Kotlin gets some traction; I’m always happy to see good competition in languages. And I have some sympathy for a few of the author’s concerns about Scala. For example, while I am happily using SBT now, I was mightily confused at first, in part because of the major changes it was going through at the time. For most of his complaints, though, I have quite a different impression. In particular it’s hard for me to imagine how he got the impression he did of the Scala community; I have found folks to be quite helpful.

But the author of this post read  waaay  too much into the Yammer mail, ultimately saying this about what he infers to be the relative performance of Java and Scala:

“…if I can serve all of my customers on one machine instead of 100 machines …”

Whoa, wait a minute, dude. What’s that you’re smoking?

1. First of all, the implication that real-world code is 100x slower in Scala than Java is so radically far from what we see elsewhere that we should wonder just what the heck they were doing. OK, the Yammer folks didn’t say that the whole app ran 100x slower, but “some parts” did; we don’t know how much difference it made overall. But even so, something is up — 100x is a huge and uncharacteristic difference. When I mentioned this to a colleague, he said “If for loops are M times slower and immutable collections are N times slower, then a for loop on immutable collections will be MxN times slower.” But that isn’t how it works! The truth is that if every single aspect of a program is made M times slower, the result will be just M times slower. So unless we see gists of realistic code demonstrating a 100x difference (don’t hold your breath), we should assume that they were doing something a little goofy. For example, they mentioned an “inner serialization loop”; it’s entirely believable that when they wrote their app they didn’t design the data structures for efficient serialization (long linked lists of Option[Int] vs. arrays of unboxed Ints, for example), but that wouldn’t be Scala’s fault. Also, accurately measuring the performance of code on the JVM is tricky — they wouldn’t be the first to get it wrong. Or perhaps they were repeatedly using a view, causing the computation to be done over and over again — who knows. But if this 100x difference they saw is really more than carelessness on their part, if it’s really a legitimate performance problem in Scala itself, well, gist or it didn’t happen! — extraordinary claims demand extraordinary evidence.

2. A lot of large applications are fairly I/O-bound. In many apps there may be no code at all which, if made several times faster, would have a substantial impact on normal application performance. And if you think about it, it’s quite obvious that this is true, since interpreted languages like Ruby and Python really are orders of magnitude slower than Java and Scala at computation. Does anyone seriously think that companies using Rails and Django are OK with buying 30 times as many machines to run their web sites? “Bill, order another 300 servers — we’ve decided to use Ruby.” Of course not — the penalty is nowhere near that steep for most applications, because so much of the total time is spent waiting for I/O to complete.

3. Generally only 5% or so of the code in a large application really matters much from a performance perspective — the rest could be made several times faster and you could hardly tell. Might there be some small portion of the code that you want to finesse for the sake of performance? Sure, happens all the time — in Scala, in Java, in C, and even in assembly language. If performance matters enough, you even move functionality to hardware — ask anyone who works on high-performance networking equipment or devices that process or render images. But we don’t move *everything* to hardware just because custom hardware is faster — that would be suicide. We move what matters. You take that small portion of code that’s critical to performance and you herniate it and yourself to speed it up, but you leave the rest of the program alone, unoptimized but maintainable, because it just doesn’t matter enough to overall performance. It would be silly to do otherwise.

4. The performance of Scala code depends on how you write it. It is true that on the JVM there is a performance hit for writing functional code, and that really does mean that in that 5% or so of the code that is performance-critical you should consider writing while loops rather than for-comprehensions, using mutable variables, preferring arrays over other collection types, and so on. At that point you’re just using Scala as a better Java, taking advantage of type inference, omitting semicolons, enjoying a better type system, writing tail-recursive code instead of loops that mutate data, etc. But you are still getting essentially the same performance you get from Java itself, with those conveniences. So even in this case, what is the incentive to use Java? And for the other 95% or so of the code, your goal should be to make it robust, maintainable, extensible, etc., in which case you are far better off with Scala, using functional code, immutable data, actors, and so on.

This post gives a great example of the difference between using Scala as a better Java and writing functional Scala. Using immutable linked lists rather than mutable arrays, and filtering rather than modifying in place, make the code dramatically simpler — but also much slower. What may be more of a surprise is that when the author used Scala as a better Java on an in-place sort in an array, the Scala version outperformed the Java version (because the Scala compiler optimizes simple tail recursion). So it’s a trade-off, and it’s up to you to decide when to strive for maintainability and when to strive for performance. But if you are leaning hard toward performance more than about 5% of the time in a large app, you are probably doing your team a disservice.

The functional version — taken from an old paper introducing Scala — should be thought of as a demonstration of the expressive power of Scala, not as the “right way” to do things in Scala.

In fact, the performance difference demonstrated in that post is not really about Scala vs. Java. If you wrote Java code to sort an immutable linked list using the same technique, it would perform just as poorly. The only reason we even think to blame Scala for its poor performance in that algorithm is that the Java implementation would have been so painful to write that we wouldn’t bother. Immutable linked lists are easy in Scala and perform admirably for many algorithms, but they are a poor choice for huge sorts; we developers are expected to know when their use is appropriate. You can’t observe that some poorly performing algorithm was far easier to implement in Scala than Java and conclude that Scala’s performance is poor. It’s not Scala’s fault if you do something silly, even if Scala made it easy to do it.

This StackOverflow answer about how to count the distinct vowels in an English word gives a dramatic Scala example of recoding something that is straightforward for much higher performance.

Scala gives you a wider range of choice in the tradeoff between performance and maintainability than Java does, but you can always get Java-level performance if you need it. So the real difference between the languages is not about performance at all — it’s that you can write much more maintainable code in Scala where performance is not critical. Where it is critical, you do what you need to do, and that’s pretty much the same thing in the two languages. Don’t take this chart too literally, but I’ve tried to roughly convey Scala’s performance spectrum here:

How to think about Scala performance

How to think about Scala performance

What you should get out of this diagram is that for that 5% where performance is critical, Scala code can be about as fast as Java code while being somewhat more convenient; and for the 95% where performance isn’t so important, Scala offers agility comparable to that of dynamic languages while performing better. We just discussed an instance of the former; here are some examples of Scala’s agility: AB, C. I doubt that any could be expressed better, or perhaps even as well, in Python or Ruby.

5. Immutable collections are slower than their mutable counterparts, leading some to suggest that they should be avoided. However, immutable data structures are a huge win when it comes to making your code concurrent, since you can hand off a structure and then modify it without synchronizing or making a copy, both of which have serious performance implications. Also, the use of immutable data usually makes your code easier to reason about. So you should lean toward immutable collections, and Scala makes that easy. Then if it turns out that you could improve performance substantially by using a mutable collection somewhere, go for it, but be careful not to paint yourself into a corner. If your app is a success and demand increases dramatically, you may very well want to make your app more concurrent, and everything mutable will be a land mine waiting to blow up in your face.

6. Often the really big performance wins are in the architecture rather than in individual lines of code — the problem is what you are doing, not how fast you are doing it. Opportunities for transformative architectural change are likely to be more obvious to you — and easier to implement — if you are not knee-deep in boilerplate, imperative code, and mutable data.

Conclusion

So please, no more talk of “if I can serve all of my customers on one machine instead of 100 machines….” Balderdash! And mandates like “never use closures” or “never use immutable collections” are certainly not good advice.

A much better mandate would be to deal with performance in Scala the way you do — or at least should — in any language: design good abstractions, write maintainable code, profile it, and then trade off maintainability for performance only where the engineering cost of writing/testing/maintaining a more complex implementation is exceeded by the performance benefit it delivers. If you do that, you will end up with an app that performs close enough to a Java app for your needs but is much more maintainable and scalable.

UPDATE: This article talks about the goals for Scala 2.11 (currently under construction), which further target performance.

[other Scala posts]  [An Impromptu Scala/Groovy Competition]

Advertisements

Why (many) men like large breasts

This is one of my most visited posts, so I wanted to make it a little easier to tell what it is suggesting. Toward that end I created this infographic:

[now the original post]

Hmmm. I just read the following in an interesting article in Psychology Today about evolutionary psychology, one of my favorite topics:

Until very recently, it was a mystery to evolutionary psychology why men prefer women with large breasts, since the size of a woman’s breasts has no relationship to her ability to lactate. But Harvard anthropologist Frank Marlowe contends that larger, and hence heavier, breasts sag more conspicuously with age than do smaller breasts. Thus they make it easier for men to judge a woman’s age (and her reproductive value) by sight—suggesting why men find women with large breasts more attractive.

It isn’t clear to me why finding large breasts attractive isn’t a simple case of a supernormal stimulus at work.

Putting that aside, I can see the point. Ultimately what we find attractive in a potential mate are those features which, in the environment in which we evolved, tended to lead to having lots of progeny. Our entire notion of what is sexually attractive is encoded into our brains today for the very reason that those who had it over the millennia left more copies of their genes behind.  Many thousands of years ago, people didn’t live to a ripe old age, so you needed to have kids while the iron was hot, so to speak; you were probably going to be dead at 30 and making babies until it happened. Hence mechanisms in the male brain that made him better at choosing a younger (but still fertile) woman as his primary partner were generally adaptive, because they left more time for baby-making.

On the other hand, they were only adaptive to the extent that you couldn’t already tell a woman’s age — was that a problem? Faces and voices and skin tone and energy level and behavior and peers and so on seem like pretty good indicators too. After all, most women wear bras these days (among many other mechanisms to give the appearance of youth), but we can still generally tell the difference between a 15-year-old, a 20-year-old, and a 25-year-old, can’t we?  Would larger but variable breasts on young women really make it easier to judge their age?

Beyond that I have a problem with the quote’s dismissal of a relationship with lactation.

Yes, if we were to measure breast size and lactation today, I doubt that there would be a significant correlation. But that doesn’t mean that there was no such correlation in our ancient past. Imagine women hundreds of thousands of years ago, who almost certainly had much smaller breasts than women today (we are the exception among primates).  Picture that, for whatever reasons, some of these women have breasts that are less developed functionally than others, and these are often smaller. Perhaps they have poorer immune systems and have therefore struggled more with parasites, a major problem in those days. Or maybe these women were malnourished during a critical period. Or their hormone levels weren’t right during development.  Or perhaps their genes for the construction of mammary glands are just off. In any case, among the small breasts of the day, theirs are smaller, for reasons that will negatively impact their children.

In such an environment, genes that gave men a preference for larger breasts would do well because they would tend to result in healthier children. In those days, “larger” meant something else, perhaps still tiny compared to women of today. But genes encoding such a preference into the male brain would cause average female breast size to increase over the millenia, just as the peahen’s preference for males with longer tails with lots of eyes gradually increased tail size in peacocks. Women who tended to accumulate fat in the breasts (a cheap way to make breasts larger) were considered more attractive, even though the fat had nothing to do with fertility. And just as the peahen’s preference persists even now, when peacocks have enormous tails that make them less able to escape predators, the human male’s preference for larger breasts would persist even though it no longer really means that his children will be better fed or have a better immune system.

In other words, just because you can’t tell anything about how much milk a 600cc breast will produce vs. a 300cc breast — the difference generally being the amount of fat in the breast — that doesn’t mean that preferring a 60cc breast over a 30cc breast wouldn’t have been adaptive way back then. The inflation to today’s numbers could just be a long-term effect of the preference itself, the result of generation after generation of men preferentially selecting women who tended to accumulate fat in the breasts. This kind of effect (if that’s really what’s going on here) is called a Fisherian runaway, after the genius who first suggested it.

In other words, just as women with small breasts today may use padding to trick the male brain into being more attracted to them, women’s DNA may have evolved to pad their breasts on the inside for the same reason.

By the way, you might think that a mutation that eliminated peahens’ preference for long tails with lots of eyes would do quite well, since the resulting males, which would tend to have shorter tails, wouldn’t be so handicapped against predators. But in fact those males would have trouble attracting a mate, so the mutated gene would die out.

If there are in fact genes that cause a preference for larger breasts, then where does it stop? Why not breasts the size of a house? Well, for one thing there are health implications beyond a certain point.  But we also seem to have a preference for “normal” — that is, we find anything too far out of the ordinary unattractive. A guy might be attracted to somewhat larger breasts than normal, but he is not attracted to breasts that are so enormous compared to what he has seen before that they seem freakish. Full lips (a sign of female hormones) are also generally considered attractive, but not so full as to seem abnormal.

But today, it is possible and even common for women to simply buy larger breasts (roughly 5% of American women have breast implants now, and if recent trends continue that number will double in the decades to come), or to buy devices that make their breasts look larger, in an effort to make themselves more attractive — ultimately to the genes that construct the male brain. As more and more women do this, it raises the standard for “normal,” and still larger breasts then seem attractive.  Then women want larger implants, and so on.

On the computer, of course, it is simply a matter of moving pixels around, and the laws of physics are no obstacle in cyberspace. Images abound on the web of women with breasts that, for most of us, seem ridiculously large. No doubt there is some variation among males in the strength of their preference for large breasts. But it is also true that on the web men can repeatedly find images of women with breasts a little larger than what they are used to seeing, and in so doing continually ratchet up their idea of what is “normal.” Breasts that seem abnormally large or round to one man may be attractive to another man simply because the latter is used to seeing breasts like that — to him, they no longer seem abnormal.

So — couldn’t that be a reasonable hypothesis for why many men have a preference for larger-than-average breasts? In fact, we are generally attracted to somewhat exaggerated sex organs and secondary sex characteristics because they imply good levels of sex hormones. This makes us responsive to supernormal stimuli.

[I’d love to hear from a professional in the field of evolutionary psychology on these thoughts. I am *not* one.]