Scala Performance

“You can’t observe that some poorly performing algorithm was far easier to implement in Scala than Java and conclude that Scala’s performance is poor.”

[This is one of my most popular posts, but it’s long; if you are pressed for time, just read item 4, look at the qualitative chart there, and read the conclusion.]

You may have read the infamous email in which a Fantom fan complained about his difficulties using Scala at Yammer. He complained about a lot of things, including how hard it is to get started with Scala collections and the poor performance they were seeing and the worthless Scala community. Hmm. It is certainly worth reading, with a grain of salt, as the flip side of the success stories, some of which were explicitly about improving performance. Apparently it doesn’t work for everyone. Scala has improved since Yammer’s struggles with Scala began, but still, it’s worth reading about one company’s bad experience.

By the way, I am certainly waiting to see whether Fantom and/or Kotlin gets some traction; I’m always happy to see good competition in languages. And I have some sympathy for a few of the author’s concerns about Scala. For example, while I am happily using SBT now, I was mightily confused at first, in part because of the major changes it was going through at the time. For most of his complaints, though, I have quite a different impression. In particular it’s hard for me to imagine how he got the impression he did of the Scala community; I have found folks to be quite helpful.

But the author of this post read  waaay  too much into the Yammer mail, ultimately saying this about what he infers to be the relative performance of Java and Scala:

“…if I can serve all of my customers on one machine instead of 100 machines …”

Whoa, wait a minute, dude. What’s that you’re smoking?

1. First of all, the implication that real-world code is 100x slower in Scala than Java is so radically far from what we see elsewhere that we should wonder just what the heck they were doing. OK, the Yammer folks didn’t say that the whole app ran 100x slower, but “some parts” did; we don’t know how much difference it made overall. But even so, something is up — 100x is a huge and uncharacteristic difference. When I mentioned this to a colleague, he said “If for loops are M times slower and immutable collections are N times slower, then a for loop on immutable collections will be MxN times slower.” But that isn’t how it works! The truth is that if every single aspect of a program is made M times slower, the result will be just M times slower. So unless we see gists of realistic code demonstrating a 100x difference (don’t hold your breath), we should assume that they were doing something a little goofy. For example, they mentioned an “inner serialization loop”; it’s entirely believable that when they wrote their app they didn’t design the data structures for efficient serialization (long linked lists of Option[Int] vs. arrays of unboxed Ints, for example), but that wouldn’t be Scala’s fault. Also, accurately measuring the performance of code on the JVM is tricky — they wouldn’t be the first to get it wrong. Or perhaps they were repeatedly using a view, causing the computation to be done over and over again — who knows. But if this 100x difference they saw is really more than carelessness on their part, if it’s really a legitimate performance problem in Scala itself, well, gist or it didn’t happen! — extraordinary claims demand extraordinary evidence.

2. A lot of large applications are fairly I/O-bound. In many apps there may be no code at all which, if made several times faster, would have a substantial impact on normal application performance. And if you think about it, it’s quite obvious that this is true, since interpreted languages like Ruby and Python really are orders of magnitude slower than Java and Scala at computation. Does anyone seriously think that companies using Rails and Django are OK with buying 30 times as many machines to run their web sites? “Bill, order another 300 servers — we’ve decided to use Ruby.” Of course not — the penalty is nowhere near that steep for most applications, because so much of the total time is spent waiting for I/O to complete.

3. Generally only 5% or so of the code in a large application really matters much from a performance perspective — the rest could be made several times faster and you could hardly tell. Might there be some small portion of the code that you want to finesse for the sake of performance? Sure, happens all the time — in Scala, in Java, in C, and even in assembly language. If performance matters enough, you even move functionality to hardware — ask anyone who works on high-performance networking equipment or devices that process or render images. But we don’t move *everything* to hardware just because custom hardware is faster — that would be suicide. We move what matters. You take that small portion of code that’s critical to performance and you herniate it and yourself to speed it up, but you leave the rest of the program alone, unoptimized but maintainable, because it just doesn’t matter enough to overall performance. It would be silly to do otherwise.

4. The performance of Scala code depends on how you write it. It is true that on the JVM there is a performance hit for writing functional code, and that really does mean that in that 5% or so of the code that is performance-critical you should consider writing while loops rather than for-comprehensions, using mutable variables, preferring arrays over other collection types, and so on. At that point you’re just using Scala as a better Java, taking advantage of type inference, omitting semicolons, enjoying a better type system, writing tail-recursive code instead of loops that mutate data, etc. But you are still getting essentially the same performance you get from Java itself, with those conveniences. So even in this case, what is the incentive to use Java? And for the other 95% or so of the code, your goal should be to make it robust, maintainable, extensible, etc., in which case you are far better off with Scala, using functional code, immutable data, actors, and so on.

This post gives a great example of the difference between using Scala as a better Java and writing functional Scala. Using immutable linked lists rather than mutable arrays, and filtering rather than modifying in place, make the code dramatically simpler — but also much slower. What may be more of a surprise is that when the author used Scala as a better Java on an in-place sort in an array, the Scala version outperformed the Java version (because the Scala compiler optimizes simple tail recursion). So it’s a trade-off, and it’s up to you to decide when to strive for maintainability and when to strive for performance. But if you are leaning hard toward performance more than about 5% of the time in a large app, you are probably doing your team a disservice.

The functional version — taken from an old paper introducing Scala — should be thought of as a demonstration of the expressive power of Scala, not as the “right way” to do things in Scala.

In fact, the performance difference demonstrated in that post is not really about Scala vs. Java. If you wrote Java code to sort an immutable linked list using the same technique, it would perform just as poorly. The only reason we even think to blame Scala for its poor performance in that algorithm is that the Java implementation would have been so painful to write that we wouldn’t bother. Immutable linked lists are easy in Scala and perform admirably for many algorithms, but they are a poor choice for huge sorts; we developers are expected to know when their use is appropriate. You can’t observe that some poorly performing algorithm was far easier to implement in Scala than Java and conclude that Scala’s performance is poor. It’s not Scala’s fault if you do something silly, even if Scala made it easy to do it.

This StackOverflow answer about how to count the distinct vowels in an English word gives a dramatic Scala example of recoding something that is straightforward for much higher performance.

Scala gives you a wider range of choice in the tradeoff between performance and maintainability than Java does, but you can always get Java-level performance if you need it. So the real difference between the languages is not about performance at all — it’s that you can write much more maintainable code in Scala where performance is not critical. Where it is critical, you do what you need to do, and that’s pretty much the same thing in the two languages. Don’t take this chart too literally, but I’ve tried to roughly convey Scala’s performance spectrum here:

How to think about Scala performance

How to think about Scala performance

What you should get out of this diagram is that for that 5% where performance is critical, Scala code can be about as fast as Java code while being somewhat more convenient; and for the 95% where performance isn’t so important, Scala offers agility comparable to that of dynamic languages while performing better. We just discussed an instance of the former; here are some examples of Scala’s agility: AB, C. I doubt that any could be expressed better, or perhaps even as well, in Python or Ruby.

5. Immutable collections are slower than their mutable counterparts, leading some to suggest that they should be avoided. However, immutable data structures are a huge win when it comes to making your code concurrent, since you can hand off a structure and then modify it without synchronizing or making a copy, both of which have serious performance implications. Also, the use of immutable data usually makes your code easier to reason about. So you should lean toward immutable collections, and Scala makes that easy. Then if it turns out that you could improve performance substantially by using a mutable collection somewhere, go for it, but be careful not to paint yourself into a corner. If your app is a success and demand increases dramatically, you may very well want to make your app more concurrent, and everything mutable will be a land mine waiting to blow up in your face.

6. Often the really big performance wins are in the architecture rather than in individual lines of code — the problem is what you are doing, not how fast you are doing it. Opportunities for transformative architectural change are likely to be more obvious to you — and easier to implement — if you are not knee-deep in boilerplate, imperative code, and mutable data.

Conclusion

So please, no more talk of “if I can serve all of my customers on one machine instead of 100 machines….” Balderdash! And mandates like “never use closures” or “never use immutable collections” are certainly not good advice.

A much better mandate would be to deal with performance in Scala the way you do — or at least should — in any language: design good abstractions, write maintainable code, profile it, and then trade off maintainability for performance only where the engineering cost of writing/testing/maintaining a more complex implementation is exceeded by the performance benefit it delivers. If you do that, you will end up with an app that performs close enough to a Java app for your needs but is much more maintainable and scalable.

UPDATE: This article talks about the goals for Scala 2.11 (currently under construction), which further target performance.

[other Scala posts]  [An Impromptu Scala/Groovy Competition]

An Impromptu Scala/Groovy Competition

I got a kick out of this post. The author, who was learning Scala, came up with a solution to this problem. Don’t bother to analyze it, but here is his solution (digits — what that post calls bigNum — is a String of 1000 digits):

def product( digits:String ) = { digits.map( java.lang.String.valueOf(_).toInt ).reduceLeft( (prod,c) => prod*c ) }
digits.sliding(5).toSeq.map( s => product(s) ).max

He was really happy with it and, understandably, wanted to share it and explain how it works.

Well, the very first comment was a somewhat tacky “mine is better because it’s shorter” Groovy solution (don’t bother with it either), albeit shorter mostly by not breaking out some of the work into a function as the post’s author had chosen to do for clarity (I’ve changed variable names to match the Scala version):

( 0 .. digits.size()-5 ).collect { digits.substring( it, it+5 ).toList().inject(1) { prod,c -> prod * c.toInteger() } }.max {it}

But then a more seasoned Scala developer absolutely blew the Groovy solution’s doors off with this:

digits.map{_.asDigit}.sliding(5).map{_.product}.max

That is, convert the digit characters to their numeric values, map every sliding window of five numbers to its product, and find the maximum of those. Assuming you know what a “sliding window” is and what map does, it’s hard to imagine a much more readable solution.

Now the truth of the matter is that having the shortest solution is usually not nearly as important as having a maintainable (understandable and non-fragile) solution, but in this case the Scala solution is far better on all of those counts. Also, Scala code will often be longer than the equivalent Groovy code, because Scala is statically typed. Whether or not performance, and catching type errors before you ship, are worth having to specify some type information is up to you. But in this case those who bragged about Groovy have good reason to be embarrassed.

But there was more. The ill-fated Groovy challenge was followed by another comment saying, effectively, that the Scala code was unreadable and that clever coders should be fired and replaced with average coders who write three times as much code in Java to do the same thing, because then everyone will understand it later.

Really?

If a company I owned stock in expressed that attitude, I would start thinking about putting in a sell order. There are very good business reasons to use modern programming languages, and there are very good business reasons to hire smart software developers. I am bewildered by people who think it is better to stick with older, less capable languages so that they only need mediocre staff. Good luck!

There’s a big difference between hard-because-unfamiliar and obtuse. The seasoned Scala coder’s solution is way more understandable, to anyone who knows Scala, than an equivalent Java solution would be to another Java developer.

But even if it weren’t understandable, to you — even if you didn’t have any idea what those methods do, or couldn’t synthesize their collective effect in your mind — it is really easy to figure such things out in the REPL. Just start by evaluating some subexpression and iterate toward the whole thing, recalling the previous line with an up-arrow and adding to it:

scala> val digits = "9873459873"
digits: String = 9873459873
scala> digits.map{_.asDigit}
res0: scala.collection.immutable.IndexedSeq[Int] = Vector(9, 8, 7, 3, 4, 5, 9, 8, 7, 3)
scala> digits.map{_.asDigit}.sliding(5)
res1: Iterator[scala.collection.immutable.IndexedSeq[Int]] = non-empty iterator
scala> digits.map{_.asDigit}.sliding(5) foreach println
Vector(9, 8, 7, 3, 4)
Vector(8, 7, 3, 4, 5)
Vector(7, 3, 4, 5, 9)
Vector(3, 4, 5, 9, 8)
Vector(4, 5, 9, 8, 7)
Vector(5, 9, 8, 7, 3)
scala> digits.map{_.asDigit}.sliding(5).map{_.product} foreach println
6048
3360
3780
4320
10080
7560
scala> digits.map{_.asDigit}.sliding(5).map{_.product}.max
res4: Int = 10080

For very little effort we have walked through the code step by step as it transforms the data into the solution. Given a working knowledge of Scala, the most junior developer on your staff could do this.

The REPL is a fantastic tool in the developer’s toolbox. Back when IDE support for Scala was young and unstable, you’d see comments about how tool support for Scala was not as good as Java’s. (To be honest, it’s still not as good, but it’s quite usable.) It was a legitimate concern, but even then Scala had a nice REPL and Java did not.

And now there is a “Scala worksheet” in Eclipse and IntelliJ that allows you to do such explorations in a way which is often more convenient, and these explorations can be stored alongside your code as documentation. And because Scala is a statically typed language, these tools can look up and complete method names for you, a big time-saver. So the guy who comes along in six months and has to figure out what some functional Scala code does is going to find that much easier than reverse-engineering the equivalent imperative Java code.

But please, go right ahead and fire those smart software developers who insist on programming in modern languages. We want to hire some!

[other Scala posts]

ZSH map and filter functions using “anonymous functions”

UPDATE: Arash Rouhani picked this up and took it quite a bit further here.

The other day I was thinking that it would be handy to have a map function (in the functional programming sense) for zsh, and I couldn’t see anything in zsh itself that looked like what I wanted. So a quick google turned up this page by Yann Esposito, which gives an implementation of not only map but also filter and fold. Very cool!

The only problem, as the author points out, is that it is inconvenient to have to actually define a separate named function in order to use the facilities. So I groveled around in the zsh docs and found the (e) qualifier for parameter expansion, which causes an evaluation. That led me to write new versions of map and filter and related commands that work with anonymous “functions” — really just bits of code that get evaluated with $1 set to something — like so:

### Map each of a list of integer Xs to X+1:

$ mapa '$1+1' {1..4}
2
3
4
5

### Map each FOO.scala to FOO.class:

$ map '$1:r.class' test{1,2}.scala
test1.class
test2.class

### Get the subset which are ordinary files (bin is a dir):

$ filterf 'test -f' bin test{1,2}.scala
test1.scala
test2.scala

### Get the even numbers between 1 and 5:

$ filtera '$1%2 == 0' {1..5}
2
4

### Map each filename to 0 if it is an ordinary file:

$ each '[ -f $1 ]; echo $?' /bin test{1,2}.scala
1
0
0

### Given a directory tree containing some Foo.task files,
### an isStartable function that returns success if a task is startable now,
### and a startTask function that starts it, start all startable tasks.

$ eachf startTask $( filterf isStartable **/*.task )

Here are the functions:

###### map{,a}

### map Word Arg ...
### For each Arg, evaluate and print Word with Arg as $1.
### Returns last nonzero result, or 0.

function map() {
  typeset f="$1"; shift
  typeset x
  typeset result=0
  for x; map_ "$x" "$f" || result=$?
  return $result
}
function map_() {
  print -- "${(e)2}"
}

### mapa ArithExpr Arg ...   # is shorthand for
### map '$[ ArithExpr ]' Arg ...

function mapa() {
  typeset f="\$[ $1 ]"; shift
  map "$f" "$@"
}

###### each{,f}

### each Command Arg ...
### For each Arg, execute Command with Arg as $1.
### Returns last nonzero result, or 0.

function each() {
  typeset f="$1"; shift
  typeset x
  typeset result=0
  for x; each_ "$x" "$f" || result=$?
  return $result
}
function each_() {
  eval "$2"
}

### eachf Command Arg ...   # is shorthand for
### each 'Command $1' Arg ...

function eachf() {
  typeset f="$1 \"\$1\""; shift
  each "$f" "$@"
}

###### filter{,f,a}

### filter Command Arg ...
### For each Arg, print Arg if Command is successful with Arg as $1.

function filter() {
  typeset f="$1"; shift
  typeset x
  for x; filter_ "$x" "$f"
  return 0
}
function filter_() {
  eval "$2" && print -- "$1"
}

### filterf Command Arg ...   # is shorthand for
### filter 'Command "$1"' Arg ...

function filterf() {
  typeset f="$1 \"\$1\""; shift
  filter "$f" "$@"
}

### filtera ArithRelation Arg ...  # is shorthand for
### filter '(( ArithRelation ))' Arg ...

function filtera() {
  typeset f="(( $1 ))"; shift
  filter "$f" "$@"
}

Writing this kind of code is tricky for me; it’s easy to get the quoting wrong. For example,

$ each 'echo "$1"' \*

should just print an asterisk, but at first I was getting a list of files.

Anyway, it was fun. I might add fold and friends later, when I have more time.

Oh, one last thing. I use this function for testing zsh code:

TEST() {
  echo TEST: "${(qq)@}"
  "$@"
}

That way I can write a suite of many tests like this one

TEST filtera '$1%2 == 0' {1..5}

and get output like this, showing me what is being tested, followed by the results:

TEST: 'filtera' '$1%2 == 0' '1' '2' '3' '4' '5'
2
4

Without the (qq) you wouldn’t be able to tell where one argument ends and the next begins:

TEST: filtera $1%2 == 0 1 2 3 4 5
2
4

A Quick Look at Scala

Here are a few fairly simple steps in Scala that show a little of its flavor and power, followed by a bit of comparative Java code.

      • Let’s start by creating a map from integers to their English names. We’ll populate the map for integers 1 and 2 as examples, and make the name of any integer not in the map “UNKNOWN”.
        val map = Map( 1->"one", 2->"two" ) withDefaultValue "UNKNOWN"
      • Now let’s produce a list of the values from that map that have even keys. That is, for each key-value pair in the map, if the key is even, we want the value.
        for ( (k,v) <- map if k%2 == 0 ) yield v
      • OK, now generalize that by writing a method which, given such a map from Int to String and any test function for the keys, returns a list of those values whose keys pass the test (i.e. the test function returns true).
        def pickValues( map: Map[Int,String], keyTest: Int => Boolean ) =
          for ( (k,v) <- map if keyTest(k) ) yield v
      • Next let’s generalize the method to work with any type of key and any type of value. That is, we’ll write a method which, given any map and a test function appropriate for the map’s keys, will return a list of those values whose keys pass the test.
        def pickValues[K,V]( map: Map[K,V], keyTest: K => Boolean ) =
          for ( (k,v) <- map if keyTest(k) ) yield v
      • Now let’s invoke that last method to get the names of even numbers from the map we created earlier.
        pickValues( map, (x:Int) => x%2 == 0 )
      • That’s already pretty nice.  With some more advanced tricks (which I won’t go into here) we could make that
        map pickValues ( _%2 == 0 )
      • But in reality you probably wouldn’t write this at all because Map has a method that produces the subset Map whose keys satisfy some predicate.  You could just ask for the values of that subset.
        ( map filterKeys ( _%2 == 0 ) ).values

Did you follow that? My guess is that even if you aren’t a Scala developer, you can more or less understand what’s going on without someone having to explain the details. And we very quickly arrived at a fairly general-purpose bit of code that is straightforward to use.

Now for some Java.

        • First let’s create that map.
          Map<Integer,String> map = new HashMap<Integer,String>();
          map.put(1,"one");
          map.put(2,"two");
In addition to being more verbose and less readable, that falls a bit short in that it doesn’t specify a default value.  You could use some other collections library if you wanted that, but the JRE Map interface and its implementation classes just don’t have that feature, so you would either have to handle defaulting each place you used the map or write a method to take care of that once and for all.
        • Now produce a list of the values from that map with even keys.
          List<String> list = new LinkedList<String>();
          for ( Integer i : map.keySet() ) {
              if ( i%2 == 0 ) list.add( map.get(i) );
          }

The next step would be to generalize this to a method which takes a test function on Integers. We could do it, of course, but we’re already at the limit of what most people would be willing to do in order to avoid repeating themselves. Since there are no functions in Java, what we wrote in Scala as

(x:Int) => x%2 == 0

would become an unwieldy anonymous inner class. Most people probably wouldn’t bother. If you are serious about wanting to do such things, then at the very least you should look at Google’s collection library for Java. But even that is going to be much less friendly than Scala.

Perhaps more importantly, the Java collections are not immutable. This may not seem like a big deal when you are first starting out in Scala, but it really has far-reaching implications in a concurrent environment.

Part of the reason I wanted to blog about this is that in the now-infamous Yammer letter in which an engineer who likes the Fantom programming language tries to make Scala sound like a horrible programming language with a horrible community, he enumerates the things he had to explain to new members of his team for even the simplest usage of a collection:

        • implicit parameters
        • builder typeclasses
        • “operator overloading”
        • return type inference
        • Traversable
        • TraversableOnce
        • GenTraversable
        • Iterable
        • IterableLike
        • Should they be choosing the most general type for parameters, and if so what was that?
        • What was a =:= and where could they get one from?

Really? All of that was necessary in order to understand even the simplest use of a collection? As you can probably tell, I am not a fan of his teaching style.

Rather than burying the magic necessary to create a really good collections library in the compiler, where it would be unavailable to the rest of us, the Scala designers made the type system powerful enough to allow a high-quality collections library to be written as ordinary Scala code. The good news is that that means more power for us to use in our own code. Unfortunately this also means that when you look at detailed type signatures in the collections library you see the many traits and classes needed to create it. This is called the “visible plumbing problem.”

Recently the API documentation has been changed to make the less interesting parts of the plumbing less visible. And even so, the type signatures can be a little daunting for newcomers. But nobody should be deceived into thinking that they’ll have to understand GenTraversable, IterableLike, etc. in order to use Scala collections — that’s just FUD. The only reason I can think of for deliberately dragging someone new to Scala through all of that plumbing is to try to convince them that Scala is hard. When teaching someone to drive, do you start by popping the hood and explaining how the drive train works?

Fortunately I didn’t have that guy’s help getting started.

[other Scala posts]

Scala Snippets

This post is really just notes to myself, which I add to as I find things. But I thought I’d share — maybe others will find them useful.


0. Resources:


1. Use flatMap with monadish objects like Some/None to pick out (and map) the interesting stuff:

  val data  = Map( "name" -> "Fred", "age" -> 21, "iq" -> 140 )
  val attrs = List( "name", "age", "income" )
  attrs flatMap data.get   // List(Fred, 21); no iq or income, poor guy

2. Use partial functions to put cases in derived classes which get used from the base class. For example, here is how you would take care of handling bad input in one place (badValue) for a set of related classes:

  import scala.{ PartialFunction => PF }
  abstract class Base {
    def validCases: PF[...]  // define in subclasses
    def   badValue: PF[...] = { case value => err( "bad value: " + value ) }
    def handle( value:... ) = ( validCases orElse badValue )(value)
  }
  class Derived1 extends Base {
    def validCases = {
      case ... => ...
      case ... => ...
    }
  class Derived2 extends Base {
    def validCases = {
      case ... => ...
      case ... => ...
    }
}

3. Use this.type in methods in a base class where you need to return the derived type (otherwise Scala will use the base type). Useful for builder hierarchies:

  abstract class Base {
    def withDefault(...):this.type = {
      ...
      this // but as the derived type!
    }
  }

4. If you are having type problems in Eclipse, try these in order:

  1. Save all files and wait a few moments.
  2. From the menu, do Project / Clean and wait a few moments.
  3. One by one, add types to closure args and method returns.
  4. Eliminate implicit defs, even if you are absolutely convinced that they aren’t coming into play in the problematic code.
  5. If you are getting a “missing parameter type” error where you think the compiler should be able to infer the type, try assigning an intermediate result to a variable to force the compiler to choose the most specific type.

5. From Daniel Sobral, an expression that, given a string containing words (and perhaps other crud), returns a map from words to their frequencies. \P{Alpha} is the negation of \p{Alpha}, the POSIX char class. See the JDK 6 Pattern javadoc for details on regex constructs. Note the use of identity, which is just ( x => x ); it’s defined in the prelude, so you don’t need to create another.

  val text = "How much wood could a wood chuck chuck, if a wood chuck could chuck wood."
  text.toLowerCase.split("""\P{Alpha}+""").groupBy(identity).mapValues(_.length)

6. Don’t forget that you can see the code scalac magically generates for you using a compiler option. The output below has been manually formatted (somewhat) and commented:

% cat Person.scala
 case class Person( name:String, age:Int )

% scalac -Xprint:typer Person.scala    # this is 2.9.1-final

[[syntax trees at end of typer]] // Scala source: Person.scala

package  {

// Here's the generated class.
case class Person extends java.lang.Object with ScalaObject with Product with Serializable {

  // Private members and public accessors.
  private[this] val name: String = _; def name: String = Person.this.name;
  private[this] val  age: Int    = _; def  age: Int    = Person.this.age;

  // Constructor.
  def this(name: String, age: Int): Person = {
    Person.super.this();
    ()
  }

  // Versions of copy() for every combination of supplied parameters.
  def copy(name: String = name, age: Int = age): Person = new Person(name, age)
  def copy$default$2: Int    @scala.annotation.unchecked.uncheckedVariance = Person.this.age
  def copy$default$1: String @scala.annotation.unchecked.uncheckedVariance = Person.this.name

  // The usual stuff you want for every immutable class.
  override def hashCode(): Int    = ScalaRunTime.this._hashCode(Person.this)
  override def toString(): String = ScalaRunTime.this._toString(Person.this)
  override def equals(x$1: Any): Boolean =
    Person.this.eq( x$1.asInstanceOf[java.lang.Object] ).||( x$1 match {
      case (name: String, age: Int)Person( (name$1 @ _), (age$1 @ _) ) if name$1.==(name).&&( age$1.==(age) )
            => x$1.asInstanceOf[Person].canEqual(Person.this)
     case _ => false
    } )
  override def canEqual(x$1: Any): Boolean = x$1.$isInstanceOf[Person]()

  // Methods that allow all case classes to be self-describing.
  override def productPrefix: java.lang.String = "Person";
  override def productArity: Int = 2;
  override def productElement(x$1: Int): Any = x$1 match {
    case 0 => name
    case 1 => age
    case _ => throw new java.lang.IndexOutOfBoundsException(x$1.toString())
  }

}  // class Person

// Here's the companion object.
final object Person extends scala.runtime.AbstractFunction2[String,Int,Person]
  with ScalaObject with Serializable {

  def this(): object Person = {
    Person.super.this();
    ()
  };

  final override def toString(): java.lang.String = "Person";

  case def unapply(x$0: Person): Option[(String, Int)] =
    if (x$0.==(null))
        scala.this.None
      else
        scala.Some.apply[(String, Int)](scala.Tuple2.apply[String, Int](x$0.name, x$0.age));

  case def apply(name: String, age: Int): Person = new Person(name, age);

  protected def readResolve(): java.lang.Object = Person

}  // companion object

}  // package

7. The usual idiom for providing Java versions of Scala APIs:

  import scala.collection.JavaConverters._
  def jFoos = foos.asJava
  def  foos = ...

Info on Java-Scala conversions, and why it is better to avoid the implicit JavaConversions.  Info on JavaConversions.  Do this if you want exceptions checked in the Java code:

  @throws( classOf[TimeoutException] )
  def jFoos ...

8. Passing on repeated parameters (a.k.a. varargs), with modification:

  def foo( a:String, b:Int* ) = bar( a, 7 :: b.toList : _* )

And here’s how you match repeated parameters:

  case Seq( xs @ _* ) => ...

9. Don’t do this (unless you really mean to):

  val x = new X {...}

That leads to the use of reflection (x is structurally typed). This fixes it:

  val x:X = new X {...}

More details here.


10. To filter a Seq by type in a for comprehension, you need to use this trick:

val mixedTypes = List( 5, "five" )
  for (   (x:Int)y@(x:Int)

But if you use a pattern, it works without the trick:

val mixedTypes = List( (1,5), (1,"five") )
  for ( (a,b:Int)

Basically what’s happening is that if you use a pattern, you get a PartialFunction, which will handle values that match and ignore values that don’t. (x:Int) is not a pattern, but y@(x:Int) is.


11. If you are thinking in terms of zipWith or mapcar to combine multiple Seqs, this is probably what you want:

  val xs = List(1,2,3)
  val ys = List(7,8,9)
  (xs,ys).zipped map (_+_)  // List(8,10,12)

Note that no intermediate list is built, as is with

  xs zip ys map { case (x,y) => x+y }

and that zipped works for more than two input Seqs:

  (xs,ys,zs).zipped map ( _*_/_ )

Apparently this only works with Tuple2 and Tuple3, though. Also, be careful — zipped is strict, so attempting to map zipped infinite Streams will win you a StackOverflowError or OutOfMemoryError.


12. Here’s a discussion about profiling and performance. They rave about the YourKit profiler and the Caliper microbenchmarker, and point to this interesting post. This is all pre-macros, so the advice may need an update soon.


13. Here‘s how to use a PriorityQueue to get the top N (in this case 4) elements of a larger list by some ordering criterion, without having to sort the entire list:

  val pq = PriorityQueue( 7,3,4,1,2,0,9,5,8,6 )( Ordering.Int.reverse )
  Iterator.fill(4)( pq.dequeue ) foreach print  // 0123

Of course, a PriorityQueue is perfect for situations where you keep getting new entries and want to always process the highest-priority one first. BUT — currently there is no immutable PriorityQueue in the standard libary, only a mutable one. You might think that you could use SortedSet (TreeSet) as an immutable replacement for PriorityQueue, but it is the set’s Ordering that determines whether or not a value is already in the set, not equals() — so an attempt to add something to the collection which compares as equal to an existing member of the set will be ignored. For example, if you have a SortedSet of Vectors ordered on the size of the vector, then the set will only hold one vector of length 4. Only use SortedSet if you are sure that distinct values will never be considered equal by the set’s Ordering#compare.


14. GenTraversableFactory contains methods (which are available implicitly for Traversables) for creating/initializing new collections:

  collection.mutable.ArrayBuffer.fill(3,2)(1)           // 3x2 ArrayBuffer(s) of 1s
  List.fill(10)( util.Random.nextInt(2) )               // 10 of 0|1
  new String( Array.fill(20)( util.Random.nextPrintableChar ) )  // random String
  Vector.tabulate(3,3)( (i,j) => if (i==j) 1 else 0 )   // 3x3 identity matrix
  util.Random.shuffle( List.range(0,11,2) )             // 0,2,...,10 shuffled
  Iterator.fill(20)( util.Random.nextGaussian ) foreach println  // don't make a List
  List.iterate(1,10)(2*)                                // 1,2,4,8,...,512
  List.concat( 1 to 4, 9 to 12, Set(2,5) )              // 1,2,3,4,9,10,11,12,2,5

15. It’s not surprising that scala.util.Properties lets you get/set properties, but the same class will also tell you which OS you are on, your username, your temp dir, the values of environment variables, etc.

  Properties.propOrNone("foo") match { case None => ...; case Some(foo) => ... }
  Properties. envOrElse("foo","bar")

16. A nice post from Daniel Sobral (a Scala bard) explaining for comprehensions.


17. Tips for investigating long compile times.


18. Here’s the usual way of defining a Stream of Fibonacci numbers, using zip (remember that zipped won’t work because it’s strict):

  val fibs: Stream[BigInt] = 0 #:: 1 #:: ( fibs zip fibs.tail map ( n => n._1 + n._2 ) )

But you can make a more efficient version of this kind of Stream by avoiding the zip, like so:

  val fibs: Stream[BigInt] = {
    def loop( h:BigInt, n:BigInt ): Stream[BigInt] = h #:: loop( n, h+n )
    loop(0,1)
  }

19. To define cyclical dependencies:

  trait A { self:B => ... }
  trait B { self:A => ... }

20. The stackable trait pattern:

  abstract class       Ab { def f(...) : ... /* abstract */ } // or trait
  class Base   extends Ab { def f(...) = ... }
  trait Tweak1 extends Ab { abstract override def f(...) = super.f(...) }
  trait Tweak2 extends Ab { abstract override def f(...) = super.f(...) }

  val tweaked = new Base with Tweak1 with Tweak2    // or whatever
  tweaked.f(...)  // calls Tweak2.f calls Tweak1.f calls Base.f  // reverse order

If Ab is an abstract class (as opposed to a trait) it can take constructor parameters, but of course you can only extend one class.


21. Implicit value classes allow you to effectively add methods to a class without actually creating new objects:

  // This value class adds a times() method to Int, allowing you to do
  // 3 times { ... }
  implicit class Int_times(val numTimes:Int) extends AnyVal {
    def times(f: => _) {
      var i = numTimes
      while (i > 0) { f; i -= 1 }
    }
  }

22. To get the conveniences of case classes within a class hierarchy, make the case classes the leaves, duplicate any common fields in them (sorry DRY), and use abstract methods to represent the common fields in a superclass:

  // Want to be able to access node.value without having to figure out
  // whether it is a Branch or a Leaf.
  abstract class   Node { val value:Int }
  case     class Branch(      value:Int, left:Node, right:Node ) extends Node
  case     class   Leaf(      value:Int                        ) extends Node

23. Be careful with mapValues and filterKeys — they currently return a view (without saying so — a bug), which means that the computation gets done again every time you use the result.  Use map.mapValues(…).view.force to avoid this, or map the result through the identity function:

scala> val m = Map(1->2, 3->4) mapValues { x => println("OUCH!"); x+1 }
OUCH!
OUCH!
m: scala.collection.immutable.Map[Int,Int] = Map(1 -> 3, 3 -> 5)

scala> m(1)
OUCH!
res20: Int = 3

scala> val n = m map identity
OUCH!
OUCH!
n: scala.collection.immutable.Map[Int,Int] = Map(1 -> 3, 3 -> 5)

scala> n(1)
res21: Int = 3

24. Array equality is weird (a Java legacy, presumably):

  import collection.mutable
  import mutable.ArrayBuffer
          Set(1,2)        ==           Set(1,2)        // true
         List(1,2)        ==          List(1,2)        // true
  mutable.Set(1,2)        ==   mutable.Set(1,2)        // true
  ArrayBuffer(1,2)        ==   ArrayBuffer(1,2)        // true
        Array(1,2)        ==         Array(1,2)        // FALSE!!!
        Array(1,2)   sameElements    Array(1,2)        // true
        Array(1,2).deep   ==         Array(1,2).deep   // true
        Array(1,2).toSeq  ==         Array(1,2).toSeq  // true (WrappedArray)

25. Constructor goodies:

  class A         (                   b:Int )    // b is just a constructor arg
  class A         (               val b:Int )    // b is a field
  class A         (       private val b:Int )    // b is private
  class A private (                   b:Int )    // primary constructor is private,
                                                 // but usable by other constructors
  class A         ( @BeanProperty var b:Int )    // adds JavaBean getB/setB methods

And you can subclass a Java class with multiple constructors like so.


26. Composing Futures:

  import scala.concurrent._
  import ExecutionContext.Implicits.global
  def f(x:Int) = { Thread.sleep(5000); x }
  val futures = 1 to 9 map { x => Future( f(x) ) }  // List[ Future[X] ]
  val future  = Future sequence futures             // Future[ List[X] ]
  future onSuccess { case results => println( results.sum ) }
  println("this gets printed before the sum")

27. To write a set of programs which all have to do the same setup (examine environment variables, process arguments, open connections, etc.) and teardown, rather than calling out to a library to do the common stuff you can simply extend App, put the common stuff in there, and then extend that to create each of your programs:

  class  DbApp  extends   App { /* common stuff */ }
  object DbApp1 extends DbApp { ... }
  object DbApp2 extends DbApp { ... }

The children of DbApp will of course inherit any variables defined by it.


28. You cannot use _root_ to import from the default package (ultimately a JVM limitation):

                class C1
  package a   { class C2 }
  package a.b { class C3 {
                  val c2fails = new C2            // not found (in a.b)
                  val c2works = new _root_.a.C2   // OK
                  val c1fails = new _root_.C1     // not a member of pkg 
              } }

29. To throw an exception without the overhead:

  throw new Exception with scala.util.control.NoStackTrace

30. You can use collection.breakout (the breakout method on the scala.collection package object) to provide a CanBuildFrom that produces a specific type, obviating a toList or other conversion.

  type S = String
  def cross( a:S, b:S ):List[S] = for ( c1  b.map( c2 => ""+c1+c2 ) )(collection.breakOut)  // ok
  cross("ab","12")  // List("a1","a2","b1","b2")

And here Rex Kerr shows how to use CanBuildFrom in your own code to make it return the type you started with.


31. While you can use “System exit 1” to exit a program just as you would in Java, “sys exit 1” is better in that it returns Nothing, allowing you to use it in expressions without losing the type info:

  def die( msg:String ) = { log(msg); sys exit 1 }
  val age = if ( born <= now ) now - born else die("born in the future!")

32. Do this to make an infinite Stream by repeating a TraversableOnce:

  implicit class RichTraversable[A](val items: TraversableOnce[A]) extends AnyVal {
    def looped =
      if (items.isEmpty)
        sys error ".looped"
      else {
        lazy val s: Stream[A] = items.toStream #::: s
        s
      }
  }
List(1,2,3).looped take 10 foreach print  // 1231231231

But that’s still a Stream, so it will eventually consume the heap storing calculated results. You could of course make an Iterator, e.g.

  implicit class TraversableLooper[A : scala.reflect.ClassTag](val items: TraversableOnce[A]) {
    def looped: Iterator[A] =
      if (items.isEmpty)
        sys error "(empty).looped"
      else
        new Iterator[A] {
          val array = items.toArray
          val numItems = array.length
          var i = -1
          def hasNext = true
          def next = {
            i += 1
            if (i == numItems) i = 0
            array(i)
          }
        }
  }
  (List(1,2,3).looped take 900000000).sum   // 1800000000

Note that you can’t currently make the latter a value class because it defines an inner class. That restriction is probably going to be eliminated (in 2.11?).  And notice the use of ClassTag so that we can create an Array[A] without knowing what A is.


33. You can do this to choose between methods once (and then use whichever you have chosen repeatedly):

  val f = if (...) f1 _ else f2 _

But if f1 and/or f2 are overloaded you will probably need to say which you mean, like so:

  val f = if (...) f1(_:Int) else f2(_:Int)

More info here.


34. Reverse sorting:

  val names = List("d","b","c","a")
  println( names sorted Ordering[String].reverse )

or, more generally:

  def revSort[ A:Ordering ]( s:Seq[A] ) = s sorted Ordering[A].reverse

Or you could (apparently) add a method to Seq:

  implicit class SeqMethods[A](val xs: Seq[A]) extends AnyVal {
    def revSorted(implicit ord: Ordering[A]) = xs sorted ord.reverse
  }

35. You can bind a lowercase variable to an existential type:

  val x: Some[_] = Some("foo")

  x match {
    case st:Some[t] => // t is a type variable
      val it:t = st.get
      var items = Set[t]() // which we can use to define new types
      items += it
      items
  } // Set[_]

Note that you still get Set[_] even if you start with Some[String], because Some[t] would match other types; this doesn’t bind t to String.


36. You can exclude some parameters of a case class from the usual magic by putting them in a separate parameter list.  E.g. here see that c is ignored by toString() and equals():

scala> case class Foo(a: Int, b: Int)(c: Int)
defined class Foo
scala> val f1 = Foo(1,2)(3)
f1: Foo = Foo(1,2)
scala> val f2 = Foo(4,5)(6)
f2: Foo = Foo(4,5)
scala> val f3 = Foo(1,2)(7)
f3: Foo = Foo(1,2)
scala> Set(f1,f2,f3)
res1: scala.collection.immutable.Set[Foo] = Set(Foo(1,2), Foo(4,5))

37.

[other Scala posts]

Enums in Scala

[To cut to the chase, there is a link to a Scala Enum class you might find useful at the end of this post.]

Newcomers to Scala are often surprised to find that it is easier to write enums in Java than it is in Scala. Nearly everything else is easier in Scala than in Java, and often has preferable semantics as well, so the fact that enums are easier in Java comes as a bit of a shock.

But once you think about it, it makes some sense. The designers of Scala resist putting in highly focused language features, instead preferring features that have very broad applicability. The designers of Java said “Developers need enums; let’s enhance the language to make it easy to get good enums.” The designers of Scala, on the other hand, want as much as possible for problems to be solved by writing classes and traits and putting them in a library. The Scala language designers ask themselves questions like “Developers need to be able to do things like X and Y; have we made it possible to write good library solutions for such things?” Something that is preventing library designers from creating good solutions to common problems is a good argument for a new language feature. The Scala language designers tend to see their role at a different level of abstraction.

Consider Perl’s terse syntax for hash tables. The Perl designers (i.e. Larry Wall) decided to build in language features to make hash tables easy to create and use, whereas the Java designers decided to do that in a library rather than in the language itself.

In the same way, Java targeted enums directly, and as a result the syntax is fairly minimal, whereas the Scala folks have so far decided that the problem doesn’t warrant new language features. And even if the Scala folks were to do something special for enums, they couldn’t make it three times easier to create an enum than in Java; it would be about the same.

It could certainly be easier, but without too much difficulty you can make enums in Scala that are similar to Java enums, and actually do a little more. There seems to be a lot of churn on the web about how to do enums well in Scala, so I wrote up something about my own implementation in this StackOverflow answer (how did developers get anything done before StackOverflow?).

UPDATE:  Wouldn’t macros (available in 2.10) allow for a very usable enum facility?  I wouldn’t be surprised to see something in 2.11.  Ah — here’s a thread about that.  And another.

[other Scala posts]

What Has Science Ever Done For Me?

(a bit of a rant)

My son had some college friends over for dinner a while back, and at one rather interesting point in a lively conversation my overall respect for science became evident, and this fellow Henry (that’s not really his name), scoffed “Pfft. What has science ever done for me?”

I hardly knew what to say. And by the way, the complaint is probably more often levied at mathematics — what possible good is it in our lives?

Henry was wearing clothes made of synthetic materials and dyes, both the result of many years of chemical research, woven into cloth by machines that run on electricity from a sophisticated electrical grid and are controlled by computers. He had a cell phone on his belt — good lord. The chips in it are the product of quantum mechanics and semiconductor physics. The software on it owes heavily to computer science. The encryption it uses to keep Henry’s conversations private is based on number theory. Our understanding of electromagnetism allows us to transmit his conversations using antennas designed using antenna theory. The transmission encoding depends on information theory. And heck, he drove over in a car; just imagine what we could find in the car and the roads and the oil refineries that wouldn’t exist if it weren’t for science.

But even that is peanuts. If it weren’t for science and math, Henry would very likely not exist at all. Consider: the germ theory of disease, the statistical methods by which we determine the effectiveness and risk of medicines, microscopes, medical imaging systems like MRI scanners and X-rays, diagnostic tests, medical monitoring equipment, medical databases, and so on — medicine has come a long way as the result of science. What are the chances that he or some ancestor of his would have died without all of that support, I wonder. Remember what infant mortality was like a few hundred years ago, and the average lifespan of those who did survive to adulthood? And food — there are way too many people on this planet for the food supply that could be produced without the chemical fertilizer made possible by the Haber-Bosch process for fixing nitrogen. And let’s not forget clean water.

If you could somehow magically go back in time and eliminate every person who made a contribution to science or math, our lives would be very primitive indeed. Which begs the question — how in the world could anyone in a developed country, particularly a college student, be so unaware of the mammoth contribution science and math have made to our quality of life?

Certainly as our world grows more complex, our understanding of it has trouble keeping up. Perhaps the average person simply thinks that cell phones come from businesses like Apple and Motorola and leaves it at that — no science needed, just retail storefronts. “Who needs science? I can get everything I need at WalMart!” Perhaps the conceptual distance between quantum mechanics and the use of it to talk to your girlfriend is just too great, and we lose the connection. Perhaps the greater our debt to science and math, the poorer our appreciation of their role in our lives.

Henry, were it not for science and math your life would suck beyond your imagination, except for the detail that you probably wouldn’t be alive at all.

Absent-Minded Professors

Most of us know somebody, often male, who can’t seem to remember details about things he has done or where he is supposed to be or people he has met and so on, but can effortlessly remember the physics or chemistry or math or economics required to explain some problem at hand — even though those courses were completed several decades ago. Men of this ilk can be extremely annoying to a wife who naturally assumes that if he really cared about her he would remember that they met at a particular restaurant, and would know that April 23rd is their anniversary without having to look it up, or — horrors — be reminded on the 24th.

I am travelling today, and over Skype I gave my son a laptop tour of my friend Magdalena’s house. “Nicholas and I painted this room together a few years ago,” Magdalena told him as we entered her living room.

After a short pause, I heard myself say “We did?”  Good thing she’s not my wife.

It turns out that it is possible to have lousy episodic memory but great semantic memory. Really, Magdalena — I’m not making this up. Episodic memory is memory about events — having met someone, having been somewhere, needing to be somewhere, etc. Semantic memory is about concepts. If tomorrow you explain to somebody the difference between episodic memory and semantic memory, you are using your semantic memory to do so. If you add that you read it in some guy’s blog, you are using your episodic memory. And if you explain this to your wife to justify having forgotten her birthday, you are using wishful thinking.

Wanted: DNA for Smart Brains

OK, smart people, you may be in a position to help science! There is important research underway to help determine the genetic factors of intelligence; our genes have a large influence on how are brains are built, and this effort will try to determine what is different about the genes of highly intelligent people from those of the general population. This research is likely to be ground-breaking, and the project is looking for test subjects. If your test scores meet any of the following criteria, you qualify:

– a pre-1995 SAT score of at least 780 Math and 700 Verbal
– a post-1995 SAT score of at least 800 Math and 760 Verbal
– a GRE score of at least  800 Quantitative and 700 Verbal
– an ACT score of at least 35

Sound like you? If you still have the documentation, apply at

https://www.cog-genomics.org/

If selected, they will send you a little kit for you to collect saliva. You’ll be able to see the results of your own DNA analysis.

Here’s a great talk on the subject:

Sounds interesting!

My Experience of Groovy

I recently inherited a bit of code written in Groovy. It does a lot of XML processing and fiddling with maps, and the author chose Groovy because it has handy facilities for such tasks. My job was to use that bit of demoware as a starting point to write a more substantial blob which would run as part of a servlet under Tomcat.

It was painful. Every time I misspelled an identifier or tried to access a variable that was out of scope or got a method name wrong or forgot to type “def” before the first assignment to a variable or just about anything else, I had to redeploy the servlet. Perhaps JavaRebel would have helped here, but I didn’t have it or know how to use it and I wasn’t sure it would work with Groovy anyway, so I didn’t invest the time to find out.

Also, the support for Groovy in the NetBeans IDE seems to be poor. Of course you can’t expect it to do nearly what it can with a strongly typed language — showing you inferred types, suggesting completions, doing refactoring. But there were other problems on top of that, like breakpoints not firing for code that clearly must have executed, and occasionally I would find myself stopped inside the Groovy runtime. And NetBeans’s “Apply Code Changes” feature, which sometimes allows you to change the code in a running program (kind of a poor man’s JavaRebel) didn’t work when I changed Groovy code.

Furthermore, Groovy didn’t seem to like anonymous classes; I was unable to anonymously extend a Java class I extend in Java all the time.

Ultimately I tried to rewrite some of the code in Java. But the code I rewrote extended a Groovy class, and apparently extending a Groovy class in Java is not allowed, or at least not workable. The Java subclass could not access inner classes defined in the Groovy superclass. The Java class could not access the superclass’s fields. The compiler complained that some mystical method required by the Groovy runtime was not written. I ended up having to make the Java class unrelated to the Groovy class and give each object a reference to the other — a bit ugly.

And finally, I got some really obtuse error messages, like this one:

Caught: org.codehaus.groovy.runtime.typehandling.GroovyCastException:
  Cannot cast object 'null' with class 'null' to class 'java.lang.Number'

The line in question was

if (save_file) saveMetricsInput(xml_metrics_text);

Huh?

  • What is “object null”? Neither of the variables used in that line was null.
  • And what is “class null”?  I had no idea.
  • And where is anything being cast to java.lang.Number?

When I get errors like this I begin “flailing” — making semi-random changes to the code to see what effect, if any, they have on the error messages produced, in the hopes of getting clues about what is really going on. Flailing is never pleasant.

Eventually I figured out that the return type for the method was (erroneously) long, and the above line was the last executable line in the method, so Groovy was trying to coerce that last line to a long value. I puzzled over other errors as well, but that one was probably the worst.

I enjoy Groovy for standalone scripting — it’s pleasant for that, especially when you need some dynamic behavior. I used Groovy for a framework for updating a database, and it worked out quite well — each update is itself a bit of Groovy code that gets invoked by the framework. But I really wish I had just rewritten this Tomcat-based thing in Scala the day I got it. I’ll stick with statically typed languages inside heavyweight containers from now on. And actually, Scala feels pretty dynamic itself, even having the moral equivalent of duck typing, missing-method handling, and a REPL.

It occurs to me that I might have had a better experience with Groovy if I had used a different IDE, perhaps Eclipse. In fact I’ve been wanting to get to know Eclipse better anyway since it appears to be the best-supported IDE now for Scala. But I think a big part of the draw for Groovy is that it is very Java-like, and that isn’t very important to me — I’ll happily invest the time to learn a different way of doing things if the result is better.

Finally, I’m working on a Griffon (which is based on Groovy) app right now, and I have to say I’m pretty impressed so far. I don’t generally like GUI work, but Griffon removes a lot of the pain. @Bindable and other features remind me of JavaFX. I am starting to see the power of Groovy’s AST transformations.

Groovy and Ruby/JRuby seem similar; both have interesting metaprogramming facilities. I have yet to get a feel for where I would choose Groovy vs. Clojure; my gut feel is that using Scala for statically typed code and Clojure for more dynamic work would be a dynamite combination. Now with ClojureScript on the client, Clojure is even more interesting!