Cuddly, Octo-Palm Tree: Notes on Optimizing Clojure Code: Reflection

20 February 2022

Notes on Optimizing Clojure Code: Reflection

Clojure is a dynamic language. That's great for (some notion of) expressivity, but sometimes it can get in the way of performance.

To get the best performance out of JVM Clojure, we have to understand how Clojure's brand of dynamic typing meshes with the JVM's. This is what this post is about.

The JVM

JVM bytecode is typed in roughly the same way that Java is. That is, when a method is invoked on an object, the JVM bytecode includes the full signature of that method, namely its name, the "target static type" on which it is (statically) invoked, and the (static) types of each of its arguments. At runtime, the JVM will check whether the given static type has a method with that name and those arguments (when loading the code) and whether the object it is invoked on (at invocation time, when running the code) actually implements that target static type.

In this context, the "target static type" is one of interface, class, or abstract class. This gives us some level of dynamism, because we can then pass to that bytecode any object that implements the target type, even one that did not exist when that bytecode itself was loaded. But it sure does sound expensive to have to scan the entire hierarchy of an object for each method invocation.

And it would be, if not for the amazing HotSpot JIT. One of the things it's really good at is figuring out when this information does not need to be checked anymore. Basically, given a block of N instructions that each invoke a method (and would thus in principle all need to check if their argument matches their individual static target types), the JVM can figure out that a given concrete, runtime class can bypass all these checks and replace all of them with a single check at the start of that block. And then, for good measure, inline the specific implementation of all these methods for that one class. In practice, the "type checking" cost basically disappears.

What if we want more dynamism? Languages like Ruby or Python can do things like "call a method called foo with two arguments on this object, and see what happens". They don't need to know what class that method comes from. Can the JVM do something like that? After all, there is JRuby (and there was Jython).

Yes, it can, although if you're trying to do that from straight Java it's going to seem a lot more complex. That "invoke" bytecode I described earlier really does need to know the static target type, so it can't be used for this level of dynamism. What you end up needing to do is the following¹:

Call getClass on the object. Because that's a method on Object, and (almost) everything is an instance of Object, that always succeeds and can follow the pattern of having a well-defined static target type as explained above.
We know from the signature of getClass that the result is an object of type Class, so we can now invoke methods on that as the static target type. We call getMethods.
We now have an array of Method objects, each of which we can ask for its name as a string with getName, which we can compare with the method name we wanted to invoke. If there is none, we give up and throw some type of Throwable. If there is one or more, we keep going.
We then need to check, for each matching name, the number of arguments the function expects versus the number of arguments that we are given. Note that at this level there is no variadic method, so every Method object has a well-defined number of expected argument, which we can access with getParameterCount. (Java variadic functions are a Java compiler feature, not a bytecode-level one. At the bytecode level, "variadic" methods just take an array as their last argument.) If there is no matching method, we give up and throw; if there is at least one matching method, we keep going.
We now need to check the type of each argument. Now this part is a bit tricky, because Java makes the static type of the arguments part of a method's signature, but static and dynamic types don't always match (specifically, the dynamic type can be a subtype of the static type). Since we're coming from a dynamic language, it's fair to assume we don't have access to static types for the arguments, but if we did we could use that to select the best match. If we don't, we can either try to guess based on the dynamic types, or give up and throw an exception. Let's imagine we have somehow settled on a single method to call at this point. We're still not quite done.
Finally, we can call the invoke method on that Method object, which will call the method. That's a variadic method, so we first have to collect all of the arguments into an array of Objects. We can assume that that method will call statically-typed methods from then on following the "fast" path described above, so it doesn't matter that all the arguments are typed as Object at this point, except that it does mean primitives always get boxed when going through this path.

Not only is this a lot more work than checking for the static target type, it's also not an access pattern that the HotSpot JIT knows how to optimize. So that's bad for performance.

What does all of this have to do with Clojure?

Why Clojure is fast

Clojure is very fast for a dynamic language because it mostly manages to always stick to the "known target static type" case, which HotSpot then optimizes.

For example, let's look at the conj function in the Clojure standard library. It can act on lists, vectors, queues, maps, and sets.

One could have made five separate classes with no common ancestor (besides Object) and relied on the slow, getClass approach in the Clojure compiler. That would work but be horribly slow.

For performance, the most obvious path would be to create a Conjable interface and make sure each of these five types implements that. Then conj could just compile to a static call to that, and you'd get a runtime error if the argument you give to conj happens to not implement that interface.

This is basically what Clojure does: a call to conj compiles down to a call to clojure.lang.RT#conj, which looks like this:

static public IPersistentCollection conj(IPersistentCollection coll, Object x){
    if(coll == null)
        return new PersistentList(x);
    return coll.cons(x);
}

It's not called Conjable, because there are other methods in it, but that's a plain old Java interface. If one wanted to add a new type that works with conj, one would just have to implement that interface. And the net result here is that, from a performance perspective, if your collection is implemented in Java and implements the IPersistentCollection interface, there is no performance hit for calling conj from Clojure compared to calling your conj method from Java. Even though the Clojure code that does the calling is untyped, the generated bytecode invokes the typed method conj on the static target type IPersistentCollection.

But conj is a "strictly Clojure" function; Clojure also has a lot of functions that integrate with Java. Notably, the seq abstraction that a lot of collection-processing functions are based on works just as well on Java collections:

t.core=> (let [al (doto (java.util.ArrayList.) (.add 1) (.add 2) (.add 3))]
    #_=>   (filter odd? al))
(1 3)
t.core=>

Whereas conj could rely on its argument implementing the IPersistentCollection interface, there's no way filter here can pull that off, as ArrayList is a pre-existing JVM class. So how does filter work?

Like most seq-returning functions in the Clojure standard library, filter works by first calling seq on its argument, and then working on the result. The result of seq is an ISeq, which is, again, a well-known static target type, so we're back to the fast path. How does the conversion to seq work?

Let's look at first as it's a bit simpler than filter, but follows the same general principles. The implementation of first is:

(def
 ^{:arglists '([coll])
   :doc "Returns the first item in the collection. Calls seq on its
    argument. If coll is nil, returns nil."
   :added "1.0"
   :static true}
 first (fn ^:static first [coll] (. clojure.lang.RT (first coll))))

It looks funny because at that point in time the Clojure compiler does not know about defn yet. But basically all it's doing is deferring to clojure.lang.RT#first, which is defined as:

static public Object first(Object x){
    if(x instanceof ISeq)
        return ((ISeq) x).first();
    ISeq seq = seq(x);
    if(seq == null)
        return null;
    return seq.first();
}

Calls to instanceof in either if or switch statements are also among the things that the HotSpot JIT can recognize and optimize very well. So this is still on the fast path.

What is seq doing, though? Here it is:

static public ISeq seq(Object coll){
    if(coll instanceof ASeq)
        return (ASeq) coll;
    else if(coll instanceof LazySeq)
        return ((LazySeq) coll).seq();
    else
        return seqFrom(coll);
}

// N.B. canSeq must be kept in sync with this!
static ISeq seqFrom(Object coll){
    if(coll instanceof Seqable)
        return ((Seqable) coll).seq();
    else if(coll == null)
        return null;
    else if(coll instanceof Iterable)
        return chunkIteratorSeq(((Iterable) coll).iterator());
    else if(coll.getClass().isArray())
        return ArraySeq.createFromObject(coll);
    else if(coll instanceof CharSequence)
        return StringSeq.create((CharSequence) coll);
    else if(coll instanceof Map)
        return seq(((Map) coll).entrySet());
    else {
        Class c = coll.getClass();
        Class sc = c.getSuperclass();
        throw new IllegalArgumentException("Don't know how to create ISeq from: " + c.getName());
    }
}

static public boolean canSeq(Object coll){
    return coll instanceof ISeq
            || coll instanceof Seqable
            || coll == null
            || coll instanceof Iterable
            || coll.getClass().isArray()
            || coll instanceof CharSequence
            || coll instanceof Map;
}

This is all on the "fast" path; Clojure is explicitly handling the Java standard library (through supporting Iterable) as well as many existing (and future) Java libraries (Iterable being a very standard interface to implement for Java collections), and gives an explicitly hook for Clojure-specific custom collections with the Seqable interface.

Bascially, as long as you only call Clojure core functions, you're on the fast path and don't need to worry about JVM bytecode limitations with regards to dynamism. So why am I wirting about this? Why should you care?

How to get the slow path

The Clojure compiler can generate the slow code path I described above, and if you care about performance it's important to know when it does so and how to avoid it.

Clojure core functions are "preoptimized" for the fast path as explained above, but Clojure is built on interop as one of its fundamental pillars, so you have free range to call any arbitrary Java method on any Clojure value — which can be any arbitrary Java object.

The fundamental interop operator in Clojure is ., but it is rarely directly used. Instead, the .method form is used as syntactic sugar. For example, to call the charAt method on a String object, one can write:

t.core=> (def s "hello")
#'t.core/s
t.core=> (defn char-at [arg] (.charAt arg 2))
#'t.core/char-at
t.core=> (char-at s)
\l
t.core=>

where \l is Clojure literal syntax for the Character object corresponding to the letter "l". Because there's no way for the Clojure compiler to know that arg is of type String when compiling the char-at function, that function is generated using the slow path described above. In other words, it's just hoping that the argument has a method, any method, called charAt and taking a single argument.

We can get a rough measure of its performance through benchmarking it (reusing the bench function from last week):

t.core=> (bench #(char-at s))
7.408092233176674E-6
t.core=>

Without a comparison point, it's hard to know whether that's good or bad. So let's get a comparison point. We can tell the Clojure compiler that we do expect a String argument, and make it compile to the fast path described above, by defining the function this way:

t.core=> (defn fast-char-at [^String arg] (.charAt arg 2))
#'t.core/fast-char-at
t.core=> (bench #(fast-char-at s))
1.626647040930247E-8
t.core=> (/ 7.408092233176674E-6 1.626647040930247E-8)
455.42100079314895
t.core=>

Improving speed by 455x is a nice speedup for the relatively low effort of adding one type hint.

Benchmarking at the REPL is not always reliable, so we can write a small program to double-check those results:

(ns t.core
  (:require [criterium.core :as crit])
  (:gen-class))

(defn bench
  [f]
  (->> (crit/benchmark (f) {}) :mean first))

(defn char-at
  [s idx]
  (.charAt s idx))

(defn fast-char-at
  [^String s ^long idx]
  (.charAt s idx))

(defn -main
  [& args]
  (let [r1 (char-at "hello" 2)
        r2 (fast-char-at "hello" 2)
        t1 (bench #(char-at "hello" 2))
        t2 (bench #(fast-char-at "hello" 2))]
    (println (format "%-15s: %.2e (%d)" "char-at" t1 r1))
    (println (format "%-15s: %.2e (%d)" "fast-char-at" t2 r2))
    (println (format "speedup: %6.2f" (/ t1 t2)))))

which yields:

$ java -server -jar target/uberjar/t-app-standalone.jar
char-at        : 2.34e-06 (l)
fast-char-at   : 3.56e-09 (l)
speedup: 656.51
$

`warn-on-reflection`

I have written before about Clojure compiler flags, so I won't repeat all of the context here. The point is, there is a flag that will let the Clojure compiler tell us when we need to add a type hint, so we don't accidentally end up with a slow path.

Unlike *unchecked-math*, *warn-on-reflection* really should always be enabled at a project level. If you are using Leingingen, you can turn it on from the project.clj file by adding this line to your default profile (i.e. top-level map):

  :global-vars {*warn-on-reflection* true}

Here's what it looks like:

t.core=> (set! *warn-on-reflection* true)
true
t.core=> (.charAt "hello" 2)
\l
t.core=> (let [s "hello"] (.charAt s 2))
\l
t.core=> (defn char-at [arg] (.charAt arg 2))
Reflection warning, /private/var/folders/wv/9lkw754x31l1m4b228b663400000gn/T/form-init12357873795856465089.clj:1:21 - call to method charAt can't be resolved (target class is unknown).
#'t.core/char-at
t.core=>

I think this illustrates nicely why the flag is useful: not all calls need to be annotated (though you may still want to for clarity), because, in some situations, the compiler can infer the type from context. For example, here, we can see that it knows that literal strings are instances of String, and that it is able to propagate type information on let-bound locals.

The only down side I can think of for turning that flag project-wide by default is that it could generate a spurious warning in cases where you actually do want to call a method based on its name regardless of the providing type.

In all of my career so far, I've wanted to do that exactly once. I was actually working in Java at the time, so I solved it by going through the reflection APIs (i.e. essentially the "slow path" described above). If you do end up with a similar use-case, and you somehow can't fix it upstream by getting your objects to implement a common interface when they have an identical method, I would still recommend keeping *warn-on-reflection* set at the project level, and simply deactivating it for the one method where you actually want reflection:

(set! *warn-on-reflection* false)
(defn wrapping-weird-apis
  [arg]
  (;;... calling some method
    arg))
(set! *warn-on-reflection* true)

Conclusion

Clojure's support for reflection is really nice when exploring Java APIs in the REPL, but it should rarely be used in production code. Just turn on `warn-on-reflection by default, and ensure you get no warnings through CI. (Or discipline, if you're into that.)

For performance-sensitive code, reflection is very bad. Not only is it slow itself, it also prevents a lot of HotSpot optimizations on the surrounding code. Even if you do end up with a situation where the method you want to call does not have a single static target type, you may be better served by writing a case against a handful of target types instead, as Clojure is doing for seq.

What I'm describing here is the situation prior to the introduction of invokeDynamic. Clojure was designed before it, and the existing design does not benefit from it, so Clojure is not using it and I'm not going to talk about it further in this post. It did change the game for JRuby, which used to work roughly as I describe the "slow" path here and now has better options for basically teaching HotSpot how to optimize some typical Ruby code patterns.
↩

Tags: clojure

« Notes on Optimizing Clojure Code: Data Structures Notes on Optimizing Clojure Code: Type Hints »

Notes on Optimizing Clojure Code: Reflection

The JVM

Why Clojure is fast

How to get the slow path

`warn-on-reflection`

Conclusion

Links

Recent Posts

Tags