Notes on Optimizing Clojure Code: Reflection
Clojure is a dynamic language. That's great for (some notion of) expressivity, but sometimes it can get in the way of performance.
To get the best performance out of JVM Clojure, we have to understand how Clojure's brand of dynamic typing meshes with the JVM's. This is what this post is about.
The JVM
JVM bytecode is typed in roughly the same way that Java is. That is, when a method is invoked on an object, the JVM bytecode includes the full signature of that method, namely its name, the "target static type" on which it is (statically) invoked, and the (static) types of each of its arguments. At runtime, the JVM will check whether the given static type has a method with that name and those arguments (when loading the code) and whether the object it is invoked on (at invocation time, when running the code) actually implements that target static type.
In this context, the "target static type" is one of interface, class, or abstract class. This gives us some level of dynamism, because we can then pass to that bytecode any object that implements the target type, even one that did not exist when that bytecode itself was loaded. But it sure does sound expensive to have to scan the entire hierarchy of an object for each method invocation.
And it would be, if not for the amazing HotSpot JIT. One of the things it's really good at is figuring out when this information does not need to be checked anymore. Basically, given a block of N instructions that each invoke a method (and would thus in principle all need to check if their argument matches their individual static target types), the JVM can figure out that a given concrete, runtime class can bypass all these checks and replace all of them with a single check at the start of that block. And then, for good measure, inline the specific implementation of all these methods for that one class. In practice, the "type checking" cost basically disappears.
What if we want more dynamism? Languages like Ruby or Python can do things like
"call a method called foo
with two arguments on this object, and see what
happens". They don't need to know what class that method comes from. Can the
JVM do something like that? After all, there is JRuby (and there was Jython).
Yes, it can, although if you're trying to do that from straight Java it's going to seem a lot more complex. That "invoke" bytecode I described earlier really does need to know the static target type, so it can't be used for this level of dynamism. What you end up needing to do is the following1:
- Call
getClass
on the object. Because that's a method onObject
, and (almost) everything is an instance ofObject
, that always succeeds and can follow the pattern of having a well-defined static target type as explained above. - We know from the signature of
getClass
that the result is an object of typeClass
, so we can now invoke methods on that as the static target type. We callgetMethods
. - We now have an array of
Method
objects, each of which we can ask for its name as a string withgetName
, which we can compare with the method name we wanted to invoke. If there is none, we give up and throw some type ofThrowable
. If there is one or more, we keep going. - We then need to check, for each matching name, the number of arguments the
function expects versus the number of arguments that we are given. Note that
at this level there is no variadic method, so every
Method
object has a well-defined number of expected argument, which we can access withgetParameterCount
. (Java variadic functions are a Java compiler feature, not a bytecode-level one. At the bytecode level, "variadic" methods just take an array as their last argument.) If there is no matching method, we give up and throw; if there is at least one matching method, we keep going. - We now need to check the type of each argument. Now this part is a bit tricky, because Java makes the static type of the arguments part of a method's signature, but static and dynamic types don't always match (specifically, the dynamic type can be a subtype of the static type). Since we're coming from a dynamic language, it's fair to assume we don't have access to static types for the arguments, but if we did we could use that to select the best match. If we don't, we can either try to guess based on the dynamic types, or give up and throw an exception. Let's imagine we have somehow settled on a single method to call at this point. We're still not quite done.
- Finally, we can call the
invoke
method on thatMethod
object, which will call the method. That's a variadic method, so we first have to collect all of the arguments into an array ofObject
s. We can assume that that method will call statically-typed methods from then on following the "fast" path described above, so it doesn't matter that all the arguments are typed asObject
at this point, except that it does mean primitives always get boxed when going through this path.
Not only is this a lot more work than checking for the static target type, it's also not an access pattern that the HotSpot JIT knows how to optimize. So that's bad for performance.
What does all of this have to do with Clojure?
Why Clojure is fast
Clojure is very fast for a dynamic language because it mostly manages to always stick to the "known target static type" case, which HotSpot then optimizes.
For example, let's look at the conj
function in the Clojure standard library.
It can act on lists, vectors, queues, maps, and sets.
One could have made five separate classes with no common ancestor (besides
Object) and relied on the slow, getClass
approach in the Clojure compiler.
That would work but be horribly slow.
For performance, the most obvious path would be to create a Conjable
interface and make sure each of these five types implements that. Then conj
could just compile to a static call to that, and you'd get a runtime error if
the argument you give to conj
happens to not implement that interface.
This is basically what Clojure does: a call to conj
compiles down to a call
to clojure.lang.RT#conj
, which looks like this:
static public IPersistentCollection conj(IPersistentCollection coll, Object x){
if(coll == null)
return new PersistentList(x);
return coll.cons(x);
}
It's not called Conjable
, because there are other methods in it, but that's a
plain old Java interface. If one wanted to add a new type that works with
conj
, one would just have to implement that interface. And the net result
here is that, from a performance perspective, if your collection is implemented
in Java and implements the IPersistentCollection
interface, there is no
performance hit for calling conj
from Clojure compared to calling your conj
method from Java. Even though the Clojure code that does the calling is
untyped, the generated bytecode invokes the typed method conj
on the static
target type IPersistentCollection
.
But conj
is a "strictly Clojure" function; Clojure also has a lot of
functions that integrate with Java. Notably, the seq abstraction that a lot of
collection-processing functions are based on works just as well on Java
collections:
t.core=> (let [al (doto (java.util.ArrayList.) (.add 1) (.add 2) (.add 3))]
#_=> (filter odd? al))
(1 3)
t.core=>
Whereas conj
could rely on its argument implementing the
IPersistentCollection
interface, there's no way filter
here can pull that
off, as ArrayList
is a pre-existing JVM class. So how does filter
work?
Like most seq-returning functions in the Clojure standard library, filter
works by first calling seq
on its argument, and then working on the result.
The result of seq
is an ISeq
, which is, again, a well-known static target
type, so we're back to the fast path. How does the conversion to seq
work?
Let's look at first
as it's a bit simpler than filter
, but follows the same
general principles. The implementation of first
is:
(def
^{:arglists '([coll])
:doc "Returns the first item in the collection. Calls seq on its
argument. If coll is nil, returns nil."
:added "1.0"
:static true}
first (fn ^:static first [coll] (. clojure.lang.RT (first coll))))
It looks funny because at that point in time the Clojure compiler does not know
about defn
yet. But basically all it's doing is deferring to
clojure.lang.RT#first
, which is defined as:
static public Object first(Object x){
if(x instanceof ISeq)
return ((ISeq) x).first();
ISeq seq = seq(x);
if(seq == null)
return null;
return seq.first();
}
Calls to instanceof
in either if or switch statements are also among the
things that the HotSpot JIT can recognize and optimize very well. So this is
still on the fast path.
What is seq
doing, though? Here it is:
static public ISeq seq(Object coll){
if(coll instanceof ASeq)
return (ASeq) coll;
else if(coll instanceof LazySeq)
return ((LazySeq) coll).seq();
else
return seqFrom(coll);
}
// N.B. canSeq must be kept in sync with this!
static ISeq seqFrom(Object coll){
if(coll instanceof Seqable)
return ((Seqable) coll).seq();
else if(coll == null)
return null;
else if(coll instanceof Iterable)
return chunkIteratorSeq(((Iterable) coll).iterator());
else if(coll.getClass().isArray())
return ArraySeq.createFromObject(coll);
else if(coll instanceof CharSequence)
return StringSeq.create((CharSequence) coll);
else if(coll instanceof Map)
return seq(((Map) coll).entrySet());
else {
Class c = coll.getClass();
Class sc = c.getSuperclass();
throw new IllegalArgumentException("Don't know how to create ISeq from: " + c.getName());
}
}
static public boolean canSeq(Object coll){
return coll instanceof ISeq
|| coll instanceof Seqable
|| coll == null
|| coll instanceof Iterable
|| coll.getClass().isArray()
|| coll instanceof CharSequence
|| coll instanceof Map;
}
This is all on the "fast" path; Clojure is explicitly handling the Java
standard library (through supporting Iterable
) as well as many existing (and
future) Java libraries (Iterable
being a very standard interface to implement
for Java collections), and gives an explicitly hook for Clojure-specific custom
collections with the Seqable
interface.
Bascially, as long as you only call Clojure core functions, you're on the fast path and don't need to worry about JVM bytecode limitations with regards to dynamism. So why am I wirting about this? Why should you care?
How to get the slow path
The Clojure compiler can generate the slow code path I described above, and if you care about performance it's important to know when it does so and how to avoid it.
Clojure core functions are "preoptimized" for the fast path as explained above, but Clojure is built on interop as one of its fundamental pillars, so you have free range to call any arbitrary Java method on any Clojure value — which can be any arbitrary Java object.
The fundamental interop operator in Clojure is .
, but it is rarely directly
used. Instead, the .method
form is used as syntactic sugar. For example, to
call the charAt
method on a String
object, one can write:
t.core=> (def s "hello")
#'t.core/s
t.core=> (defn char-at [arg] (.charAt arg 2))
#'t.core/char-at
t.core=> (char-at s)
\l
t.core=>
where \l
is Clojure literal syntax for the Character
object corresponding
to the letter "l". Because there's no way for the Clojure compiler to know that
arg
is of type String
when compiling the char-at
function, that function
is generated using the slow path described above. In other words, it's just
hoping that the argument has a method, any method, called charAt
and taking
a single argument.
We can get a rough measure of its performance through benchmarking it (reusing
the bench
function from last week):
t.core=> (bench #(char-at s))
7.408092233176674E-6
t.core=>
Without a comparison point, it's hard to know whether that's good or bad. So
let's get a comparison point. We can tell the Clojure compiler that we do
expect a String
argument, and make it compile to the fast path described above,
by defining the function this way:
t.core=> (defn fast-char-at [^String arg] (.charAt arg 2))
#'t.core/fast-char-at
t.core=> (bench #(fast-char-at s))
1.626647040930247E-8
t.core=> (/ 7.408092233176674E-6 1.626647040930247E-8)
455.42100079314895
t.core=>
Improving speed by 455x is a nice speedup for the relatively low effort of adding one type hint.
Benchmarking at the REPL is not always reliable, so we can write a small program to double-check those results:
(ns t.core
(:require [criterium.core :as crit])
(:gen-class))
(defn bench
[f]
(->> (crit/benchmark (f) {}) :mean first))
(defn char-at
[s idx]
(.charAt s idx))
(defn fast-char-at
[^String s ^long idx]
(.charAt s idx))
(defn -main
[& args]
(let [r1 (char-at "hello" 2)
r2 (fast-char-at "hello" 2)
t1 (bench #(char-at "hello" 2))
t2 (bench #(fast-char-at "hello" 2))]
(println (format "%-15s: %.2e (%d)" "char-at" t1 r1))
(println (format "%-15s: %.2e (%d)" "fast-char-at" t2 r2))
(println (format "speedup: %6.2f" (/ t1 t2)))))
which yields:
$ java -server -jar target/uberjar/t-app-standalone.jar
char-at : 2.34e-06 (l)
fast-char-at : 3.56e-09 (l)
speedup: 656.51
$
*warn-on-reflection*
I have written before about Clojure compiler flags, so I won't repeat all of the context here. The point is, there is a flag that will let the Clojure compiler tell us when we need to add a type hint, so we don't accidentally end up with a slow path.
Unlike *unchecked-math*
, *warn-on-reflection*
really should
always be enabled at a project level. If you are using Leingingen, you
can turn it on from the project.clj
file by adding this line to your default
profile (i.e. top-level map):
:global-vars {*warn-on-reflection* true}
Here's what it looks like:
t.core=> (set! *warn-on-reflection* true)
true
t.core=> (.charAt "hello" 2)
\l
t.core=> (let [s "hello"] (.charAt s 2))
\l
t.core=> (defn char-at [arg] (.charAt arg 2))
Reflection warning, /private/var/folders/wv/9lkw754x31l1m4b228b663400000gn/T/form-init12357873795856465089.clj:1:21 - call to method charAt can't be resolved (target class is unknown).
#'t.core/char-at
t.core=>
I think this illustrates nicely why the flag is useful: not all calls need to
be annotated (though you may still want to for clarity), because, in some situations, the
compiler can infer the type from context. For example, here, we can see that it
knows that literal strings are instances of String
, and that it is able to
propagate type information on let
-bound locals.
The only down side I can think of for turning that flag project-wide by default is that it could generate a spurious warning in cases where you actually do want to call a method based on its name regardless of the providing type.
In all of my career so far, I've wanted to do that exactly once. I was actually
working in Java at the time, so I solved it by going through the reflection
APIs (i.e. essentially the "slow path" described above). If you do end up with
a similar use-case, and you somehow can't fix it upstream by getting your
objects to implement a common interface when they have an identical method, I
would still recommend keeping *warn-on-reflection*
set at the project level,
and simply deactivating it for the one method where you actually want
reflection:
(set! *warn-on-reflection* false)
(defn wrapping-weird-apis
[arg]
(;;... calling some method
arg))
(set! *warn-on-reflection* true)
Conclusion
Clojure's support for reflection is really nice when exploring Java APIs in the REPL, but it should rarely be used in production code. Just turn on `warn-on-reflection by default, and ensure you get no warnings through CI. (Or discipline, if you're into that.)
For performance-sensitive code, reflection is very bad. Not only is it slow
itself, it also prevents a lot of HotSpot optimizations on the surrounding
code. Even if you do end up with a situation where the method you want to call
does not have a single static target type, you may be better served by writing
a case
against a handful of target types instead, as Clojure is doing for
seq
.
What I'm describing here is the situation prior to the introduction of
↩invokeDynamic
. Clojure was designed before it, and the existing design does not benefit from it, so Clojure is not using it and I'm not going to talk about it further in this post. It did change the game for JRuby, which used to work roughly as I describe the "slow" path here and now has better options for basically teaching HotSpot how to optimize some typical Ruby code patterns.