6 March 2022

Notes on Optimizing Clojure Code: Arrays

Arrays are the most primitive collections on the JVM, and therefore the ones with the fastest operations. This doesn't mean they are right for every situation, but when they are, they can greatly enhance performance.

Working with arrays in Clojure is surprisingly easy once you get used to it, but there are a few gotchas worth pointing out.

Type hints

Array operations are only fast if they avoid reflection. Therefore, you really need to make sure you provide appropriate type hints any time you use any array operation.

In most cases, turning on the compiler flag *warn-on-reflection* will prevent you from generating slow, reflective code.

Microbenchmark

In the following sections, I will be referencing lines from the output of this program:

(ns t.core
  (:require [criterium.core :as crit])
  (:gen-class))

(set! *warn-on-reflection* true)
(set! *unchecked-math* :warn-on-boxed)

(defn bench
  [n f]
  (let [b #(loop [n 1000]
             (when-not (== 0 n)
               (f)
               (recur (dec n))))
        time-in-seconds (->> (crit/benchmark (b) {}) :mean first)]
    (println (format "%30s: %.3e" n time-in-seconds))))

(defn -main
  [& args]
  (let [r (fn [n f]
            (let [arr (make-array Long/TYPE 10 10)]
              (bench n #(f arr))))
        v (fn [n s f]
            (let [v (into [] (range s))]
              (bench (str n " (" s ")") #(f v))))]
    (bench "no-op" (fn []))
    (r "count" count)
    (r "alength (no hint, no warn)" alength)
    (r "alength" (fn [^"[[J" arr] (alength arr)))
    (r "aget (no hint, warns)" (fn [arr] (aget arr 3)))
    (r "aget" (fn [^"[[J" arr] (aget arr 3)))
    (r "deep aget (no hint, no warn)" (fn [arr] (aget arr 3 3)))
    (r "deep aget (hint ignored)" (fn [^"[[J" arr] (aget arr 3 3)))
    (r "aset (no hint, no warn)" (fn [arr] (aset arr 3 3 1)))
    (r "aset (hint ignored)" (fn [^"[[J" arr] (aset arr 3 3 1)))
    (r "deep aset" (fn [^"[[J" arr] (aset ^longs (aget arr 3) 3 1)))
    (v "vector get" 10 (fn [v] (get v 2)))
    (v "vector get" 1000 (fn [v] (get v 2)))
    (v "vector 'set'" 10 (fn [v] (assoc v 3 1)))
    (v "vector 'set'" 1000 (fn [v] (assoc v 3 1)))
    (r "nested copy aset" (fn [^"[[J" arr]
                            (let [copy (aclone arr)
                                  to-change (aclone ^"[J" (aget arr 3))]
                              (aset to-change 3 1)
                              (aset copy 3 to-change))))))

Here's our baseline for an empty loop:

                         no-op: 3.026e-07

alength

You can use count on an array, but alength will generate faster code by directly emitting the correct JVM-level bytecode (the "length" property of an array is a special property at the bytecode level).

                         count: 1.015e-04
    alength (no hint, no warn): 1.082e-02
                       alength: 6.589e-06

The count function has been designed to be fast, but also polymorphic. It does have a specialization for arrays, meaning it does end up calling the special JVM "array length" function in the end (so its performance does not depend on the length of the array), but it does need to first discover that its argument is an array.

The "no hint" variant illustrates a limit of the *warn-on-reflection* flag: it can only detect reflection issues on function calls, when it can statically determine that the function called has some expectations about its argument types. Here, the alength function is just passed as an argument, not called, so it does not warn. When the function is later called (within r), we have lost track of its type requirements.

It's also interesting to note that alength is faster than count when properly hinted, but slower otherwise.

aset, aget

The two most basic operations on an array are getting and setting its elements by index. This is what aset and aget do.

They are faster than the Clojure collection counterpart (Vector), as should be expected:

         aget (no hint, warns): 1.090e-02
                          aget: 6.539e-06
               vector get (10): 7.566e-06
             vector get (1000): 1.205e-05
             vector 'set' (10): 2.159e-05
           vector 'set' (1000): 5.888e-05
              nested copy aset: 2.414e-05

Very small vectors are actually represented as plain arrays, hence the very close performance. As the vector size grows, it becomes an increasingly deep tree (branching factor 32).

There is one big caveat about aget, though, which matters for multidimensional arrays. aget is defined as a multi-arity function, but, at least as of Clojure 1.11.0-rc1, the multi-indices arities always produce reflective code, and do not warn about it even with *warn-on-reflection* turned on.

  deep aget (no hint, no warn): 1.098e-02
      deep aget (hint ignored): 1.046e-02
       aset (no hint, no warn): 1.049e-02
           aset (hint ignored): 1.102e-02
                     deep aset: 6.574e-06

The workaround is very easy: do not use the alternative arities for aget, and just nest calls instead. The same applies for aset, and the workaround is unfortunately a bit uglier as you have to nest aget calls for a single aset. Note that the ^longs and ^"[J" type hints are equivalent.

Loops

We've covered individual operations on arrays, but how you put these operations together also matters. Once you're at a point in your performance journey where you need arrays as your data structures, you also need your operations "around" these arrays to be as fast as possible.

In Clojure, the fastest iteration construct is recur, as it compiles down to native JVM looping (think of a while loop in Java). recur must be in tail position (this is checked by the compiler) and will not grow the stack; it jumps back to the first parent anchor (in terms of nested lexical forms) and takes as many arguments as the parent defined.

Valid parents are either function argument definitions or explicit loop calls:

(loop [n 0]
  (when (< n 5)
    (recur (inc n))))

(defn recur-example
  [n]
  (if (< n 5)
    n
    (recur (- n 3))))

There is no performance advantage to jumping to a loop instead of a function, so you should only introduce a loop when you need additional looping variables compared to the current function's arguments.

Some access patterns on arrays are more common than others; Clojure provides two macros to iterate on arrays in more convenient forms than writing explicit loops: areduce and amap.

Let's look at (amap arr idx ret expr) in more details:

  • arr can be any expression, but it has to be type-hinted to a type of array if we want maximum performance. In practice it is going to be either a symbol or a list.
  • idx and ret must be symbols, which this anaphoric macro will bind. Therefore, to avoid confusion, they should probably not be bound in the current scope.
  • amap will initialize ret to a clone of arr, then execute expr once for each index in arr with idx bound to the current index (while ret remains bound to the clone of arr, possibly mutated by previous executions of expr). Once it runs out of indices, it returns ret.

Here's an example:

t.core=> (set! *warn-on-reflection* true)
true
t.core=> (def arr (make-array Long/TYPE 5))
#'t.core/arr
t.core=> (seq arr)
(0 0 0 0 0)
t.core=> (seq (amap arr
    #_=>            my-idx-symbol
    #_=>            my-ret-symbol
    #_=>            (aset my-ret-symbol my-idx-symbol (* 2 my-idx-symbol))))
Reflection warning, call to static method alength can't be resolved.
Reflection warning, call to static method aclone can't be resolved.
Reflection warning, call to static method aset can't be resolved.
Reflection warning, call to static method aset can't be resolved.
(0 2 4 6 8)
t.core=> (seq (amap ^longs arr
    #_=>            my-idx-symbol
    #_=>            my-ret-symbol
    #_=>            (aset my-ret-symbol my-idx-symbol (* 2 my-idx-symbol))))
(0 2 4 6 8)
t.core=>

Note that the return value of expr is ignored; the only way expr can change the return value of amap is through mutating ret.

The other macro, areduce, works in a similar way. Specifically, looking at (areduce arr idx ret init expr):

  • arr is, again, an expression that evaluates (at run time) to an array and that is (at read time) type-hinted to the array type.
  • idx is a symbol you provide to serve as the index of the current element.
  • init is an expression that should evaluate to your "initial accumulator" value for the reduction, just like the init argument to reduce. It cannot reference either idx or ret as they are not bound yet.
  • Once per index, the expression expr will be evaluated with idx bound to the current index and ret bound to the result of the last evaluation of expr (or init for index 0), and its result will be bound to ret for subsequent evaluations. This rebinding of ret is the main difference with amap.

Classic sum of all the elements in an array:

t.core=> (def arr (into-array Long/TYPE [1 2 3 4 5]))
#'t.core/arr
t.core=> (areduce ^longs arr
    #_=>          i res 0
    #_=>          (+ res (aget ^longs arr i)))
15
t.core=>

Note that if you do use an array-producing expression in the arr position instead of a symbol, you have no way of referring to that array again in expr. This is technically also the case for amap, but there you can at least assume that ret starts as a clone, which for some cases may be good enough.

For this reason, areduce and amap are most often used with a simple, bound symbol as their first argument, rather than a complex expression.

Making arrays

If you already have an array, you can make a copy with aclone. If you know the type of array you want to create, you can create it with one of the type-array functions (boolean-array, int-array, etc.). Those functions type hint their return value, and there is one for each primitive type on the JVM, plus object-array. They all have the same arglist and behaviour: if their only argument is a number, they create an array of that length; if their only argument is a seq, they create an array with the same number of elements (and same values) as that seq, provided the types match; if they get two arguments, the first one must be a number, which will be the size of the array, and the second one must be a seq of matching type, and will be used to initialize the array. If the seq is longer than the array size, extra elements are ignored; if it is shorter, it only initializes the first elements, and the rest are kept to their default value.

If you want to assert the type of an array without using type-hints, you can use the "types" functions (longs, booleans, ints, etc.); again, there is one for each primitive type, but this time not for Object.

For example:

t.core=> (set! *warn-on-reflection* true)
true
t.core=> (aget (int-array [1 2 3]) 0)
1
t.core=> (aget (into-array Integer/TYPE [1 2 3]) 0)
Reflection warning, call to static method aget can't be resolved.
1
t.core=> (aget (ints (into-array Integer/TYPE [1 2 3])) 0)
1
t.core=> (aget (longs (into-array Integer/TYPE [1 2 3])) 0)
Execution error (ClassCastException).
class [I cannot be cast to class [J

t.core=> (seq (float-array 10 [1 2 3]))
(1.0 2.0 3.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0)
t.core=>

For more generic array construction, you can use to-array, into-array, to-array-2d, or make-array. Why so many choices?

  • (into-array type? s) turns the seq s into an array of type type. If type is omitted, it is set to the class of the first element in the array. If other elements are not the same class or one of its subclass, this will fail at runtime.
  • (make-array type dim [& dims]) returns a multidimensional array of the given type with the given dimensions. The type is not optional and the returned arrays are all of the corresponding length. The initial values are the default value for the given type (0 for numbers, nil for objects, etc.), per the Java memory model.
  • (to-array coll) expects a java.util.Collection and calls java.util.Collection#toArray on it. It's essentially just a convenience wrapper for .toArray with type-hint included.
  • (to-array-2d coll) takes any java.util.Collection of which the elements are themselves java.util.Collection, and returns a [[Ljava.lang.Object; with the same elements.
t.core=> (into-array Integer/TYPE [1 2 3 4])
#object["[I" 0x731a9924 "[I@731a9924"]
t.core=> (seq (into-array Integer/TYPE [1 2 3 4]))
(1 2 3 4)
t.core=> (into-array Integer [1 2 3 4])
Execution error (IllegalArgumentException).
array element type mismatch

t.core=> (into-array Long [1 2 3 4])
#object["[Ljava.lang.Long;" 0x636dde44 "[Ljava.lang.Long;@636dde44"]
t.core=> (seq (into-array Long [1 2 3 4]))
(1 2 3 4)
t.core=> (map seq (make-array Boolean/TYPE 3 2))
((false false) (false false) (false false))
t.core=> (map seq (to-array-2d [[1 2] [3] [4 5 6]]))
((1 2) (3) (4 5 6))
t.core=> (type (to-array-2d [[1 2] [3] [4 5 6]]))
[[Ljava.lang.Object;
t.core=>

In practice, into-array is the one I've used the most.

Note: calling seq on an array is the easiest way to get it displayed in a Clojure REPL.

Conclusion

Arrays are the fastest collection you can use in Clojure, so if you're aiming at writing performant code, you should get familiar with them. Hopefully this blog helps with that.

As always, remember that performance is not the only factor in most code bases. Introducing mutable arrays in a Clojure code base, which is generally expected to deal mostly with immutable, Clojure-native data structures, should not be done lightly.

Tags: clojure