16 January 2022

Notes on Optimizing Clojure Code: Overview

Over the month of December, I've spent a lot of time trying to optimize my solutions to Advent of Code, with reasonable success. I thought I'd collect my learnings in a set of notes for my future self, and share that with you. Not all of this is Clojure-specific, but the details mostly are.

Why optimize code?

Frankly, because it's fun.

For the sake of argument, let's assume you need to make your code faster because it is too slow for the context it's being run in. In the case of Advent of Code, I'm putting on myself the arbitrary constraint of running each day under a second. That's pretty easy for the first few days, but gets a bit harder on the later challenges.

In a more realistic context, maybe you have a website that's not as snappy as you'd want, or you have to run a batch job every hour and it's taking 63 minutes and growing, or you have to run a script on your laptop and wait for it to finish before you can do something else.

Whatever the details, context matters in two very specific ways:

  1. It can tell us when to stop. Once your code is "fast enough", you don't need to optimize it further. Optimizing code often (though not always) comes at a cost to maintainability, so it should be done with some amount of reluctance on long-lived production code.
  2. It tells us how much code we need to care about. Specifically, all of the code involved in the realistic use-case that is currently "too slow", but not the rest of the code. This is really important if you want to be efficient in your optimization efforts, especially on large code bases.

The process of optimizing code

In most cases, when a software application is too slow, the vast majority of its time is spent in a very small fraction of the code, which we therefore call "the bottleneck". Optimizing is an iterative process that goes a bit like this:

  1. Find a section of code that is too slow for some inputs. This is usually done with the help of some logging techniques.
  2. Create a benchmark that exercises that section of code with a representative set of inputs. In simple cases this is just a new entrypoint in your application; in more complex cases it may be a separate project containing a copy of just the code you want to focus on optimizing.
  3. Try to gain insights into what, specifically, is slow, where time is spent, and how things could be made faster. This is often helped by profiling the code.
  4. Iterate on the benchmark until it is fast enough.
  5. Integrate your optimizations in the main code and try it on real use-cases again. If it's now fast enough, you're done. If you identify new cases for which it's too slow, keep going.

A concrete example

To make things a little bit more concrete, I'll use my answer to day 12 of this year's AoC as a guiding example for the various points I'm going to make across this series.

We'll start with a version of the code extracted from my initial submission. Here is the code in its entirety:

(ns t.core
  (:require [clojure.string :as string]
            [clojure.set :as set])
  (:gen-class))

(defn parse
  [lines]
  (->> lines
       (map #(string/split % #"-"))
       (mapcat (fn [[a b]] [{a #{b}} {b #{a}}]))
       (apply merge-with set/union {})))

(defn small?
  [^String s]
  (= s (.toLowerCase s)))

(defn ends?
  [[path _]]
  (= (last path) "end"))

(defn part2
  [input]
  (loop [num-paths 0
         paths [[["start"] #{"start"} false]]]
    (if (empty? paths)
      num-paths
      (let [path (for [[path visited twice?] paths
                       next-step (get input (last path))
                       :when (or (not (visited next-step))
                                 (and (not= "start" next-step)
                                      (not twice?)))]
                   [(conj path next-step)
                    (if (small? next-step)
                      (conj visited next-step)
                      visited)
                    (or twice?
                        (boolean (visited next-step)))])]
        (recur (->> path (filter ends?) count (+ num-paths))
               (->> path (remove ends?)))))))

(defn -main
  [& args]
  (let [input (-> (slurp "day12")
                  (string/split-lines)
                  parse)]
    (part2 input)))

The file day12 is my input to the problem, which, for those who want to play along, reads:

he-JK
wy-KY
pc-XC
vt-wy
LJ-vt
wy-end
wy-JK
end-LJ
start-he
JK-end
pc-wy
LJ-pc
at-pc
xf-XC
XC-he
pc-JK
vt-XC
at-he
pc-he
start-at
start-XC
at-LJ
vt-JK

General notes

The JVM is a complex tool with pretty advanced JIT capabilities. While that's great for performance overall, it does make optimization a little bit harder, as what goes fast can boil down to what the JIT is able to optimize, and that's not always easy to predict.

It's also a garbage-collected platform, and in long-lived real-world use-cases garbage collection can take up a significant proportion of run time. The JVM has many different GC algorithms to choose from, each with a plethora of fine-tuning options. How friendly your code is to the particular GC you're running with can have a pretty big impact on performance.

All of that is to say that the environment you run your code in matters; specifically, the configuration of the underlying JVM matters. I highly recommend running all of your measurements under realistic scenarios; in the particular case of Clojure, this means I recommend against trying to take measurements from a REPL and instead running your code through your deployment pipeline instead. That usually means creating an AOT compiled JAR and running it with the same JVM version and same set of flags as you use in production.

As a concrete example, let's take the code above and make the simplest possible attempt at measuring performance: change the -main function to add a time call to it:

(defn -main
  [& args]
  (let [input (-> (slurp "day12")
                  (string/split-lines)
                  parse)]
    (time (part2 input))))

time is a macro in the Clojure standard library that takes an expression, runs it, prints how long it took, and returns its result.

If I run the -main function in my REPL, I get:

t.core=> (-main)
"Elapsed time: 2131.917783 msecs"
119760
t.core=>

If, instead, I compile and run that code:

$ lein uberjar
Compiling t.core
Created target/uberjar/t-app.jar
Created target/uberjar/t-app-standalone.jar
$ java -jar target/uberjar/t-app-standalone.jar
"Elapsed time: 1584.507666 msecs"
$

That's close to a 25% speed increase.

What's next?

There's obviously a lot more to say about this topic. I'm trying to split it into digestible chunks; my current plan is to cover all three types of measurements in the next part of this series, but I may have to split that further if I find myself writing too much.

Tags: clojure