Tools You Should Know About: jq
In a nutshell
From the official page
jq is a lightweight and flexible command-line JSON processor.
If you're familiar with tools like grep, sed, and awk, you can think of jq
as
being to JSON objects as they are to lines of text.
Why you should know about it
JSON is everywhere. For better or worse, it's become the de-facto standard for data sharing. Being able to manipulate JSON on the command-line enables you to use all of your scripting tools to manipulate JSON values. It also augments your scripting toolbelt with a new, very sharp toy.
If you deal with JSON, you need to know about jq
. I don't care if you never
use the command-line; if so, the command-line is worth learning just for jq
.
Feature highlights
jq
defines a small, specialized language for manipulating JSON values. The
language is documented in the jq
manual, but you should not start
your jq
journey by reading it: it's more of a reference document.
The main concept in the jq
language is that of a filter. You can imagine your
JSON values flowing through the jq
command the same way lines of text flow
through sed: the jq
filter will be applied to each object in the current
"flow" individually.
JSON objects can be compound, and thus some jq
filters will produce multiple
"output" objects for a single "input" object. In that case, you can chain
multiple filters with the |
key, which should be a familiar concept if you
are used to working at the command-line. The semantics of |
is a bit like
chaining flatMap
operations: conceptually, you first run all of the inputs
through the first filter, then collect all of the outputs (possibly more than
inputs), and run that through the second filter.
Pretty-printing
The easiest way to get started with jq
is to use it for pretty-printing.
Pretty-printing the output is the default behaviour, so all you need to do is
pipe data through a jq
invocation:
$ echo '{"a":1,"b":"hello","c":[1,2,3]}' | jq
{
"a": 1,
"b": "hello",
"c": [
1,
2,
3
]
}
$
This is implicitly equivalent to the '.'
filter. When you explicitly state
the filter, you can pass the input file as a second argument:
$ jq '.' <(echo '{"a":1, "b": "hello", "c": [1, 2, 3]}')
{
"a": 1,
"b": "hello",
"c": [
1,
2,
3
]
}
$
(Yes, that <()
syntax is a file, sort of. At least as far as jq
is concerned.)
Generating new JSON values
jq
can also be used to create brand new JSON values. The advantage here is
that it will automatically "JSON-escape" given values from the command-line.
For example:
$ jq -n --arg argName 'this has a " in it!' '{"title": $argName}'
{
"title": "this has a \" in it!"
}
$
This is super useful if you have to collect (or generate) data and then include it into a JSON value. A real example from my job was collecting data from git commits and sending it to Slack. Sometimes people put weird characters in git commit messages.
The -n
argument tells jq
not to look for any input, i.e. in this case we
are not applying a filter to an "external" JSON value. Note that we could still
apply filters to that:
$ jq -n --arg argName 'this has a " in it!' '{"title": $argName} | .title'
"this has a \" in it!"
$
but I can't come up with a great example of why you'd want to in this case.
Modifying JSON values
My deployment script for this blog keeps track of what is currently deployed in
a JSON file. It's a small, simple Bash script, and it would not really be worth
trying to write it in other languages, as it's mostly chaining together calls
to lein
, aws
and terraform
. Still, it was useful to be able to keep track
of state in a structured manner, without having to come up with some
Bash-friendly text format.
Here is the relevant bit:
jq -c --arg version $VERSION '. + [{version: $version, ami: null}]' \
< tf/deployed \
> tf/deployed.tmp
mv tf/deployed.tmp tf/deployed
The state of my deployment is a list of JSON object with version
(the version
of my blog, a commit sha) and ami
(the Amazon Machine Image used for that
deployment). The $VERSION
env var is set to the new version the script is
deploying. This JSON filter will take the existing content of the deployed
file (which is a JSON list) and add a new element to it.
Merging JSON values
Whereas --arg
allows you to take non-JSON input and turn it into a
JSON-encoded string, --slurpfile
will slurp in a whole JSON file and parse
it as a JSON value.
$ jq --slurpfile p1 <(echo '{"name": "john", "age": 35}') \
'[.[] | {owner: $p1, product: .}]' \
<(echo '[{"id": 1},{"id": 2}]')
[
{
"owner": [
{
"name": "john",
"age": 35
}
],
"product": {
"id": 1
}
},
{
"owner": [
{
"name": "john",
"age": 35
}
],
"product": {
"id": 2
}
}
]
$
There's a little bit more going on with this filter. The .[] |
part means
that we assume the input is an array, and we want to operate on each element of
the array. In the next part, {owner: $p1, product: .}
, we are creating a new
JSON object with two keys, owner
and product
, picking the value for owner
from the file we slurped, and the value for product
from the current element
in the input array. Finally, the wrapping [ ... ]
are specifying that we want
the result of parsing all the elements of the iniitial input as an array.
Structured Bash state
Bash has arrays, but they're a bit clunky. Bash does not have dictionaries. But
sometimes you do want a structured value, while it's still not quite worth
moving up to a real language. jq
can help here.
As an example, I have a script to help me manage Google Cloud instances. The problem it is solving is that, when you want to delete a machine, you have to know both the name of the machine and the zone it's in. I usually only know the name of the machine, so I'd have to look the zone up. It's annoying to have to go to the Google Console every time.
Another issue is that our machine names are a bit long, and I'd like to have autocomplete on them.
The solution I came up with was to have small Zsh function that will query the API once, to get the list of all machines. After that, I can have fast, local auto-complete based on the machine name. But then I'd still have to look up the zone. So what I wanted was to store, locally, in my shell, a list of tuples (machine name, zone).
Here is how that works. First, I have a function refresh_machines
that sets a
(global to my current shell) variable machines
:
refresh_machines() {
machines=$(gloud compute instance list --format=json [...] \
jq -c '[.[] | select (.name | startswith("$PREFIX"))
| {key: .name, value: (.zone | sub(".*/"; ""))]
| from_entries')
}
Then, I have a function kill_machine
:
kill_machine() {
machine="$1"
gcloud compute [...] --zone=$(echo "$machines" | jq -r ".[\"$machine\"]")
}
and an associated auto-complete function that calls refresh_machines
if
$machines
is not set when I try to auto-complete kill_machine
.
The refresh_machines
filter in more details:
.[]
: running for each element in the input. Thegcloud
command will return a JSON array.select (...)
: keep only elements from the original array that match a given condition (in this case, their.name
field starts with a known prefix).{key: ..., value: ...}
: create (for each element in the input array) a new JSON object with justkey
andvalue
as fields, respectively set to.name
and the last part of the path in.zone
of the input object.from_entries
: this turns an array into an object, expecting each element in the array to be an object withkey
andvalue
fields.
In other words, the $machine
variable contains a JSON object where field
names are machine names, and the corresponding value is the zone that machine
is in.
This lets us query the $machines
object later on with just .[machine-name]
.
The -c
option asks jq
to not pretty-print, meaning the entire object is
on just one line with no extra space (which is a bit easier to deal with for
Bash commands), and the -r
option prints JSON strings without quotes.
Conclusion
This was a bit of a whirlwind tour. I don't expect that you come out of this
with a deep understanding of how jq
works or how to write your own jq
filters. Rather, I hope you're coming out of it with:
- The knowledge that
jq
exists. Maybe next time you have to deal with JSON values, you'll wonder if it can be applied (hint: yes). - Some idea of what
jq
can do. - Some idea of the use-cases for which it may be suited.
It can take a bit of time to learn, but it's tremendously useful and versatile. You won't learn it properly by sitting down and reading about it. Instead, you should make sure, right now, that you have it installed, and start using it immediately. You'll probably use it just for pretty printing at first, but that's good enough to keep it mentally nearby, and you'll find use-cases for it over time. That's when you'll actually learn: when you have a concrete task that you want to do with it.
There is an official tutorial; if you are convinced already, you can go read that now. Otherwise, you can refer to it later on, when you have a real use-case to anchor your learning.