Querying JSON and XML with jq and xq
4 minute read
jq
is a tool I use at least two to three times a week (maybe daily when it's my week to investigate sentry issues in production), so I thought I'd write about it and its XML offshoot,
xq
(via
yq
).
A lot of our day-to-day work involves reviewing API request and response data in structured text formats, whether it be debugging, reviewing input or output formats for vendor APIs, etc. Often, I find I have a question about a JSON blob, and that blob is too large to inspect by just giving it a "once over." Enter,
jq
. In it's most basic form,
jq
lets you query JSON documents, but it's also useful for linting and formatting minified JSON.
Basic Usage
Let's say I have the following JSON blob:
1{"firstName":"Ben","lastName":"Asher","email":"[email protected]"}
Copy this to your clipboard and run the following command (after you install
jq
with, for example,
brew install jq
1pbpaste | jq '.'
This prints the following:
1{
2 "firstName": "Ben",
3 "lastName": "Asher",
4 "email": "[email protected]"
5}
What we did is pipe the output of our clipboard into
jq
and use the
jq
command
.
to print it with better formatting (and colors!). If the JSON were invalid, it would tell you where it's invalid. I've found this immensely useful for quickly fixing issues in our public API documentation since our API documentation vendor requires valid JSON in our sample responses that we present readers.
Querying JSON
Things get even better when you need to query JSON. I often find myself investigating a Sentry error and debugging an issue that requires inspecting JSON involved in the error. Let's say I have a list of people in JSON format:
1[
3 "firstName": "Ben",
4 "lastName": "Asher",
5 "email": "[email protected]"
8 "firstName": "Ronald",
9 "lastName": "Martin",
10 "email": "[email protected]"
12]
If I want see just emails, I can do
jq '.[].email'
. That
jq
command string tells it to look at each array element (you can also do
.[0]
to select just the first one, for example), and from each one, select the
email
attribute. This prints:
1"[email protected]"
2"[email protected]"
For a recent issue, I had data like the JSON above, and, if one of the object fields were missing, the backend code threw an error. The amount of data was too large to inspect manually. With
jq
, I could query each attribute, and, if
jq
couldn't find it, it printed
null
. So, to find which attribute was the culprit, I did
jq '.[].email' | grep 'null'
and replaced
email
with each sibling attribute until I got a
null
.
Querying XML
jq
is super powerful, but recently I wanted to do something like the example above, but with XML. Our customer job feeds for vendors are in XML and often contain thousands of jobs. I found out a new vendor would not ingest jobs that didn't have a valid two digit country code. Before we launched with the vendor, I needed to figure out how many jobs would be impacted. I figured there must be an equivalent for XML, right? There is, and it's a
tool called
yq
that wraps
jq
and supports multiple other formats. It ships with a tool called
xq
that you can use for XML.
Back to my task, I needed to figure out how many job postings in the feed would be dropped by the vendor. For simplicity, I'll use the snippet below as our "feed" and walk you through what I did. You can install
yq
with
brew install python-yq
.
Let's copy the feed to our clipboard:
1<?xml version="1.0" encoding="utf-8"?>
2<source>
3 <publisher>Ashby</publisher>
4 <job>
5 <title><![CDATA[Product Engineer]]></title>
6 <country><![CDATA[US]]></country>
7 </job>
8 <job>
9 <title><![CDATA[Full-Stack Engineer]]></title>
10 <city><![CDATA[Remote]]></city>
11 <country><![CDATA[CA]]></country>
12 </job>
13 <job>
14 <title><![CDATA[Pilot]]></title>
15 <city><![CDATA[Remote]]></city>
16 <country><![CDATA[PT]]></country>
17 </job>
18 <job>
19 <title><![CDATA[Software Engineer]]></title>
20 <city><![CDATA[Remote]]></city>
21 <country><![CDATA[DE]]></country>
22 </job>
23 <job>
24 <title><![CDATA[VP of Sales]]></title>
25 <city><![CDATA[San Francisco]]></city>