Querying JSON and XML with jq and xq

Ben Asher

Engineering

4 minute read

January 9, 2022

jq is a tool I use at least two to three times a week (maybe daily when it's my week to investigate sentry issues in production), so I thought I'd write about it and its XML offshoot, xq (via yq ).

A lot of our day-to-day work involves reviewing API request and response data in structured text formats, whether it be debugging, reviewing input or output formats for vendor APIs, etc. Often, I find I have a question about a JSON blob, and that blob is too large to inspect by just giving it a "once over." Enter, jq . In it's most basic form, jq lets you query JSON documents, but it's also useful for linting and formatting minified JSON.

Basic Usage

Let's say I have the following JSON blob:

1{"firstName":"Ben","lastName":"Asher","email":"[email protected]"}

Copy this to your clipboard and run the following command (after you install jq with, for example, brew install jq

)

1pbpaste | jq '.'

This prints the following:

1{
2  "firstName": "Ben",
3  "lastName": "Asher",
4  "email": "[email protected]"
5}

What we did is pipe the output of our clipboard into jq and use the jq command . to print it with better formatting (and colors!). If the JSON were invalid, it would tell you where it's invalid. I've found this immensely useful for quickly fixing issues in our public API documentation since our API documentation vendor requires valid JSON in our sample responses that we present readers.

Querying JSON

Things get even better when you need to query JSON. I often find myself investigating a Sentry error and debugging an issue that requires inspecting JSON involved in the error. Let's say I have a list of people in JSON format:

1[
3    "firstName": "Ben",
4    "lastName": "Asher",
5    "email": "[email protected]"
8    "firstName": "Ronald",
9    "lastName": "Martin",
10    "email": "[email protected]"
12]

If I want see just emails, I can do jq '.[].email' . That jq command string tells it to look at each array element (you can also do .[0] to select just the first one, for example), and from each one, select the email attribute. This prints:

1"[email protected]"
2"[email protected]"

For a recent issue, I had data like the JSON above, and, if one of the object fields were missing, the backend code threw an error. The amount of data was too large to inspect manually. With jq , I could query each attribute, and, if jq couldn't find it, it printed null . So, to find which attribute was the culprit, I did jq '.[].email' | grep 'null' and replaced email with each sibling attribute until I got a null .

Querying XML

jq is super powerful, but recently I wanted to do something like the example above, but with XML. Our customer job feeds for vendors are in XML and often contain thousands of jobs. I found out a new vendor would not ingest jobs that didn't have a valid two digit country code. Before we launched with the vendor, I needed to figure out how many jobs would be impacted. I figured there must be an equivalent for XML, right? There is, and it's a tool called yq that wraps jq and supports multiple other formats. It ships with a tool called xq that you can use for XML.

Back to my task, I needed to figure out how many job postings in the feed would be dropped by the vendor. For simplicity, I'll use the snippet below as our "feed" and walk you through what I did. You can install yq with brew install python-yq .

Let's copy the feed to our clipboard:

1<?xml version="1.0" encoding="utf-8"?>
2<source>
3  <publisher>Ashby</publisher>
4  <job>
5    <title><![CDATA[Product Engineer]]></title>
6    <country><![CDATA[US]]></country>
7  </job>
8  <job>
9    <title><![CDATA[Full-Stack Engineer]]></title>
10    <city><![CDATA[Remote]]></city>
11    <country><![CDATA[CA]]></country>
12  </job>
13  <job>
14    <title><![CDATA[Pilot]]></title>
15    <city><![CDATA[Remote]]></city>
16    <country><![CDATA[PT]]></country>
17  </job>
18  <job>
19    <title><![CDATA[Software Engineer]]></title>
20    <city><![CDATA[Remote]]></city>
21    <country><![CDATA[DE]]></country>
22  </job>
23  <job>
24    <title><![CDATA[VP of Sales]]></title>
25    <city><![CDATA[San Francisco]]></city>