Bash that JSON (with jq) — Librato Blog

Bash that JSON (with jq)



jq-json

The notion of SaaS can be confounding to operations people; I say this as an operations person myself. Faith just doesn't come naturally to us, so it's not always easy to entrust this or that bit of infrastructure to a software-as-a-service vendor. I think it's fair to say, however, that increasing numbers of us are arriving at the realization that our personal ability to splice an rj45 cable or install MySQL is not the factor that makes our products or services better than those of our competitors. As our self-awareness broadens -- as we allow ourselves to recognize which of our contributions translate to competitive advantages -- we naturally begin to want to optimize our workload for those activities.

Self-awareness helps us identify the activities we shouldn't waste time on, those bits of infrastructure we can allow ourselves to offload to a SaaS. And as that transition happens -- as we feed pieces of our infrastructure to the SaaS monster -- our interfaces change from shell-tools to APIs. 

Shell is a fantastic lowest-common-denominator automation language for working with the venerable shell-tools we all grew up with. Yes, it's ugly and utilitarian. Yes, it lacks basic features found in every "serious" programming language. And yes, it's sometimes unwisely brought to bear for problems it is ill-equipped to solve. But, it's always there for you, this handy little force multiplier, making it quick and easy to glue together lonely little tools, transforming them into reusable solutions that save time and headaches.

As more of the things I rely on every day move to SaaS, I find that I spend less time gluing together shell-tools, and more time glueing together APIs. However, glueing together APIs requires working with JSON. Parsing it, extracting it, transforming it; the JSON is everywhere -- ubiquitous. Unavoidable. And shell just hasn't had a very good answer for the question of JSON.

Until, that is, jq.

Introducing jq

jq is a fast, lightweight, flexible, CLI JSON processor. jq stream-processes JSON like awk stream processes text. jq, coupled with cURL, has me writing shell to glue together Web APIs, which is pretty great. It helped me write shellbrato, a shell library for the Librato API, as well as myriad other little tools that I use day to day for things like looking up PRs assigned to me via GitHub APIs and resolving AWS Instance tags to IPs via the AWS API.

Let's try out jq together by using it to inspect a big unknown blob of JSON. The following command will grab a JSON blob that represents the open issues in the public Docker GitHub repository, and store it in a shell variable named foo:

foo=$(curl 'https://gist.githubusercontent.com/djosephsen/a1a290366b569c5b98e9/raw/c0d01a18e16ba7c75d31a9893dd7fa1b8486a963/docker_issues')

If you echo ${foo}, you'll probably see a large incomprehensible blob of text (unless Docker manages to close all of their open issues by the time you read this). You can use jq to reformat this text and make it more readable, like so:

echo ${foo} | jq .

jq's first argument is a "filter". The dot is perhaps the simplest of all jq filters. It matches the current input. You can think of the dot as an infinitely dense particle of JSON. Any time you see a leading dot (that is, a dot with nothing in front of it), you're looking at the entire body of input, smushed into a little dot. 

There are a lot of jq filters, and at first glance, many of them will seem silly and useless, but as you'll discover on your path to jq adeptness, they combine in surprisingly powerful ways. The keyword 'type' is a jq filter that, for each object in the input, emits the type of that object. For example:

echo '[][]{}' | jq type

yields "array""array""object" from the type filter. The 'length' keyword emits the size (cardinality) of each object in the input.

So, re-using the previous example:

echo '[][]{}' | jq length

yields three 0s, since both arrays as well as the object are empty. Type and length are really useful in the context of inspecting large blobs of JSON that we've never seen before. The comma filter, which copies the input and feeds it serially to each filter surrounding it, allows us to use type and length at the same time. Try this on our huge blob:

echo ${foo} | jq 'type,length'

For me, this yields "array", and 30. The comma fed a copy of the input to type first and then length. In jq, we can also pipe the output of one filter to the input of the next. So a more explicit way of doing the same thing we just did would be:

echo ${foo} | jq '.|type,length'

This yields the same output, but it gives you a better notion of how the filters work together; take the input (dot), and pipe it to the comma filter, which copies the input, and sends a copy to type and then length. So we're dealing with a single array that has 30 elements in it. 

Let's take a look inside it. We can use square brackets to unwrap layers of the input. I think of them as 'unwrappy brackets' when I see them in jq:

echo ${foo} | jq '.[]'

That gives us the raw content of the array (notice: compared to simply jq '.' the output is no longer wrapped in square brackets). That's not super helpful, since there is a lot of content, so this scrolls off the screen. Let’s try using type and length on the unwrapped input:

echo ${foo} | jq '.[] | type,length'

Well, it looks like we have a bunch of objects of varying lengths (we already know there are 30 of them), but it's hard to tell exactly how many since they also scroll off my screen. 

Let’s bring some more venerable shell tools to bear to help us interpret this output:

echo ${foo} | jq '.[] | type,length' | sort | uniq -c

That's better, now we can see there are 30 objects. In my particular input, 20 of the objects have 19 attributes, and 10 of them have 20 attributes. That's weird: I wonder what the difference is between the two different kinds. The 'keys' filter will return an array of the attribute names for each object on its input:

echo ${foo} | jq '.[]|keys'

That shows us a bunch of attributes alright, but to make sense of the difference between the two object types we'll have to bring sort and uniq back in:

echo ${foo} | jq '.[]|keys' | sort | uniq -c

Ah-hah, in this output I can see that only 10 of my objects have a 'pull_request' attribute. That makes sense, since not every GitHub issue will have a corresponding pull_request. 

jq also allows us to refer to elements and attributes by their index or key. So if we just wanted to see the first issue in the issues array, we could use:

echo ${foo} | jq '.[0]'

Or just the first issue's keys:

echo ${foo} | jq '.[0] | keys'

Or just the first issue's first key

echo ${foo} | jq '.[0] | keys | .[0]'

If we just wanted a list of issue IDs:

echo ${foo} | jq '.[].id'

How would we select a specific issue by its ID number? The 'select' filter is the first filter we'll use that takes an argument. It looks like a C function, and it's intended to be given an expression that returns "true" or "false". In practice jq interprets non-zero values as "true", so you can also pass ‘select’ expressions that return numerical values. jq has the whole range of equality operators that you'd expect. So we can select out issue 117446711 with:

echo ${foo} | jq '.[] | select(.id==117446711)'

Let's talk a little bit more about how this works. The function ‘select’, for each object on its input, if its argument expression returns true for that object, returns that object unmodified. If its argument returns false, ‘select’ outputs nothing. 

Argument parenthesis in jq are a little like Las Vegas: whatever happens in there stays in there. What I mean is, you can do all sorts of input transformation inside ‘select’'s argument, but ‘select’ will still output its original input unchanged. For example, let's say we wanted to select every issue with one or more labels:

echo ${foo} | jq '.[] | select((.labels|length)>=1)'

Inside that ‘select’ filter, we're transforming the input object by filtering out just the labels array, and then we're passing the label array to the length filter to see how big it is. If it's greater or equal to 1, then the expression exits “true” and ‘select’ parrots back the original object (not the one that we mangled in the process of trying to count the size of its labels array).

The last thing I want to show you are a few filters you can use to check for the presence of keys or values inside an object. The two I find myself using a lot are ‘has’ and ‘index’. The former of these filters exits with a boolean “true” or “false”, and the other with an ordinal. They're both ideal for nesting inside a ‘select()’ like so:

echo ${foo} | jq '.[] | select (.| has("pull_request"))'

‘has’, as you've probably guessed, checks for the presence of a named key in the input object. If the key exists, 'has' exits true, otherwise it exits false. In the command above, we've piped a copy of 'select''s input into 'has' to check for the presence of the 'pull_request' attribute. Each issue that has the attribute will cause 'has' to exit 'true', which will in turn cause 'select' to output the issue object. Otherwise, 'select' will eat the object. This is a pretty typical way to parse out only the records that have a particular key. 

'index' checks for values. More specifically, it returns the index value of the given argument in its input. If you give index a three-value array, and ask for the second value like so:

echo '["foo","bar","bash"]' | jq 'index("bar")'

'index' will return 1 (arrays are zero-indexed in jq). If you feed it a string, and ask it for a substring, 'index' will return the index value of the character in the string where the substring begins. For example:

echo '"foo"' | jq 'index("oo")'

... also returns 1. 'index' returns null if it can't find the value you're looking for in the input. We can nest 'index' inside ‘select’ explicitly like so:

echo ${foo} | jq '.[] | select((.state|index("open")>=0))'

Literally, if the index value inside the current record's labels for the value of "open" is greater than zero, select the record. Thankfully, the ‘select’ filter interprets numerical output as “true”, and null output as “false”, so we don't have to be explicit, and we could rewrite that last command as:

echo ${foo} | jq '.[] | select(.state|index("open"))'

That looks a lot more like our 'has' which checks for the presence or keys. With 'index' and 'has' nested inside 'select', you have about 80% of what you need to mangle JSON structures in shell for fun and profit. In fact, most of the query tools I’ve written to do things like resolve AWS Instance IP addresses from Tag Names use only what I’ve covered so far.  From here I would show you object construction (which I think of as wrappy brackets), and mapping, but those two subjects really require an article of their own.

Send us a tweet to @librato if you’d like to see that article, and good luck bashing JSON!

Start using Librato now. Full-featured and free for 30 days. Pricing starts at $5 per month.
I accept the Terms of Service, Privacy Policy and occasional emails from Librato. Got questions? Talk to us.