Extract Nested Data From Complex JSON
We're all data people here, so you already know the scenario: it happens perhaps once a day, perhaps 5, or even more. There's an API you're working with, and it's great. It contains all the information you're looking for, but there's just one problem: the complexity of nested JSON objects is endless, and suddenly the job you love needs to be put on hold to painstakingly retrieve the data you actually want, and it's 5 levels deep in a nested JSON hell. Nobody feels like much of a "scientist" or an "engineer" when half their day becomes dealing with key value errors.
Luckily, we code in Python! (okay fine, language doesn't make much of a difference here. It felt like a rallying call at the time).
Using Google Maps API as an Example
To visualize the problem, let's take an example somebody might actually want to use. I think the Google Maps API is a good candidate to fit the bill here.
While Google Maps is actually a collection of APIs, the Google Maps Distance Matrix. The idea is that with a single API call, a user can calculate the distance and time traveled between an origin and an infinite number of destinations. It's a great full-featured API, but as you might imagine the resulting JSON for calculating commute time between where you stand and every location in the conceivable universe makes an awfully complex JSON structure.
Getting a Taste of JSON Hell
Real quick, here's an example of the types of parameters this request accepts:
One origin, one destination. The JSON response for a request this straightforward is quite simple:
For each destination, we're getting two data points: the commute distance, and estimated duration. If we hypothetically wanted to extract those values, typing
response['rows']['elements']['distance']['test'] isn't too crazy. I mean, it's somewhat awful and brings on casual thoughts of suicide, but nothing out of the ordinary
Now let's make things interesting by adding a few more stops on our trip:
A lot is happening here. There are objects. There are lists. There are lists of objects which are part of an object. The last thing I'd want to deal with is trying to parse this data only to accidentally get a useless key:value pair like "status": "OK".
Code Snippet To The Rescue
Let's say we only want the human-readable data from this JSON, which is labeled "text" for both distance and duration. We've created a function below dubbed
json_extract() to help us resolve this very issue. The idea is that
json_extract() is flexible and agnostic, therefore can be imported as a module into any project you might need.
We need to pass this function two values:
- A complex Python dictionary, such as the response we parsed from
- The name of the dictionary key containing values we want to extract.
Regardless of where the key "text" lives in the JSON, this function returns every value for the instance of "key." Here's our function in action:
Running this function will result in the following output:
Oh fiddle me timbers! Because the Google API alternates between distance and trip duration, every other value alternates between distance and time (can we pause to appreciate this awful design? There are infinitely better ways to structure this response). Never fear, some simple Python can help us split this list into two lists:
This will take our one list and split it in to two lists, alternating between even and odd:
Getting Creative With Lists
A common theme I run in to while extracting lists of values from JSON objects like these is that the lists of values I extract are very much related. In the above example, for every duration we have an accompanying distance, which is a one-to-one basis. Imagine if we wanted to associate these values somehow?
To use a better example, I recently I used our
json_extract() function to fetch lists of column names and their data types from a database schema. As separate lists, the data looked something like this:
Clearly these two lists are directly related; the latter is describing the former. How can this be useful? By using Python's
I like to think they call it zip because it's like zipping up a zipper, where each side of the zipper is a list. This output a dictionary where list 1 serves as the keys, and list 2 serves as values:
And there you have it folks: a free code snippet to copy and secretly pretend you wrote forever. I've thrown the function up on Github Gists, if such a thing pleases you.
That's all for today folks! Zip it up and zip it out. Zippity-do-da, buh bye.