Automagically Turn JSON into Pandas DataFrames

In his post about extracting data from APIs, Todd demonstrated a nice way to massage JSON into a pandas DataFrame. This method works great when our JSON response is flat, because dict.keys() only gets the keys on the first "level" of a dictionary. It gets a little trickier when our JSON starts to become nested though, as I experienced when working with Spotify's API via the Spotipy library. For example, take a look at a response from their https://api.spotify.com/v1/tracks/{id} endpoint:

import spotipy
from spotipy.oauth2 import SpotifyClientCredentials


spotify_client_id = 'YOUR_ID'
spotify_client_secret  = 'YOUR_SECRET'

client_credentials_manager = SpotifyClientCredentials(
    client_id=spotify_client_id,
    client_secret=spotify_client_secret
)

sp = spotipy.Spotify(
    client_credentials_manager=client_credentials_manager
)

Set up Spotipy client

track_response = sp.track('0BDYBajZydY54OTgQsH940')
print(track_response)

Fetch a Spotify track

Here's the output:

{
  "album": {
    "album_type": "album",
    "artists": [{
        "external_urls": {
          "spotify": "https://open.spotify.com/artist/7wyRA7deGRxozTyBc6QXPe"
        },
        "href": "https://api.spotify.com/v1/artists/7wyRA7deGRxozTyBc6QXPe",
        "id": "7wyRA7deGRxozTyBc6QXPe",
        "name": "Stephen Malkmus & The Jicks",
        "type": "artist",
        "uri": "spotify:artist:7wyRA7deGRxozTyBc6QXPe"
      },
      {
        "external_urls": {
          "spotify": "https://open.spotify.com/artist/0WISkx0PwT6lYWdPqKUJY8"
        },
        "href": "https://api.spotify.com/v1/artists/0WISkx0PwT6lYWdPqKUJY8",
        "id": "0WISkx0PwT6lYWdPqKUJY8",
        "name": "Stephen Malkmus",
        "type": "artist",
        "uri": "spotify:artist:0WISkx0PwT6lYWdPqKUJY8"
      },
      {
        "external_urls": {
          "spotify": "https://open.spotify.com/artist/7uStwCeP54Za8gXUFCf5L7"
        },
        "href": "https://api.spotify.com/v1/artists/7uStwCeP54Za8gXUFCf5L7",
        "id": "7uStwCeP54Za8gXUFCf5L7",
        "name": "The Jicks",
        "type": "artist",
        "uri": "spotify:artist:7uStwCeP54Za8gXUFCf5L7"
      }
    ],
    "available_markets": ["AR",
      "BO",
      "BR",
      "CA",
      "...",
      "US",
      "UY",
      "VN"
    ],
    "external_urls": {
      "spotify": "https://open.spotify.com/album/6pWpb4IdPu9vp9mOdh5DjY"
    },
    "href": "https://api.spotify.com/v1/albums/6pWpb4IdPu9vp9mOdh5DjY",
    "id": "6pWpb4IdPu9vp9mOdh5DjY",
    "images": [{
        "height": 640,
        "url": "https://i.scdn.co/image/918fdb6fdffccf2bd2dd1a1a93136000f8cf9bd3",
        "width": 640
      },
      {
        "height": 300,
        "url": "https://i.scdn.co/image/fb39290ebca6fac424d4a40611a7e0d1146c5f88",
        "width": 300
      },
      {
        "height": 64,
        "url": "https://i.scdn.co/image/873da4a0a21acd96e4a0036c9ecd0580b62652d4",
        "width": 64
      }
    ],
    "name": "Real Emotional Trash",
    "release_date": "2008-03-04",
    "release_date_precision": "day",
    "type": "album",
    "uri": "spotify:album:6pWpb4IdPu9vp9mOdh5DjY"
  },
  "artists": [{
      "external_urls": {
        "spotify": "https://open.spotify.com/artist/7wyRA7deGRxozTyBc6QXPe"
      },
      "href": "https://api.spotify.com/v1/artists/7wyRA7deGRxozTyBc6QXPe",
      "id": "7wyRA7deGRxozTyBc6QXPe",
      "name": "Stephen Malkmus & The Jicks",
      "type": "artist",
      "uri": "spotify:artist:7wyRA7deGRxozTyBc6QXPe"
    },
    {
      "external_urls": {
        "spotify": "https://open.spotify.com/artist/0WISkx0PwT6lYWdPqKUJY8"
      },
      "href": "https://api.spotify.com/v1/artists/0WISkx0PwT6lYWdPqKUJY8",
      "id": "0WISkx0PwT6lYWdPqKUJY8",
      "name": "Stephen Malkmus",
      "type": "artist",
      "uri": "spotify:artist:0WISkx0PwT6lYWdPqKUJY8"
    },
    {
      "external_urls": {
        "spotify": "https://open.spotify.com/artist/7uStwCeP54Za8gXUFCf5L7"
      },
      "href": "https://api.spotify.com/v1/artists/7uStwCeP54Za8gXUFCf5L7",
      "id": "7uStwCeP54Za8gXUFCf5L7",
      "name": "The Jicks",
      "type": "artist",
      "uri": "spotify:artist:7uStwCeP54Za8gXUFCf5L7"
    }
  ],
  "available_markets": ["AR",
    "BO",
    "BR",
    "CA",
    "...",
    "US",
    "UY",
    "VN"
  ],
  "disc_number": 1,
  "duration_ms": 608826,
  "explicit": False,
  "external_ids": {
    "isrc": "USMTD0877204"
  },
  "external_urls": {
    "spotify": "https://open.spotify.com/track/0BDYBajZydY54OTgQsH940"
  },
  "href": "https://api.spotify.com/v1/tracks/0BDYBajZydY54OTgQsH940",
  "id": "0BDYBajZydY54OTgQsH940",
  "is_local": False,
  "name": "Real Emotional Trash",
  "popularity": 21,
  "preview_url": "https://p.scdn.co/mp3-preview/4fcbcd5a99fc7590d5819849e1aad3eff981dc75?cid=be22fd00039241bc96d161a63876b54c",
  "track_number": 4,
  "type": "track",
  "uri": "spotify:track:0BDYBajZydY54OTgQsH940"
}

Metadata for a single Spotify track

In addition to plenty of information about the track, Spotify also includes information about the album that contains the track. If we were to just use the dict.keys() method to turn this response into a DataFrame, we'd be missing out on all that extra album information. Well, it would be there, just not readily accessible.

track_response.keys()

Fetch JSON keys

Output:

dict_keys([
    'album',
    'artists',
    'available_markets',
    'disc_number',
    'duration_ms',
    'explicit',
    'external_ids',
    'external_urls',
    'href',
    'id',
    'is_local',
    'name',
    'popularity',
    'preview_url',
    'track_number',
    'type',
    'uri'
])

Output of track_response.keys()

So how do we get around this? Well, we could write our own function, but because pandas is amazing, it already has a built in tool that takes care of this for us.

Data Normalization

Meet json_normalize():

import pandas as pd
from pandas.io.json import json_normalize
json_normalize(track_response)

Normalize JSON data in Pandas

Output:

	album.album_type	album.artists	album.available_markets	album.external_urls.spotify	album.href	album.id	album.images	album.name	album.release_date	album.release_date_precision	...	external_urls.spotify	href	id	is_local	name	popularity	preview_url	track_number	type	uri
0	album	[{'external_urls': {'spotify': 'https://open.s...	[AR, BO, BR, CA, CL, CO, CR, EC, GT, HK, HN, I...	https://open.spotify.com/album/6pWpb4IdPu9vp9m...	https://api.spotify.com/v1/albums/6pWpb4IdPu9v...	6pWpb4IdPu9vp9mOdh5DjY	[{'height': 640, 'url': 'https://i.scdn.co/ima...	Real Emotional Trash	2008-03-04	day	...	https://open.spotify.com/track/0BDYBajZydY54OT...	https://api.spotify.com/v1/tracks/0BDYBajZydY5...	0BDYBajZydY54OTgQsH940	False	Real Emotional Trash	21	https://p.scdn.co/mp3-preview/4fcbcd5a99fc7590...	4	track	spotify:track:0BDYBajZydY54OTgQsH940

Yep – it's that easy. pandas takes our nested JSON object, flattens it out, and turns it into a DataFrame.

This makes our life easier when we're dealing with one record, but it really comes in handy when we're dealing with a response that contains multiple records.

tracks_response = sp.tracks(
    ['0BDYBajZydY54OTgQsH940',
     '7fdUqrzb8oCcIoKvFuzMrs',
     '0islTY4Fw6lhYbfqi8Qtdj',
     '3jyFLbljUTKjE13nIWXchH',
     '6dNmC2YWtWbVOFOdTuRDQs']
)
print(tracks_response)

Fetch multiple tracks

Output:

{
  "tracks": [{
      "album": {
        "album_type": "album",
        "artists": [{
            "external_urls": {
              "spotify": "https://open.spotify.com/artist/7wyRA7deGRxozTyBc6QXPe"
            },
            "href": "https://api.spotify.com/v1/artists/7wyRA7deGRxozTyBc6QXPe",
            "id": "7wyRA7deGRxozTyBc6QXPe",
            "name": "Stephen Malkmus & The Jicks",
            "type": "artist",
            "uri": "spotify:artist:7wyRA7deGRxozTyBc6QXPe"
          },
          {
            "external_urls": {
              "spotify": "https://open.spotify.com/artist/0WISkx0PwT6lYWdPqKUJY8"
            },
            "href": "https://api.spotify.com/v1/artists/0WISkx0PwT6lYWdPqKUJY8",
            "id": "0WISkx0PwT6lYWdPqKUJY8",
            "name": "Stephen Malkmus",
            "type": "artist",
            "uri": "spotify:artist:0WISkx0PwT6lYWdPqKUJY8"
          },
          {
            "external_urls": {
              "spotify": "https://open.spotify.com/artist/7uStwCeP54Za8gXUFCf5L7"
            },
            "href": "https://api.spotify.com/v1/artists/7uStwCeP54Za8gXUFCf5L7",
            "id": "7uStwCeP54Za8gXUFCf5L7",
            "name": "The Jicks",
            "type": "artist",
            "uri": "spotify:artist:7uStwCeP54Za8gXUFCf5L7"
          }
        ],
        "available_markets": ["AR",
          "BO",
          "BR",
          "CA",
          "CL",
          "CO",
          "CR",
          "EC",
          "GT",
          "HK",
          "HN",
          "ID",
          "MX",
          "MY",
          "NI",
          "PA",
          "PE",
          "PH",
          "PY",
          "SG",
          "SV",
          "TH",
          "TW",
          "US",
          "UY",
          "VN"
        ],
        "external_urls": {
          "spotify": "https://open.spotify.com/album/6pWpb4IdPu9vp9mOdh5DjY"
        },
        "href": "https://api.spotify.com/v1/albums/6pWpb4IdPu9vp9mOdh5DjY",
        "id": "6pWpb4IdPu9vp9mOdh5DjY",
        "images": [{
            "height": 640,
            "url": "https://i.scdn.co/image/918fdb6fdffccf2bd2dd1a1a93136000f8cf9bd3",
            "width": 640
          },
          {
            "height": 300,
            "url": "https://i.scdn.co/image/fb39290ebca6fac424d4a40611a7e0d1146c5f88",
            "width": 300
          },
          {
            "height": 64,
            "url": "https://i.scdn.co/image/873da4a0a21acd96e4a0036c9ecd0580b62652d4",
            "width": 64
          }
        ],
        "name": "Real Emotional Trash",
        "release_date": "2008-03-04",
        "release_date_precision": "day",
        "type": "album",
        "uri": "spotify:album:6pWpb4IdPu9vp9mOdh5DjY"
      },
      "artists": [{
          "external_urls": {
            "spotify": "https://open.spotify.com/artist/7wyRA7deGRxozTyBc6QXPe"
          },
          "href": "https://api.spotify.com/v1/artists/7wyRA7deGRxozTyBc6QXPe",
          "id": "7wyRA7deGRxozTyBc6QXPe",
          "name": "Stephen Malkmus & The Jicks",
          "type": "artist",
          "uri": "spotify:artist:7wyRA7deGRxozTyBc6QXPe"
        },
        {
          "external_urls": {
            "spotify": "https://open.spotify.com/artist/0WISkx0PwT6lYWdPqKUJY8"
          },
          "href": "https://api.spotify.com/v1/artists/0WISkx0PwT6lYWdPqKUJY8",
          "id": "0WISkx0PwT6lYWdPqKUJY8",
          "name": "Stephen Malkmus",
          "type": "artist",
          "uri": "spotify:artist:0WISkx0PwT6lYWdPqKUJY8"
        },
        {
          "external_urls": {
            "spotify": "https://open.spotify.com/artist/7uStwCeP54Za8gXUFCf5L7"
          },
          "href": "https://api.spotify.com/v1/artists/7uStwCeP54Za8gXUFCf5L7",
          "id": "7uStwCeP54Za8gXUFCf5L7",
          "name": "The Jicks",
          "type": "artist",
          "uri": "spotify:artist:7uStwCeP54Za8gXUFCf5L7"
        }
      ],
      "available_markets": ["AR",
        "BO",
        "BR",
        "CA",
        "CL",
        "CO",
        "CR",
        "EC",
        "GT",
        "HK",
        "HN",
        "ID",
        "MX",
        "MY",
        "NI",
        "PA",
        "PE",
        "PH",
        "PY",
        "SG",
        "SV",
        "TH",
        "TW",
        "US",
        "UY",
        "VN"
      ],
      "disc_number": 1,
      "duration_ms": 608826,
      "explicit": False,
      "external_ids": {
        "isrc": "USMTD0877204"
      },
      "external_urls": {
        "spotify": "https://open.spotify.com/track/0BDYBajZydY54OTgQsH940"
      },
      "href": "https://api.spotify.com/v1/tracks/0BDYBajZydY54OTgQsH940",
      "id": "0BDYBajZydY54OTgQsH940",
      "is_local": False,
      "name": "Real Emotional Trash",
      "popularity": 21,
      "preview_url": "https://p.scdn.co/mp3-preview/4fcbcd5a99fc7590d5819849e1aad3eff981dc75?cid=be22fd00039241bc96d161a63876b54c",
      "track_number": 4,
      "type": "track",
      "uri": "spotify:track:0BDYBajZydY54OTgQsH940"
    },
    {
      "album": {
        "album_type": "album",
        "artists": [{
            "external_urls": {
              "spotify": "https://open.spotify.com/artist/7wyRA7deGRxozTyBc6QXPe"
            },
            "href": "https://api.spotify.com/v1/artists/7wyRA7deGRxozTyBc6QXPe",
            "id": "7wyRA7deGRxozTyBc6QXPe",
            "name": "Stephen Malkmus & The Jicks",
            "type": "artist",
            "uri": "spotify:artist:7wyRA7deGRxozTyBc6QXPe"
          },
          {
            "external_urls": {
              "spotify": "https://open.spotify.com/artist/0WISkx0PwT6lYWdPqKUJY8"
            },
            "href": "https://api.spotify.com/v1/artists/0WISkx0PwT6lYWdPqKUJY8",
            "id": "0WISkx0PwT6lYWdPqKUJY8",
            "name": "Stephen Malkmus",
            "type": "artist",
            "uri": "spotify:artist:0WISkx0PwT6lYWdPqKUJY8"
          },
          {
            "external_urls": {
              "spotify": "https://open.spotify.com/artist/7uStwCeP54Za8gXUFCf5L7"
            },
            "href": "https://api.spotify.com/v1/artists/7uStwCeP54Za8gXUFCf5L7",
            "id": "7uStwCeP54Za8gXUFCf5L7",
            "name": "The Jicks",
            "type": "artist",
            "uri": "spotify:artist:7uStwCeP54Za8gXUFCf5L7"
          }
        ],
        "available_markets": ["AR",
          "BO",
          "BR",
          "CA",
          "CL",
          "CO",
          "CR",
          "EC",
          "GT",
          "HK",
          "HN",
          "ID",
          "MX",
          "MY",
          "NI",
          "PA",
          "PE",
          "PH",
          "PY",
          "SG",
          "SV",
          "TH",
          "TW",
          "US",
          "UY",
          "VN"
        ],
        "external_urls": {
          "spotify": "https://open.spotify.com/album/6pWpb4IdPu9vp9mOdh5DjY"
        },
        "href": "https://api.spotify.com/v1/albums/6pWpb4IdPu9vp9mOdh5DjY",
        "id": "6pWpb4IdPu9vp9mOdh5DjY",
        "images": [{
            "height": 640,
            "url": "https://i.scdn.co/image/918fdb6fdffccf2bd2dd1a1a93136000f8cf9bd3",
            "width": 640
          },
          {
            "height": 300,
            "url": "https://i.scdn.co/image/fb39290ebca6fac424d4a40611a7e0d1146c5f88",
            "width": 300
          },
          {
            "height": 64,
            "url": "https://i.scdn.co/image/873da4a0a21acd96e4a0036c9ecd0580b62652d4",
            "width": 64
          }
        ],
        "name": "Real Emotional Trash",
        "release_date": "2008-03-04",
        "release_date_precision": "day",
        "type": "album",
        "uri": "spotify:album:6pWpb4IdPu9vp9mOdh5DjY"
      },
      "artists": [{
          "external_urls": {
            "spotify": "https://open.spotify.com/artist/7wyRA7deGRxozTyBc6QXPe"
          },
          "href": "https://api.spotify.com/v1/artists/7wyRA7deGRxozTyBc6QXPe",
          "id": "7wyRA7deGRxozTyBc6QXPe",
          "name": "Stephen Malkmus & The Jicks",
          "type": "artist",
          "uri": "spotify:artist:7wyRA7deGRxozTyBc6QXPe"
        },
        {
          "external_urls": {
            "spotify": "https://open.spotify.com/artist/0WISkx0PwT6lYWdPqKUJY8"
          },
          "href": "https://api.spotify.com/v1/artists/0WISkx0PwT6lYWdPqKUJY8",
          "id": "0WISkx0PwT6lYWdPqKUJY8",
          "name": "Stephen Malkmus",
          "type": "artist",
          "uri": "spotify:artist:0WISkx0PwT6lYWdPqKUJY8"
        },
        {
          "external_urls": {
            "spotify": "https://open.spotify.com/artist/7uStwCeP54Za8gXUFCf5L7"
          },
          "href": "https://api.spotify.com/v1/artists/7uStwCeP54Za8gXUFCf5L7",
          "id": "7uStwCeP54Za8gXUFCf5L7",
          "name": "The Jicks",
          "type": "artist",
          "uri": "spotify:artist:7uStwCeP54Za8gXUFCf5L7"
        }
      ],
      "available_markets": ["AR",
        "BO",
        "BR",
        "CA",
        "CL",
        "CO",
        "CR",
        "EC",
        "GT",
        "HK",
        "HN",
        "ID",
        "MX",
        "MY",
        "NI",
        "PA",
        "PE",
        "PH",
        "PY",
        "SG",
        "SV",
        "TH",
        "TW",
        "US",
        "UY",
        "VN"
      ],
      "disc_number": 1,
      "duration_ms": 222706,
      "explicit": False,
      "external_ids": {
        "isrc": "USMTD0877203"
      },
      "external_urls": {
        "spotify": "https://open.spotify.com/track/7fdUqrzb8oCcIoKvFuzMrs"
      },
      "href": "https://api.spotify.com/v1/tracks/7fdUqrzb8oCcIoKvFuzMrs",
      "id": "7fdUqrzb8oCcIoKvFuzMrs",
      "is_local": False,
      "name": "Cold Son",
      "popularity": 25,
      "preview_url": "https://p.scdn.co/mp3-preview/4cf4e21727def47097e27d30de16ffe9f99b7774?cid=be22fd00039241bc96d161a63876b54c",
      "track_number": 3,
      "type": "track",
      "uri": "spotify:track:7fdUqrzb8oCcIoKvFuzMrs"
    },
    ...
  ]
}

Output

json_normalise(tracks_response)
print(json_normalise(tracks_response))

Normalize multiple tracks

Output:

	album.album_type	album.artists	album.available_markets	album.external_urls.spotify	album.href	album.id	album.images	album.name	album.release_date	album.release_date_precision	...	external_urls.spotify	href	id	is_local	name	popularity	preview_url	track_number	type	uri
0	album	[{'external_urls': {'spotify': 'https://open.s...	[AR, BO, BR, CA, CL, CO, CR, EC, GT, HK, HN, I...	https://open.spotify.com/album/6pWpb4IdPu9vp9m...	https://api.spotify.com/v1/albums/6pWpb4IdPu9v...	6pWpb4IdPu9vp9mOdh5DjY	[{'height': 640, 'url': 'https://i.scdn.co/ima...	Real Emotional Trash	3/4/2008	day	...	https://open.spotify.com/track/0BDYBajZydY54OT...	https://api.spotify.com/v1/tracks/0BDYBajZydY5...	0BDYBajZydY54OTgQsH940	FALSE	Real Emotional Trash	21	https://p.scdn.co/mp3-preview/4fcbcd5a99fc7590...	4	track	spotify:track:0BDYBajZydY54OTgQsH940
1	album	[{'external_urls': {'spotify': 'https://open.s...	[AR, BO, BR, CA, CL, CO, CR, EC, GT, HK, HN, I...	https://open.spotify.com/album/6pWpb4IdPu9vp9m...	https://api.spotify.com/v1/albums/6pWpb4IdPu9v...	6pWpb4IdPu9vp9mOdh5DjY	[{'height': 640, 'url': 'https://i.scdn.co/ima...	Real Emotional Trash	3/4/2008	day	...	https://open.spotify.com/track/7fdUqrzb8oCcIoK...	https://api.spotify.com/v1/tracks/7fdUqrzb8oCc...	7fdUqrzb8oCcIoKvFuzMrs	FALSE	Cold Son	25	https://p.scdn.co/mp3-preview/4cf4e21727def470...	3	track	spotify:track:7fdUqrzb8oCcIoKvFuzMrs
2	album	[{'external_urls': {'spotify': 'https://open.s...	[AR, BO, BR, CA, CL, CO, CR, EC, GT, HK, HN, I...	https://open.spotify.com/album/6pWpb4IdPu9vp9m...	https://api.spotify.com/v1/albums/6pWpb4IdPu9v...	6pWpb4IdPu9vp9mOdh5DjY	[{'height': 640, 'url': 'https://i.scdn.co/ima...	Real Emotional Trash	3/4/2008	day	...	https://open.spotify.com/track/0islTY4Fw6lhYbf...	https://api.spotify.com/v1/tracks/0islTY4Fw6lh...	0islTY4Fw6lhYbfqi8Qtdj	FALSE	Hopscotch Willie	24	https://p.scdn.co/mp3-preview/c7782dc6d7c0bb12...	2	track	spotify:track:0islTY4Fw6lhYbfqi8Qtdj
3	album	[{'external_urls': {'spotify': 'https://open.s...	[AR, BO, BR, CA, CL, CO, CR, EC, GT, HK, HN, I...	https://open.spotify.com/album/6pWpb4IdPu9vp9m...	https://api.spotify.com/v1/albums/6pWpb4IdPu9v...	6pWpb4IdPu9vp9mOdh5DjY	[{'height': 640, 'url': 'https://i.scdn.co/ima...	Real Emotional Trash	3/4/2008	day	...	https://open.spotify.com/track/3jyFLbljUTKjE13...	https://api.spotify.com/v1/tracks/3jyFLbljUTKj...	3jyFLbljUTKjE13nIWXchH	FALSE	Dragonfly Pie	26	https://p.scdn.co/mp3-preview/50f419e7d3e8a6a7...	1	track	spotify:track:3jyFLbljUTKjE13nIWXchH
4	album	[{'external_urls': {'spotify': 'https://open.s...	[AR, AU, BO, BR, CA, CL, CO, CR, DO, EC, GT, H...	https://open.spotify.com/album/5DMvSCwRqfNVlMB...	https://api.spotify.com/v1/albums/5DMvSCwRqfNV...	5DMvSCwRqfNVlMB5LjHOwG	[{'height': 640, 'url': 'https://i.scdn.co/ima...	Sparkle Hard	5/18/2018	day	...	https://open.spotify.com/track/6dNmC2YWtWbVOFO...	https://api.spotify.com/v1/tracks/6dNmC2YWtWbV...	6dNmC2YWtWbVOFOdTuRDQs	FALSE	Difficulties - Let Them Eat Vowels	35	https://p.scdn.co/mp3-preview/787be9d1bbebcd84...	11	track	spotify:track:6dNmC2YWtWbVOFOdTuRDQs

Separate Ways (Worlds Apart)

By default, json_normalize() uses periods . to indicate nested levels of the JSON object (which is actually converted to a Python dict by Spotipy). In our case, the album id is found in track['album']['id'], hence the period between album and id in the DataFrame. This makes things slightly annoying if we want to grab a Series from our new DataFrame. In pandas, we can grab a Series from a DataFrame in many ways. To grab the album.id column, for example:

tracks_df['album.id']

Output:

0    6pWpb4IdPu9vp9mOdh5DjY
1    6pWpb4IdPu9vp9mOdh5DjY
2    6pWpb4IdPu9vp9mOdh5DjY
3    6pWpb4IdPu9vp9mOdh5DjY
4    5DMvSCwRqfNVlMB5LjHOwG
Name: album.id, dtype: object

or...

tracks_df.loc[:,'album.id']

Output:

0    6pWpb4IdPu9vp9mOdh5DjY
1    6pWpb4IdPu9vp9mOdh5DjY
2    6pWpb4IdPu9vp9mOdh5DjY
3    6pWpb4IdPu9vp9mOdh5DjY
4    5DMvSCwRqfNVlMB5LjHOwG
Name: album.id, dtype: object

Pandas also allows us to use dot notation (i.e. dataframe.column_name) to grab a column as a Series, but only if our column name doesn't include a period already. Since json_normalize() uses a period as a separator by default, this ruins that method. Never fear though – overriding this behavior is as simple as overriding the default argument in the function call:

tracks_df = json_normalize(tracks_response['tracks'],sep="_")
print(tracks_df)

Override separators

Output:

	album_album_type	album_artists	album_available_markets	album_external_urls_spotify	album_href	album_id	album_images	album_name	album_release_date	album_release_date_precision	...	external_urls_spotify	href	id	is_local	name	popularity	preview_url	track_number	type	uri
0	album	[{'external_urls': {'spotify': 'https://open.s...	[AR, BO, BR, CA, CL, CO, CR, EC, GT, HK, HN, I...	https://open.spotify.com/album/6pWpb4IdPu9vp9m...	https://api.spotify.com/v1/albums/6pWpb4IdPu9v...	6pWpb4IdPu9vp9mOdh5DjY	[{'height': 640, 'url': 'https://i.scdn.co/ima...	Real Emotional Trash	3/4/2008	day	...	https://open.spotify.com/track/0BDYBajZydY54OT...	https://api.spotify.com/v1/tracks/0BDYBajZydY5...	0BDYBajZydY54OTgQsH940	FALSE	Real Emotional Trash	21	https://p.scdn.co/mp3-preview/4fcbcd5a99fc7590...	4	track	spotify:track:0BDYBajZydY54OTgQsH940
1	album	[{'external_urls': {'spotify': 'https://open.s...	[AR, BO, BR, CA, CL, CO, CR, EC, GT, HK, HN, I...	https://open.spotify.com/album/6pWpb4IdPu9vp9m...	https://api.spotify.com/v1/albums/6pWpb4IdPu9v...	6pWpb4IdPu9vp9mOdh5DjY	[{'height': 640, 'url': 'https://i.scdn.co/ima...	Real Emotional Trash	3/4/2008	day	...	https://open.spotify.com/track/7fdUqrzb8oCcIoK...	https://api.spotify.com/v1/tracks/7fdUqrzb8oCc...	7fdUqrzb8oCcIoKvFuzMrs	FALSE	Cold Son	25	https://p.scdn.co/mp3-preview/4cf4e21727def470...	3	track	spotify:track:7fdUqrzb8oCcIoKvFuzMrs
2	album	[{'external_urls': {'spotify': 'https://open.s...	[AR, BO, BR, CA, CL, CO, CR, EC, GT, HK, HN, I...	https://open.spotify.com/album/6pWpb4IdPu9vp9m...	https://api.spotify.com/v1/albums/6pWpb4IdPu9v...	6pWpb4IdPu9vp9mOdh5DjY	[{'height': 640, 'url': 'https://i.scdn.co/ima...	Real Emotional Trash	3/4/2008	day	...	https://open.spotify.com/track/0islTY4Fw6lhYbf...	https://api.spotify.com/v1/tracks/0islTY4Fw6lh...	0islTY4Fw6lhYbfqi8Qtdj	FALSE	Hopscotch Willie	24	https://p.scdn.co/mp3-preview/c7782dc6d7c0bb12...	2	track	spotify:track:0islTY4Fw6lhYbfqi8Qtdj
3	album	[{'external_urls': {'spotify': 'https://open.s...	[AR, BO, BR, CA, CL, CO, CR, EC, GT, HK, HN, I...	https://open.spotify.com/album/6pWpb4IdPu9vp9m...	https://api.spotify.com/v1/albums/6pWpb4IdPu9v...	6pWpb4IdPu9vp9mOdh5DjY	[{'height': 640, 'url': 'https://i.scdn.co/ima...	Real Emotional Trash	3/4/2008	day	...	https://open.spotify.com/track/3jyFLbljUTKjE13...	https://api.spotify.com/v1/tracks/3jyFLbljUTKj...	3jyFLbljUTKjE13nIWXchH	FALSE	Dragonfly Pie	26	https://p.scdn.co/mp3-preview/50f419e7d3e8a6a7...	1	track	spotify:track:3jyFLbljUTKjE13nIWXchH
4	album	[{'external_urls': {'spotify': 'https://open.s...	[AR, AU, BO, BR, CA, CL, CO, CR, DO, EC, GT, H...	https://open.spotify.com/album/5DMvSCwRqfNVlMB...	https://api.spotify.com/v1/albums/5DMvSCwRqfNV...	5DMvSCwRqfNVlMB5LjHOwG	[{'height': 640, 'url': 'https://i.scdn.co/ima...	Sparkle Hard	5/18/2018	day	...	https://open.spotify.com/track/6dNmC2YWtWbVOFO...	https://api.spotify.com/v1/tracks/6dNmC2YWtWbV...	6dNmC2YWtWbVOFOdTuRDQs	FALSE	Difficulties - Let Them Eat Vowels	35	https://p.scdn.co/mp3-preview/787be9d1bbebcd84...	11	track	spotify:track:6dNmC2YWtWbVOFOdTuRDQs

Now we can go back to using dot notation to access a column as a Series. This saves us some typing every time we want to grab a column, and it looks a bit nicer (to me, at least). I say worth it.

tracks_df.album_id

Output:

0    6pWpb4IdPu9vp9mOdh5DjY
1    6pWpb4IdPu9vp9mOdh5DjY
2    6pWpb4IdPu9vp9mOdh5DjY
3    6pWpb4IdPu9vp9mOdh5DjY
4    5DMvSCwRqfNVlMB5LjHOwG
Name: album_id, dtype: object

I Need That Record

By including more parameters when we use json_normlize(), we can really extract just the data that we want from our API response.

From our responses above, we can see that the artist property contains a list of artists that are associated with a track:

tracks_response['tracks'][0]['artists']

Output:

[{
    "external_urls": {
      "spotify": "https://open.spotify.com/artist/7wyRA7deGRxozTyBc6QXPe"
    },
    "href": "https://api.spotify.com/v1/artists/7wyRA7deGRxozTyBc6QXPe",
    "id": "7wyRA7deGRxozTyBc6QXPe",
    "name": "Stephen Malkmus & The Jicks",
    "type": "artist",
    "uri": "spotify:artist:7wyRA7deGRxozTyBc6QXPe"
  },
  {
    "external_urls": {
      "spotify": "https://open.spotify.com/artist/0WISkx0PwT6lYWdPqKUJY8"
    },
    "href": "https://api.spotify.com/v1/artists/0WISkx0PwT6lYWdPqKUJY8",
    "id": "0WISkx0PwT6lYWdPqKUJY8",
    "name": "Stephen Malkmus",
    "type": "artist",
    "uri": "spotify:artist:0WISkx0PwT6lYWdPqKUJY8"
  },
  {
    "external_urls": {
      "spotify": "https://open.spotify.com/artist/7uStwCeP54Za8gXUFCf5L7"
    },
    "href": "https://api.spotify.com/v1/artists/7uStwCeP54Za8gXUFCf5L7",
    "id": "7uStwCeP54Za8gXUFCf5L7",
    "name": "The Jicks",
    "type": "artist",
    "uri": "spotify:artist:7uStwCeP54Za8gXUFCf5L7"
  }
]

Let's say I want to load this data into a database later. It would be nice to have a join table that maps each of the artists that are associated with each track. Luckily, this is possible with json_normalize()'s record_path and meta parameters.

record_path tells json_normalize() what path of keys leads to each individual record in the JSON object. In our case, we want to grab every artist id, so our function call will look like:

json_normalize(
    tracks_response['tracks'],
    record_path=['artists'],
    sep="_"
)

	external_urls	href	id	name	type	uri
1	{'spotify': 'https://open.spotify.com/artist/7...	https://api.spotify.com/v1/artists/7wyRA7deGRx...	7wyRA7deGRxozTyBc6QXPe	Stephen Malkmus & The Jicks	artist	spotify:artist:7wyRA7deGRxozTyBc6QXPe
1	{'spotify': 'https://open.spotify.com/artist/0...	https://api.spotify.com/v1/artists/0WISkx0PwT6...	0WISkx0PwT6lYWdPqKUJY8	Stephen Malkmus	artist	spotify:artist:0WISkx0PwT6lYWdPqKUJY8
2	{'spotify': 'https://open.spotify.com/artist/7...	https://api.spotify.com/v1/artists/7uStwCeP54Z...	7uStwCeP54Za8gXUFCf5L7	The Jicks	artist	spotify:artist:7uStwCeP54Za8gXUFCf5L7
3	{'spotify': 'https://open.spotify.com/artist/7...	https://api.spotify.com/v1/artists/7wyRA7deGRx...	7wyRA7deGRxozTyBc6QXPe	Stephen Malkmus & The Jicks	artist	spotify:artist:7wyRA7deGRxozTyBc6QXPe
4	{'spotify': 'https://open.spotify.com/artist/0...	https://api.spotify.com/v1/artists/0WISkx0PwT6...	0WISkx0PwT6lYWdPqKUJY8	Stephen Malkmus	artist	spotify:artist:0WISkx0PwT6lYWdPqKUJY8
5	{'spotify': 'https://open.spotify.com/artist/7...	https://api.spotify.com/v1/artists/7uStwCeP54Z...	7uStwCeP54Za8gXUFCf5L7	The Jicks	artist	spotify:artist:7uStwCeP54Za8gXUFCf5L7
6	{'spotify': 'https://open.spotify.com/artist/7...	https://api.spotify.com/v1/artists/7wyRA7deGRx...	7wyRA7deGRxozTyBc6QXPe	Stephen Malkmus & The Jicks	artist	spotify:artist:7wyRA7deGRxozTyBc6QXPe
7	{'spotify': 'https://open.spotify.com/artist/0...	https://api.spotify.com/v1/artists/0WISkx0PwT6...	0WISkx0PwT6lYWdPqKUJY8	Stephen Malkmus	artist	spotify:artist:0WISkx0PwT6lYWdPqKUJY8
8	{'spotify': 'https://open.spotify.com/artist/7...	https://api.spotify.com/v1/artists/7uStwCeP54Z...	7uStwCeP54Za8gXUFCf5L7	The Jicks	artist	spotify:artist:7uStwCeP54Za8gXUFCf5L7
9	{'spotify': 'https://open.spotify.com/artist/7...	https://api.spotify.com/v1/artists/7wyRA7deGRx...	7wyRA7deGRxozTyBc6QXPe	Stephen Malkmus & The Jicks	artist	spotify:artist:7wyRA7deGRxozTyBc6QXPe
10	{'spotify': 'https://open.spotify.com/artist/0...	https://api.spotify.com/v1/artists/0WISkx0PwT6...	0WISkx0PwT6lYWdPqKUJY8	Stephen Malkmus	artist	spotify:artist:0WISkx0PwT6lYWdPqKUJY8
11	{'spotify': 'https://open.spotify.com/artist/7...	https://api.spotify.com/v1/artists/7uStwCeP54Z...	7uStwCeP54Za8gXUFCf5L7	The Jicks	artist	spotify:artist:7uStwCeP54Za8gXUFCf5L7
12	{'spotify': 'https://open.spotify.com/artist/7...	https://api.spotify.com/v1/artists/7wyRA7deGRx...	7wyRA7deGRxozTyBc6QXPe	Stephen Malkmus & The Jicks	artist	spotify:artist:7wyRA7deGRxozTyBc6QXPe

Cool – we're almost there. Now we want to use the meta parameter to specify what data we want to include from the rest of the JSON object. In our case, we want to keep the track id and map it to the artist id. If we look back at our API response, the name of the column that included the track is is called, appropriately, id, so our full function call should look like this:

json_normalize(
    tracks_response['tracks'],
    record_path=['artists'],
    meta=['id'],
    sep="_"
)

Output:

-----------------------------------------
ValueError                   Traceback (most recent call last)

    <ipython-input-14-77e00a98c3c0> in <module>()
    ----> 1 json_normalize(tracks_response['tracks'],record_path=['artists'],meta=['id'],sep="_")

    ~/anaconda3/envs/music_data/lib/python3.6/site-packages/pandas/io/json/normalize.py in json_normalize(data, record_path, meta, meta_prefix, record_prefix, errors, sep)
        268         if k in result:
        269             raise ValueError('Conflicting metadata name {name}, '
    --> 270                              'need distinguishing prefix '.format(name=k))
        271 
        272         result[k] = np.array(v).repeat(lengths)
    
ValueError: Conflicting metadata name id, need distinguishing prefix

Uh oh – an error! What's going on? Well, it turns out that both the album id and track id were given the key id. pandas doesn't like that, and it gives us a helpful error to tell us so: ValueError: Conflicting metadata name id, need distinguishing prefix.

There are two more parameters we can use to overcome this error: record_prefix and meta_prefix. These are strings we'll add to the beginning of our records and metadata to prevent these naming conflicts. Since we're dealing with Spotify artist ids for our records and Spotify track ids as the metadata, I'll use sp_artist_ and sp_track_ respectively. When that's done, I'll select only the columns that we're interested in.

artist_and_track = json_normalize(
    data=tracks_response['tracks'],
    record_path='artists',
    meta=['id'],
    record_prefix='sp_artist_',
    meta_prefix='sp_track_',
    sep="_"
)
artist_and_track = artist_and_track[['sp_track_id','sp_artist_id']]
artist_and_track

Output:

	sp_track_id	sp_artist_id
0	0BDYBajZydY54OTgQsH940	7wyRA7deGRxozTyBc6QXPe
1	0BDYBajZydY54OTgQsH940	0WISkx0PwT6lYWdPqKUJY8
2	0BDYBajZydY54OTgQsH940	7uStwCeP54Za8gXUFCf5L7
3	7fdUqrzb8oCcIoKvFuzMrs	7wyRA7deGRxozTyBc6QXPe
4	7fdUqrzb8oCcIoKvFuzMrs	0WISkx0PwT6lYWdPqKUJY8
5	7fdUqrzb8oCcIoKvFuzMrs	7uStwCeP54Za8gXUFCf5L7
6	0islTY4Fw6lhYbfqi8Qtdj	7wyRA7deGRxozTyBc6QXPe
7	0islTY4Fw6lhYbfqi8Qtdj	0WISkx0PwT6lYWdPqKUJY8
8	0islTY4Fw6lhYbfqi8Qtdj	7uStwCeP54Za8gXUFCf5L7
9	3jyFLbljUTKjE13nIWXchH	7wyRA7deGRxozTyBc6QXPe
10	3jyFLbljUTKjE13nIWXchH	0WISkx0PwT6lYWdPqKUJY8
11	3jyFLbljUTKjE13nIWXchH	7uStwCeP54Za8gXUFCf5L7
12	6dNmC2YWtWbVOFOdTuRDQs	7wyRA7deGRxozTyBc6QXPe

TL;DR

Use pd.io.json.json_normalize() to automagically flatten a nested JSON object into a DataFrame
Make your life slightly easier when it comes to selecting columns by overriding the default sep parameter
Specify what data constitutes a record with the record_path parameter
Include data from outside of the record path with the meta parameter
Fix naming conflicts if they arise with the record_prefix and meta_prefix parameters

Data Normalization

Separate Ways (Worlds Apart)

Output:

Output:

I Need That Record

TL;DR

Related Posts