Automagically Turn JSON into Pandas DataFrames

Let Pandas do the heavy lifting for you when turning JSON into a DataFrame.

Automagically Turn JSON into Pandas DataFrames

    In his post about extracting data from APIs, Todd demonstrated a nice way to massage JSON into a pandas DataFrame. This method works great when our JSON response is flat, because dict.keys() only gets the keys on the first "level" of a dictionary. It gets a little trickier when our JSON starts to become nested though, as I experienced when working with Spotify's API via the Spotipy library. For example, take a look at a response from their https://api.spotify.com/v1/tracks/{id} endpoint:

    import spotipy
    from spotipy.oauth2 import SpotifyClientCredentials
    
    spotify_client_id = 'YOUR_ID'
    spotify_client_secret  = 'YOUR_SECRET'
    client_credentials_manager = SpotifyClientCredentials(client_id=spotify_client_id, client_secret=spotify_client_secret)
    sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)
    
    track_response = sp.track('0BDYBajZydY54OTgQsH940')
    track_response
    

    Output:

    {
      "album": {
        "album_type": "album",
        "artists": [{
            "external_urls": {
              "spotify": "https://open.spotify.com/artist/7wyRA7deGRxozTyBc6QXPe"
            },
            "href": "https://api.spotify.com/v1/artists/7wyRA7deGRxozTyBc6QXPe",
            "id": "7wyRA7deGRxozTyBc6QXPe",
            "name": "Stephen Malkmus & The Jicks",
            "type": "artist",
            "uri": "spotify:artist:7wyRA7deGRxozTyBc6QXPe"
          },
          {
            "external_urls": {
              "spotify": "https://open.spotify.com/artist/0WISkx0PwT6lYWdPqKUJY8"
            },
            "href": "https://api.spotify.com/v1/artists/0WISkx0PwT6lYWdPqKUJY8",
            "id": "0WISkx0PwT6lYWdPqKUJY8",
            "name": "Stephen Malkmus",
            "type": "artist",
            "uri": "spotify:artist:0WISkx0PwT6lYWdPqKUJY8"
          },
          {
            "external_urls": {
              "spotify": "https://open.spotify.com/artist/7uStwCeP54Za8gXUFCf5L7"
            },
            "href": "https://api.spotify.com/v1/artists/7uStwCeP54Za8gXUFCf5L7",
            "id": "7uStwCeP54Za8gXUFCf5L7",
            "name": "The Jicks",
            "type": "artist",
            "uri": "spotify:artist:7uStwCeP54Za8gXUFCf5L7"
          }
        ],
        "available_markets": ["AR",
          "BO",
          "BR",
          "CA",
          "...",
          "US",
          "UY",
          "VN"
        ],
        "external_urls": {
          "spotify": "https://open.spotify.com/album/6pWpb4IdPu9vp9mOdh5DjY"
        },
        "href": "https://api.spotify.com/v1/albums/6pWpb4IdPu9vp9mOdh5DjY",
        "id": "6pWpb4IdPu9vp9mOdh5DjY",
        "images": [{
            "height": 640,
            "url": "https://i.scdn.co/image/918fdb6fdffccf2bd2dd1a1a93136000f8cf9bd3",
            "width": 640
          },
          {
            "height": 300,
            "url": "https://i.scdn.co/image/fb39290ebca6fac424d4a40611a7e0d1146c5f88",
            "width": 300
          },
          {
            "height": 64,
            "url": "https://i.scdn.co/image/873da4a0a21acd96e4a0036c9ecd0580b62652d4",
            "width": 64
          }
        ],
        "name": "Real Emotional Trash",
        "release_date": "2008-03-04",
        "release_date_precision": "day",
        "type": "album",
        "uri": "spotify:album:6pWpb4IdPu9vp9mOdh5DjY"
      },
      "artists": [{
          "external_urls": {
            "spotify": "https://open.spotify.com/artist/7wyRA7deGRxozTyBc6QXPe"
          },
          "href": "https://api.spotify.com/v1/artists/7wyRA7deGRxozTyBc6QXPe",
          "id": "7wyRA7deGRxozTyBc6QXPe",
          "name": "Stephen Malkmus & The Jicks",
          "type": "artist",
          "uri": "spotify:artist:7wyRA7deGRxozTyBc6QXPe"
        },
        {
          "external_urls": {
            "spotify": "https://open.spotify.com/artist/0WISkx0PwT6lYWdPqKUJY8"
          },
          "href": "https://api.spotify.com/v1/artists/0WISkx0PwT6lYWdPqKUJY8",
          "id": "0WISkx0PwT6lYWdPqKUJY8",
          "name": "Stephen Malkmus",
          "type": "artist",
          "uri": "spotify:artist:0WISkx0PwT6lYWdPqKUJY8"
        },
        {
          "external_urls": {
            "spotify": "https://open.spotify.com/artist/7uStwCeP54Za8gXUFCf5L7"
          },
          "href": "https://api.spotify.com/v1/artists/7uStwCeP54Za8gXUFCf5L7",
          "id": "7uStwCeP54Za8gXUFCf5L7",
          "name": "The Jicks",
          "type": "artist",
          "uri": "spotify:artist:7uStwCeP54Za8gXUFCf5L7"
        }
      ],
      "available_markets": ["AR",
        "BO",
        "BR",
        "CA",
        "...",
        "US",
        "UY",
        "VN"
      ],
      "disc_number": 1,
      "duration_ms": 608826,
      "explicit": False,
      "external_ids": {
        "isrc": "USMTD0877204"
      },
      "external_urls": {
        "spotify": "https://open.spotify.com/track/0BDYBajZydY54OTgQsH940"
      },
      "href": "https://api.spotify.com/v1/tracks/0BDYBajZydY54OTgQsH940",
      "id": "0BDYBajZydY54OTgQsH940",
      "is_local": False,
      "name": "Real Emotional Trash",
      "popularity": 21,
      "preview_url": "https://p.scdn.co/mp3-preview/4fcbcd5a99fc7590d5819849e1aad3eff981dc75?cid=be22fd00039241bc96d161a63876b54c",
      "track_number": 4,
      "type": "track",
      "uri": "spotify:track:0BDYBajZydY54OTgQsH940"
    }
    

    In addition to plenty of information about the track, Spotify also includes information about the album that contains the track. If we were to just use the dict.keys() method to turn this response into a DataFrame, we'd be missing out on all that extra album information. Well, it would be there, just not readily accessible.

    track_response.keys()
    

    Output:

    dict_keys(['album', 'artists', 'available_markets', 'disc_number', 'duration_ms', 'explicit', 'external_ids', 'external_urls', 'href', 'id', 'is_local', 'name', 'popularity', 'preview_url', 'track_number', 'type', 'uri'])
    

    So how do we get around this? Well, we could write our own function, but because pandas is amazing, it already has a built in tool that takes care of this for us.

    Data Normalization

    Meet json_normalize():

    import pandas as pd
    from pandas.io.json import json_normalize
    json_normalize(track_response)
    

    Output:

    album.album_type album.artists album.available_markets album.external_urls.spotify album.href album.id album.images album.name album.release_date album.release_date_precision ... external_urls.spotify href id is_local name popularity preview_url track_number type uri
    0 album [{'external_urls': {'spotify': 'https://open.s... [AR, BO, BR, CA, CL, CO, CR, EC, GT, HK, HN, I... https://open.spotify.com/album/6pWpb4IdPu9vp9m... https://api.spotify.com/v1/albums/6pWpb4IdPu9v... 6pWpb4IdPu9vp9mOdh5DjY [{'height': 640, 'url': 'https://i.scdn.co/ima... Real Emotional Trash 2008-03-04 day ... https://open.spotify.com/track/0BDYBajZydY54OT... https://api.spotify.com/v1/tracks/0BDYBajZydY5... 0BDYBajZydY54OTgQsH940 False Real Emotional Trash 21 https://p.scdn.co/mp3-preview/4fcbcd5a99fc7590... 4 track spotify:track:0BDYBajZydY54OTgQsH940

    Yep – it's that easy. pandas takes our nested JSON object, flattens it out, and turns it into a DataFrame.

    This makes our life easier when we're dealing with one record, but it really comes in handy when we're dealing with a response that contains multiple records.

    tracks_response = sp.tracks(
        ['0BDYBajZydY54OTgQsH940',
         '7fdUqrzb8oCcIoKvFuzMrs',
         '0islTY4Fw6lhYbfqi8Qtdj',
         '3jyFLbljUTKjE13nIWXchH',
         '6dNmC2YWtWbVOFOdTuRDQs']
    )
    tracks_response
    

    Output:

    {
      "tracks": [{
          "album": {
            "album_type": "album",
            "artists": [{
                "external_urls": {
                  "spotify": "https://open.spotify.com/artist/7wyRA7deGRxozTyBc6QXPe"
                },
                "href": "https://api.spotify.com/v1/artists/7wyRA7deGRxozTyBc6QXPe",
                "id": "7wyRA7deGRxozTyBc6QXPe",
                "name": "Stephen Malkmus & The Jicks",
                "type": "artist",
                "uri": "spotify:artist:7wyRA7deGRxozTyBc6QXPe"
              },
              {
                "external_urls": {
                  "spotify": "https://open.spotify.com/artist/0WISkx0PwT6lYWdPqKUJY8"
                },
                "href": "https://api.spotify.com/v1/artists/0WISkx0PwT6lYWdPqKUJY8",
                "id": "0WISkx0PwT6lYWdPqKUJY8",
                "name": "Stephen Malkmus",
                "type": "artist",
                "uri": "spotify:artist:0WISkx0PwT6lYWdPqKUJY8"
              },
              {
                "external_urls": {
                  "spotify": "https://open.spotify.com/artist/7uStwCeP54Za8gXUFCf5L7"
                },
                "href": "https://api.spotify.com/v1/artists/7uStwCeP54Za8gXUFCf5L7",
                "id": "7uStwCeP54Za8gXUFCf5L7",
                "name": "The Jicks",
                "type": "artist",
                "uri": "spotify:artist:7uStwCeP54Za8gXUFCf5L7"
              }
            ],
            "available_markets": ["AR",
              "BO",
              "BR",
              "CA",
              "CL",
              "CO",
              "CR",
              "EC",
              "GT",
              "HK",
              "HN",
              "ID",
              "MX",
              "MY",
              "NI",
              "PA",
              "PE",
              "PH",
              "PY",
              "SG",
              "SV",
              "TH",
              "TW",
              "US",
              "UY",
              "VN"
            ],
            "external_urls": {
              "spotify": "https://open.spotify.com/album/6pWpb4IdPu9vp9mOdh5DjY"
            },
            "href": "https://api.spotify.com/v1/albums/6pWpb4IdPu9vp9mOdh5DjY",
            "id": "6pWpb4IdPu9vp9mOdh5DjY",
            "images": [{
                "height": 640,
                "url": "https://i.scdn.co/image/918fdb6fdffccf2bd2dd1a1a93136000f8cf9bd3",
                "width": 640
              },
              {
                "height": 300,
                "url": "https://i.scdn.co/image/fb39290ebca6fac424d4a40611a7e0d1146c5f88",
                "width": 300
              },
              {
                "height": 64,
                "url": "https://i.scdn.co/image/873da4a0a21acd96e4a0036c9ecd0580b62652d4",
                "width": 64
              }
            ],
            "name": "Real Emotional Trash",
            "release_date": "2008-03-04",
            "release_date_precision": "day",
            "type": "album",
            "uri": "spotify:album:6pWpb4IdPu9vp9mOdh5DjY"
          },
          "artists": [{
              "external_urls": {
                "spotify": "https://open.spotify.com/artist/7wyRA7deGRxozTyBc6QXPe"
              },
              "href": "https://api.spotify.com/v1/artists/7wyRA7deGRxozTyBc6QXPe",
              "id": "7wyRA7deGRxozTyBc6QXPe",
              "name": "Stephen Malkmus & The Jicks",
              "type": "artist",
              "uri": "spotify:artist:7wyRA7deGRxozTyBc6QXPe"
            },
            {
              "external_urls": {
                "spotify": "https://open.spotify.com/artist/0WISkx0PwT6lYWdPqKUJY8"
              },
              "href": "https://api.spotify.com/v1/artists/0WISkx0PwT6lYWdPqKUJY8",
              "id": "0WISkx0PwT6lYWdPqKUJY8",
              "name": "Stephen Malkmus",
              "type": "artist",
              "uri": "spotify:artist:0WISkx0PwT6lYWdPqKUJY8"
            },
            {
              "external_urls": {
                "spotify": "https://open.spotify.com/artist/7uStwCeP54Za8gXUFCf5L7"
              },
              "href": "https://api.spotify.com/v1/artists/7uStwCeP54Za8gXUFCf5L7",
              "id": "7uStwCeP54Za8gXUFCf5L7",
              "name": "The Jicks",
              "type": "artist",
              "uri": "spotify:artist:7uStwCeP54Za8gXUFCf5L7"
            }
          ],
          "available_markets": ["AR",
            "BO",
            "BR",
            "CA",
            "CL",
            "CO",
            "CR",
            "EC",
            "GT",
            "HK",
            "HN",
            "ID",
            "MX",
            "MY",
            "NI",
            "PA",
            "PE",
            "PH",
            "PY",
            "SG",
            "SV",
            "TH",
            "TW",
            "US",
            "UY",
            "VN"
          ],
          "disc_number": 1,
          "duration_ms": 608826,
          "explicit": False,
          "external_ids": {
            "isrc": "USMTD0877204"
          },
          "external_urls": {
            "spotify": "https://open.spotify.com/track/0BDYBajZydY54OTgQsH940"
          },
          "href": "https://api.spotify.com/v1/tracks/0BDYBajZydY54OTgQsH940",
          "id": "0BDYBajZydY54OTgQsH940",
          "is_local": False,
          "name": "Real Emotional Trash",
          "popularity": 21,
          "preview_url": "https://p.scdn.co/mp3-preview/4fcbcd5a99fc7590d5819849e1aad3eff981dc75?cid=be22fd00039241bc96d161a63876b54c",
          "track_number": 4,
          "type": "track",
          "uri": "spotify:track:0BDYBajZydY54OTgQsH940"
        },
        {
          "album": {
            "album_type": "album",
            "artists": [{
                "external_urls": {
                  "spotify": "https://open.spotify.com/artist/7wyRA7deGRxozTyBc6QXPe"
                },
                "href": "https://api.spotify.com/v1/artists/7wyRA7deGRxozTyBc6QXPe",
                "id": "7wyRA7deGRxozTyBc6QXPe",
                "name": "Stephen Malkmus & The Jicks",
                "type": "artist",
                "uri": "spotify:artist:7wyRA7deGRxozTyBc6QXPe"
              },
              {
                "external_urls": {
                  "spotify": "https://open.spotify.com/artist/0WISkx0PwT6lYWdPqKUJY8"
                },
                "href": "https://api.spotify.com/v1/artists/0WISkx0PwT6lYWdPqKUJY8",
                "id": "0WISkx0PwT6lYWdPqKUJY8",
                "name": "Stephen Malkmus",
                "type": "artist",
                "uri": "spotify:artist:0WISkx0PwT6lYWdPqKUJY8"
              },
              {
                "external_urls": {
                  "spotify": "https://open.spotify.com/artist/7uStwCeP54Za8gXUFCf5L7"
                },
                "href": "https://api.spotify.com/v1/artists/7uStwCeP54Za8gXUFCf5L7",
                "id": "7uStwCeP54Za8gXUFCf5L7",
                "name": "The Jicks",
                "type": "artist",
                "uri": "spotify:artist:7uStwCeP54Za8gXUFCf5L7"
              }
            ],
            "available_markets": ["AR",
              "BO",
              "BR",
              "CA",
              "CL",
              "CO",
              "CR",
              "EC",
              "GT",
              "HK",
              "HN",
              "ID",
              "MX",
              "MY",
              "NI",
              "PA",
              "PE",
              "PH",
              "PY",
              "SG",
              "SV",
              "TH",
              "TW",
              "US",
              "UY",
              "VN"
            ],
            "external_urls": {
              "spotify": "https://open.spotify.com/album/6pWpb4IdPu9vp9mOdh5DjY"
            },
            "href": "https://api.spotify.com/v1/albums/6pWpb4IdPu9vp9mOdh5DjY",
            "id": "6pWpb4IdPu9vp9mOdh5DjY",
            "images": [{
                "height": 640,
                "url": "https://i.scdn.co/image/918fdb6fdffccf2bd2dd1a1a93136000f8cf9bd3",
                "width": 640
              },
              {
                "height": 300,
                "url": "https://i.scdn.co/image/fb39290ebca6fac424d4a40611a7e0d1146c5f88",
                "width": 300
              },
              {
                "height": 64,
                "url": "https://i.scdn.co/image/873da4a0a21acd96e4a0036c9ecd0580b62652d4",
                "width": 64
              }
            ],
            "name": "Real Emotional Trash",
            "release_date": "2008-03-04",
            "release_date_precision": "day",
            "type": "album",
            "uri": "spotify:album:6pWpb4IdPu9vp9mOdh5DjY"
          },
          "artists": [{
              "external_urls": {
                "spotify": "https://open.spotify.com/artist/7wyRA7deGRxozTyBc6QXPe"
              },
              "href": "https://api.spotify.com/v1/artists/7wyRA7deGRxozTyBc6QXPe",
              "id": "7wyRA7deGRxozTyBc6QXPe",
              "name": "Stephen Malkmus & The Jicks",
              "type": "artist",
              "uri": "spotify:artist:7wyRA7deGRxozTyBc6QXPe"
            },
            {
              "external_urls": {
                "spotify": "https://open.spotify.com/artist/0WISkx0PwT6lYWdPqKUJY8"
              },
              "href": "https://api.spotify.com/v1/artists/0WISkx0PwT6lYWdPqKUJY8",
              "id": "0WISkx0PwT6lYWdPqKUJY8",
              "name": "Stephen Malkmus",
              "type": "artist",
              "uri": "spotify:artist:0WISkx0PwT6lYWdPqKUJY8"
            },
            {
              "external_urls": {
                "spotify": "https://open.spotify.com/artist/7uStwCeP54Za8gXUFCf5L7"
              },
              "href": "https://api.spotify.com/v1/artists/7uStwCeP54Za8gXUFCf5L7",
              "id": "7uStwCeP54Za8gXUFCf5L7",
              "name": "The Jicks",
              "type": "artist",
              "uri": "spotify:artist:7uStwCeP54Za8gXUFCf5L7"
            }
          ],
          "available_markets": ["AR",
            "BO",
            "BR",
            "CA",
            "CL",
            "CO",
            "CR",
            "EC",
            "GT",
            "HK",
            "HN",
            "ID",
            "MX",
            "MY",
            "NI",
            "PA",
            "PE",
            "PH",
            "PY",
            "SG",
            "SV",
            "TH",
            "TW",
            "US",
            "UY",
            "VN"
          ],
          "disc_number": 1,
          "duration_ms": 222706,
          "explicit": False,
          "external_ids": {
            "isrc": "USMTD0877203"
          },
          "external_urls": {
            "spotify": "https://open.spotify.com/track/7fdUqrzb8oCcIoKvFuzMrs"
          },
          "href": "https://api.spotify.com/v1/tracks/7fdUqrzb8oCcIoKvFuzMrs",
          "id": "7fdUqrzb8oCcIoKvFuzMrs",
          "is_local": False,
          "name": "Cold Son",
          "popularity": 25,
          "preview_url": "https://p.scdn.co/mp3-preview/4cf4e21727def47097e27d30de16ffe9f99b7774?cid=be22fd00039241bc96d161a63876b54c",
          "track_number": 3,
          "type": "track",
          "uri": "spotify:track:7fdUqrzb8oCcIoKvFuzMrs"
        },
        ...
      ]
    }
    
    
    json_normalise(tracks_response)
    

    Output:

    album.album_typealbum.artistsalbum.available_marketsalbum.external_urls.spotifyalbum.hrefalbum.idalbum.imagesalbum.namealbum.release_datealbum.release_date_precision...external_urls.spotifyhrefidis_localnamepopularitypreview_urltrack_numbertypeuri
    0album[{'external_urls': {'spotify': 'https://open.s...[AR, BO, BR, CA, CL, CO, CR, EC, GT, HK, HN, I...https://open.spotify.com/album/6pWpb4IdPu9vp9m...https://api.spotify.com/v1/albums/6pWpb4IdPu9v...6pWpb4IdPu9vp9mOdh5DjY[{'height': 640, 'url': 'https://i.scdn.co/ima...Real Emotional Trash3/4/2008day...https://open.spotify.com/track/0BDYBajZydY54OT...https://api.spotify.com/v1/tracks/0BDYBajZydY5...0BDYBajZydY54OTgQsH940FALSEReal Emotional Trash21https://p.scdn.co/mp3-preview/4fcbcd5a99fc7590...4trackspotify:track:0BDYBajZydY54OTgQsH940
    1album[{'external_urls': {'spotify': 'https://open.s...[AR, BO, BR, CA, CL, CO, CR, EC, GT, HK, HN, I...https://open.spotify.com/album/6pWpb4IdPu9vp9m...https://api.spotify.com/v1/albums/6pWpb4IdPu9v...6pWpb4IdPu9vp9mOdh5DjY[{'height': 640, 'url': 'https://i.scdn.co/ima...Real Emotional Trash3/4/2008day...https://open.spotify.com/track/7fdUqrzb8oCcIoK...https://api.spotify.com/v1/tracks/7fdUqrzb8oCc...7fdUqrzb8oCcIoKvFuzMrsFALSECold Son25https://p.scdn.co/mp3-preview/4cf4e21727def470...3trackspotify:track:7fdUqrzb8oCcIoKvFuzMrs
    2album[{'external_urls': {'spotify': 'https://open.s...[AR, BO, BR, CA, CL, CO, CR, EC, GT, HK, HN, I...https://open.spotify.com/album/6pWpb4IdPu9vp9m...https://api.spotify.com/v1/albums/6pWpb4IdPu9v...6pWpb4IdPu9vp9mOdh5DjY[{'height': 640, 'url': 'https://i.scdn.co/ima...Real Emotional Trash3/4/2008day...https://open.spotify.com/track/0islTY4Fw6lhYbf...https://api.spotify.com/v1/tracks/0islTY4Fw6lh...0islTY4Fw6lhYbfqi8QtdjFALSEHopscotch Willie24https://p.scdn.co/mp3-preview/c7782dc6d7c0bb12...2trackspotify:track:0islTY4Fw6lhYbfqi8Qtdj
    3album[{'external_urls': {'spotify': 'https://open.s...[AR, BO, BR, CA, CL, CO, CR, EC, GT, HK, HN, I...https://open.spotify.com/album/6pWpb4IdPu9vp9m...https://api.spotify.com/v1/albums/6pWpb4IdPu9v...6pWpb4IdPu9vp9mOdh5DjY[{'height': 640, 'url': 'https://i.scdn.co/ima...Real Emotional Trash3/4/2008day...https://open.spotify.com/track/3jyFLbljUTKjE13...https://api.spotify.com/v1/tracks/3jyFLbljUTKj...3jyFLbljUTKjE13nIWXchHFALSEDragonfly Pie26https://p.scdn.co/mp3-preview/50f419e7d3e8a6a7...1trackspotify:track:3jyFLbljUTKjE13nIWXchH
    4album[{'external_urls': {'spotify': 'https://open.s...[AR, AU, BO, BR, CA, CL, CO, CR, DO, EC, GT, H...https://open.spotify.com/album/5DMvSCwRqfNVlMB...https://api.spotify.com/v1/albums/5DMvSCwRqfNV...5DMvSCwRqfNVlMB5LjHOwG[{'height': 640, 'url': 'https://i.scdn.co/ima...Sparkle Hard5/18/2018day...https://open.spotify.com/track/6dNmC2YWtWbVOFO...https://api.spotify.com/v1/tracks/6dNmC2YWtWbV...6dNmC2YWtWbVOFOdTuRDQsFALSEDifficulties - Let Them Eat Vowels35https://p.scdn.co/mp3-preview/787be9d1bbebcd84...11trackspotify:track:6dNmC2YWtWbVOFOdTuRDQs

    Separate Ways (Worlds Apart)

    By default, json_normalize() uses periods . to indicate nested levels of the JSON object (which is actually converted to a Python dict by Spotipy). In our case, the album id is found in track['album']['id'], hence the period between album and id in the DataFrame. This makes things slightly annoying if we want to grab a Series from our new DataFrame. In pandas, we can grab a Series from a DataFrame in many ways. To grab the album.id column, for example:

    tracks_df['album.id']
    

    Output:

    0    6pWpb4IdPu9vp9mOdh5DjY
    1    6pWpb4IdPu9vp9mOdh5DjY
    2    6pWpb4IdPu9vp9mOdh5DjY
    3    6pWpb4IdPu9vp9mOdh5DjY
    4    5DMvSCwRqfNVlMB5LjHOwG
    Name: album.id, dtype: object
    

    or

    tracks_df.loc[:,'album.id']
    

    Output:

    0    6pWpb4IdPu9vp9mOdh5DjY
    1    6pWpb4IdPu9vp9mOdh5DjY
    2    6pWpb4IdPu9vp9mOdh5DjY
    3    6pWpb4IdPu9vp9mOdh5DjY
    4    5DMvSCwRqfNVlMB5LjHOwG
    Name: album.id, dtype: object
    

    pandas also allows us to use dot notation (i.e. dataframe.column_name) to grab a column as a Series, but only if our column name doesn't include a period already. Since json_normalize() uses a period as a separator by default, this ruins that method. Never fear though – overriding this behavior is as simple as overriding the default argument in the function call:

    tracks_df = json_normalize(tracks_response['tracks'],sep="_")
    tracks_df
    

    Output:

    album_album_typealbum_artistsalbum_available_marketsalbum_external_urls_spotifyalbum_hrefalbum_idalbum_imagesalbum_namealbum_release_datealbum_release_date_precision...external_urls_spotifyhrefidis_localnamepopularitypreview_urltrack_numbertypeuri
    0album[{'external_urls': {'spotify': 'https://open.s...[AR, BO, BR, CA, CL, CO, CR, EC, GT, HK, HN, I...https://open.spotify.com/album/6pWpb4IdPu9vp9m...https://api.spotify.com/v1/albums/6pWpb4IdPu9v...6pWpb4IdPu9vp9mOdh5DjY[{'height': 640, 'url': 'https://i.scdn.co/ima...Real Emotional Trash3/4/2008day...https://open.spotify.com/track/0BDYBajZydY54OT...https://api.spotify.com/v1/tracks/0BDYBajZydY5...0BDYBajZydY54OTgQsH940FALSEReal Emotional Trash21https://p.scdn.co/mp3-preview/4fcbcd5a99fc7590...4trackspotify:track:0BDYBajZydY54OTgQsH940
    1album[{'external_urls': {'spotify': 'https://open.s...[AR, BO, BR, CA, CL, CO, CR, EC, GT, HK, HN, I...https://open.spotify.com/album/6pWpb4IdPu9vp9m...https://api.spotify.com/v1/albums/6pWpb4IdPu9v...6pWpb4IdPu9vp9mOdh5DjY[{'height': 640, 'url': 'https://i.scdn.co/ima...Real Emotional Trash3/4/2008day...https://open.spotify.com/track/7fdUqrzb8oCcIoK...https://api.spotify.com/v1/tracks/7fdUqrzb8oCc...7fdUqrzb8oCcIoKvFuzMrsFALSECold Son25https://p.scdn.co/mp3-preview/4cf4e21727def470...3trackspotify:track:7fdUqrzb8oCcIoKvFuzMrs
    2album[{'external_urls': {'spotify': 'https://open.s...[AR, BO, BR, CA, CL, CO, CR, EC, GT, HK, HN, I...https://open.spotify.com/album/6pWpb4IdPu9vp9m...https://api.spotify.com/v1/albums/6pWpb4IdPu9v...6pWpb4IdPu9vp9mOdh5DjY[{'height': 640, 'url': 'https://i.scdn.co/ima...Real Emotional Trash3/4/2008day...https://open.spotify.com/track/0islTY4Fw6lhYbf...https://api.spotify.com/v1/tracks/0islTY4Fw6lh...0islTY4Fw6lhYbfqi8QtdjFALSEHopscotch Willie24https://p.scdn.co/mp3-preview/c7782dc6d7c0bb12...2trackspotify:track:0islTY4Fw6lhYbfqi8Qtdj
    3album[{'external_urls': {'spotify': 'https://open.s...[AR, BO, BR, CA, CL, CO, CR, EC, GT, HK, HN, I...https://open.spotify.com/album/6pWpb4IdPu9vp9m...https://api.spotify.com/v1/albums/6pWpb4IdPu9v...6pWpb4IdPu9vp9mOdh5DjY[{'height': 640, 'url': 'https://i.scdn.co/ima...Real Emotional Trash3/4/2008day...https://open.spotify.com/track/3jyFLbljUTKjE13...https://api.spotify.com/v1/tracks/3jyFLbljUTKj...3jyFLbljUTKjE13nIWXchHFALSEDragonfly Pie26https://p.scdn.co/mp3-preview/50f419e7d3e8a6a7...1trackspotify:track:3jyFLbljUTKjE13nIWXchH
    4album[{'external_urls': {'spotify': 'https://open.s...[AR, AU, BO, BR, CA, CL, CO, CR, DO, EC, GT, H...https://open.spotify.com/album/5DMvSCwRqfNVlMB...https://api.spotify.com/v1/albums/5DMvSCwRqfNV...5DMvSCwRqfNVlMB5LjHOwG[{'height': 640, 'url': 'https://i.scdn.co/ima...Sparkle Hard5/18/2018day...https://open.spotify.com/track/6dNmC2YWtWbVOFO...https://api.spotify.com/v1/tracks/6dNmC2YWtWbV...6dNmC2YWtWbVOFOdTuRDQsFALSEDifficulties - Let Them Eat Vowels35https://p.scdn.co/mp3-preview/787be9d1bbebcd84...11trackspotify:track:6dNmC2YWtWbVOFOdTuRDQs

    Now we can go back to using dot notation to access a column as a Series. This saves us some typing every time we want to grab a column, and it looks a bit nicer (to me, at least). I say worth it.

    tracks_df.album_id
    

    Output:

    0    6pWpb4IdPu9vp9mOdh5DjY
    1    6pWpb4IdPu9vp9mOdh5DjY
    2    6pWpb4IdPu9vp9mOdh5DjY
    3    6pWpb4IdPu9vp9mOdh5DjY
    4    5DMvSCwRqfNVlMB5LjHOwG
    Name: album_id, dtype: object
    

    I Need That Record

    By including more parameters when we use json_normlize(), we can really extract just the data that we want from our API response.

    From our responses above, we can see that the artist property contains a list of artists that are associated with a track:

    tracks_response['tracks'][0]['artists']
    

    Output:

    [{
        "external_urls": {
          "spotify": "https://open.spotify.com/artist/7wyRA7deGRxozTyBc6QXPe"
        },
        "href": "https://api.spotify.com/v1/artists/7wyRA7deGRxozTyBc6QXPe",
        "id": "7wyRA7deGRxozTyBc6QXPe",
        "name": "Stephen Malkmus & The Jicks",
        "type": "artist",
        "uri": "spotify:artist:7wyRA7deGRxozTyBc6QXPe"
      },
      {
        "external_urls": {
          "spotify": "https://open.spotify.com/artist/0WISkx0PwT6lYWdPqKUJY8"
        },
        "href": "https://api.spotify.com/v1/artists/0WISkx0PwT6lYWdPqKUJY8",
        "id": "0WISkx0PwT6lYWdPqKUJY8",
        "name": "Stephen Malkmus",
        "type": "artist",
        "uri": "spotify:artist:0WISkx0PwT6lYWdPqKUJY8"
      },
      {
        "external_urls": {
          "spotify": "https://open.spotify.com/artist/7uStwCeP54Za8gXUFCf5L7"
        },
        "href": "https://api.spotify.com/v1/artists/7uStwCeP54Za8gXUFCf5L7",
        "id": "7uStwCeP54Za8gXUFCf5L7",
        "name": "The Jicks",
        "type": "artist",
        "uri": "spotify:artist:7uStwCeP54Za8gXUFCf5L7"
      }
    ]
    

    Let's say I want to load this data into a database later. It would be nice to have a join table that maps each of the artists that are associated with each track. Luckily, this is possible with json_normalize()'s record_path and meta parameters.

    record_path tells json_normalize() what path of keys leads to each individual record in the JSON object. In our case, we want to grab every artist id, so our function call will look like:

    json_normalize(tracks_response['tracks'],record_path=['artists'],sep="_")
    

    external_urls href id name type uri
    1 {'spotify': 'https://open.spotify.com/artist/7... https://api.spotify.com/v1/artists/7wyRA7deGRx... 7wyRA7deGRxozTyBc6QXPe Stephen Malkmus & The Jicks artist spotify:artist:7wyRA7deGRxozTyBc6QXPe
    1 {'spotify': 'https://open.spotify.com/artist/0... https://api.spotify.com/v1/artists/0WISkx0PwT6... 0WISkx0PwT6lYWdPqKUJY8 Stephen Malkmus artist spotify:artist:0WISkx0PwT6lYWdPqKUJY8
    2 {'spotify': 'https://open.spotify.com/artist/7... https://api.spotify.com/v1/artists/7uStwCeP54Z... 7uStwCeP54Za8gXUFCf5L7 The Jicks artist spotify:artist:7uStwCeP54Za8gXUFCf5L7
    3 {'spotify': 'https://open.spotify.com/artist/7... https://api.spotify.com/v1/artists/7wyRA7deGRx... 7wyRA7deGRxozTyBc6QXPe Stephen Malkmus & The Jicks artist spotify:artist:7wyRA7deGRxozTyBc6QXPe
    4 {'spotify': 'https://open.spotify.com/artist/0... https://api.spotify.com/v1/artists/0WISkx0PwT6... 0WISkx0PwT6lYWdPqKUJY8 Stephen Malkmus artist spotify:artist:0WISkx0PwT6lYWdPqKUJY8
    5 {'spotify': 'https://open.spotify.com/artist/7... https://api.spotify.com/v1/artists/7uStwCeP54Z... 7uStwCeP54Za8gXUFCf5L7 The Jicks artist spotify:artist:7uStwCeP54Za8gXUFCf5L7
    6 {'spotify': 'https://open.spotify.com/artist/7... https://api.spotify.com/v1/artists/7wyRA7deGRx... 7wyRA7deGRxozTyBc6QXPe Stephen Malkmus & The Jicks artist spotify:artist:7wyRA7deGRxozTyBc6QXPe
    7 {'spotify': 'https://open.spotify.com/artist/0... https://api.spotify.com/v1/artists/0WISkx0PwT6... 0WISkx0PwT6lYWdPqKUJY8 Stephen Malkmus artist spotify:artist:0WISkx0PwT6lYWdPqKUJY8
    8 {'spotify': 'https://open.spotify.com/artist/7... https://api.spotify.com/v1/artists/7uStwCeP54Z... 7uStwCeP54Za8gXUFCf5L7 The Jicks artist spotify:artist:7uStwCeP54Za8gXUFCf5L7
    9 {'spotify': 'https://open.spotify.com/artist/7... https://api.spotify.com/v1/artists/7wyRA7deGRx... 7wyRA7deGRxozTyBc6QXPe Stephen Malkmus & The Jicks artist spotify:artist:7wyRA7deGRxozTyBc6QXPe
    10 {'spotify': 'https://open.spotify.com/artist/0... https://api.spotify.com/v1/artists/0WISkx0PwT6... 0WISkx0PwT6lYWdPqKUJY8 Stephen Malkmus artist spotify:artist:0WISkx0PwT6lYWdPqKUJY8
    11 {'spotify': 'https://open.spotify.com/artist/7... https://api.spotify.com/v1/artists/7uStwCeP54Z... 7uStwCeP54Za8gXUFCf5L7 The Jicks artist spotify:artist:7uStwCeP54Za8gXUFCf5L7
    12 {'spotify': 'https://open.spotify.com/artist/7... https://api.spotify.com/v1/artists/7wyRA7deGRx... 7wyRA7deGRxozTyBc6QXPe Stephen Malkmus & The Jicks artist spotify:artist:7wyRA7deGRxozTyBc6QXPe

    Cool – we're almost there. Now we want to use the meta parameter to specify what data we want to include from the rest of the JSON object. In our case, we want to keep the track id and map it to the artist id. If we look back at our API response, the name of the column that included the track is is called, appropriately, id, so our full function call should look like this:

    json_normalize(tracks_response['tracks'],record_path=['artists'],meta=['id'],sep="_")
    

    Output:

    -----------------------------------------
    ValueError                   Traceback (most recent call last)
    
        <ipython-input-14-77e00a98c3c0> in <module>()
        ----> 1 json_normalize(tracks_response['tracks'],record_path=['artists'],meta=['id'],sep="_")
    
        ~/anaconda3/envs/music_data/lib/python3.6/site-packages/pandas/io/json/normalize.py in json_normalize(data, record_path, meta, meta_prefix, record_prefix, errors, sep)
            268         if k in result:
            269             raise ValueError('Conflicting metadata name {name}, '
        --> 270                              'need distinguishing prefix '.format(name=k))
            271 
            272         result[k] = np.array(v).repeat(lengths)
        
    ValueError: Conflicting metadata name id, need distinguishing prefix 
    

    Uh oh – an error! What's going on? Well, it turns out that both the album id and track id were given the key id. pandas doesn't like that, and it gives us a helpful error to tell us so: ValueError: Conflicting metadata name id, need distinguishing prefix.

    There are two more parameters we can use to overcome this error: record_prefix and meta_prefix. These are strings we'll add to the beginning of our records and metadata to prevent these naming conflicts. Since we're dealing with Spotify artist ids for our records and Spotify track ids as the metadata, I'll use sp_artist_ and sp_track_ respectively. When that's done, I'll select only the columns that we're interested in.

    artist_and_track = json_normalize(
        data=tracks_response['tracks'],
        record_path='artists',
        meta=['id'],
        record_prefix='sp_artist_',
        meta_prefix='sp_track_',
        sep="_"
    )
    artist_and_track = artist_and_track[['sp_track_id','sp_artist_id']]
    artist_and_track
    

    Output:

    sp_track_id sp_artist_id
    00BDYBajZydY54OTgQsH940 7wyRA7deGRxozTyBc6QXPe
    10BDYBajZydY54OTgQsH940 0WISkx0PwT6lYWdPqKUJY8
    20BDYBajZydY54OTgQsH940 7uStwCeP54Za8gXUFCf5L7
    37fdUqrzb8oCcIoKvFuzMrs 7wyRA7deGRxozTyBc6QXPe
    47fdUqrzb8oCcIoKvFuzMrs 0WISkx0PwT6lYWdPqKUJY8
    57fdUqrzb8oCcIoKvFuzMrs 7uStwCeP54Za8gXUFCf5L7
    60islTY4Fw6lhYbfqi8Qtdj 7wyRA7deGRxozTyBc6QXPe
    70islTY4Fw6lhYbfqi8Qtdj 0WISkx0PwT6lYWdPqKUJY8
    80islTY4Fw6lhYbfqi8Qtdj 7uStwCeP54Za8gXUFCf5L7
    93jyFLbljUTKjE13nIWXchH 7wyRA7deGRxozTyBc6QXPe
    103jyFLbljUTKjE13nIWXchH 0WISkx0PwT6lYWdPqKUJY8
    113jyFLbljUTKjE13nIWXchH 7uStwCeP54Za8gXUFCf5L7
    126dNmC2YWtWbVOFOdTuRDQs 7wyRA7deGRxozTyBc6QXPe

    TL;DR

    • Use pd.io.json.json_normalize() to automagically flatten a nested JSON object into a DataFrame
    • Make your life slightly easier when it comes to selecting columns by overriding the default sep parameter
    • Specify what data constitutes a record with the record_path parameter
    • Include data from outside of the record path with the meta parameter
    • Fix naming conflicts if they arise with the record_prefix and meta_prefix parameters
    Graham Beckley's' avatar
    Philly
    Loves Python; loves pandas; leaves every project more Pythonic than he found it.

    Loves Python; loves pandas; leaves every project more Pythonic than he found it.