Tableau's View Extraction REST API

There's nothing I love more than exposing expensive enterprise software.

It may not seem obvious, but most SaaS products have an underlying core goal: shackle businesses to depend on proprietary, closed-source, costly software. When you pair a surplus of money with a reluctance to work, you've arrived at Corporate America: a prime victim yearning to marry itself to any vendor with a nice pitch deck and a vague promise.

In the case of Tableau, this becomes obvious when you attempt to do anything besides create visuals. I don't like spending hours of my time cleaning data to be rewarded with a shitty iframe embed: I want my data. As we've already seen by exposing Tableau's hidden Superadmin access, it's pretty clear Tableau doesn't want you to do this.

I realize Tableau is a BI tool, and some might argue we're barking up the wrong tree, and all data should be clean before reaching Tableau. My sentiment is this: fuck that. If a single license costs one thousand dollars, and we have the power to manipulate data faster as we visualize it, we should at least be able to own that data: and by "own," I don't mean a CSV export. I want it in my own database of choice, not a locked down and hidden Postgres database living on a VPS filled with Tableau stuff.

Here's how we'd do that.

"You Expect us to Just Walk out the Casino with Millions of Dollars on us?"

"You're looking at a Boeski, a Jim Brown, a Miss Daisy, two Jethros and a Leon Spinks, not to mention the biggest Ella Fitzgerald ever."

You're here because you're the best of the best. If you're feeling scared, feel free to back out now.

This tutorial assumes you have a Tableau Server instance, with a workbook published to a site within said instance. We're going to take a page out of that workbook and turn the raw data into a database table. FAIR WARNING: We're about to dive deep into the obscure world of the Tableau Server REST API. It's clunky, it's ugly, and it returns XML. Strap yourself in.

We're going to be working with 3 core endpoints. Let's walk through them, and I'll show you how to exploit said endpoints to create a ruthless data mining machine in Python.

'Tableau Authorization' Endpoint

Like all obnoxious (aka useful) APIs, we need to authorize each API call with a temporary token. Of course, we'll just have Python generate said token for every call we make.

POST: https://[MyTaleauServerURL]/api/3.0/auth/signin

Tableau Auth URL

Hitting this endpoint successfully will result in an XML response (ugh). The response should look something like this:

<?xml version='1.0' encoding='UTF-8'?>
<tsResponse xmlns="https://tableau.com/api" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://tableau.com/api https://tableau.com/api/ts-api-3.0.xsd">
    <credentials token="KBIuvu6FTViuyivuTUR^yfvgTUycvjGubgc">
        <site id="09Hiugv-345-45d0-b48b-34543giuyvg" contentUrl="hackers"/>
        <user id="Uohiiyu-3455-8675-9b42-bugvdr876gv"/>
    </credentials>
</tsResponse>

Successful auth response

There are a number of things going on here that we should take note of. The first being a marvel of modern technology: this is perhaps the shittiest response to a token API call in modern history. Other than that, we need two things from this response:

The token is required for every API call from here on out. It is intended to be passed as a header value with the key X-Tableau-Auth.
The site ID is what we'll be using to look up the location of our workbooks in our server instance. This is added to the URL of future API calls (again, impressively shitty design here).

'List All Views by Site' Endpoint

There are actually a number of methods we could use to retrieve views, but we're specifically settling on listing our views by 'site,' in the Tableau sense of the word. If you're unfamiliar, a Tableau site is not a site at all: it's more of project within a greater Tableau instance. They probably should've named them that.

GET: https://[MyTaleauServerURL]/api/3.0/sites/[MySiteID]/views

Views endpoint

As mentioned, we use the site ID from step 1 to construct this endpoint. In my particular instance, I've only saved a single workbook for simplicity's sake. The response for such a case is as follows:

<?xml version='1.0' encoding='UTF-8'?>
<tsResponse xmlns="https://tableau.com/api" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://tableau.com/api https://tableau.com/api/ts-api-3.0.xsd">
    <pagination pageNumber="1" pageSize="100" totalAvailable="1"/>
    <views>
        <view id="9a4a1de9-b7af-4a4a-8556-fd5ac82f92bd" name="Jira" contentUrl="JiraIssues/sheets/Jira" createdAt="2018-12-21T09:11:39Z" updatedAt="2018-12-21T09:11:39Z">
            <workbook id="208a0c4e-e1d9-4852-9d19-7a2fe2717191"/>
            <owner id="Uohiiyu-3455-8675-9b42-bugvdr876gv"/>
            <project id="4d1ca337-20b4-442c-aa7b-1dfd470b68bd"/>
            <tags/>
        </view>
    </views>
</tsResponse>

Views response

Check out the views node: when we make this API call, <views> will contain a list of every view saved to the specified site. Keep in mind that a view is equivalent to a "sheet" of a workbook: in almost any case, you will have many views listed here.

My sheet happens to be called "Jira," as stated by name="Jira". The thing we really need however is the view id attribute: this will be used in our third and final API call.

'Get View Data' Endpoint

Now let's get the raw data from a view of our choice.

GET: https://[MyTaleauServerURL]/api/3.0/sites/[MySiteID]/views/[MyViewID]/data

Views data endpoint

Here's where we hit pay dirt. This request will result in an output of comma-separated values; I don't need to tell you what we can do with comma-separated values. Here's what my response looks like after formatting it as a table:

Current Assignee	Current Status	Day of Updated	epic_color	epic_name	Issue Type	issuetype_color	issuetype_url	key	Priority	project	summary
Todd Birchard	Done	June 7, 2018	#42526E	Widgets	Bug	#db5d5d	https://hackers.nyc3.digitaloceanspaces.com/bug.png	HACK-96	Lowest	Hackers and Slackers	"Recent Posts" widget does not have link rollover
Todd Birchard	Backlog	June 15, 2018	#57D9A3	Page Templates	Task	#73B0E1	https://hackers.nyc3.digitaloceanspaces.com/task.png	HACK-32	Lowest	Hackers and Slackers	“Join” page
Todd Birchard	Done	November 13, 2018	#42526E	Widgets	Task	#73B0E1	https://hackers.nyc3.digitaloceanspaces.com/task.png	HACK-543	Medium	Hackers and Slackers	Add “pro tip” box
Todd Birchard	To Do	December 14, 2018	#679EEF	SEO	Major Functionality	#93d171	https://hackers.nyc3.digitaloceanspaces.com/story.png	HACK-656	Low	Hackers and Slackers	Add alt attributes to images vis clarifai
Todd Birchard	Backlog	October 16, 2018	#FDDA3E	Accounts	Major Functionality	#93d171	https://hackers.nyc3.digitaloceanspaces.com/story.png	HACK-473	Medium	Hackers and Slackers	Add avatar selection to signup
Todd Birchard	Done	November 13, 2018	#57D9A3	Page Templates	Sub-task	#92BFE5	https://hackers.nyc3.digitaloceanspaces.com/subtask.png	HACK-231	Medium	Hackers and Slackers	Add blurb to each post page explaining what these are
Todd Birchard	Done	December 10, 2018	#291BA9	Code snippets	Task	#73B0E1	https://hackers.nyc3.digitaloceanspaces.com/task.png	HACK-452	Medium	Hackers and Slackers	Add color styles for json snippets

That's right, a table. Databases are comprised of tables. Perhaps you see where I'm going with this.

"There's a Ninety-five Pound Chinese Man with a Hundred Sixty Million Dollars Behind this Door."

We've got the goods, but calling all these individual endpoints manually does nothing for us. We don't want to steal a single view, we want to systematically rob Tableau of its views on a scheduler and Shanghai them off to a database of our choosing.

It would be a crime not to automate this, so I've created a class containing all the relevant methods we'd want when it comes to interacting with Tableau's REST API:

import requests
import xml.etree.ElementTree as ET
from . import r
import pandas as pd
import io


class ExtractTableauView:
    """Class for with the Tableau server API."""

    __baseurl = r.get('baseurl')
    __username = r.get('username')
    __password = r.get('password')
    __database = r.get('uri')
    __contenturl = r.get('contenturl')

    @classmethod
    def get_view(cls, site, xml, view, token):
        """Extract contents of a single view."""
        headers = {'X-Tableau-Auth': token,
                   'Content-Type': 'text/csv'
                   }
        req = requests.get(cls.__baseurl + '/api/3.2/sites/' + str(site) +'/views/' + str(view) + '/data', headers=headers, stream=True)
        csv_text = req.text
        view_df = pd.read_csv(io.StringIO(csv_text), header=0)
        return view_df

    @classmethod
    def list_views(cls, site, xml, token):
        """List all views belonging to a Tableau Site."""
        headers = {'X-Tableau-Auth': token}
        req = requests.get(cls.__baseurl + '/api/3.2/sites/' + site + '/views', auth=(cls.__username, cls.__password), headers=headers)
        root = ET.fromstring(req.content)
        views_arr = []
        for child in root.iter('*'):
            if child.tag == '{https://tableau.com/api}views':
                for view in child:
                    view_dict = {
                        'name': view.attrib.get('name'),
                        'id': view.attrib.get('id'),
                        'url': cls.__baseurl + '/' + view.attrib.get('contentUrl'),
                        'created': view.attrib.get('createdAt'),
                        'updated': view.attrib.get('updatedAt')
                    }
                    views_arr.append(view_dict)
        return views_arr

    @classmethod
    def get_token(cls, xml):
        """Receive Auth token to perform API requests."""
        for child in xml.iter('*'):
            if child.tag == '{https://tableau.com/api}credentials':
                token = child.attrib.get('token')
                return token

    @classmethod
    def get_site(cls, xml):
        """Retrieve ID of Tableau 'site' instance."""
        root = xml
        for child in root.iter('*'):
            if child.tag == '{https://tableau.com/api}site':
                site = child.attrib.get('id')
                return site

    @classmethod
    def initialize_tableau_request(cls):
        """Retrieve core XML for interacting with Tableau."""
        headers = {'Content-Type': 'application/xml'}
        body = '<tsRequest><credentials name="' + cls.__username + '" password="' + cls.__password + '" ><site contentUrl="' + cls.__contenturl + '" /></credentials></tsRequest>'
        req = requests.post(cls.__baseurl + '/api/3.2/auth/signin', auth=(cls.__username, cls.__password), headers=headers, data=body)
        root = ET.fromstring(req.content)
        return root

Tableau extraction class

The above snippet is a Python class utilizing all the API endpoints we explored in a mostly effortless manner. Instantiating the class immediately covers the grunt work of:

Generating a token
Getting your (unfriendly) site ID
Listing all views belonging to the provided site
Retrieving data from a worksheet of choice

Get a list of views in your Tableau site by using the list_views() method. When you see the view you want, pass the view ID to the .get_view() method. This will result in response of all raw data in the view in the form of a CSV.

How to Pull a Heist (Final Chapter): Storing in Offshore Accounts

To earn your title as a true con artist, I'm leaving the final step up to you. You've escaped with the loot, but you'll need to put all that data somewhere. This should be a trivial matter of automating a simple database query, but the specifics are up to you.

If you're ready to liberate your data, feel free to grab the source off of Github and go nuts: