Tableau's REST API: Turning Tableau into an ETL Pipeline GUI

Organizing a heist on Tableau Server to reclaim workbook data.

Tableau's REST API: Turning Tableau into an ETL Pipeline GUI

    There's nothing I love more than exposing expensive enterprise software.

    It may not seem obvious, but most SaaS products have an underlying core goal: shackle businesses to depend on proprietary, closed-source, costly software. When you pair a surplus of money with a reluctance to work, you've arrived at Corporate America: a prime victim yearning to marry itself to any vendor with a nice pitch deck and a vague promise.

    In the case of Tableau, this becomes obvious when you attempt to do anything besides create visuals. I don't like spending hours of my time cleaning data to be rewarded with a shitty iframe embed: I want my data. As we've already seen by exposing Tableau's hidden Superadmin access, it's pretty clear Tableau doesn't want you to do this.

    I realize Tableau is a BI tool, and some might argue we're barking up the wrong tree, and all data should be clean before reaching Tableau. My sentiment is this: fuck that. If a single license costs one thousand dollars, and we have the power to manipulate data faster as we visualize it, we should at least be able to own that data: and by "own," I don't mean a CSV export. I want it in my own database of choice, not a locked down and hidden Postgres database living on a VPS filled with Tableau stuff.

    Here's how we'd do that.

    "You Expect us to Just Walk out the Casino with Millions of Dollars on us?"

    You're looking at a Boeski, a Jim Brown, a Miss Daisy, two Jethros and a Leon Spinks, not to mention the biggest Ella Fitzgerald ever.

    You're here because you're the best of the best. If you're feeling scared, feel free to back out now.

    This tutorial assumes you have a Tableau Server instance, with a workbook published to a site within said instance. We're going to take a page out of that workbook and turn the raw data into a database table. FAIR WARNING: We're about to dive deep into the obscure world of the Tableau Server REST API. It's clunky, it's ugly, and it returns XML. Strap yourself in.

    We're going to be working with 3 core endpoints. Let's walk through them, and I'll show you how to exploit said endpoints to create a ruthless data mining machine in Python.

    'Tableau Authorization' Endpoint

    Like all obnoxious (aka useful) APIs, we need to authorize each API call with a temporary token. Of course, we'll just have Python generate said token for every call we make.

    POST: http://[MyTaleauServerURL]/api/3.0/auth/signin

    Hitting this endpoint successfully will result in an XML response (ugh). The response should look something like this:

    <?xml version='1.0' encoding='UTF-8'?>
    <tsResponse xmlns="http://tableau.com/api" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://tableau.com/api http://tableau.com/api/ts-api-3.0.xsd">
        <credentials token="KBIuvu6FTViuyivuTUR^yfvgTUycvjGubgc">
            <site id="09Hiugv-345-45d0-b48b-34543giuyvg" contentUrl="hackers"/>
            <user id="Uohiiyu-3455-8675-9b42-bugvdr876gv"/>
        </credentials>
    </tsResponse>
    

    There are a number of things going on here that we should take note of. The first being a marvel of modern technology: this is perhaps the shittiest response to a token API call in modern history. Other than that, we need two things from this response:

    • The token is required for every API call from here on out. It is intended to be passed as a header value with the key X-Tableau-Auth.
    • The site ID is what we'll be using to look up the location of our workbooks in our server instance. This is added to the URL of future API calls (again, impressively shitty design here).

    'List All Views by Site' Endpoint

    There are actually a number of methods we could use to retrieve views, but we're specifically settling on listing our views by 'site,' in the Tableau sense of the word. If you're unfamiliar, a Tableau site is not a site at all: it's more of project within a greater Tableau instance. They probably should've named them that.

    GET: http://[MyTaleauServerURL]/api/3.0/sites/[MySiteID]/views

    As mentioned, we use the site ID from step 1 to construct this endpoint. In my particular instance, I've only saved a single workbook for simplicity's sake. The response for such a case is as follows:

    <?xml version='1.0' encoding='UTF-8'?>
    <tsResponse xmlns="http://tableau.com/api" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://tableau.com/api http://tableau.com/api/ts-api-3.0.xsd">
        <pagination pageNumber="1" pageSize="100" totalAvailable="1"/>
        <views>
            <view id="9a4a1de9-b7af-4a4a-8556-fd5ac82f92bd" name="Jira" contentUrl="JiraIssues/sheets/Jira" createdAt="2018-12-21T09:11:39Z" updatedAt="2018-12-21T09:11:39Z">
                <workbook id="208a0c4e-e1d9-4852-9d19-7a2fe2717191"/>
                <owner id="Uohiiyu-3455-8675-9b42-bugvdr876gv"/>
                <project id="4d1ca337-20b4-442c-aa7b-1dfd470b68bd"/>
                <tags/>
            </view>
        </views>
    </tsResponse>
    

    Check out the views node: when we make this API call, <views> will contain a list of every view saved to the specified site. Keep in mind that a view is equivalent to a "sheet" of a workbook: in almost any case, you will have many views listed here.

    My sheet happens to be called "Jira," as stated by name="Jira". The thing we really need however is the view id attribute: this will be used in our third and final API call.

    'Get View Data' Endpoint

    Now let's get the raw data from a view of our choice.

    GET: http://[MyTaleauServerURL]/api/3.0/sites/[MySiteID]/views/[MyViewID]/data
    

    Here's where we hit pay dirt. This request will result in an output of comma-separated values; I don't need to tell you what we can do with comma-separated values. Here's what my response looks like after formatting it as a table:

    Current Assignee Current Status Day of Updated epic_color epic_name Issue Type issuetype_color issuetype_url key Priority project summary
    Todd Birchard Done June 7, 2018 #42526E Widgets Bug #db5d5d https://hackers.nyc3.digitaloceanspaces.com/bug.png HACK-96 Lowest Hackers and Slackers "Recent Posts" widget does not have link rollover
    Todd Birchard Backlog June 15, 2018 #57D9A3 Page Templates Task #73B0E1 https://hackers.nyc3.digitaloceanspaces.com/task.png HACK-32 Lowest Hackers and Slackers “Join” page
    Todd Birchard Done November 13, 2018 #42526E Widgets Task #73B0E1 https://hackers.nyc3.digitaloceanspaces.com/task.png HACK-543 Medium Hackers and Slackers Add “pro tip” box
    Todd Birchard To Do December 14, 2018 #679EEF SEO Major Functionality #93d171 https://hackers.nyc3.digitaloceanspaces.com/story.png HACK-656 Low Hackers and Slackers Add alt attributes to images vis clarifai
    Todd Birchard Backlog October 16, 2018 #FDDA3E Accounts Major Functionality #93d171 https://hackers.nyc3.digitaloceanspaces.com/story.png HACK-473 Medium Hackers and Slackers Add avatar selection to signup
    Todd Birchard Done November 13, 2018 #57D9A3 Page Templates Sub-task #92BFE5 https://hackers.nyc3.digitaloceanspaces.com/subtask.png HACK-231 Medium Hackers and Slackers Add blurb to each post page explaining what these are
    Todd Birchard Done December 10, 2018 #291BA9 Code snippets Task #73B0E1 https://hackers.nyc3.digitaloceanspaces.com/task.png HACK-452 Medium Hackers and Slackers Add color styles for json snippets

    That's right, a table. Databases are comprised of tables. Perhaps you see where I'm going with this.

    "There's a Ninety-five Pound Chinese Man with a Hundred Sixty Million Dollars Behind this Door."

    Let's get him out.

    We've got the goods, but calling all these individual endpoints manually does nothing for us. We don't want to steal a single view, we want to systematically rob Tableau of it's views on a scheduler and Shanghai them off to a database of our choosing.

    It would be a crime not to automate this, so I've created a class containing all the relevant methods we'd want when it comes to interacting with Tableau's REST API:

    import requests
    import xml.etree.ElementTree as ET
    from . import r
    import pandas as pd
    import io
    
    
    class ExtractTableauView:
        """Class for with the Tableau server API."""
    
        __baseurl = r.get('baseurl')
        __username = r.get('username')
        __password = r.get('password')
        __database = r.get('uri')
        __contenturl = r.get('contenturl')
    
        @classmethod
        def get_view(cls, site, xml, view, token):
            """Extract contents of a single view."""
            headers = {'X-Tableau-Auth': token,
                       'Content-Type': 'text/csv'
                       }
            req = requests.get(cls.__baseurl + '/api/3.2/sites/' + str(site) +'/views/' + str(view) + '/data', headers=headers, stream=True)
            csv_text = req.text
            view_df = pd.read_csv(io.StringIO(csv_text), header=0)
            return view_df
    
        @classmethod
        def list_views(cls, site, xml, token):
            """List all views belonging to a Tableau Site."""
            headers = {'X-Tableau-Auth': token}
            req = requests.get(cls.__baseurl + '/api/3.2/sites/' + site + '/views', auth=(cls.__username, cls.__password), headers=headers)
            root = ET.fromstring(req.content)
            views_arr = []
            for child in root.iter('*'):
                if child.tag == '{http://tableau.com/api}views':
                    for view in child:
                        view_dict = {
                            'name': view.attrib.get('name'),
                            'id': view.attrib.get('id'),
                            'url': cls.__baseurl + '/' + view.attrib.get('contentUrl'),
                            'created': view.attrib.get('createdAt'),
                            'updated': view.attrib.get('updatedAt')
                        }
                        views_arr.append(view_dict)
            return views_arr
    
        @classmethod
        def get_token(cls, xml):
            """Receive Auth token to perform API requests."""
            for child in xml.iter('*'):
                if child.tag == '{http://tableau.com/api}credentials':
                    token = child.attrib.get('token')
                    return token
    
        @classmethod
        def get_site(cls, xml):
            """Retrieve ID of Tableau 'site' instance."""
            root = xml
            for child in root.iter('*'):
                if child.tag == '{http://tableau.com/api}site':
                    site = child.attrib.get('id')
                    return site
    
        @classmethod
        def initialize_tableau_request(cls):
            """Retrieve core XML for interacting with Tableau."""
            headers = {'Content-Type': 'application/xml'}
            body = '<tsRequest><credentials name="' + cls.__username + '" password="' + cls.__password + '" ><site contentUrl="' + cls.__contenturl + '" /></credentials></tsRequest>'
            req = requests.post(cls.__baseurl + '/api/3.2/auth/signin', auth=(cls.__username, cls.__password), headers=headers, data=body)
            root = ET.fromstring(req.content)
            return root
    

    The above snippet is a Python class utilizing all the API endpoints we explored in a mostly effortless manner. Instantiating the class immediately covers the grunt work of:

    • Generating a token
    • Getting your (unfriendly) site ID
    • Listing all views belonging to the provided site
    • Retrieving data from a worksheet of choice

    Get a list of views in your Tableau site by using the list_views() method. When you see the view you want, pass the view ID to the .get_view() method. This will result in response of all raw data in the view in the form of a CSV.

    How to Pull a Heist (Final Chapter): Storing in Offshore Accounts

    To earn your title as a true con artist, I'm leaving the final step up to you. You've escaped with the loot, but you'll need to put all that data somewhere. This should be a trivial matter of automating a simple database query, but the specifics are up to you.

    If you're ready to liberate your data, feel free to grab the source off of Github and go nuts.

    Todd Birchard's' avatar
    New York City Website
    Product manager turned engineer with an ongoing identity crisis. Breaks everything before learning best practices. Completely normal and emotionally stable.

    Product manager turned engineer with an ongoing identity crisis. Breaks everything before learning best practices. Completely normal and emotionally stable.