There's nothing I love more than exposing expensive enterprise software.
It may not seem obvious, but most SaaS products have an underlying core goal: shackle businesses to depend on proprietary, closed-source, costly software. When you pair a surplus of money with a reluctance to work, you've arrived at Corporate America: a prime victim yearning to marry itself to any vendor with a nice pitch deck and a vague promise.
In the case of Tableau, this becomes obvious when you attempt to do anything besides create visuals. I don't like spending hours of my time cleaning data to be rewarded with a shitty iframe embed: I want my data. As we've already seen by exposing Tableau's hidden Superadmin access, it's pretty clear Tableau doesn't want you to do this.
I realize Tableau is a BI tool, and some might argue we're barking up the wrong tree, and all data should be clean before reaching Tableau. My sentiment is this: fuck that. If a single license costs one thousand dollars, and we have the power to manipulate data faster as we visualize it, we should at least be able to own that data: and by "own," I don't mean a CSV export. I want it in my own database of choice, not a locked down and hidden Postgres database living on a VPS filled with Tableau stuff.
Here's how we'd do that.
"You Expect us to Just Walk out the Casino with Millions of Dollars on us?"
You're here because you're the best of the best. If you're feeling scared, feel free to back out now.
This tutorial assumes you have a Tableau Server instance, with a workbook published to a site within said instance. We're going to take a page out of that workbook and turn the raw data into a database table. FAIR WARNING: We're about to dive deep into the obscure world of the Tableau Server REST API. It's clunky, it's ugly, and it returns XML. Strap yourself in.
We're going to be working with 3 core endpoints. Let's walk through them, and I'll show you how to exploit said endpoints to create a ruthless data mining machine in Python.
'Tableau Authorization' Endpoint
Like all obnoxious (aka useful) APIs, we need to authorize each API call with a temporary token. Of course, we'll just have Python generate said token for every call we make.
Hitting this endpoint successfully will result in an XML response (ugh). The response should look something like this:
There are a number of things going on here that we should take note of. The first being a marvel of modern technology: this is perhaps the shittiest response to a token API call in modern history. Other than that, we need two things from this response:
- The token is required for every API call from here on out. It is intended to be passed as a header value with the key
- The site ID is what we'll be using to look up the location of our workbooks in our server instance. This is added to the URL of future API calls (again, impressively shitty design here).
'List All Views by Site' Endpoint
There are actually a number of methods we could use to retrieve views, but we're specifically settling on listing our views by 'site,' in the Tableau sense of the word. If you're unfamiliar, a Tableau site is not a site at all: it's more of project within a greater Tableau instance. They probably should've named them that.
As mentioned, we use the site ID from step 1 to construct this endpoint. In my particular instance, I've only saved a single workbook for simplicity's sake. The response for such a case is as follows:
Check out the views node: when we make this API call,
<views> will contain a list of every view saved to the specified site. Keep in mind that a view is equivalent to a "sheet" of a workbook: in almost any case, you will have many views listed here.
My sheet happens to be called "Jira," as stated by
name="Jira". The thing we really need however is the view id attribute: this will be used in our third and final API call.
'Get View Data' Endpoint
Now let's get the raw data from a view of our choice.
Here's where we hit pay dirt. This request will result in an output of comma-separated values; I don't need to tell you what we can do with comma-separated values. Here's what my response looks like after formatting it as a table:
|Current Assignee||Current Status||Day of Updated||epic_color||epic_name||Issue Type||issuetype_color||issuetype_url||key||Priority||project||summary|
|Todd Birchard||Done||June 7, 2018||#42526E||Widgets||Bug||#db5d5d||https://hackers.nyc3.digitaloceanspaces.com/bug.png||HACK-96||Lowest||Hackers and Slackers||"Recent Posts" widget does not have link rollover|
|Todd Birchard||Backlog||June 15, 2018||#57D9A3||Page Templates||Task||#73B0E1||https://hackers.nyc3.digitaloceanspaces.com/task.png||HACK-32||Lowest||Hackers and Slackers||“Join” page|
|Todd Birchard||Done||November 13, 2018||#42526E||Widgets||Task||#73B0E1||https://hackers.nyc3.digitaloceanspaces.com/task.png||HACK-543||Medium||Hackers and Slackers||Add “pro tip” box|
|Todd Birchard||To Do||December 14, 2018||#679EEF||SEO||Major Functionality||#93d171||https://hackers.nyc3.digitaloceanspaces.com/story.png||HACK-656||Low||Hackers and Slackers||Add alt attributes to images vis clarifai|
|Todd Birchard||Backlog||October 16, 2018||#FDDA3E||Accounts||Major Functionality||#93d171||https://hackers.nyc3.digitaloceanspaces.com/story.png||HACK-473||Medium||Hackers and Slackers||Add avatar selection to signup|
|Todd Birchard||Done||November 13, 2018||#57D9A3||Page Templates||Sub-task||#92BFE5||https://hackers.nyc3.digitaloceanspaces.com/subtask.png||HACK-231||Medium||Hackers and Slackers||Add blurb to each post page explaining what these are|
|Todd Birchard||Done||December 10, 2018||#291BA9||Code snippets||Task||#73B0E1||https://hackers.nyc3.digitaloceanspaces.com/task.png||HACK-452||Medium||Hackers and Slackers||Add color styles for json snippets|
That's right, a table. Databases are comprised of tables. Perhaps you see where I'm going with this.
"There's a Ninety-five Pound Chinese Man with a Hundred Sixty Million Dollars Behind this Door."
We've got the goods, but calling all these individual endpoints manually does nothing for us. We don't want to steal a single view, we want to systematically rob Tableau of its views on a scheduler and Shanghai them off to a database of our choosing.
It would be a crime not to automate this, so I've created a class containing all the relevant methods we'd want when it comes to interacting with Tableau's REST API:
The above snippet is a Python class utilizing all the API endpoints we explored in a mostly effortless manner. Instantiating the class immediately covers the grunt work of:
- Generating a token
- Getting your (unfriendly) site ID
- Listing all views belonging to the provided site
- Retrieving data from a worksheet of choice
Get a list of views in your Tableau site by using the
list_views() method. When you see the view you want, pass the view ID to the
.get_view() method. This will result in response of all raw data in the view in the form of a CSV.
How to Pull a Heist (Final Chapter): Storing in Offshore Accounts
To earn your title as a true con artist, I'm leaving the final step up to you. You've escaped with the loot, but you'll need to put all that data somewhere. This should be a trivial matter of automating a simple database query, but the specifics are up to you.
If you're ready to liberate your data, feel free to grab the source off of Github and go nuts: