I try my best not to hate on Tableau. It was the software’s combination of power and ease-of-use that drove me to purchase a license in the first place. Ever since then, I’m finding new and exciting ways Tableau intentionally locks users out of their data.
I recently stumbled upon Tableau's officially supported Python library for Tableau Server, aptly named Tableau Server Client. Wishful thinking would lead one to believe that a developer-facing tool to work with data might provide a path to... retrieve one's own data. On the contrary, the Tableau Python SDK only doubles-down on my largest criticism of Tableau: there remains no straightforward way for users to fetch their data from their own instances.
Exploring Tableau Server Client
I quickly conclude in this section that `Tableau Server Client` is a useless library for our purposes. Demonstrating this intends to demonstrate that Tableau is intentionally making user-hostile decisions, ultimately leading us to a different approach.
Setting up a client to connect to my Tableau Server proved to be easy, beginning with installing tableauserverclient:
The Python logic to connect to your Tableau Server instance to retrieve is a piece of cake. So far, so good:
A perfect example of this is the View object Tableau allows you to interact with on your server. Those familiar know that views are slang for sheets of workbooks stored on Tableau server.
This simple snippet lists every view object on your server. Wow! Think of what we can do with all that tabular data we worked so hard to transform, rig- WRONG. Look at what Tableau's Python 'View Object' actually contains:
id The identifier of the view item.
name The name of the view.
owner_id The id for the owner of the view.
preview_image The thumbnail image for the view.
total_views The usage statistics for the view. Indicates the total number of times the view has been accessed.
workbook_id The id of the workbook associated with the view.
Holy Moses, stop the presses! We can get retrieve our data in the form of a thumbnail image?! THANK YOU GENEROUS TABLEAU OVERLORDS!
Notice how there's no mention of, you know, the actual data.
We're going to play a game. In the wake of my time has been wasted, I feel that warm tickling feeling which seems to say "Viciously dismantle the ambitions of an establishment!" May I remind you, we're talking about the kind of establishment that bills customer licenses based on the number of CPUs being utilized by their server infrastructure. This is effectively recognizing the horrifying and inefficient codebase behind Tableau server, and leveraging this flaw for monetization. Yes, you're paying more money to incentivize worst practices.
Let's Make a Flask App... an Angry One
In our last post I shared a little script to help you get started stealing data off your own Tableau Server. That doesn't quite scratch my itch anymore. I'm going to build an interface. I want to make it easy as possible for anybody to systemically rob Tableau Server of every penny its got. That's a lot of pennies when we consider the equation: data = oil + new.
Before I bore you, here's a quick demo of the MVP we're building:
This POC demonstrates that it is very possible to automate the extraction of Tableau views from Tableau Server. The success message is signaling that we've successfully taken a Tableau view and created a corresponding table in an external database. Any data we manipulate in Tableau is now truly ours: we can now leverage the transforms we've applied in workbooks, use this data in other applications, and utilize an extract scheduler to keep the data coming. We've turned a BI tool into an ETL tool. In other words, you can kindly take those thumbnail previews and shove it.
I'll be open sourcing all of this, as is my civic duty. Let us be clear to enterprises: withholding freedom to one's own data is an act of war. Pricing models which reward poor craftsmanship are an insult to our intellect. For every arrogant atrocity committed against consumers, the war will wage twice as hard. I should probably mention these opinions are my own.
The Proletariat Strikes Back
Get a feel for where we're heading with the obligatory project-file-structure tree:
As usual, we're using a classic Flask application factory set up here.
Weapons Of Choice
Let's have a look at our core arsenal:
requests: We're achieving our goal by exploiting some loopholes exposed in the Tableau REST API.
pandas: Will handle everything from extracting comma-separated data into a CSV, render HTML tables, and output SQL.
flask_sqlalchemy: Used in tandem with pandas to handle shipping our data off elsewhere.
flask_redis: To handle session variables.
Initiating our Application
Here's how we construct our app:
This should all feel like business-as-usual. The core of our application is split between routes.py, which handles views, and tableau.py, which handles the anti-establishment logic. Let's begin with the latter.
Life, Liberty, and The Pursuit of Sick Data Pipelines
Our good friend tableau.py might look familiar to those who joined us last time. tableau.py has been busy hitting the gym since then and is looking sharp for primetime:
I wish I could take full credit for what a shit show this class appears to be at first glance, but I assure you we've been left with no choice. For example: have I mentioned that Tableau's REST API returns XML so malformed that it breaks XML parsers? I can't tell incompetence from malicious intent at this point.
Here's a method breakdown of our class:
initialize_tableau_request(): Handles initial auth and returns valuable information such as site ID and API Token to be used thereafter.
get_site(): Extracts the site ID from XML returned by the above.
get_token(): Similarly extracts our token.
list_views(): Compiles a list of all views within a Tableau site, giving us a chance to select ones for extraction.
get_view(): Takes a view of our choice and creates a DataFrame, which is to be shipped off to a foreign database.
Our Routing Logic
Moving on we have routes.py building the views and associated logic for our app:
We only have 3 pages to our application. They include our list of views, a preview of a single view, and a success page for when said view is exported. This is all core Flask logic.
Putting it On Display
We build our pages dynamically based on the values we pass our Jinja templates. The homepage utilizes some nested loops to list the views we returned from tableau.py, and also makes use of query strings to pass values on to other templates.
Moving on: our humble view.jinja2 page has two purposes: display the selected view, and export it in the name of justice.
The War is Not Over
This repository is open to the public and can be found here:
There are still crusades left ahead of us: for instance, building out this interface to accept credentials via login as opposed to a config file, and the scheduling of view exports, as opposed to on-demand.
Where we go from here depends on what we the people decide. For all I know, I could be shouting to an empty room here (I'm almost positive anybody who pays for enterprise software prefers the blind eye of denial). If the opposite holds true, I dare say the revolution is only getting started.