I try my best not to hate on Tableau. It was the software’s combination of power and ease-of-use that drove me to purchase a license in the first place. Ever since then, I’m finding new and exciting ways Tableau intentionally locks users out of their data.

I recently stumbled upon Tableau's officially supported Python library for Tableau Server, aptly named Tableau Server Client. Wishful thinking would lead one to believe that a developer-facing tool to work with data might provide a path to... retrieve one's own data. On the contrary, the Tableau Python SDK only doubles-down on my largest criticism of Tableau: there remains no straightforward way for users to fetch their data from their own instances.

Exploring Tableau Server Client

πŸ’‘
I quickly conclude in this section that `Tableau Server Client` is a useless library for our purposes. Demonstrating this intends to demonstrate that Tableau is intentionally making user-hostile decisions, ultimately leading us to a different approach. 

Setting up a client to connect to my Tableau Server proved to be easy, beginning with installing tableauserverclient:

pip install tableauserverclient
Install Tableau Server Client

The Python logic to connect to your Tableau Server instance to retrieve is a piece of cake. So far, so good:

"""Initialize Tableau Server Client."""
import tableauserverclient as TSC


tableau_auth = TSC.TableauAuth('username', 'password')
server = TSC.Server('https://servername')
Connect to your Tableau Server instance

A perfect example of this is the View object Tableau allows you to interact with on your server. Those familiar know that views are slang for sheets of workbooks stored on Tableau server.

"""Initialize Tableau Server Client."""
import tableauserverclient as TSC


tableau_auth = TSC.TableauAuth('username', 'password')
server = TSC.Server('https://servername')

# Fetch all views
with server.auth.sign_in(tableau_auth):
    all_views, pagination_item = server.views.get()
    print([view.name for view in all_views])
Fetch all views from your Tableau Server instance

This simple snippet lists every view object on your server. Wow! Think of what we can do with all that tabular data we worked so hard to transform, rig- WRONG. Look at what Tableau's Python 'View Object' actually contains:

  • id The identifier of the view item.
  • name The name of the view.
  • owner_id The id for the owner of the view.
  • preview_image The thumbnail image for the view.
  • total_views The usage statistics for the view. Indicates the total number of times the view has been accessed.
  • workbook_id The id of the workbook associated with the view.

Holy Moses, stop the presses! We can get retrieve our data in the form of a thumbnail image?! THANK YOU GENEROUS TABLEAU OVERLORDS!

Notice how there's no mention of, you know, the actual data.

We're going to play a game. In the wake of my time has been wasted, I feel that warm tickling feeling which seems to say "Viciously dismantle the ambitions of an establishment!" May I remind you, we're talking about the kind of establishment that bills customer licenses based on the number of CPUs being utilized by their server infrastructure. This is effectively recognizing the horrifying and inefficient codebase behind Tableau server, and leveraging this flaw for monetization. Yes, you're paying more money to incentivize worst practices.

Let's Make a Flask App... an Angry One

In our last post I shared a little script to help you get started stealing data off your own Tableau Server. That doesn't quite scratch my itch anymore. I'm going to build an interface. I want to make it easy as possible for anybody to systemically rob Tableau Server of every penny its got. That's a lot of pennies when we consider the equation: data = oil + new.

Before I bore you, here's a quick demo of the MVP we're building:

Demo of app for exporting views from Tableau Server
Demo of app for exporting views from Tableau Server

This POC demonstrates that it is very possible to automate the extraction of Tableau views from Tableau Server. The success message is signaling that we've successfully taken a Tableau view and created a corresponding table in an external database. Any data we manipulate in Tableau is now truly ours: we can now leverage the transforms we've applied in workbooks, use this data in other applications, and utilize an extract scheduler to keep the data coming. We've turned a BI tool into an ETL tool. In other words, you can kindly take those thumbnail previews and shove it.

I'll be open sourcing all of this, as is my civic duty. Let us be clear to enterprises: withholding freedom to one's own data is an act of war. Pricing models which reward poor craftsmanship are an insult to our intellect. For every arrogant atrocity committed against consumers, the war will wage twice as hard. I should probably mention these opinions are my own.

The Proletariat Strikes Back

Get a feel for where we're heading with the obligatory project-file-structure tree:

tableau_view_extractor
β”œβ”€β”€ application
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ database.py
β”‚   β”œβ”€β”€ tableau.py
β”‚   β”œβ”€β”€ routes.py
β”‚   β”œβ”€β”€ static
β”‚   β”‚   β”œβ”€β”€ data
β”‚   β”‚   β”‚   └── view.csv
β”‚   β”‚   β”œβ”€β”€ dist
β”‚   β”‚   β”‚   β”œβ”€β”€ all.css
β”‚   β”‚   β”‚   β”œβ”€β”€ packed.js
β”‚   β”‚   β”œβ”€β”€ img
β”‚   β”‚   β”‚   └── tableaugithub.jpg
β”‚   β”‚   β”œβ”€β”€ js
β”‚   β”‚   β”‚   └── main.js
β”‚   β”‚   └── scss
β”‚   β”‚       └── main.scss
β”‚   └── templates
β”‚       β”œβ”€β”€ export.jinja2
β”‚       β”œβ”€β”€ index.jinja2
β”‚       β”œβ”€β”€ layout.jinja2
β”‚       └── view.jinja2
β”œβ”€β”€ config.ini
β”œβ”€β”€ config.py
β”œβ”€β”€ app.yaml
β”œβ”€β”€ start.sh
β”œβ”€β”€ wsgi.py
β”œβ”€β”€ Pipfile
β”œβ”€β”€ README.md
└── requirements.txt
Project Structure

As usual, we're using a classic Flask application factory set up here.

Weapons Of Choice

Let's have a look at our core arsenal:

  • requests: We're achieving our goal by exploiting some loopholes exposed in the Tableau REST API.
  • pandas: Will handle everything from extracting comma-separated data into a CSV, render HTML tables, and output SQL.
  • flask_sqlalchemy: Used in tandem with pandas to handle shipping our data off elsewhere.
  • flask_redis: To handle session variables.

Initiating our Application

Here's how we construct our app:

"""Initialize application."""
from flask import Flask
from flask_redis import FlaskRedis
from flask_sqlalchemy import SQLAlchemy
from log import LOGGER

# Set global entities
db = SQLAlchemy()
r = FlaskRedis()


def create_app():
    """Initialize Flask app UI."""
    app = Flask(__name__, instance_relative_config=False)
    app.config.from_object("config.Config")

    with app.app_context():
        # Initiate globals
        try:
            db.init_app(app)
            r.init_app(app)

            # Set global contexts
            r.set("uri", app.config["SQLALCHEMY_DATABASE_URI"])
            r.set("baseurl", app.config["REDIS_HOST"])
            r.set("username", app.config["REDIS_USERNAME"])
            r.set("password", app.config["REDIS_PASSWORD"])

            # Import our modules
            from . import routes

            app.register_blueprint(routes.home_blueprint)

            LOGGER.success(f"Application started: {app}")
            return app
        except Exception as e:
            LOGGER.success(f"Application failed to start: {e}")
Initialize Flask Application

This should all feel like business-as-usual. The core of our application is split between routes.py, which handles views, and tableau.py, which handles the anti-establishment logic. Let's begin with the latter.

Life, Liberty, and The Pursuit of Sick Data Pipelines

Our good friend tableau.py might look familiar to those who joined us last time. tableau.py has been busy hitting the gym since then and is looking sharp for primetime:

"""Tableau Server Client."""
import io
import xml.etree.ElementTree as ET

import pandas as pd
import requests

from . import r


class ExtractTableauView:
    """Class for working in a Tableau instance."""

    __baseurl = r.get("baseurl")
    __username = r.get("username")
    __password = r.get("password")
    __database = r.get("uri")
    __contenturl = r.get("contenturl")

    @classmethod
    def get_view(cls, site, xml, view, token):
        """Extract contents of a single view."""
        headers = {"X-Tableau-Auth": token, "Content-Type": "text/csv"}
        req = requests.get(
            cls.__baseurl
            + "/api/3.2/sites/"
            + str(site)
            + "/views/"
            + str(view)
            + "/data",
            headers=headers,
            stream=True,
        )
        csv_text = req.text
        view_df = pd.read_csv(io.StringIO(csv_text), header=0)
        return view_df

    @classmethod
    def list_sites(cls, token):
        """Get all sites belonging to a Tableau Instance."""
        headers = {"X-Tableau-Auth": token}
        req = requests.get(cls.__baseurl + "/api/3.2/sites", headers=headers)
        root = ET.fromstring(req.content)
        sites_arr = []
        for child in root.iter("*"):
            if child.tag == "{http://tableau.com/api}sites":
                for site in child:
                    site_dict = {
                        "name": site.attrib.get("name"),
                        "id": site.attrib.get("id"),
                        "contentURL": site.attrib.get("contentUrl"),
                        "state": site.attrib.get("state"),
                    }
                    sites_arr.append(site_dict)
        return sites_arr

    @classmethod
    def list_views(cls, site, xml, token):
        """List all views belonging to a Tableau Site."""
        headers = {"X-Tableau-Auth": token}
        req = requests.get(
            cls.__baseurl + "/api/3.2/sites/" + site + "/views",
            auth=(cls.__username, cls.__password),
            headers=headers,
        )
        root = ET.fromstring(req.content)
        views_arr = []
        for child in root.iter("*"):
            if child.tag == "{http://tableau.com/api}views":
                for view in child:
                    view_dict = {
                        "name": view.attrib.get("name"),
                        "id": view.attrib.get("id"),
                        "url": cls.__baseurl
                        + "/"
                        + view.attrib.get("contentUrl"),
                        "created": view.attrib.get("createdAt"),
                        "updated": view.attrib.get("updatedAt"),
                    }
                    views_arr.append(view_dict)
        return views_arr

    @classmethod
    def get_token(cls, xml):
        """Receive Auth token to perform API requests."""
        for child in xml.iter("*"):
            if child.tag == "{http://tableau.com/api}credentials":
                token = child.attrib.get("token")
                return token

    @classmethod
    def get_site(cls, xml, key):
        """Retrieve ID of Tableau 'site' instance."""
        root = xml
        for child in root.iter("*"):
            if child.tag == "{http://tableau.com/api}site":
                site = child.attrib.get(key)
                return site

    @classmethod
    def initialize_tableau_request(cls):
        """Retrieve core XML for interacting with Tableau."""
        headers = {"Content-Type": "application/xml"}
        body = (
            '<tsRequest><credentials name="'
            + cls.__username
            + '" password="'
            + cls.__password
            + '" ><site contentUrl="'
            + cls.__contenturl
            + '" /></credentials></tsRequest>'
        )
        req = requests.post(
            cls.__baseurl + "/api/3.2/auth/signin",
            auth=(cls.__username, cls.__password),
            headers=headers,
            data=body,
        )
        root = ET.fromstring(req.content)
        return root
Class to interact with Tableau Server via Rest API

I wish I could take full credit for what a shit show this class appears to be at first glance, but I assure you we've been left with no choice. For example: have I mentioned that Tableau's REST API returns XML so malformed that it breaks XML parsers? I can't tell incompetence from malicious intent at this point.

Here's a method breakdown of our class:

  • initialize_tableau_request(): Handles initial auth and returns valuable information such as site ID and API Token to be used thereafter.
  • get_site(): Extracts the site ID from XML returned by the above.
  • get_token(): Similarly extracts our token.
  • list_views(): Compiles a list of all views within a Tableau site, giving us a chance to select ones for extraction.
  • get_view(): Takes a view of our choice and creates a DataFrame, which is to be shipped off to a foreign database.

Our Routing Logic

Moving on we have routes.py building the views and associated logic for our app:

"""Set Flask Routes."""
import pandas as pd
from flask import Blueprint, Markup
from flask import current_app as app
from flask import render_template, request
from flask_assets import Bundle, Environment

from . import database, tableau


home_blueprint = Blueprint(
    "home",
    __name__,
    template_folder="templates",
    static_folder="static"
)

assets = Environment(app)
js = Bundle("js/*.js", filters="jsmin", output="dist/packed.js")
scss = Bundle("scss/*.scss", filters="libsass", output="dist/all.css")
assets.register("scss_all", scss)
assets.register("js_all", js)
scss.build(force=True, disable_cache=True)
js.build(force=True, disable_cache=True)


@home_blueprint.route("/nav.jinja2", methods=["GET"])
def nav():
    """Build nav before every template render."""
    tableau_view_extractor = tableau.ExtractTableauView()
    xml = tableau_view_extractor.initialize_tableau_request()
    token = tableau_view_extractor.get_token(xml)
    all_sites = tableau_view_extractor.list_sites(token)
    return render_template("nav.jinja2", all_sites=all_sites)


@home_blueprint.route("/", methods=["GET", "POST"])
def entry():
    """Homepage which lists all available views."""
    tableau_view_extractor = tableau.ExtractTableauView()
    xml = tableau_view_extractor.initialize_tableau_request()
    token = tableau_view_extractor.get_token(xml)
    site_id = tableau_view_extractor.get_site(xml, "id")
    site_name = tableau_view_extractor.get_site(xml, "contentUrl")
    views = tableau_view_extractor.list_views(site_id, xml, token)
    all_sites = tableau_view_extractor.list_sites(token)
    site = tableau_view_extractor.get_site(xml)
    return render_template(
        "index.jinja2",
        title="Here are your views.",
        template="home-template",
        views=views,
        token=token,
        xml=xml,
        site_name=site_name,
        site=site,
        all_sites=all_sites,
    )


@home_blueprint.route("/view", methods=["GET", "POST"])
def view():
    """Display a preview of a selected view."""
    site = request.args.get("site")
    xml = request.args.get("xml")
    view = request.args.get("view")
    token = request.args.get("token")
    tableau_view_extractor = tableau.ExtractTableauView()
    view_df = tableau_view_extractor.get_view(site, xml, view, token)
    view_df.to_csv("application/static/data/view.csv")
    return render_template(
        "view.jinja2",
        title="Your View",
        template="home-template",
        view=view,
        token=token,
        xml=xml,
        site=site,
        view_df=Markup(view_df.to_html(index=False)),
    )


@home_blueprint.route("/export", methods=["GET", "POST"])
def export():
    """Export view to external database."""
    view_df = pd.read_csv("application/static/data/view.csv")
    view_df.to_sql(
        name="temp",
        con=database.engine,
        if_exists="replace",
        chunksize=50,
        index=True,
    )
    return render_template(
        "export.jinja2",
        title="Success!",
        template="success-template",
    )
Flask routes providing a UI to extract Tableau data

We only have 3 pages to our application. They include our list of views, a preview of a single view, and a success page for when said view is exported. This is all core Flask logic.

Putting it On Display

We build our pages dynamically based on the values we pass our Jinja templates. The homepage utilizes some nested loops to list the views we returned from tableau.py, and also makes use of query strings to pass values on to other templates.

{% extends "layout.jinja2" %}

{% block content %}
<div class="extended-container {{template}}">
  <div class="container">
    <div class="row">
      <div class="col s12 flex-container">
        {% for view in views %}
        <div class="download">
          <a href="{{ url_for('home.view') }}?token={{token}}&site={{site}}&view={{view.id}}&xml={{xml}}">
            <ul>
              {% for key, value in view.items() %}
              <li class="view-list-item">
                <span class="key {{key}}">
                  {{key}}
                 </span>
                {{ value }}
              </li>
              {% endfor %}
            </ul>
          </a>
        </div>
        {% endfor %}
      </div>
    </div>
  </div>
{% endblock %}
Homepage Jinja Template

Moving on: our humble view.jinja2 page has two purposes: display the selected view, and export it in the name of justice.

{% extends "layout.jinja2" %}

{% block content %}
<div class="extended-container {{template}}">
  <div class="container">
    <div class="row">
      <div class="col s12">
        <h1>{{title}}</h1>
        <a href="{{ url_for('home.export') }}" class="export">
          <i class="far fa-file-export"></i>
        </a>
        {{view_df}}
      </div>
    </div>
  </div>
{% endblock %}
Template to construct a Tableau view we've extracted

The War is Not Over

This repository is open to the public and can be found here:

GitHub - toddbirchard/tableau-extraction: πŸ“ˆβž‘οΈπŸ’Ύ A Flask application which extends Tableau to be used as an ETL tool.
πŸ“ˆβž‘οΈπŸ’Ύ A Flask application which extends Tableau to be used as an ETL tool. - GitHub - toddbirchard/tableau-extraction: πŸ“ˆβž‘οΈπŸ’Ύ A Flask application which extends Tableau to be used as an ETL tool.

There are still crusades left ahead of us: for instance, building out this interface to accept credentials via login as opposed to a config file, and the scheduling of view exports, as opposed to on-demand.

Where we go from here depends on what we the people decide. For all I know, I could be shouting to an empty room here (I'm almost positive anybody who pays for enterprise software prefers the blind eye of denial). If the opposite holds true, I dare say the revolution is only getting started.