Data Could Save Humanity if it Weren't for Humanity

A compelling case for robot overlords.

A decade has passed since I stumbled into technical product development. Looking back, I've spent that time almost exclusively in the niche of data-driven products and engineering. While it seems obvious now, I realized in the 2000s that you could generally create two types of product: you could either build a (likely uninspired) UI for existing data, or you could build products which produced new data or interpreted existing data in a new useful way. Betting on the latter seemed like an obvious choice. The late 2000’s felt like building apps for the sake of apps most of the time. Even today Product Hint is littered with weather apps and rehashed tools, solving problems so insignificant that they almost seem satirical.

Years passed, thus our data-centric tools evolved to fit cultural mind shifts in businesses which speculated on how these tools could be used. This began to build a clear yet slowly-growing narrative about how enterprises consider data analysis in their org structures. Unfortunately, I can't say that much about that shift has been positive. There are a number of major problems I believe we need to address:

  • SaaS is created with the goal of selling the product to enterprises. While humanity's understanding of data science reach unprecedented territory, we choose to perfect the sales pitch while neglecting education on these tools.
  • As an atrocity to science, individual actors commonly cherry pick information to confirm conclusions for personal benefit, without checks and balances.
  • Data which contradicts knee jerk assumptions made by executives are sometimes taken as personal threats or attacks.
  • Most importantly, data professionals are horribly siloed. Analysts, scientists, and engineers waste far too much time drawing lines between roles: I find it absolutely absurd to unanimously agree that tool X is for BI while tool Y is for data cleanup. Considering we all know these tools are running stacks on Python, R, SQL, etc, there is no reason to succumb to the limitations of proprietary software (such as Tableau). We've turned a blind eye to the possibility of 'data as a service': a chance to overlap responsibilities by building a better tool to reduce friction, as opposed to increase it in the interest of selling more software.

While we might all agree that collective 'data addiction' is reaching a peak, most of us barely know what we mean by that. We conceptually understand that data is important, but our imaginations on how to utilize this power effectively leaves a lot to be desired. IBM Watson probably had profound capabilities, but its failure lies with the humans tasked to make this technology relevant and useful for humankind.

The Analytics Honeymoon

As I imagine most in the Product Management professionals do, I originally considered  data analysis to come in the form of web and app analytics. This was a one-dimensional era; consumer-facing data served the sole purpose of optimizing sales and ad revenue, and there were much fewer choices of Enterprise-level tools to fall in love with. While the cheaper tools were just fine, corporate America had already fallen in love with a fickle mistress known as Omniture.

Omniture was in fact in many ways the superior product on the market. As I'm sure Sales reps explained in those years, Omniture allowed for a vast level of event tracking customization which was otherwise rare at the time: with the proper logic, effort, and willingness, executives could theoretically identify granular issues in their product's conversation flow: issues which came attached with cold hard facts in the form of numbers.

Thus, a game of numbers it was: in order to receive the level of granular detail executives wanted, there came a nominal fee. Well, many fees in fact: the product out-priced competitors tenfold upfront for the license itself. Since you just agreed to spend that much money on proprietary software, it only makes sense at that point that you should then hire a certified affiliated consultant to implement the custom reports, and then of course pay the lifelong upkeep that comes with tracking events in ever-changing software. Despite all of these costs, companies consistently moved forward with the choice under the realization that the money saved from this data could far outweigh the cost.

So what happened when we actually collected all that data?

If You Could Get That Analytics Report, That Would be Great.

Enterprises and data analysis were made for each other... but in a way that most closely resembles a cliché romcom starring Julia Roberts. This romance follows the beat of a metronome: while executives begin to grasp the impact of data-based decisions, the commitment to actually acknowledge the abundance of this information has its own lifespan. A/B testing, conversation funnels, segmentation, etc: while the foreplay of implementing these buzzwords rolls on for a number of weeks, it gives powerful figures time to reflect on one thing in particular: they've owed their success to a lack of quantifiable accountability, and numbers are right around the corner. While this might not be a conscious act, it is an entirely real phenomenon.

I've traditionally been a product manager, yet during that period of my career, I've found myself nominated to be Gatekeeper of Digital and Financial Data... for whatever reason (it's worth reiterating that I am not nor ever have been an analyst). The phenomenon followed a pattern. Given their expectations, executives reach their nerves end when the budget they allocated for enterprise-level sotftware is still under configuration, and has produced no results. There's a reason why patience is a virtue isn't a phrase you see in many sales pitches: we want what we're being sold, and we want it now.

Thats where I'd typically come in. As a product manager, analytics is a valuable weapon, so unspeakable amounts of unsolicited data thrown into my lap  seemed like ammunition for change. When a company's problems are become as large as they are obvious, some numerical correlations are nearly common sense..

Cue the dashboards, custom reports, event tracking, you name it. Often times executives would set aside a weekly cadence to review the expensive conclusions our software could finally produce. The weekly email newsletters I would produce would be met with a euphoric chain of satisfied stakeholders, time and time again. Finally it seemed, the conclusions were clear and our problems were quantifiable. And yet, nothing seemed to change.

Human Insecurities Versus World Problems

As a data enthusiast, I did what we all would've done: I placed analytics tracking on our analytics reports themselves. "Great stuff, groundbreaking work here" said one CEO, who I'd seen had not bothered to click the link provided. At a certain point, I began attaching empty Excel spreadsheets and posting dead links as the content of our beloved reports. Those too were 'groundbreaking', apparently.

This is far from an isolated phenomenon in technology. Meeting after meeting, client after client, I took front-row seats to blatant dismissal of numerical evidence in favor of  ego-driven decisions. Test A would prove to yield 30% higher conversion rates than Test B, but Test B would prevail thanks to the subjective emotional opinions of talking heads. In retrospect, I can see now how a grown adult with a household would find the sudden introduction of facts threatening. We have have imposter syndrome, and the twenty-something year old analyst attempting to improve a company will almost always lose to an adult protecting a family.

Consider a recent example uncovered by mistake. While auditing usage for a widely-know project management tool, something seemed off about our volume of usage with a product costing us unspeakable dollars. Our department had mostly been tasked  to upkeep this 'critical' internal system at all costs. As it turns out, over 80% of all activity had been out own internal upkeep. That's millions of dollars invested in something never used over the course of several years, all for the purpose of upholding a guise of value-add.

BI, Data Science, and the Choice to Make That Distinction

I've spent the last several months working deep under the hood attempting to dismantle our undisputed BI overlords over at Tableau. Fair warning: I'm about to rip in to a Tableau tangent here, but I promise there's a point.

I was introduced to Tableau as a tool to fill a niche: quick analysis and one-off extracts on tight timelines. The type of timelines where digging into Pandas and potentially entering a .bash_profile hell with Anaconda simply wasn’t an option. I was pleased with its ability to serve this purpose- such that it sparked a spontaneous 1 thousand purchase for a personal license. Tableau Desktop, Tableau Prep, and Tableau server; a decision I’ll likely regret for the rest of my life.

From my naïve perspective it seemed logical that Tableau could help assist in the data cleanup and automation I had been handling in scripts previously. This could not be more incorrect. Even with full access to my own Tableau instance, it is clear that Tableau has one motive only: to show you your data, and ensure you don't take it elsewhere. Consider this:

Check out the worksheets and and dashboards you've published to Server. Considering these are equivalent to simple database views, you'd expect the API calls to be exposed in your dev tools... why not? They aren't.

Tableau runs on a Postgres database on your personal server. However, no mention of "postgres" or anything of the sort is searchable to a useful degree. There is a highly protected Tableau superadmin account which has controls to all tables and views in this server, but most research will point users to unlock the "readonly" user which is essentially a red herring account, or perhaps useful if you're spying on your employee's actions.

And then we have the Tableau server API. Ah, what a gift it would be to query those views we created, running on scheduled extracts, so that we might build something from this information. As it turns out, Tableau's REST API does little more than reveal meta data about files you already knew about. Just in case you were wondering the date it was created, for some weird useless reason.

I'm not just picking on Tableau here (although I'll continue my series about hacking them soon enough). This has exposed a massive dichotomy in the way we see and treat data as a profession, or rather, a series of professions. between those who look at data, and those who manipulate, iterate one, and create things with Data. Nobody has ever expressed this realization to me, and many of you likely still don't see what the big deal is. That, to me, is the big deal.

Data should be a passion to those looking to improve humanity, without a doubt. If we know personalities are wining the battles against numbers, and feel numb to the fact that our proprietary tools prevent us from using data effectively, there's something to be said about the complacency of humanity as we commit to consumption over production.

Company attitudes towards data are one thing, but individuals are an entirely different story. That's a long-winded post for another time.

Author image
New York City Website
Product manager turned engineer with an ongoing identity crisis. Breaks everything before learning best practices. Completely normal and emotionally stable.

Product manager turned engineer with an ongoing identity crisis. Breaks everything before learning best practices. Completely normal and emotionally stable.