Doing the Data Dance

Adventures in Excel

If you've been reading along, over the last several posts you've learned the two major skills that any self-respecting Excel jockey counts as their go-tos: the ability to lookup (remember, I'm partial to index-match, but if you learned VH lookup, ride that until you crash your system) and the ability to pivot.

Now here's something really interesting: until we pierce the veil on doing some VB macros, by truly mastering just those two skills, you can handle basically any job that Excel can actually handle, anything more advanced is just a matter of doing it faster (and faster, and faster). However, I've neglected to discuss what types of things you'll actually encounter out in the wild that requires some fancy Excel footwork. Well, that's what this post is all about.

The first question: what is Excel actually for. Truth be told, if you separate the VB elements and the pivot tables from Excel, it was really just a way for accountants to make small changes to tax forms and have them reflected later in the form (something that had to be done by hand prior to the invention of Excel's progenitor...which had to have been horrible). With that in mind, one needs to understand that everything else that Excel can do is built on top of this relatively rudimentary foundation, and interestingly enough, the vast majority of software functions in the same way: for example, the reason why the last dozen or so Bethesda games all have similar glitches and gameplay is because they're all built on a nearly decade old engine, and anyone who works at a company that managed a database for a lengthy period of time knows that it's much easier to build on top of the garbage than to clean anything up (that's also why there's all that cut content that still exists in games when you used a game genie as a kid...it's too difficult to take that stuff out without harming the juicy center!).

The pros of this model is that because Excel was made to handle accounting work, it handles relatively small datasets (and "small" here is under a million rows...I've managed to push excel to about 250k rows without it exploding...but not much more than that) very quickly, and in a very user friendly and visual format. In other words, Your boss uses Excel because it's pretty easy to understand what's going on (and you can see the data). As such, you can think of Excel as the "Lingua Franca" (the language that two dignitaries of separate nations speak to conduct diplomacy...or perhaps, the Basic to your Star Wars Universe) of the data world. To put this as simply as possible, unless you're working for a company where EVERYONE has a background in computer/data science, you're going to need to figure out a way to get your data into an Excel friendly format somehow. The reason for this may have to do with self selection, people who like getting into code go into heavy data, people like me who like to fix problems tend to be intermediaries, and the people who actually make money for our companies (or at least are able to convince others that money is made)...are business people.

Now brace yourself, because this sounds completely insane: the daily workflow of a Business Analyst (or a BA) at many companies is taking data that already exists in one form on one platform (generally either tableau, a business intelligence program, or some sort of database), exporting it to an Excel friendly format, manipulating the data so it makes sense, drawing some sort of salient conclusion from it (usually using a lookup, a pivot, or both), presenting it to management, and then putting it back into a format that the heavy data people can use to start this engine all over again.

The funny part is, the actual mechanics of doing this work isn't hard, if everyone is on the same page, you could probably take care of this whole workflow in maybe half a day. Alas, blogs like this wouldn't need to be written if mechanics always worked the way they should, and another golden rule is that multiple departments are never on the same page. In other words, the data you get is going to be "dirty" and next time, I'm going to show you some common dirty data dilemmas, and how to polish it up with nothing but a few = signs, a couple of commands, a pivot table or two, and a shitload of ingenuity.

But as always, a summary. In this post, we learned:

  • What Excel is good at
  • What Excel perhaps isn't so good at
  • Why you still need to deal with Excel
  • The day to day of someone who deals in both the data space and the management space...and
  • That troubleshooting this step is what ultimately earns you your paycheck.

Finally, a quick aside to placate my fellow authors and friends on this fine blog, your data being dirty on my end doesn't mean you did anything wrong, it's that bouncing from one platform to another and back was never meant to be easy.

Author image
Down the block from a literal battleship Website
One of three therapists in the known universe who knows how to use a computer. Finds meaning in highly protected data, in a cave, utilizing nothing but a box of scraps.
Author image
Down the block from a literal battleship

One of three therapists in the known universe who knows how to use a computer. Finds meaning in highly protected data, in a cave, utilizing nothing but a box of scraps.