Howdy readers, it's been quite a while, hasn't it? I know I promised a post about hardware... and maybe I'll get to it one of these days, but let's be real, I haven't posted in almost four years.
What happened? Simple, I got really, REALLY busy. The last time we chatted, I was a senior analyst for a nascent analytics team for a large healthcare company. Now, I run what I built the team into, and an even larger company acquired us. The best part, we specialize in the very type of analytics and reporting that I was writing about back then. I now have a few years of experience training fresh analysts in that very particular set of skills... except with a bit more of a budget, and the freedom to pick my tools (sort of).
Now, that puts me in a really interesting position; I can look back at what I wrote all those years ago, and reflect on what's changed, what I've learned, and how to build an analyst from the ground up in 2023. Perhaps each of these points will become its own post, or perhaps this will be the end of the conversation, but either way, holy smokes has a lot changed... while somehow staying the same.
Now that we've caught up, let's dive into the fun stuff, what's changed:
The 64-Bit Evolution
(Alternatively: Why you need to beg your dinosaur of a company to upgrade you to 64-bit Office)
Oddly enough, despite the general widening of my knowledge and toolkit, the factor that had the biggest impact on our day-to-day workflow has been the company finally upgrading to 64-bit MS Office. First off, this is a free upgrade, there's no reason why you or your company should be using 32-bit versions of the software; ESPECIALLY your analytics team. The reasons are myriad, but they boil down to major upgrades to Access (the program) and Excel (the GOAT) because of how much RAM the programs can make use of (see, I told you hardware would come into play eventually!).
You may notice I made no mention of Access in any of my older posts, and there was a fantastic reason for that: 32-bit Access is completely useless. Any local processing of data (which should be avoided if possible, but who are we kidding?) requires as much memory as it can handle, and 32-bit applications can only access 4 GB of memory at a time. One of the two things nearly ANYONE knows about their computer is that it has more than 4 GB of RAM... which is a crime. How this plays out is that your ability to do anything remotely fancy is kneecapped by 32-bit MS Office. Often, Excel and Access will "give up" and basically tell you to upgrade to 64-bit; or just as likely, the program will give you the dreaded faded white interface, and then hang till it crashes.
While the upgrade to 64-bit doesn't solve all of these issues, it DOES open up the possibility to handle an effectively infinite amount more data at once, whether that be bigger lookups (which we'll get to...), messing with giant data models, or building rapid access local data source in Access. You don't need to know more, but if you're still using 32-bit software, upgrade before reading any further.
XLOOKUP
: The Final Frontier of the INDEX MATCH
One of my most popular posts discussed how INDEX MATCH
made the VLOOKUP
and HLOOKUP
obsolete: This is 100% still the case, however, as of 2019 (either Office 2019 or Office 365) INDEX MATCH
has been completely overshadowed by the introduction of a new challenger: XLOOKUP
.
It's as if Microsoft read my post, and saw how folks were using INDEX MATCH
with iferror
instead of VLOOKUP
, and actually processed all of that feedback and released the undisputed king of the lookup function: XLOOKUP
.
Here's how it works: =
Xlookup(
the value you want to look up, the column in which you want to look up that value in, the column that contains the values you want to pull back into your initial sheet, and what to do if there's no match). In practice, it looks like this: =
Xlookup
(A2,Sheet1!
A:A
,Sheet1!
B:B
,"")
. What that does is it checks cell A2, then looks for it in sheet1, column A, and returns the matching value in Sheet1 column B
. If there's no match, it returns a blank. In and of itself, it does the same thing as VLOOKUP
and INDEX MATCH
, but the real power is that it:
- Offers the same flexibility as
INDEX MATCH
(allowing you to do lookups in any orientation and across workbooks, requiring WAY less processing power) while being a single, easy-to-use function. - It has a built-in "if error," meaning you can program what to do if you don't have a match by simply putting in something after the 3rd comma.
- Because of the built-in
iferror
, it allows the function to be infinitely recursive. Nothing is stopping you from putting an infinite number of differentXLOOKUP
s after the third comma of eachXLOOKUP
function, like so:**=Xlookup(A2,Sheet1!A:A,Sheet1!B:B,Xlookup(A2,Sheet2!A:A,Sheet2!B:B,""))**
What that'll do is check A2
, return what's in Sheet1 B:B
, and if there's no match, check out Sheet2
for another match... repeat as often as necessary!
In a nutshell, there's no reason to teach anyone VLOOKUP
anymore, and INDEX MATCH
, while still having a few uses outside of what it normally does, is effectively replaced by XLOOKUP
too. When training someone new, after teaching them pivot tables, I teach them XLOOKUP
. Once you have those two skills, you can work an entry-level finance or accounting gig... no joke!
XLOOKUP
and can explain why, they're a catch. I can't tell you how many people I've interviewed who don't have an answer to this question at all.One of my favorite sketch performers: Makro sums it up nicely in this video (coincidentally, this is also where I learned XLOOKUP
was a thing):
Another fun fact: Office 2019/365 introduced the TEXTJOIN
macro I wrote about in my first post as a built-in function.
The Rise of Power BI
Why Learning PowerPivot Is Worth at Least $20,000
My later posts all revolved around discussing how to leverage the built-in PowerPivot add-on that comes with Excel to mess around with datasets that Excel has no business working with.
While this information is still very useful, 64-bit Access more-or-less serves the same purpose but works a lot better for what we were doing with it (i.e. loading giant flat files to visualize things in Excel). THAT SAID, I would go as far as to say, next to learning XLOOKUP
, PowerPivot was possibly the most important thing I could have taught... Why, you ask? Simple: Power BI uses the same interface to do a hell of a lot more.
Stepping back a little, what is Power BI (or PBI if you're nasty)? You may remember that I referenced Tableau quite a bit when discussing tools that companies leverage to visualize large datasets; well, PBI does the same thing. Still, it's a Microsoft product, (can be) a good deal cheaper, and works natively with the rest of the MS suite. As such, PBI is rapidly becoming the software of choice for producing interactive dashboards at the enterprise level... meaning that it's essential for any self-respecting analyst to know how to use. Luckily, Microsoft basically re-skinned the existing PowerQuery/PowerPivot interface, added a few additional connection drivers built-in (you can always download more), added a ton of additional stock visualizations, and released PBI to the world. In a nutshell, if you know how to use pivot tables, and you read my three PowerQuery posts... you know how to use PBI, and the rest is just practice!
Alation: Learning to Love Predictive A.I.
One common theme across my first few posts was a commentary on the status of how many large organizations are, for lack of a better word, organized when it comes to their data and analytics. A lot of it still applies (Silos are still a major issue, "those who sit above in shadow" still exist, and there's always going to be a team who owns the data with whom you have little to no contact... no matter how high you get in the organization), but a lot of it was a result of a general lack of information trickling down to the front-lines of the organization.
The good news is, I made it my mission as I was rising within my organization to increase the general level of understanding across the entire Analytics vertical. Many companies rely on user-submitted documentation artifacts on something like confluence. Still, some of the more forward-thinking companies go a step further and leverage a product that combines the user-generated documentation of a Confluence with some Machine Learning and social tools to create a searchable repository of data dictionaries. The product I'm familiar with (but not the only product in the space) is called Alation... and it has single-handedly raised the level of data consciousness across the entire organization, which is an absolute game changer.
The real miracle is that it doesn't just compile all this information in one place. Still, it also has a really solid SQL development environment (or IDE: Integrated Development Environment) that supplies relevant documentation depending on how you write your queries (basically, it dynamically shows the fields you can pull from based on your "from" statement) and it automatically suggests the completion of fields as you type. The way it arrives at these suggestions is by logging all of the successful queries that folks run within the IDE, tracking the joins, fields, and queries that people use, and how often they use them.
It then takes this information and posts it within the searchable database. Picture this: if you're working on a new data source you don't understand, you can start with whatever field you know you need and see what joins are the most popular (unlike high school, in this case, popularity is a sign of quality).
This democratizes "tribal knowledge" (a term I despise, but one that is so damn descriptive, I have to use it), and allows anyone with enough curiosity access to the keys to the kingdom.
Access + Curiosity + Grit = Success
One of the most pivotal things I learned during my time as an analyst had nothing to do with analytics at all, but rather, had everything to do with how one gets access to things in a corporate environment.
If you work at a company of any substantial size (basically, any company with departments) there's usually a system where one obtains access to the various tools that they use to do whatever job it is they came to the company to do. In the case of every company I've ever worked for, this system is Sailpoint's IdentityNOW; but every company has something like it.
I work in healthcare, so in my org, there's a very good reason to have these access roles buttoned up and to require some degree of effort to get access to anything... but would it shock you if I told you that no one knows which roles people need to actually do their job.
In general, your boss (and maybe your boss's boss... if they're particularly savvy) has a list of roles that, over time, they've figured out that everyone under them needs to have. However, that's basically where the knowledge ends. Let's say that you need access to a new tool, or a new dataset... there's nothing that adds that to the roles you already have, someone needs to figure out what role grants that access, and then they need to put in for that role for you (or you can do it yourself).
The good news is, there is a standard procedure that most companies follow when designing these roles. There's usually a role "root" (i.e. Power BI), followed by an access "level" (i.e. Corporate), followed by the thing you're trying to access (i.e. the name of the workspace), and then finally, the level of permission (i.e. Read Only). I highly encourage any analyst with some degree of curiosity and a general desire to learn and do more within their role to go exploring.
I can bet you a fair sum of money that if you work for a large company, not a single person on any of the analytics teams (ESPECIALLY if there's more than one) knows everything that's accessible to them... so that gives you the ability to BE that person... if you know the roles, you have the power.
That solves the access side of the equation; the other side is how to apply this access helpfully and creatively. I've been told the best advice I've given to new trainees is as follows:
"You won't be given the ability to break something until you're no longer at risk of breaking it."
This leads to its corollary:
"Once you know you can't break something, nothing is stopping you from trying everything you can to solve the problem."
This two-fold axiom has been my team's raison d'etre since our inception. Essentially, we have very broad read-only access across the organization, which allows us to come up with rapid and creative solutions to emergent requests. This leads to my final unifying theory.
What Is an Analyst, Anyways?
Another fun thing I talk about with new hires is that "analyst" is perhaps the broadest term in business. The term encompasses everything from lawyers who ferret out tax fraud to folks that visualize trends with dashboards, to operatives working overseas doing wet work.
That said, now that I'm in a position where I have to hire analysts and "define" what my team does, I look at it like this: An analyst is someone who can take data from multiple sources (some of which they may not understand) and organize it for some third party on-demand.
That "on-demand" part is the most important, what differentiates my analysts from the folks in IT who work with data (and who may own the data) is that we have a much tighter turnaround time (often a week at most), and we aren't wedded to a particular development structure that necessitates the creation of (arguably unnecessary) artifacts and meetings and the like. Essentially, if you need it once, you'll get it from us; if you need it regularly or on-demand, you may want to go to IT (but you might still go to us... we're quite good).
However, this broad mandate necessitates three things above all else: creativity, curiosity, and tenacity. If you're going to be working with data sources you've never seen before regularly, you need to be curious enough to get a good enough understanding to pull down the data you need for your product; then, you need to figure out a way to jam these new data sources together, even if they're not designed to do so; and finally, you need to be able to try out a bunch of possible solutions until you find the one that works without getting discouraged... and then keep that in your back pocket for next time.
What Have We Learned?
- Work on being curious, creative, and tenacious.
- For the love of god, upgrade to 64-bit Office.
- Once you master
XLOOKUP
, Pivot Tables, and PowerQuery, everything else will fall into place. - Learn your way around your company's data dictionary system.
- Realize that no one knows anything, and as such, that opens the door to you knowing everything (a good place to start is what roles you can request for your job).
That said, no promises, but I did want to go into a little more detail about some of the crazy hacks I've come up with over the last few years, and hopefully I'll be able to share them with you soon!
Happy trails,
-Snacks