Data Science’s Most Misunderstood Hero

Be careful which skills you put on a pedestal, since the effects of unwise choices can be devastating. In addition to mismanaged teams and unnecessary hires, you’ll see the real heroes quitting or re-educating themselves to fit your incentives du jour. A prime example of this phenomenon is in analytics.

Shopping for the trophy hire

The top trophy hire in data science is elusive, and it’s no surprise: “full-stack” data scientist means mastery of machine learning, statistics, and analytics. When teams can’t get their hands on a three-in-one polymath, they set their sights on luring the most impressive prize among the single-origin specialists. Who gets the pedestal?

Today’s fashion in data science favors flashy sophistication with a dash of sci-fi, making AI and machine learning darlings of the hiring circuit. Alternative challengers for the alpha spot come from statistics, thanks to a century-long reputation for rigor and mathematical superiority. What about analysts?

Analytics as a second-class citizen

If your primary skill is analytics (or data-mining or business intelligence), chances are that your self-confidence takes a beating when your aforementioned compatriots strut past you and the job market drops not-so-subtle hints about leveling up your skills to join them.

Good analysts are a prerequisite for effectiveness in your data endeavors. It’s dangerous to have them quit on you, but that’s exactly what they’ll do if you under-appreciate them.

What the uninitiated rarely grasp is that the three professions under the data science umbrella are completely different from one another. They may use the same equations, but that’s where the similarity ends. Far from being a sloppy version of other data science breeds, good analysts are a prerequisite for effectiveness in your data endeavors. It’s dangerous to have them quit on you, but that’s exactly what they’ll do if you under-appreciate them.

Alike in dignity

Instead of asking an analyst to develop their statistics or machine learning skills, consider encouraging them to seek the heights of their own discipline first. Data science is the kind of beast where excellence in one area beats mediocrity in two.

Each of the three data science disciplines has its own excellence. Statisticians bring rigor, ML engineers bring performance, and analysts bring speed.

At peak expertise, all three are equally pedestal-worthy but they provide very different services. To understand the subtleties, let’s examine what it means to be truly excellent in each of the data science disciplines, what value they bring, and which personality traits are required to survive each job.

Excellence in statistics: rigor

As specialists in coming to conclusions beyond your data safely, statisticians are your best protection against fooling yourself in an uncertain world. To them, inferring something sloppily is a greater sin than leaving your mind a blank slate, so expect a good statistician to put the brakes on your exuberance. Constantly on tiptoe, they care deeply about whether the methods applied are right for the problem and they agonize over which inferences are valid from the information at hand.

What most people don’t realize is that statisticians are essentially epistemologists. Since there’s no magic that makes certainty out of uncertainty, their role is not to produce Truth but rather a sensible integration of palatable assumptions with available information.

The result? A perspective that helps leaders make important decisions in a risk-controlled manner.

Unsurprisingly, many statisticians react with vitriol toward “upstarts” who learn the equations without absorbing any of the philosophy. If dealing with statisticians seems exhausting, here’s a quick fix: don’t come to any conclusions beyond your data and you won’t need their services. (Easier said than done, right? Especially if you want to make an important launch decision.)

Excellence in machine learning: performance

You might be an applied machine learning / AI engineer if your response to “I bet you couldn’t build a model that passes testing at 99.99999% accuracy” is “Watch me.” With the coding chops build prototypes and production systems that work and the stubborn resilience to fail every hour for several years if that’s what it takes, machine learning specialists know that they won’t find the perfect solution in a textbook. Instead, they’ll be engaged in a marathon of trial-and-error. Having great intuition for how long it’ll take them to try each new option is a huge plus and is more valuable than an intimate knowledge of how the algorithms work (though it’s nice to have both).

The result? A system that automates a tricky task well enough to pass your statistician’s strict testing bar and deliver the audacious performance a business leader demanded.

Performance means more than clearing a metric — it also means reliable, scalable, and easy-to-maintain models that perform well in production. Engineering excellence is a must.

Wide versus deep

What the previous two roles have in common is that they both provide high-effort solutions to specific problems. If the problems they tackle aren’t worth solving, you end up wasting their time and your money. A frequent lament among business leaders is, “Our data science group is useless.” and the problem usually lies in an absence of analytics expertise.

Statisticians and machine learning engineers are narrow-and-deep (the shape of a rabbit hole, incidentally) workers, so it’s really important to point them at problems that deserve the effort. If your experts are carefully solving the wrong problems, of course your investment in data science suffers low returns. To ensure that you can make good use of narrow-and-deep experts, you either need to be sure you already have the right problem or you need a wide-and-shallow approach to finding one.

Excellence in analytics: speed

The best analysts are lightning-fast coders who can surf vast datasets quickly, encountering and surfacing potential insights faster than the those other specialists can say “whiteboard.” Their semi-sloppy coding style baffles traditional software engineers… until it leaves them in the dust. Speed is the highest virtue, closely followed by the trait of not snoozing past potentially useful gems. A mastery of visual presentation of information helps with speed bottlenecks on the brain side: beautiful and effective plots allow the mind to extract information faster, which pays off in time-to-potential-insights.

Where statisticians and ML folk are slow, analysts are a whirlwind of inspiration for decision-makers and other data science colleagues.

The result: the business gets a finger on its pulse and eyes on previously-unknown unknowns. This generates the inspiration that helps decision-makers select valuable quests to send statisticians and ML engineers on, saving them from mathematically-impressive excavations of useless rabbit holes.

Sloppy nonsense or stellar storytelling?

“But,” object the statisticians, “most of their so-called insights are nonsense.” By that they mean the results of their exploration may reflect only noise. Perhaps, but there’s more to the story.

Analysts are data storytellers. Their mandate is to summarize interesting facts and be careful to point out that any poetic inspiration that comes along for the ride is not to be taken seriously without a statistical follow-up.

Buyer beware: there are many data charlatans out there posing as data scientists. There’s no magic that makes certainty out of uncertainty.

Good analysts have unwavering respect for the one golden rule of their profession: do not come to conclusions beyond the data (and prevent your audience from doing it too). Unfortunately, relatively few analysts are the real deal — buyer beware: there are many data charlatans out there posing as data scientists. These peddle nonsense, leaping beyond the data in undisciplined ways to “support” decisions based on wishful thinking. If your ethical standards are lax, perhaps you’d keep these snake oil salesmen around and house them in the marketing dark arts part of your business. Personally, I’d prefer not to.

Good analysts have unwavering respect for the one golden rule of their profession: do not come to conclusions beyond the data.

As long as analysts stick to the facts (“This is what is here.” But what does it mean? “Only: This is what is here.”) and don’t take themselves too seriously, the worst crime they could commit is wasting someone’s time when they run it by them. Out of respect for their golden rule, good analysts use softened, hedging language (for example, not “we conclude” but “we are inspired to wonder”) and discourage leader overconfidence by emphasizing a multitude of possible interpretations for every insight.

While statistical skills are required to test hypotheses, analysts are your best bet for coming up with those hypotheses in the first place. For instance, they might say something like “It’s only a correlation, but I suspect it could be driven by …” and then explain why they think that.

This takes strong intuition about what might be going on beyond the data, and the communication skills to convey the options to the decision-maker, who typically calls the shots on which hypotheses (of many) are important enough to warrant a statistician’s effort. As analysts mature, they’ll begin to get the hang of judging what’s important in addition to what’s interesting, allowing decision-makers to step away from the middleman role.

Of the three breeds, analysts are the most likely heirs to the decision throne.

Because subject matter expertise goes a long way towards helping you spot interesting patterns in your data faster, the best analysts are serious about familiarizing themselves with the domain. Failure to do so is a red flag. As their curiosity pushes them to develop a sense for the business, expect their output to shift from a jumble of false alarms to a sensibly-curated set of insights that decision-makers are more likely to care about.

To avoid wasted time, analysts should lay out the story they’re tempted to tell and poke it from several angles with follow-up investigations to see if it holds water before bringing it to decision-makers. If a decision-maker is in danger of being driven to take an important action based on an inspiring story, that is the Bat-Signal for the statisticians to swoop in and check (in new data, of course) that the action is a wise choice in light of assumptions the decision-maker is willing to live with and their appetite for risk.

The analyst-statistician hybrid

For analysts sticking to the facts, there’s no such thing as wrong, there’s only slow. Adding statistical expertise to “do things correctly” misses the point in an important way, especially because there’s a very important filter between exploratory data analytics and statistical rigor: the decision-maker. Someone with decision responsibility has to sign off on the business impact of pursuing the analyst’s insight being worth a high-effort expert’s time. Unless the analyst-statistician hybrid is also a skilled decision-maker and business leader, their skillset forms a sandwich with a chasm in the middle.

An analyst who bridges that gap, however, is worth their weight in gold. Treasure them!

Analytics for machine learning and AI

Machine learning specialists put a bunch of potential data inputs through algorithms, tweak the settings, and keep iterating until the right outputs are produced. While it may sound like there’s no role for analytics here, in practice a business often has far too many potential ingredients to shove into the blender all at once.

Your analyst is the sprinter; their ability to quickly help you see and summarize what-is-here is a superpower for your process.

One way to filter down to a promising set to try is domain expertise — ask a human with opinions about how things might work. Another way is through analytics. To use the analogy of cooking, the machine learning engineer is great at tinkering in the kitchen, but right now they’re standing in front of a huge, dark warehouse full of potential ingredients. They could either start grabbing them haphazardly and dragging them back to their kitchens, or they could send a sprinter armed with a flashlight through the warehouse first. Your analyst is the sprinter; their ability to quickly help you see and summarize what-is-here is a superpower for your process.

The analyst-ML expert hybrid

Analysts accelerate machine learning projects, so dual skillsets are very useful. Unfortunately, because of the differences in coding style and approach between analytics and ML engineering, it’s unusual to see peak expertise in one individual (and even rarer to that person to be slow and philosophical when needed, which is why the true full-stack data scientist is a rare beast indeed).

Dangers of chronic under-appreciation

An expert analyst is not a shoddy version of the machine learning engineer, their coding style is optimized for speed — on purpose. Nor are they a bad statistician, since they don’t deal at all with uncertainty, they deal with facts. “Here’s what’s in our data, it’s not my job to talk about what it means beyond the present data, but perhaps it will inspire the decision-maker to pursue the question with a statistician…”

What beginners don’t realize is that the work requires top analysts to have a better grasp of the mathematics of data science than either of the other applied breeds. Unless the task is complicated enough that it demands the invention a new hypothesis test or algorithm (the work of researchers), statisticians and ML specialists can rely on checking that off-the-shelf packages and tests are right for the job, but they can often skip having to face the equations themselves.

For example, statisticians might forget the equations for a t-test’s p-value because they get it by hitting run on a software package, but they never forget how and when to use one, as well as the correct philosophical interpretation of the results. Analysts, on the other hand, aren’t looking to interpret. They’re after a view into the shape of a gory, huge, multidimensional dataset. By knowing the way the equation for the p-value slices their dataset, they can form a reverse view of what the patterns in original dataset must have been to produce the number they saw. Without an appreciation of the math, you don’t get that view. Unlike a statistician, though, they don’t care if the t-test is right for the data. They care that the t-test gives them a useful view of what’s going on in the current dataset. The distinction is subtle, but it’s important.

Statisticians deal with things outside the data, while analysts stick to things inside it.

At peak excellence, both are deeply mathematical and they often use the same equations, but their jobs are entirely different.

Similarly, analysts often use machine learning algorithms to slice their data, identify compelling groupings, and examine anomalies. Since their goal is not performance but inspiration, their approach is different and might appear sloppy to the ML engineer. Again, it’s the use of the same tool for a different job.

To summarize what’s going on with an analogy: pins are used by surgeons, tailors, and office workers. That doesn’t mean the jobs are the same or even comparable, and it would be dangerous to encourage all your tailors and office workers to study surgery to progress in their careers.

The only roles every business needs are decision-makers and analysts. If you lose your analysts, who will help you figure out which problems are worth solving?

If you overemphasize hiring and rewarding skills in machine learning and statistics, you’ll lose your analysts. Who will help you figure out which problems are worth solving then? You’ll be left with a group of miserable experts who keep being asked to work on worthless projects or analytics tasks they didn’t sign up for. Your data will lie around useless.

Care and feeding of researchers

If this doesn’t sound bad enough, many leaders try to hire PhDs and overemphasize research — as opposed to applied — versions of the statistician and ML engineer… without having a problem that is valuable, important, and known to be impossible to solve with all the existing algorithms out there.

That’s only okay if you’re investing in a research division and you’re not planning to ask your researchers what they’ve done for you lately. Research for research’s sake is a high-risk investment and very few companies can afford it, because getting nothing of value out of it is a very real possibility.

Researchers only belong outside of a research division if you have appropriate problems for them to solve — their skillset is creating new algorithms and tests from scratch where an off-the-shelf version doesn’t exist — otherwise they’ll experience a bleak Sisyphean spiral (which would be entirely your fault, not theirs). Researchers typically spend over a decade in training, which merits at least the respect of not being put to work on completely irrelevant tasks.

When in doubt, hire analysts before other roles.

As a result, the right time to hire them to an applied project tends to be after your analysts helped you identify a valuable project and attempts to complete it with applied data scientists have already failed. That’s when you bring on the professional inventors.

The punchline

When in doubt, hire analysts before other roles. Appreciate them and reward them. Encourage them to grow to the heights of their chosen career (and not someone else’s). Of the cast of characters mentioned in this story, the only ones every business with data needs are decision-makers and analysts. The others you’ll only be able to use when you know exactly what you need them for. Start with analytics and be proud of your newfound ability to open your eyes to the rich and beautiful information in front of you. Inspiration is a powerful thing and not to be sniffed at.

WRITTEN BY Cassie Kozyrkov

Data Science’s Most Misunderstood Hero

Shopping for the trophy hire

Analytics as a second-class citizen

Alike in dignity

Excellence in statistics: rigor

Excellence in machine learning: performance

Wide versus deep

Excellence in analytics: speed

Sloppy nonsense or stellar storytelling?

The analyst-statistician hybrid

Analytics for machine learning and AI

The analyst-ML expert hybrid

Dangers of chronic under-appreciation

Care and feeding of researchers

The punchline

Head of Decision Intelligence, Google. ❤️ Stats, ML/AI, data, puns, art, theatre, decision science. All views are my own. twitter.com/quaesita

Softifive

You May Have Missed

Top 9 Simple Ways To Extend The Lifespan Of Your Electrical System

Utilize Instagram Guides in Your Online Marketing Strategy

Top Notch Facebook Marketing Tactics for Small Business

Latest Instagram Updates in January 2022 you must know

More From Our Network