In search of ML/AI success? Know your hard and your soft numbers.
There’s no guaranteed success in AI. Still, if you’ve been following this website long enough, you already know three steps you can take to improve your chances:
- Cultivate a leadership-level understanding of AI so that the folks in charge get a realistic picture of what AI can really do. Bonus points if you can grow this into a company-wide data literacy.
- Develop a solid plan to lay out the specifics of how the company will use AI. Map out an overall strategy, then develop a balanced portfolio of specific data projects to execute.
- Expect to iterate on projects – say, run projects as sprints – to make it easier to change course based on new discoveries, or to even stop a project that no longer looks like it will bear fruit.
There’s another important step here, and it’s easy to miss because it’s subtle:
- Express your business in terms of numbers.
This is something I call, quantification.
Quantification is important because everything in data – from summing and sorting in BI, to predicting customer churn with AI – is based on applying analytical techniques to numbers.
Data is a numbers-in, numbers-out affair: you feed numbers as inputs into a spreadsheet, a dashboard, or the training process for an AI model. You then get numbers as outputs from those analyses.
We can express this flow as:
business concepts →
The arrow following “business concepts,” and the one leading into “business decisions,” involve translation and interpretation. And that’s where this gets interesting:
Numbers-in: Inputs to analyses
Pretty much everything in your business is, or can be expressed as, a number. Not all numbers are created equal, though. It helps to split them into “hard” and “soft”:
Hard numbers are concepts that are both numeric in nature and factual. The number of vehicles you sold last year? The day-to-day revenue tallies from a particular store? The stock price and number of shares you purchased yesterday? Those are all hard numbers.
Soft numbers are … everything else. Your employee satisfaction scores from that HR survey? The text in a document? The social impact of your latest charity fundraiser? I call these soft numbers, because they start off as something else and you have to translate them into numeric form. That makes them a little squishy, as we’ll soon see.
Why knowing “hard” and “soft” numbers is important
Data analysis techniques only see numbers as numbers; they neither know nor care whether their inputs are hard or soft. So why should we, as humans, care about this?
We should care because soft numbers are subjective. Every decision you make about about how to translate a concept into numbers will influence those numbers. You also risk losing information along the way. All of this will influcents the analysis.
Going back to the example of that HR employee satisfaction
survey, let’s say that respondants rate issues from 1 (very
negative) to 5 (very positive). What’s the difference between
3? By pure numeric analysis, that difference is 1.
But does that really mean that someone who marks
3 is 20% more
satisfied than the person who marks
2? And how do you know to
use a scale from 1-5 instead of 1-10, or even 1-100?
You face similar challenges working with text data. Many natural language processing (NLP) techniques decompose a document into a series of word counts, which seems a simple enough way to turn text into numbers. But this still involves a series of decisions around how to break up the text: do you eliminate very common words? If so, how do you define “very common?” Do you count individual words, or do you group them into pairs in order to capture the information associated with word order? And should you boil a word down to its root, or treat “boat,” “boats,” and “boating” as distinct terms for the word count? The “right” answers here depend on a number of issues, including your choice of analysis technique.
By comparison, hard numbers don’t suffer that information loss because there’s nothing to translate. The difference between two units sold versus ten? That’s eight units. Your vendor raising prices from five-hundred dollars to two thousand? That’s a four-fold increase. We can then explore reasons why that increase happened, and what we’re going to do about it, but that fifteen-hundred dollar difference is a cold, hard fact.
Your greatest risk when it comes to hard numbers is whether they were recorded properly. Your data scientists will still have to determine which numbers are relevant to an analysis, and how to frame them – processes known as feature selection and feature engineering, respectively – but so long as there were no errors in capturing that data, your team will always start from a firm foundation of facts.
Numbers-out: Reviewing the results
The numbers you get out of an analysis are just that: cold, raw
numbers. It’s up to you to decide what those numbers mean as far
as business decisions. A BI summary may reveal that 65% of your
stores are under-performing. An AI model may return a simple
0.782 in response to the question “should we purchase this
stock at this price?” The analyses can’t tell you how you
translate those numbers back into business decisions. That part
is up to you. Interpretation and human judgement are always in
You should scrutinize any results you get back from an analysis. But you should doubly scrutinze them when you used soft numbers as inputs. You have to ask youself: “how much information did we lose in the translation from business ideas to numbers on the way in? And how will that reflect in our translation from numbers back to business concepts on the way out?”
(If you’re reading closely, this also means that you can’t hide behind “the data told me to” or “the model made me do it.” As the human in a leadership role, you are ultimately responsible for the actions your business takes as a result of what numbers a model returns to you.)
Opening the door
Turning your business into numbers may seem like a chore, but it’s the only way to reap the benefits of data analysis. As a bonus, quantifying your business also opens the door to borrowing concepts from existing, well-established quantitative fields.
Learning how the trading and insurance domains think about data can guide you on how to approach your own analyses. Furthermore, with enough of a trained eye, you’ll learn to spot analogs to valuation, risk, and portfolio theory in your industry and leverage those techniques accordingly.