Towards Quantification: Finding Hard and Soft Numbers In Your Business
2021-06-28 | tags: data literacy AI

In search of ML/AI success? Know your hard and your soft numbers.

There's no guaranteed success in AI. Still, if you've been following this website long enough, you already know three steps you can take to improve your chances:

There's another important step here, and it's easy to miss because it's subtle:

This is something I call, quantification.

Quantification is important because everything in data -- from summing and sorting in BI, to predicting customer churn with AI -- is based on applying analytical techniques to numbers.

Data is a numbers-in, numbers-out affair: you feed numbers as inputs into a spreadsheet, a dashboard, or the training process for an AI model. You then get numbers as outputs from those analyses.

We can express this flow as:

business conceptsnumbersanalysisnumbersbusiness decisions

The arrow following "business concepts," and the one leading into "business decisions," involve translation and interpretation. And that's where this gets interesting:

Numbers-in: Inputs to analyses

Pretty much everything in your business is, or can be expressed as, a number. Not all numbers are created equal, though. It helps to split them into "hard" and "soft":

Hard numbers are concepts that are both numeric in nature and factual. The number of vehicles you sold last year? The day-to-day revenue tallies from a particular store? The stock price and number of shares you purchased yesterday? Those are all hard numbers.

Soft numbers are ... everything else. Your employee satisfaction scores from that HR survey? The text in a document? The social impact of your latest charity fundraiser? I call these soft numbers, because they start off as something else and you have to translate them into numeric form. That makes them a little squishy, as we'll soon see.

Why knowing "hard" and "soft" numbers is important

Data analysis techniques only see numbers as numbers; they neither know nor care whether their inputs are hard or soft. So why should we, as humans, care about this?

We should care because soft numbers are subjective. Every decision you make about about how to translate a concept into numbers will influence those numbers. You also risk losing information along the way. All of this will influcents the analysis.

Going back to the example of that HR employee satisfaction survey, let's say that respondants rate issues from 1 (very negative) to 5 (very positive). What's the difference between 2 and 3? By pure numeric analysis, that difference is 1. But does that really mean that someone who marks 3 is 20% more satisfied than the person who marks 2? And how do you know to use a scale from 1-5 instead of 1-10, or even 1-100?

You face similar challenges working with text data. Many natural language processing (NLP) techniques decompose a document into a series of word counts, which seems a simple enough way to turn text into numbers. But this still involves a series of decisions around how to break up the text: do you eliminate very common words? If so, how do you define "very common?" Do you count individual words, or do you group them into pairs in order to capture the information associated with word order? And should you boil a word down to its root, or treat "boat," "boats," and "boating" as distinct terms for the word count? The "right" answers here depend on a number of issues, including your choice of analysis technique.

By comparison, hard numbers don't suffer that information loss because there's nothing to translate. The difference between two units sold versus ten? That's eight units. Your vendor raising prices from five-hundred dollars to two thousand? That's a four-fold increase. We can then explore reasons why that increase happened, and what we're going to do about it, but that fifteen-hundred dollar difference is a cold, hard fact.

Your greatest risk when it comes to hard numbers is whether they were recorded properly. Your data scientists will still have to determine which numbers are relevant to an analysis, and how to frame them -- processes known as feature selection and feature engineering, respectively -- but so long as there were no errors in capturing that data, your team will always start from a firm foundation of facts.

Numbers-out: Reviewing the results

The numbers you get out of an analysis are just that: cold, raw numbers. It's up to you to decide what those numbers mean as far as business decisions. A BI summary may reveal that 65% of your stores are under-performing. An AI model may return a simple 0.782 in response to the question "should we purchase this stock at this price?" The analyses can't tell you how you translate those numbers back into business decisions. That part is up to you. Interpretation and human judgement are always in order.

You should scrutinize any results you get back from an analysis. But you should doubly scrutinze them when you used soft numbers as inputs. You have to ask youself: "how much information did we lose in the translation from business ideas to numbers on the way in? And how will that reflect in our translation from numbers back to business concepts on the way out?"

(If you're reading closely, this also means that you can't hide behind "the data told me to" or "the model made me do it." As the human in a leadership role, you are ultimately responsible for the actions your business takes as a result of what numbers a model returns to you.)

Opening the door

Turning your business into numbers may seem like a chore, but it's the only way to reap the benefits of data analysis. As a bonus, quantifying your business also opens the door to borrowing concepts from existing, well-established quantitative fields.

Learning how the trading and insurance domains think about data can guide you on how to approach your own analyses. Furthermore, with enough of a trained eye, you'll learn to spot analogs to valuation, risk, and portfolio theory in your industry and leverage those techniques accordingly.

Are We In An AI Bubble?

Does the AI hype meet the technical term of a "bubble?"

What is Risk?

An introduction to risk.