Periods, Question Marks, and now Ellipses: The Punctuation Marks of Data Analysis

2021-10-04 | tags: data literacy AI

BI is periods. AI is question marks. Simulation is ellipses.

In a recent post, "Question Marks and Periods in the World of Data," I explored periods and question marks as useful punctuation in the data world:

BI looks into everything that has happened up till now. "We sold 10,000 widgets last month." "Our NYC retail stores show the highest traffic." So long as the data is clean and correct, any information coming out of BI is a fact. Hence, BI uses periods.
By comparion, AI (which I'll use as an umbrella term that includes data science and machine learning) looks into the future. We don't know what will happen, so we express AI using question marks. "How many widgets will we sell over the next 18 months?" "At which month this customer will churn?" "What will be this customer's lifetime value?"

I'll now continue that train of thought by adding simulation to the mix. Simulation answers the question of "What else?" so we'll express it using ellipses.

Moving beyond point estimates

An AI model (or similar data analysis) answers your questions with a single number known as a point estimate. The model has determined that the true answer is probably in the neighborhood of the point estimate, say, 7.5. But how big is that neighborhood? Is the model saying that "I think the true answer is 7.5, plus or minus 0.0002?" Or is it more along the lines of, "I think the true answer is somewhere between 0 and 15?"

(You may have heard your company's data scientists using terms such as "confidence interval," "region of practical equivalence" (ROPE), "credible interval" (CI), or "highest probability density" (HPD). While these mean different things, they're all ways to describe the "neighborhood" or range around the point estimate.)

I think we can agree that those two definitions of 7.5 are quite different. If you're making a business decision based on the model's answer, you'd prefer the smaller range (the model is very sure that 7.5 is close to the true answer) to the large one (the model shrugs and says "7.5? Maybe? I dunno").

One way to size up that range around a point estimate is to run a simulation: vary the inputs or parameter values, run the model, and see what shakes out. Repeat this a one-hundred-thousand (or a few million) times and you'll see how the range around the initial point estimate changes. Maybe you'll get a very narrow range, in which case you can have more faith in the that value. And if you get a wide range, then you'll know to take the model's point estimate with a grain of salt.

Finding the weird interactions and extreme values

Simulation works for more than just predictive ML models. It applies to pretty much any mathematical model or process in which you can vary the inputs or parameters.

Let's say you've developed a model that predicts revenue shortfalls by combining a variety of inputs: the number of widgets sold last year, the average monthly high temperature, the amount of rainfall, and the previous quarter's interest rates. You could run a simulation that varies each of those inputs to see what numbers the model returns.

By tracking the input values, you could then catch combinations that lead to odd or extreme cases -- "when the weather's cold and interest rates drop below a certain number, we see excess losses" -- and plan accordingly. This is especially helpful in identifying complex scenarios, when the inputs interact in unexpected ways.

This also works for risk assessments

In previous posts I've explained that risk assessments begin by asking the question, "What if?" Simulations are one way to explore those scenarios.

"What if our two main suppliers suddenly close down?" Your data scientists, having mapped out your business processes, can show you the impact of those suppliers' inputs dropping to zero. Maybe the answer is "this won't affect us for several months." Maybe it's "we'd shut down within the week." Wouldn't you like to know sooner rather than later?

Mapping out these various "What if?" scenarios gives you the opportunity to act before problems occur, which reduces the negative impact to your business operations.

Hence, ellipses

All of this brings us back to punctuation:

BI is "Here's what happened."
AI is "What will happen?"
Simulation is "What else may happen?"

You can think of simulations as a way to create temporary, alternate universes in which to test ideas. Running a simulation will give you a sense of the possible outcomes above and beyond a point estimate, or from a single set of input values.

By simulating a wide variety of scenarios, you can identify new, unexpected situations and prepare accordingly.

Business Stakeholders: Three Questions to Improve Your Communications With Data Scientists

When talking with your company's data scientists, does the conversation quickly bog down? Try these questions to keep things moving.

Human/AI Interaction: Exoskeletons, Sidekicks, and Blinking Lights

Spotting opportunities to build AI systems that complement, not outright replace, people on the job.