BI is periods. AI is question marks. Simulation is ellipses.
In a recent post, "Question Marks and Periods in the World of Data," I explored periods and question marks as useful punctuation in the data world:
BI looks into everything that has happened up till now. "We sold 10,000 widgets last month." "Our NYC retail stores show the highest traffic." So long as the data is clean and correct, any information coming out of BI is a fact. Hence, BI uses periods.
By comparion, AI (which I'll use as an umbrella term that includes data science and machine learning) looks into the future. We don't know what will happen, so we express AI using question marks. "How many widgets will we sell over the next 18 months?" "At which month this customer will churn?" "What will be this customer's lifetime value?"
I'll now continue that train of thought by adding simulation to the mix. Simulation answers the question of "What else?" so we'll express it using ellipses.
An AI model (or similar data analysis) answers your questions
with a single number known as a point estimate. The model has
determined that the true answer is probably in the neighborhood
of the point estimate, say, 7.5
. But how big is that
neighborhood? Is the model saying that "I think the true answer
is 7.5
, plus or minus 0.0002
?" Or is it more along the lines
of, "I think the true answer is somewhere between 0
and 15
?"
(You may have heard your company's data scientists using terms such as "confidence interval," "region of practical equivalence" (ROPE), "credible interval" (CI), or "highest probability density" (HPD). While these mean different things, they're all ways to describe the "neighborhood" or range around the point estimate.)
I think we can agree that those two definitions of 7.5
are
quite different. If you're making a business decision based on
the model's answer, you'd prefer the smaller range (the model is
very sure that 7.5
is close to the true answer) to the large
one (the model shrugs and says "7.5
? Maybe? I dunno").
One way to size up that range around a point estimate is to run a simulation: vary the inputs or parameter values, run the model, and see what shakes out. Repeat this a one-hundred-thousand (or a few million) times and you'll see how the range around the initial point estimate changes. Maybe you'll get a very narrow range, in which case you can have more faith in the that value. And if you get a wide range, then you'll know to take the model's point estimate with a grain of salt.
Simulation works for more than just predictive ML models. It applies to pretty much any mathematical model or process in which you can vary the inputs or parameters.
Let's say you've developed a model that predicts revenue shortfalls by combining a variety of inputs: the number of widgets sold last year, the average monthly high temperature, the amount of rainfall, and the previous quarter's interest rates. You could run a simulation that varies each of those inputs to see what numbers the model returns.
By tracking the input values, you could then catch combinations that lead to odd or extreme cases -- "when the weather's cold and interest rates drop below a certain number, we see excess losses" -- and plan accordingly. This is especially helpful in identifying complex scenarios, when the inputs interact in unexpected ways.
In previous posts I've explained that risk assessments begin by asking the question, "What if?" Simulations are one way to explore those scenarios.
"What if our two main suppliers suddenly close down?" Your data scientists, having mapped out your business processes, can show you the impact of those suppliers' inputs dropping to zero. Maybe the answer is "this won't affect us for several months." Maybe it's "we'd shut down within the week." Wouldn't you like to know sooner rather than later?
Mapping out these various "What if?" scenarios gives you the opportunity to act before problems occur, which reduces the negative impact to your business operations.
All of this brings us back to punctuation:
You can think of simulations as a way to create temporary, alternate universes in which to test ideas. Running a simulation will give you a sense of the possible outcomes above and beyond a point estimate, or from a single set of input values.
By simulating a wide variety of scenarios, you can identify new, unexpected situations and prepare accordingly.
Business Stakeholders: Three Questions to Improve Your Communications With Data Scientists
When talking with your company's data scientists, does the conversation quickly bog down? Try these questions to keep things moving.
Human/AI Interaction: Exoskeletons, Sidekicks, and Blinking Lights
Spotting opportunities to build AI systems that complement, not outright replace, people on the job.