Humans versus machines? To reduce your risk, the best answer is “yes.”
I think of an ML/AI model as a piece of factory equipment that produces decisions. AI-based automation, then, is a matter of outsourcing decisions to models. Whether the topic is document classification or autonomous vehicles, the appeal is the same: machines make decisions faster than people, operate tirelessly, and make fewer mistakes along the way.
“Faster” and “tirelessly,” yes. “Fewer mistakes” is not so cut-and-dried. Metrics matter. When choosing between a human and a machine to make a decision, relying solely on accuracy – the number of correct decisions out of the total – is a dangerous oversimplification.
Even if you could tune a model to make fewer mistakes than a human (which is itself a challenge) you still need to compare where human and machine make mistakes. You then want to understand how the two groups handle problems.
All of this leads to a different arrangement: instead of either/or, you want both humans and machines involved.
Despite the term “machine learning,” machines and humans don’t learn in the same way. A model represents generalizations across a (finite, point-in-time snapshot of a human-curated) training dataset, whereas a person learns over years of formal training, informal first-hand experience, and observing the experiences of others. These different styles of learning lead to different paths in decision-making, which ultimately lead to different types of mistakes.
How the two groups handle unfamiliar situations also plays a role. A person can adapt, building on secondhand knowledge and drawing parallels to similar situations. Most importantly, a person can throw their hands into the air and declare “I don’t know!” as they look for help.
An AI model, by comparison, has a very narrow world view as defined by its training data. It’s not self-aware, which means it can’t tell when it is out of its depth. If the present-day world vastly differs from the model’s training data (whether from a slow drift over time or a sudden shock) it will continue to churn out incorrect answers. Worst of all, because the AI model moves at machine-speed, any mistakes it makes can quickly lead to widespread impact.
Treating AI-driven automation as a “human or machine” decision exposes you to each choice’s flavor of error and error-handling. Effective automation mitigates that risk by using both, leveraging AI to augment human decision-making instead of replacing it. Done well, human and machine can shine in their own way, and you get the best of both worlds.
Here are three ways to mix human and AI decision-making, to optimize your automation efforts:
You define rules to split the worklod based on some preset criteria. Maybe the model consistently makes mistakes on a certain kind of case, a sign that this subset requires more nuance. Or maybe certain cases have a higher cost of being wrong, so you want a couple of people to put their heads together to explore it. You route those cases to a human, and leave the rest to the machines.
Using a pre-sort requires that you define a feedback channel for the machine-based cases. If the affected customer disputes the machine’s decision (say, a cardholder is responding to a mistaken fraud alert), they must be able to quickly speak with a human to set things right. This direct-to-human feedback channel is the customer’s way of reprioritizing the case for human review.
There are real-world analogs of the pre-sort. Hotel chains and airlines provide kiosks so travelers can check-in. If those travelers have questions or needs that the kiosks don’t handle, they can go straight to the desk to speak to a human. The feedback channel comes in the form of another hotel or airline representative who watches the kiosks, and steps in when a customer appears to be in difficulty.
You let the machine take the first pass on all cases. If the model’s confidence score falls below some predetermined threshold (for a classification) or the uncertainty is deemed too wide (regression), you pass the case to a human for review.
The machine-sort approach requires additional monitoring and tuning, to keep an eye on the tresholds used as decision boundaries. This goes double after you release an updated model, since new training data or new parameters may cause those thresholds to shift.
For a real-world parallel, consider the use of credit scores in lending. (Credit scores are hardly perfect, since they lose a lot of nuance by compressing a person’s credit history into three digits … but that’s another story.) A lender can reject some number of loan applications outright based on the applicants' credit score. Note that a higher credit score doesn’t guarantee the loan will be approved, just that a human will give the application a deeper look.
The AI model sees all cases, but a human intervenes on some number of randomly-chosen machine decisions. This is a way to keep an eye on the model’s performance in real-time. If the human and machine differ on enough cases, that means it’s time to re-train the model.
Spot-check works best when your model exhibits peak performance and there is a low cost of the model being wrong.
Real-world equivalents of the spot-check include “secret shopper” programs, and supervisors listening in on some customer service calls.
In automation, a “human plus AI” approach provides improved coverage and reduced exposure to risk compared to the AI-only alternative. Arrangements such as pre-sort, machine-sort, and spot-check promote humans to “manager of machines” or “priority agent” so that you can get the best of both machine and human decision-making under one roof.