The top failure modes of an ML/AI modeling project (Part 2)

(Photo by Brett Jordan on Unsplash)

Someone once told me that risk management is a matter of asking “What are you worried about? And what are you going to do about it?”

My previous post covered the “What are you worried about?” by describing several ways an ML/AI modeling project can go off the rails:

Not enough training data.
Lack of “signal” in the training data.
Lack of budget (time, money, or anything else).
The world changes.
Freak correlations.
A grab-bag of non-technical issues, such as PR fallout.

This time around, I’ll share ideas for “What are you going to do about it?” at each phase of a modeling effort. They’ll help you steer clear of trouble and soften the blow when other problems crop up.

(I emphasize “when,” not “if.” The only way to completely avoid model trouble is to not develop a model at all. No risk, no reward.)

The early stages

Develop a plan. Take the time to sort out what metrics are important to you, what are the minimum acceptable values of those metrics, and whether this problem is even amenable to an ML model. How does all of that compare to the incumbent solution? Is there an alternate approach that isn’t based on ML?

This is also the time to evaluate the model’s impacts. What happens to your business when the model is incorrect? What’s your backup plan if you can’t get the model to work at all? What are the ethical ramifications of using a model in this case?

(Consider that time when Twitter built a facial recognition model to crop images in the timeline view. Simply scaling the images would have involved far less effort, and exposed them to zero reputation risk.)

This conversation should not be limited to the company’s data scientists and ML engineers; this is the time for stakeholders, product owners, and other technical staff to all weigh in.

Work with an experienced data scientist or ML engineer. And bring them in early. If you have an in-house data team that you trust, you’re all set. Otherwise, it’s time to look for outside help.

Do yourself a favor and retain the services of an experienced professional. This is someone who can do more than just build the model. They can – and should – review your plans, prioritize modeling techniques to try, and point out potential pitfalls. They can also inspect your training data for problems.

Model R&D

Find a steady, reliable source of training data. A common question I get is “how much data will we need?” The truth is, you won’t know till you know. So I answer with a rough estimate, followed by “… and we need to make sure we can get more data, if need be.”

There are two main reasons you’d need to go back to the proverbial well. The first is when the early training work hints that more data would improve performance. The second is when you need to refresh the model later on. Having a reliable source for additional data will keep your model R&D efforts moving, which will keep your time-to-market (ergo, time-to-benefit) to a minimum.

Thoroughly review your training dataset for technical and non-technical issues.

Technical issues include missing fields or signs of inconsistent data collection. Non-technical issues are typically regulatory or ethical in nature, such as avoiding certain features when building consumer lending services. When your legal team asks you, “where did you get that training data?”, you need to have an answer.

Structure your model R&D. Remember, during the planning stages, when you sorted out what metrics were important and what values were acceptable? All of that will help you decide the maximum amount of time and money to put into building this model.

Why so? Well, building a model involves trying a variety of techniques and parameters in search of the best performance. If you don’t define your own stopping criteria, the R&D can go on forever. And the business world tends to frown on “unbounded costs.”

To add more structure to the model R&D process, I’ve borrowed the idea of “sprints” from agile software development:

At the start of the project, I work with the client to pick some block of time (2-3 weeks, usually) in which I will try building and tuning a model.
During the sprint, I try certain techniques and note the model’s performance thus far.
At the end of the sprint, I review these results with the client. If we think we have a chance to improve performance, we give it a thumbs-up and try another sprint. If it’s good enough, or we think it’s just not going to go anywhere, we give it a thumbs-down and pull the plug.

This approach means that we get the best of both worlds: cost controls are baked in, and we get to experiment. Having a periodic stopping point also gives us a “pencils down” moment in which to reflect on how the model’s doing while we’re not wrapped up in the process of building it.

After the model goes live

Monitor your model. Your model can and will make mistakes. It’s important to keep an eye on it as it runs in production. You may spot “model drift” – performance degrades because the outside world has changed over time – which is your indication that it’s time to pull more data and retrain the model.

For bonus points, implement an instant-off switch. Sometimes the world changes too quickly and your model can’t handle it. You need a way to instantly disconnect the model from, say, pricing recommendations or automated purchases. (Some trading shops refer to their instant-off tool as a “dead man’s switch” or “big red button.”)

Now, what will you do about it?

To recap, here’s the list of “What are you worried about?” (the risk assessment items) paired with “What are you going to do about it?” (the associated risk mitigations).

Not enough training data: have a source for more training data.
Lack of “signal” in the training data: develop a plan (specifically: devise an alternate solution), review your dataset.
Lack of budget (time, money, or anything else): develop a plan, structure model R&D.
The world changes: develop a plan, have a source for more training data, review your dataset, monitor your model.
Freak correlations: develop a plan, have a source for more training data, review your dataset, monitor your model.
A grab-bag of non-technical issues, such as PR fallout: develop a plan.

(Note that “work with an experienced professional” applies to all of the above.)

What you see here is that most of the risk mitigation work takes place long before the model is released to production. This is your best chance to establish guard rails and padding, to protect yourself from inevitable problems.

The catch is that “develop a plan” can be a rather involved step because it requires multiple teams’ input. Working through these matters will certainly slow you down. Resist the temptation to cut corners. The work you invest up-front will pay dividends of time, money, and effort.

The early stages

Model R&D

After the model goes live

Now, what will you do about it?

FEATURED TAGS