The Lifecycle of an ML/AI Model

2021-03-08 | tags: data literacy AI

Building that ML/AI model is not the first step; nor is it the last.

I often speak with consulting prosects who want to build a predictive model. With all the excitement of developing the model, it's easy for people to forget that there are several steps both before and after that.

I explain to these prospects that developing a model is not just research and development, or "R&D." The full process is what I call R&D&D: Research (sorting out what to build) and Development (creating the model) and Deploy (keeping it aloft in production).

Research: Asking all the right questions

Every ML/AI model should start with an intent or a goal, expressed as a question: "Can we predict Situation ABC?" "Can we classify these XYZ Objects?" These questions should tie back to your business model and, by extension, your product strategy.

That's just the first question. After that, you and your product team should explore:

Is this something an ML/AI model can reliably do? Some tasks are not (yet) reliably within reach of a model, or they require more transparency than certain models can provide. You're better off with human minds performing that work.

Do we need a custom model for this? Remember that your goal is not to build a model, but to get the answers that the model would provide. And you need those answers in a timely fashion. Take the time to compare the pricing, time-to-market, and risk of developing a model in-house to that of a vendor service or off-the-shelf tool.

What's our baseline? You're probably building this model as an alternative to (or a replacement for) some incumbent solution. This can be code, human labor, or even another model. The incumbent solution represents a baseline of performance against which to measure the new model. Consider:

How much does the incumbent solution cost? If that solution is human labor, be sure to factor in the cost of scheduling shifts, and how you would ramp up or ramp down on staffing in order to match demand.
How often does the incumbent solution make mistakes? And, more importantly, where do they make those mistakes? This further factors into costs, because if the old and the new solution make different mistakes, you may be better off running them in parallel as an ensemble solution.

The new model should offer an improvement in terms of reduced dollar/time/effort costs and/or fewer mistakes. Otherwise, why bother?

Where do we get the training data? Do you already have it in-house? Will you need to update internal systems and then wait for the data to accumulate? or will you have to go to the marketplace to acquire it?

How well does our training data match the real world? Are there gaps in the data? Of those, which stem from real-world corner cases as opposed to your blind spots? Consider the facial recognition system that didn't account for twins or the music copyright detection system that didn't account for classical music.

How can this go wrong? Any ML/AI model is guaranteed to produce some number of wrong answers. How many times can that happen before it impacts your revenue, or even your company's reputation?

What are our alternatives? Sometimes, despite your best efforts, you can't develop a model that performs as well as you'd like. What will you do if that is the case here?

Development: Building the model

Once you've mapped out the higher-order, non-technical concerns of the model, you're ready to start building. This is a straightforward process:

Pull and prepare the training data: This may require that your data engineers develop new data pipelines.

Build and test model. This is the typical train-test-evaluate cycle of building a model. It's entirely possible that things fall apart here if the model doesn't perform well. You might need more or different data, or to try different tools and techniques. You might also, unfortunately, need to call it quits if you can't develop a sufficiently performant model in due time.

Review the final model's performance. After enough of the train-test-evaluate cycle, you'll (hopefully) have a model that peforms well and you'll have some metrics around what that all means.

To say "the model performs at 90% accuracy" doesn't tell you much, unless you connect that accuracy metric to your product strategy and business goals. Circling back to what you explored during the Research phase:

Consider that remaining 10% of the time where the model is wrong: how much of that is due to false positives versus false negatives?
Do the false positives hurt you more than the false negatives?
How many people are affected by the model being wrong, and what does that cost you in terms of increased customer support calls or damaged reputation?
And is that 90% accuracy meaningless, because you need to develop a custom metric that better reflects your business needs and the state of the world?

Deployment: "I have a (ML/AI) model; now what?"

Having created the model, you now want to put it to use. That means the model leaves the research phase and you deploy it to production:

Hosting: Your developers, data scientist, and/or MLOps (machine learning operations) teams connect the model to the real world. This usually involves plugging it into some workflow, such that data comes in from one location and the model sends its inference results to some other system for taking action.

Monitoring: I've mentioned elsewhere that an ML/AI model is a piece of factory equipment that churns out decisions. As with any other such equipment, you'd do well to keep an eye on it. At a bare minimum, that means making sure the model is always available to accept new inputs. You'll also want to keep track of those inputs and outputs, to detect when either one starts to drift or experiences a sudden shock, and alert humans accordingly.

... and, Repeat: Because a model is never truly done

Perhaps I should call this "R&D&D&R," where the last letter stands for "repeat." A model is never done, it's just "a good fit for now."

The monitoring you established as part of the Deployment phase will tell you when the model's training data has diverged from how the real world looks. Perhaps hotel booking trends have changed due to a shift in travel habits, or consumers have moved to the hot new snack food in the grocery store. Since your model was built on training data that you collected before this change, it's unlikely that your model will handle it well. It's now time to rebuild the model.

Rebuilding the model means running through the R&D&D cycle all over again. It's likely that the Research phase will be shorter this time. You'll still need to confirm that the business use case still applies, and that you have a backup plan in case you can't develop a sufficiently performant replacement model.

For this next tour through the Development phase, ask yourself what it means to update the model. Will the new regression model provide a different range of numeric results? Will that new classifier add or remove any classes? If so, what impact will that have on the downstream business processes?

As for Deployment, your MLOps team should have developed procedures to push a new model to production, confirm that it was properly installed, and roll it back if it encounters any trouble. While a model may misbehave once it goes live, installation and removal should be a smooth process.

Understanding the bigger picture

The life of an ML/AI model begins with a series of well-thought-out questions in the Research phase, and may cycle through several iterations of the R&D&D process before you replace it.

Keep in mind that you'll spend most of your time and effort in the Research phase. That is the time to bring various teams together -- business stakeholders, data scientists, and product -- to explore what is realistic and whether an ML/AI model is even the right solution to the problem at hand.

Technology Alone Will Not Save You

What nighttime warfighting can teach us about using AI in companies.

Undervalued Practices in ML/AI: Series Introduction

Following the herd can be costly. Improve your ML/AI shop by following these undervalued practices.