This post part of a series on undervalued practices in ML/AI:
In the first post of this series, I said that planning -- taking the time to develop a data strategy or road map -- is an undervalued practice in ML/AI. And it is. You should definitely plan, so long as you also consider the old adage: "No plan survives first contact with the enemy."
Remember that AI involves question marks, not periods. Every ML/AI project is an experiment, which means it can go awry. And every modeling exercise can fail early on (due to insufficient data or insufficient "signal" in the features) as well as after its release to production (because your training data no longer reflects the real world).
You can spot some of those problems early on by performing a brief risk assessment. Constantly ask yourself, "How can this go wrong? And what do we do when that happens?"
Other problems will certainly creep in over time, yes. But by keeping an open mind to how reality might deviate from your intended result, you'll not only spot problems before they happen, you'll also remain flexible in the face of the inevitable surprises.
ML/AI exercises, especially predictive models, have no fixed end state. That means it's up to you to keep an eye on the R&D effort and determine when it's gone far enough. Unless you define those boundaries around a project early on, it can easily become a run on your budget, draining time and money but never yielding any useful results.
This is why I approach a modeling exercise as a series of sprints. The first sprint is a discovery effort that ends with a baseline model and an evaluation: "do we think this will yield fruit?" If so, we do another sprint. This repeats until the model performs well enough for our needs, or we decide it's time to stop.
You're not applying ML/AI at random, but in pursuit of a particular business purpose, right? Maybe you're trying to classify documents, predict customer churn, or sort through images.
A model is one of many ways to reach your goal. Other ways include software development and human labor. Each path has benefits and drawbacks, as well as a price tag in terms of time, effort, reputation, and money.
You'd do well to price out those different paths in order to make an informed decision. That price extends beyond the dollar cost for R&D. Also consider the cost of model maintenance, the business risk if the model fails in production, and the reputation risk if it fails in certain ways. (My post on "TCM: Total Cost of Model" explores this in detail.)
Even if ypu know that, say, human labor will far outweigh the cost of an ML/AI model, it's still worth the effort to price it out. Perhaps the modeling doesn't go as planned, and the projected cost increases suddenly put it in the same range as the manual solution. That's important to know.
There's no shortage of shady business models where ML/AI is concerned. It's become too easy for companies to collect, analyze, and resell data, all through unscrupulous means.
This behavior always comes to light. And rarely because the company chooses to release it.
It's bad enough when your company is actively trying to hide what it's doing. Far worse when there's genuinely no ill intent, but your ML/AI project is seen in a negative light. By the time this becomes public, you look as though you were hiding something nefarious. You've lost control of the narrative and it will cost you a lot of time, effort, money, and PR to get it back.
That's why it's best to identify and handle these problems early. I go into detail in my series "Data Ethics: A Risk Approach," but the gist is that you should explore how your company will use data and spot potential ethical issues long before you build anything. This gives you a chance to change your plans long before the project comes back to haunt you.
Consider the old "newspaper rule" -- "If our plans were front-page news, tomorrow, would we be comfortable with that?" -- as you plan every project, every model, every data product, and you will spare yourself a lot of trouble.
Undervalued Practices in ML/AI, Part 2: Hiring and Team Structure
Go off the beaten path to make the most of your data-related hiring.
Undervalued Practices in ML/AI, Part 4: Project Execution
There's a lot more to this than just building models.