This post is part of a series in which I explain ML/AI concepts at an executive level.
I opened Part 1 of this series with a declaration: business stakeholders must develop an understanding of ML/AI if their companies are to succeed with this new capability. This understanding is a key element of data literacy, which is in turn the first step to developing a data culture in a company.
I followed with an explanation of machine learning that I call “high-dimensional pattern-matching.” In essence: ML models do what we do as humans – compare objects according to their attributes, in search of generalizable patterns – but on a much larger scale.
Today we’ll build on that knowledge to explore, at a conceptual level, the mechanics of building an ML model. Knowing this will help you communicate with your company’s data team and set more realistic expectations for their work.
So when it comes to ML models, where do the data scientists and machine learning engineers fit in? In short, they:
- Translate real-world and business concepts into numbers.
- Apply algorithms to that data, in order to find patterns.
- Save those patterns into a model.
- Deploy that model so it can make predictions.
- Translate those predictions (which are, deep down, numbers) back into real-world business insights.
I’ve summarized that in just five bullet points, but there’s a lot more to it than it may seem on the surface. Let’s dig in.
(Please note that these are the technical steps that take place for building a model. This is part of a wider sequence in which your stakeholders and product team have prioritized a particular business need, confirmed that an ML model would be a feasible solution, evaluated it for risks and ethical problems, gathered a training dataset, and so on. Be sure to involve the data scientists in that planning process, too! The sooner, the better.)
Humans see the world in terms of concepts like “airplane ticket” or “house.” Deep down, ML models (and the algorithms used to build them) only see groups of numbers. So the first step in building a model is for your data scientists to translate the attributes of real-world concepts – called “features” – into numbers which the machines will use to make comparisons.
Doing this requires understanding your business domain, in order to evaluate which features are worthwhile to include in a model. (That is, which features are likely to have predictive power.) This is known as feature selection. Let’s say that you manage a real estate firm. Square footage and the number of floors in a home might make good features for predicting property prices.
In addition to feature selection, your data scientists and ML engineers will perform feature engineering. This is a fancy term for modifying the features so they are more amenable to analysis. For example, you could translate the feature “square footage” to “number of square feet, relative to some median value.” Instead of three properties of 4,000, 4,050, 4,100 square feet (which are very close in value) your data team might subtract the median value of 4,050 to yield -50, 0, 50 square feet (numbers that are very different in the eyes of an ML model, and therefore easier to use for pattern-matching).
Having chosen and modified the features, data scientists feed the resultant dataset into an algorithm. An algorithm, such as a random forest or a custom neural network, looks through the data in search of patterns.
All of the patterns that the algorithm “learns” get packaged up into a model. You can then feed new data to that model to make predictions: “Given this square footage, number of floors, and so on … what should the price of this house be?”
Training a model isn’t just a one-shot affair. It’s more of a cycle. Data scientists and ML engineers will spend time choosing features, performing feature engineering, and testing the resultant model. They’ll repeat that until they either see the performance they want, or they decide that this model isn’t going to pan out.
One key point is that there’s no intrinsic end to a model’s development; it’s up to you and the data team to decide what is “good enough” performance and what constitutes “too much time” to get there.
A model is sort of like a web application: it needs to run somewhere so it can service requests for prediction.
Here, the data scientists and ML engineers coordinate with your software developers and operations team to connect the model to the rest of your technology infrastructure. This is called deploying the model. Doing so may entail plugging it into an existing application, or putting it behind a web server to accept requests.
(I’ve glossed over a key point here, which is the handoff of the model from the ML team to the software dev or operations team. I’ll cover that in a future post.)
By this point the data scientists have trained a model and worked with your operations team to deploy it. All done, right? Not quite.
Remember what I said earlier, that a model only sees numbers? That holds for output as well as input. The model returns a numeric score and it’s up to you to figure out what to do with that.
For example, you may think that the model tells you which stores to close, or which candidates to hire, or whether to buy this company’s stock. But what it really tells you is, quite simply: “this store, candidate, or company share price is rated
What should you do with that number? The model doesn’t have any knowledge about your business, so it can’t decide for you. Your company’s data scientists, product managers, and stakeholders need to meet up and decide how to interpret that number in the context of a business decision.
The data scientists can shed some light on what went into the model and how it may have reached that conclusion. Depending on the algorithm used to build the model, they may also be able to tell you how confident that model is in its prediction. The product managers and stakeholders then have to decide what is the cutoff, or threshold value, that separates “yes” or “no” for closing that store, hiring that candidate, or buying those shares.
And that is how your data scientists and machine learning engineers build a predictive model.
There’s certainly a lot more detail involved here, but that’s the gist.
One key point about an ML model is that its entire “knowledge” and “experience” come from its training data. It doesn’t know what it doesn’t know. If the present-day world no longer matches what was in the training data, the model will serve bad predictions.
Even the most carefully-built training dataset will eventually diverge from reality. That’s the time to develop a new dataset and start the model training process anew.
In the next post, I’ll share some thoughts on when a predictive model is a good fit for a given problem.
(Would you like your stakeholders to develop a deeper
understanding of ML/AI?
Contact me to get started. )