Data Lessons from the World of Algorithmic Trading (part 6): "Monitor Your Models"

This article is part of a series. In Part 1, I outlined the premise: ML/AI shops can borrow tips and best practices from how algorithmic trading (“algo trading”) shops operate. The rest of the articles explore those ideas in more detail.

You can think of an ML/AI model as a piece of factory equipment that churns out decisions. It operates faster and more consistently than a human doing the same work, with the caveat that it reacts in the moment and is unaware when it’s out of its depth. We can say that the model makes snap judgements based on a very limited world view (which it generalized from its training data).

If that doesn’t make you even a little bit uncomfortable, it should. It’s why deploying your model is not the end of the race, but the first milestone. You then need to watch over that model and stop listening to its decisions when it goes awry.

The view from the trading floor: trust, but verify

Traders live in a similar world: they develop strategy and execution algorithms (algos) to handle trading activity on their behalf. Every algo output amounts to a decision to buy, hold, or sell shares, which means bad decsions can prove costly. Especially if they aren’t caught in short order. Therefore, while traders trust the algos they build, they still rely on several levels of padding ¹ :

Monitors to detect problems, early on, and alert a human to intervene. These range from “hey, you should check this when you get a moment” to “something is very wrong, look at it right now!”
Automated circuit breakers to halt one or all of their execution models if things go too far out of bounds.
Emergency, manual overrides (such as a “dead man’s switch” or “big red button”) so a person can stop the system from executing trades.
Automated measures, supplied by the exchange, to stop their trading account (or trading for all accounts) under exceptionally abnormal conditions.

This is how they reap the benefit of fast, automated decisions and actions without the risk of the algos running amok. (Traders are very adept at assessing and handling risk. I explore that later in this series.)

Similarly, you want to catch problems before your customers do. Or before the models unwittingly trigger a bunch of mistaken purchases. Or whatever else the models do on your behalf. To do that, you take a page from the algo traders’ handbook to protect yourself against small disconnects and shocks.

Small disconnects

The small disconnects reflect a gap between the model’s training data and the real-world activity it encounters in production. Maybe the training data wasn’t sufficiently robust (it fails to account for seasonality, or for a segment of your customer base) or the world has changed a little since you developed the training set. Going back to the factory equipment analogy, the model isn’t suitable for its operating environment. It’s churning out bad decisions – it expresses a higher-than-expected defect rate – and that will cost you money.

You can limit the damage by implementing suitably aggressive alerts that get a human to perform corrective action. For example, the human could (quickly!) pull the model from production and re-train it on data that better reflects the current state of the world. ² ³

You only get alerts if you monitor the model’s performance, so that you know the parameters around its defect rate. A proper monitoring system will note a drift in numeric values, or a sudden affinity for one class in a multiclass classifier, or a drop in the model’s confidence. (If you accept an answer without checking the score that comes with it, you’ll never know when the model was trying to say: “I kinda maybe sorta think this is right.”)

Large disconnects (shocks)

Sometimes the training data and real-world inputs differ by a such a wide margin that they don’t count as small disconnects. These are outright shocks. The model, our piece of factory equipment, was well-suited for its intended operating environment but then that environment changed. Natural disasters, sudden political upheaval, black swans (including stock market “flash crashes”), or the fallout from certain gray rhinos all sabotage the model’s ability to operate properly. And since it isn’t self-aware, it doesn’t know that it’s mishandling the situation, so it will churn out inappropriate results without a care in the world. ⁴

To address a small disconnect, you fix the model by retraining it on better data. To address a shock, you protect the model by building infrastructure and procedures around its inputs and operating environment. You would protect a piece of physical equipment by tracking the humidity on the factory floor and installing a tamper-proof cap on the fuel intake valve. For your ML/AI model, you develop a robust, aggressive alert system that alerts human operators and disconnects the model from any business activity when the inputs change beyond expectations.

The wrap-up

Despite the terms “machine learning” and “artificial intelligence,” an ML/AI model is more a piece of equipment than a thinking brain. It’s an advanced form of automation, made of a mix of code, matrix algebra, and training data.

The model cannot adapt, so it can only misbehave when its world changes. It is unfit to operate without supervision. Learn from the way traders handle their algos to protect your business from a model gone awry: an ML/AI model is not properly deployed until it has monitoring and padding around it, plus human operators to intervene when it outputs bad decisions.

The exchange requires traders to demonstrate some rudimentary safety measures before they’re permitted to trade. Good traders – ones who plan to stay in business – add more of their own. ↩︎
We can argue that the models that perform online learning – they update their body of knowledge without a full build/redeploy cycle – still “retrain” in that a human must develop the feedback loop to send new records to the model. ↩︎
If the real world has changed enough, you may even need to change the number of target classes. ↩︎
You know how stress-testing your app will uncover the bugs and corner cases? A shock provides a similar service for your ML/AI model. ↩︎

The view from the trading floor: trust, but verify

Small disconnects

Large disconnects (shocks)

The wrap-up

FEATURED TAGS