Misuse of Models: Recent Facial Recognition Failures

My latest book: Twin Wolves: Balancing risk and reward to make the most of AI

2020-09-22 | tags: AI risk AI safety

A real-world example of weighing a model's total cost (TCM) against the alternatives.

Facial recognition models are in the news again. As usual, it's not for good reasons:

Twitter has deployed a model to crop and center images, in tweets displayed in the main timeline view. It centers on faces when it detects one. People have demonstrated that it will focus on and crop around a Caucasian face, even if there are other faces in the image that are not Caucasian.

Zoom's "virtual background" feature -- which transplants the person on-screen to a locale of their choosing -- has stirred similar controversey. In at least one case, it completely erased the head of a person who is not white.

Like a lot of -- frankly, far too many ^[1] -- facial-recognition models, these likely suffer from very imbalanced training datasets. ^[2] And Twitter and Zoom are taking some well-deserved lumps for this: creating a balanced training dataset (or, handling a known-imbalanced one) is Machine Learning 101. This amateur hour is compounded by the fact that facial-recognition systems had already developed a well-publicized track record of bias and discrimination. At this point, there's no excuse for an imbalanced training dataset for a facial-recognition model.

What I don't see in the list of grievances, however, is the notion of the cost associated with these application features and the ML/AI models behind them.

Pricing it out

I think a lot about alternatives, tradeoffs, and calculating total costs. I suspect, in these cases, Twitter and Zoom did not. Zoom missed a chance to account for the model risk -- the cost of the model being wrong, which is just one component of TCM -- but they are stuck with using a model since there's no other way to implement the virtual background. Fair enough.

Twitter, on the other hand, had two alternatives that didn't involve ML/AI:

Alternative 1: Scale the image If Twitter has decided that all images must meet some predetermined dimensions for the timeline view, they could shrink (scale) them accordingly. This is especially useful for people who want their original image -- or one as close as possible to it -- to accompany their tweet.
Alternative 2: crop the image based solely on coordinates This is even easier than the first option. If your maximum image size is X by Y pixels, capture the area from 0,0 to X,Y and save the result. You can even get fancy and calculate the crop based on the image's geometric center.

Neither approach will earn points for using exciting ML/AI techniques. But they win out in terms of simplicity. There exist open-source libraries to scale and crop images, and in my experience they offer a shallow learning curve. It's also hard to get into trouble for scaling or cropping an image based purely on geometry. There's no controversey, no reputation risk, no sheepish apologies pushed through PR channels.

These are very low-cost solutions, especially when you consider that the problem here was to make an image fit a certain size. (I can't figure out why Twitter chose to add "... and center on faces" to the requirements. If the person Tweeting wanted the image cropped and centered that way, wouldn't they have done it themselves?)

You can imagine comparing these, side-by side, in a product planning meeting:

	ML/AI model	scale image	crop image
cost to develop	medium to high	near-zero	near-zero
cost of being wrong, on a technical level (model risk)	low to medium	near-zero	near-zero
cost of being wrong, on a PR level (reputation risk)	high	near-zero	near-zero

With just this table, we can see that the simpler approaches win out in terms of costs. Unless Twitter saw significant benefit to deploying a model for this -- and I can't see what that benefit would have been -- then it didn't make sense for them to do so. The ML/AI approach will always be a net negative, both in terms of up-front costs and accounting for risks.

Absent any contribution to a wider strategic goal, developing a facial-recognition model to crop photos was a boondoggle at best and busywork at worst. I see it as a poor use of time, effort, and money. We can only imagine what other problems Twitter could have addressed if they had used a simple approach to cropping the images, instead of applying machine learning where it served no discernible purpose.

Sum total

One could argue that Twitter is taking heat for having built an imbalanced training dataset. Maybe. I'd contend that Twitter is really paying the price because it employed ML/AI when it didn't need to, and the associated reputation risk materialized into a reality.

At a higher level, maybe it's time to stop developing facial recognition models altogether? The companies that deploy them can't get them right on a technical level, and also fail to grasp the social impacts of the models being wrong. ^[3]

"More than zero" counts as "far too many" in this case. ↩︎
That, and, image recognition is still a relatively new and untested domain. The reality/expectation spread is higher here than in pretty much any other area of ML/AI, which leads to disturbingly unreliable results whenever someone deploys such a model. ↩︎
We also have a lot of very real concerns of these models being consistently right. But we're pretty far from that point. ↩︎

Setting Expectations for ML/AI Projects

Explaining the realities of how an ML/AI project may go awry.

New Radar article: "Our Favorite Questions"

I've published an article on O'Reilly Radar, on what makes a good question and what are my favorite questions to ask.