(Photo by Kai Pilger on Unsplash)
This is your periodic reminder that (AI-Driven) Content Moderation Is Hard.
Even if your models exhibit damned-near-perfect performance (which is rare) you still need a lot of human involvement in the planning stages. Someone has to define "acceptable" and "objectionable" for your platform, and also decide what criteria the models will use to discern between the two. Oh, and then they need to take responsibility for those decisions.
Those decisions are moving targets. Language changes (especially slang) and the news cycle changes. Plus you have the people who will go out of their way to evade your moderation systems.
Anyway – and longtime readers will already know where I'm going here – this is not unique to content moderation. Sure, content moderation makes the problem more visible because everyone on a platform can see the models' performance in real-time. But if we zoom out, we're also reminded about challenges common to all AI models:
1/ Every AI model represents the collective decisions, actions, and inactions of those who built it.
2/ We need to reframe #1 as "attempts to represent" because the model can and will be wrong from time to time.
3/ Because of #2, it's very rare that you'll have a "set it and forget it" kind of model. Do yourself a favor: make sure you have tools in place to monitor the model's performance, and also the staffing to adjust/override/retrain the model as needed.
I'm thinking about this in context of this recent WSJ article on Meta's content moderation challenges:
"Inside Meta, Debate Over What’s Fair in Suppressing Comments in the Palestinian Territories" (WSJ)
Of special note:
Meta has long had trouble building an automated system to enforce its rules outside of English and a handful of languages spoken in large, wealthy countries. The human moderation staff is generally thinner overseas as well.
Developing baselines for predictive models
Understanding table stakes for an AI modeling effort
Weekly recap: 2023-10-29
random thoughts and articles from the past week