What you see here is the last week's worth of links and quips I have shared on LinkedIn, from Monday through Sunday.
For now I'll post the notes as they appeared on LinkedIn, including hashtags and sentence fragments. Over time I might expand on these thoughts as they land here on my blog.
The Microsoft Excel World Championship.
Or, as I'm sure some data scientist will call it: "Hey I can do that faster in Pandas!"
Someone else will say: "I think you mean R. It's better in R."
And while those two bicker, the Excel analyst will get their job done and go home on time.
"Le tableur Microsoft Excel aussi a son championnat du monde" (Le Monde)
Solid post from Andrew McAdams .
This underscores a key point about LLMs (not just ChatGPT):
The text they generate sounds great because the underlying model has picked up on grammatical patterns, not patterns of logic or reason.
If a decision requires nuance and additional context, AI is usually a poor fit.
(And, as always, be mindful of the data you send to third-party websites. LLMs or otherwise.)
Here we have another example of "AI system spouts nonsense."
"Amazon’s Alexa has been claiming the 2020 election was stolen" (Washington Post)
Of note:
Amazon declined to explain why its voice assistant draws 2020 election answers from unvetted sources.
But later, in that same article:
[Amazon spokeswoman Lauren] Raemhild said that Alexa draws data from “Amazon, licensed content providers and websites like Wikipedia.”
None of these strike me as particularly vetted sources of fact. In part because the list is so vague.
People love to shun BI/analytics/summary statistics (in favor of the supposedly more exciting world of ML/AI) but I've always been a fan. Organizing and presenting data in a way that allows people to gain new insights and guide their decisions, that's tremendously valuable.
Once in a while I come across something that highlights the power of that kind of analysis. Like this:
"[OC] The Highs and Lows of Popular Comedy Shows" (from Reddit's "r/dataisbeautiful")
Do you see how much data gets packed into these charts? Each image combines:
the different shows (vertical axis)
the number of seasons (number of boxes)
a metric calculated for each season, displayed in a way that we can compare against other seasons of the same show and even other shows (color coding)
(second image) this data expressed by the year, rather than just by the season numbers, so we could see that (say) one year was just an exceptionally good/bad year for TV in general.
This analysis was based on a select set of comedy shows. A studio exec could run this same analysis across a different grouping of shows, even different genres, and get a sense of (say) the ideal number of seasons. (If you're using TV shows to sell ad space or subscriptions, it'd be helpful to develop a rule of thumb – or, for an experienced exec, confirm your hunch – of how long you expect a show to be commercially viable.)
If anyone tells you that BI/analytics isn't useful, feel free to remind them: the fact that BI focuses on the past is a strength, not a weakness. So long as the data is clean and correct, anything that BI shows you is a fact. That's a far cry from the probabilistic answers you get from AI.
ChatGPT and other AI chatbots are great learning tools. But not in the way you might think.
High-profile chatbot failures (like saying inappropriate things, or telling lies, or revealing company secrets) don't just teach us about the limitations of chatbots. They also demonstrate some truths about all ML/AI models.
Earlier this week I posted a quick writeup on what I mean:
"What LLM chatbots teach us about AI in general"
What LLM chatbots teach us about AI in general
Wider AI truths, as surfaced by LLM failures
Developing baselines for predictive models
Understanding table stakes for an AI modeling effort