(Photo by Arseny Togulev on Unsplash)
I've often said that AI chatbots like ChatGPT and Midjourney are just "replaying their training data." What does that mean? How should that knowledge set your expectations of what these tools can (and cannot) do?
Under the hood, those chatbots use what are called large language models (LLMs). That word "model" is a hint: an LLM is an oversized version of your standard machine learning (ML) model.
A quick primer on ML models will lay the foundation for understanding chatbots, then. Let's start with some everyday pattern-finding:
You've just moved to a new town. After a few weeks of commuting to work, you get a feel for when traffic is heavier. You can use this knowledge – this understanding of the past and present – to make some educated guesses about the future. "If I leave a little later on Friday, traffic will be lighter, and I'll have a less-stressful drive." There are no guarantees that you're correct, mind you, but it's a pretty safe bet.
Congratulations: you have just simulated the work of a machine learning model. You did this by:
An ML model takes a more formal path than a person, and it can sift through a lot more attributes ("features") in search of patterns, but it's the same overall process:
A prediction is another way of saying "a synthetic data point that we didn't have before." Specifically, an ML model's prediction gives you points that were not in the original training dataset but – here's the important bit – look like they could have been.
Those LLM chatbots? They're oversized ML models. They're a larger version of the "collect training data, let an algorithm find the formula, replay that formula to fill in new data points" steps I described above. So when you use an LLM you are asking it to fill in a new data point based on the patterns it found in its training data.
The LLM won't return the exact documents on which it was trained (well, not usually -- specific text or images do indeed come up now and then, which adds fuel to some ongoing lawsuits) but you get documents that look like they could have come from the training data.
Why is it important to understand the mechanics of these AI chatbots?
1/ While a chatbot's underlying LLM is indeed "creating" text or images, it won't stray too far from its training data. It can't. Its entire job is to return documents or images that (on a statistical level) look like they could have been in the training data.
That said…
2/ The LLM doesn't see "facts" or "logic" or "train of thought" when it generates text. It only sees the linguistic patterns found in its training data. When I said that the ML algorithm looks for patterns in the data, I glossed over an important step: it first transforms all of that text data into mountains of numbers.
I'll spare you the technical details. You can do a web search for "vectorization" or "embeddings" if you're curious. But as far as the LLM is concerned: "Word 142 is usually followed by word 435345 or maybe 324, which is sometimes followed by word 798 or even 32498384." It has no idea of what combination of words might be considered awkward, controversial, or socially acceptable.
Most of all …
3/ This explains why chatbots seem to make things up or "hallucinate." (I prefer the terms "fabricate" or "lie" but … fine.) Since the underlying LLM has no concept of facts or logic, and since it's trying to replay the grammatical patterns from its training data, it will sometimes create text that is grammatically correct but complete and utter nonsense.
It's possible to train an LLM chatbot on text that is factually incorrect, and it will emit a lot of text that is similarly incorrect. But it's also possible for it to hallucinate when it's been fed a diet of plain truthful material. "John Smith drove his tractor to the moon" could easily come from an LLM that was trained on documents about space exploration and farming.
The sum total is that an AI chatbot (and its underlying LLM model) doesn't really "know" anything. It doesn't have opinions and certainly isn't trying to convince you of any political view. The chatbot is simply giving you a block of text that could have plausibly fit into its training dataset. But the notion of "plausible" is based on grammatical patterns and not logical trains of thought.
Weekly recap: 2023-07-02
random thoughts and articles from the past week
Why do we need data scientists, then?
We have off-the-shelf models and turnkey data tools. Why do you need to hire data scientists, then?