Google discovered the transformer in 2017. In 2018, this then-tiny startup, OpenAI, took the idea of a transformer and created a large language model (LLM) called GPT.
A large language model = a computer algorithm that a) understands and b) generates A LOT of human language by predicting the next word in a sentence.
For example, in “The quick brown fox jumps over the lazy __”, the LLM can predict that the next word might be “dog.”
This is how ChatGPT can finish a sentence or write a completely new one.
But you don’t just go from a transformer to GPT overnight.
What OpenAI did was:
- Feed the transformer A LOT of human sentences. GPT-1 was trained on 7,000+ books, and GPT-3 was trained on the equivalent of 164,000 copies of Lord of the Rings!
- The transformer reads all of this and uses its “attention mechanism” (read: contextual understanding) to determine which words are important in forming a sentence.
- The transformer became really good at predicting the next word, so it could start writing new sentences that also make sense.
That’s how you get an LLM, and that’s how you get GPT.
Of course, it’s technically way more complicated than that. You have parameters, context windows, fine-tuning, and much more.
Those sorts of things turn a very basic LLM (GPT-1) into a very advanced LLM (GPT-4o).
Next, OpenAI took this GPT that could predict words and made it into something you can talk to: ChatGPT.
They productized a technology. It’s like taking potatoes, adding spices, and turning them into French fries. GPT = potatoes. ChatGPT = french fries.
But GPT ain’t the only french fries on the block, folks. These days, there are loads of LLMs that are good at many different things: art, music, video, and much more.
Sora is a great video producer. ElevenLabs is a great speaker. Midjourney is a great artist.
These models are trained on different data to be good at other things.
As for the “things” they’re actually good at? Well, just wait until tomorrow—can’t wait to blow your minds :)
See you cool cats soon,
Noah