Model distillation might be the most important shift happening in AI right now—and it’s reshaping the entire tech industry. It's increasingly becoming a MASSIVE topic. DeepSeek's R1 model released yesterday only reinforced this
Model distillation is a process where a smaller, simpler model (the "student") is trained to replicate the behavior and capabilities of a larger, more complex model (the "teacher"). This is achieved by using the teacher model's outputs (e.g., predictions or reasoning processes) as training data, allowing the student to inherit high performance with reduced size and computational demands.
So why is this important? For large AI labs, capital and scale were moats. It took literally billons of dollars of compute and data to pre-train a state-of-the-art model. Let alone all the dollars it took to pay the researchers (if you could even hire them)! There were only a small handful of companies that had access to this kind of capital and research talent. And when it takes that much money to create something, you'll want to charge for it
However, model distillation is SUPER important. And reasoning models are just as easy (if not easier) to distill. So what does this mean? It means anyone can take that super complex, SOTA model (that someone else spent billions on), spend a FRACTION of the cost and time distilling it, and end up with their own model that's nearly just as good.
What does this mean? It means the large AI labs (in the most pessimistic view) are providing free outsourced R&D and CapEx to the rest of the world.
OpenAI o1 charges $15 / million input tokens and $60 / million output tokens. The same cost for the DeepSeek R1 model? $0.14 / million input and $2.19 / million output. Orders of magnitude difference
So what could the implications be??
- First, let's talk about the general positives of distilled models. They provide a cheaper (point above), and more accessible (distilled models are smaller / have less parameters, making them easier to store, deploy, and run on less powerful hardware) alternatives to the large models. These are all great for the ecosystem!
Other implications I'm thinking through:
- Rise of domain specific models: Distilled models enable companies to focus on creating smaller, domain-specific models for specialized tasks (e.g., healthcare, finance, gaming). These could outperform general-purpose models in their niches.
- If anyone can distill a high-performing model, the differentiator shifts to proprietary data for fine-tuning (and the data platforms that store that data). Companies with unique, high-quality datasets will hold a significant competitive advantage. I could also see the data platforms offering model distillation services / products
- will the large providers stop spending tons of money to develop SOTA models?? I doubt it... but it does question the ultimate business model of selling essentially API calls. I think the business models will have to slowly move more towards packaged products around a model vs selling infra. Or just selling a lot more infra around the API call...The huge benefit the large labs have is they are pushing the industry forward - others are fast following. Fast followers rarely win, but if there are enough of them they can dilute the leaders.
- I think we'll see a huge jump in large tech companies offering their own distilled models. Apple? Others? Could have 10+ others in the model race a year from now. What will this do to the price providers charge?
- Do the labs all end up looking like open source infra, where the hyperscalers can just pick up their models and offer them on their own platform (like they would with some piece of opensource software that they'd offer their own hosted version of?). But instead of hosting a piece of open source software (like confluent, elastic, mongo, etc) they distill a SOTA model from a large lab
- There will certainly be a response to all of the distillation. The most natural to me is the "product" starts evolving beyond an API call, and to more fully packaged products around an API call like I mentioned above. But building products is a lot different than building infra
- Collaboration: Open-source and distillation could create ecosystems of collaboration rather than competition. Think of GitHub-like hubs for model distillation and fine-tuning
Model distillation isn’t just a tech trend—it’s a shift in power. The future of AI is accessible, decentralized, and anyone's game. I still think the large AI Labs are in a good spot - as I said earlier fast follows rarely win out. It just puts more pressure on the ones pushing the models forward to build more around the model (and of anyone, they're in the best position to do this)
Comments