Distillation Can Make AI Models Smaller and Cheaper
Briefly

Distillation Can Make AI Models Smaller and Cheaper
"The Chinese AI company DeepSeek released a chatbot earlier this year called R1, which drew a huge amount of attention. Most of it focused on the fact that a relatively small and unknown company said it had built a chatbot that rivaled the performance of those from the world's most famous AI companies, but using a fraction of the computer power and cost."
"Some of that attention involved an element of accusation. Sources alleged that DeepSeek had obtained, without permission, knowledge from OpenAI's proprietary o1 model by using a technique known as distillation. Much of the news coverage framed this possibility as a shock to the AI industry, implying that DeepSeek had discovered a new, more efficient way to build AI. But distillation,"
DeepSeek released a chatbot called R1 that claimed performance comparable to leading AI systems while using far less compute and cost. The release triggered sharp market reactions, including a steep one-day drop in value for major tech stocks and a record loss for Nvidia. Sources alleged that DeepSeek used distillation to extract knowledge from OpenAI's proprietary o1 model without permission. Knowledge distillation is a commonly used technique in AI that compresses or transfers performance from larger models or ensembles into smaller, more efficient models. The distillation approach originated in a 2015 paper by Google researchers, including Geoffrey Hinton, to replace cumbersome ensembles with single models.
Read at WIRED
Unable to calculate read time
[
|
]