AI companies are copying each other's homework to make cheap models


AI companies are copying each other

Emma Cosgrove and Hugh Langley Mar 7, 2025, 3:00 PM GMT+5 The price of building AI is falling to new lows. New, cheaper AI-development techniques have developers rejoicing — but it's not all upside. As costs hit rock bottom, Big Tech foundation model builders must justify expensive offerings. How much does it cost to start an AI company? The answer is less and less each day as large language models are being created for smaller and smaller sums. The cost of AI computing is falling. Plus, a technique called distillation to make decent LLMs at discount prices is spreading. This has sent a spark through parts of the AI ecosystem and a chill through others. Distillation is an old concept gaining new significance. For most, that's good news. For a select few, it's complicated. And for the future of AI, it's important. Distillation defined AI developers and experts say distillation is, at its core, using one model to improve another. A larger "teacher" model is prompted to generate responses and paths of reasoning, and a smaller "student" model mimics its behavior. The Chinese firm DeepSeek caused a stir with OpenAI-competitive models it's reported to have trained for about $5 million. It sent the stock market into a panic, punishing Nvidia with a loss of $600 billion in market capitalization for the prospective downshift in chip demand. (Such a decline has yet to materialize.) A University of California, Berkeley, team of researchers, flying further under the radar, trained two new models for under $1,000 in computing costs, research released in January said. In early February, researchers from Stanford University, the University of Washington, and the Allen Institute for AI were able to train a serviceable reasoning model for a fraction of that, a research paper said. Distillation was an unlock for all of these developments. It's a tool developers use, alongside fine-tuning, to improve models in the training phase, but at a much lower cost than other methods. Both techniques are used by developers to give models specific expertise or skills. This could mean taking a generic foundation model like Meta's Llama and using another model to distill it into an expert on US tax law, for example. It could also look like using DeepSeek's R1 reasoning model to distill Llama to have more reasoning capabilities — meaning when AI takes a longer time to generate an answer to question its own logic and lay out the process of reaching an answer step-by-step.