Deepseek: In recent days, R1, the powerful new open-source artificial intelligence model created by the Chinese startup Deep Seek, has shaken Silicon Valley and the rest of the world. Equipped with cutting-edge capabilities and developed on a seemingly shoestring budget, the system powers an AI chatbot that has jumped to the top of the most downloaded apps charts (even though it has just disappeared from Italian app stores), opening a debate about an imminent upheaval in the technology industry.
While some argue that DeepSeek's rise indicates that the United States has lost its edge in the sector, other experts - including executives of companies that build and customize some of the most powerful frontier models in the world - argue that the recent boom is instead a sign that another type of technological transition is underway.
DeepSeek Revolution:
Instead of trying to build ever-larger models that require mind-numbing amounts of computational resources, AI companies are focusing more on developing advanced capabilities, like reasoning. This approach has created an opportunity for smaller, more innovative startups like Deep Seek that haven’t had billions of dollars in outside investment. “It’s a paradigm shift toward reasoning, which is going to be much more democratized,” says Ali Ghodsi, CEO of Databricks, a company that specializes in building and hosting custom AI models.
“It’s been clear for a while that innovation and creating greater efficiencies […] are going to be the next round of technological breakthroughs,” says Nick Frosst, cofounder of Cohere, a startup that builds cutting-edge AI models.
In recent days, thousands of developers and AI enthusiasts have flocked to DeepSeek’s website and official app to try out the company’s latest model and then shared examples of its sophisticated capabilities on social media. Meanwhile, shares of US tech companies, including chipmaker Nvidia, tumbled on Monday, as investors began to question the need for massive investments in AI development.
Deepseek The Cost Question:
The technology behind DeepSeek’s AI was developed by a relatively small Chinese research lab spun out of a major hedge fund. A research paper published online in December found that the large DeepSeek-V3 language model cost just $5.6 million to build, a fraction of what competitors were asking for similar projects. OpenAI has previously said that some of its models have cost as much as $100 million each, still far less than the company’s most recent systems (and those of Google, Anthropic, and Meta).
The performance and efficiency of DeepSeek’s models have already prompted some big tech companies to talk about cutting costs. A Meta engineer, who asked not to be identified because he wasn’t authorized to speak publicly, believes the giant will likely look into DeepSeek’s techniques to find a way to reduce its AI outlay. “We believe open-source models are driving significant change in the industry and that this will bring the benefits of AI to everyone more quickly,” a Meta spokesperson said in a statement. “We want the US, not China, to continue to lead the way in open-source AI, which is why Meta is building it with Llama models, which have been downloaded more than 800 million times.”
However the true cost of developing DeepSeek’s new models remains unknown, as a figure cited in a single research paper may not tell the full story. “I don’t think it’s $6 million, but even if it’s $60 million, that would be a game-changer,” said Umesh Padval, managing director of Thomvest Ventures, a firm that has invested in several AI companies. “On the profitability front, [DeepSeek] will put pressure on companies that are focused on consumer AI.”
Databricks’ Ghodsi says his clients have started asking if they can use DeepSeek’s latest model and underlying techniques to reduce costs in their organizations, adding that so-called distillation — an approach used by the Chinese startup’s engineers, which involves using the results of one large language model to train another model — is relatively cheap and easy.
Padval argues that the existence of models like DeepSeek’s will ultimately benefit companies that want to spend less on AI, but many of them may have reservations about relying on a Chinese model for sensitive tasks. So far, only Perplexity, among the big players in the field, has publicly announced that it uses R1, but it stressed that it hosts the model “completely independently of China.”
DeepSeek Reasoning Capabilities:
Amjad Massad, CEO of Replit, a startup that offers AI tools for programming, called DeepSeek’s latest models impressive, speaking to Wired US. While he found Anthropic’s Sonnet superior at many computer engineering tasks, Massad found R1 to be particularly adept at turning text commands into code that a computer can execute. “We’re looking at using it primarily for agent reasoning,” he added.
DeepSeek’s latest two offerings, R1 and R1-Zero, can simulate reasoning at the level of OpenAI and Google’s most advanced systems, breaking problems down into smaller building blocks to tackle them more effectively. This process requires a significant amount of additional training to ensure that the AI reliably arrives at the correct answer.
A paper published last week by DeepSeek researchers outlines the approach they used to create R1, which the company claims outperforms OpenAI’s most advanced model, o1, on certain benchmarks. DeepSeek’s tactics include a more automated method for learning to solve problems correctly and a strategy for transferring skills from larger to smaller models.
Deepseek The Chip Issue:
Another hot topic surrounding DeepSeek is the hardware the company may have used. The issue is particularly relevant because the U.S. government has in recent years implemented a series of export controls and other trade restrictions to limit China's ability to acquire and produce cutting-edge chips, which are needed to build advanced AI.
In an August 2024 research paper, DeepSeek said it had a cluster of 10,000 Nvidia A100 chips, which fall under the restrictions announced by the United States in October 2022. In another paper published in June of the same year, the Chinese startup said its previous model, DeepSeek-V2, was developed using Nvidia H800 chips, a less capable component developed by the chipmaker to meet American restrictions.
A source at an AI training company, who asked not to be identified to protect their professional relationships, estimated that DeepSeek used about 50,000 Nvidia chips to build its technology.
Nvidia declined to comment on the matter in detail, calling DeepSeek “an excellent advancement in AI” through a spokesperson, adding that the startup’s reasoning-based approach “requires a significant number of Nvidia GPUs and high-performance networks.”
Regardless of how they were built, DeepSeek’s models appear to demonstrate a less closed-minded approach to AI development. In December, Clem Delangue, CEO of HuggingFace, a platform that hosts AI models, predicted that a Chinese company would dominate the AI space thanks to the speed of innovation in open-source models that China has largely embraced. “It was faster than I thought it would be,” Delangue later said.