Politics

Deepseek revealed how much it actually cost the Chinese chatbot that shook the artificial intelligence market

The Chinese developer of Deepseek said he has spent $ 294,000 to train his R1 model, a much smaller amount than the amounts announced by American rivals, an information that will probably return the Beijing place debate in the race for the development of artificial intelligence, writes Reuters.

The updated information of the company based in Hangzhou-the first estimate he published on the training costs of the R1-appeared in an article revised in the Nature Academic Magazine, published on Wednesday.

The launch by Deepseek of what he called lower -cost systems in January caused global investors to sell actions in the technological field, because they were afraid that the new models could threaten the domination of the IA leaders, including NVIDIA.

Since then, the Chinese company and its founder, Liang Wenfeng, have disappeared largely from the public space, except for the launch of some new products.

The article in Nature, which mentioned Liang as one of the co -authors, stated that the R1 Al Deepseek model, focused on reasoning, cost $ 294,000 for training and used 512 NVIDIA H800 chips. A previous version of the article published in January did not contain this information.

The training costs for the large linguistic models that supply the chatbots AI refers to the expenses occasioned by the operation of a strong chip cluster for weeks or months to process huge amounts of text and code.

Openai, costs of over $ 100 million

Sam Altman, CEO of the American Openai giant, said in 2023 that training of fundamental models cost “much more” of $ 100 million, although his company did not provide detailed figures for any of his launches.

Some of the Deepseek statements about the development costs and technology used have been questioned by American companies and officials.

The mentioned H800 chips were designed by Nvidia for the Chinese market after, in October 2022, the US forbade the company to export the H100 and A100 chips to China.

US officials told Reuters in June that Deepseek has access to “large volumes” of H100 chips that have been purchased after implementing American controls on exports. Nvidia told Reuters at the time that Deepseek used H800 chips legally purchased, not H100 chips.

In an additional informative document that accompanies the article in Nature, the company acknowledged for the first time that it owns A100 and stated that they used them in the preparatory development stages.

“As for our research on Deepseek-R1, we used the A100 GPUs to prepare the experiments with a smaller model,” the researchers wrote. After this initial phase, R1 was trained for 80 hours on the 512 H800 chips, they added.

Reuters has previously reported that one of the reasons why Deepseek managed to attract the brightest minds in China was that it was one of the few local companies that operate a supercalculating cluster.

“Distillation” of OpenAI models

Deepseek also responded for the first time, although not directly, to the statements of a peak counselor of the white house and other US personalities, who claimed in January that they would deliberately “distilled” the open models in its own models.

Deepseek has constantly defended the “distillation”, claiming that it offers better models, at the same time being much cheaper to drive and rolling, allowing a broader access to IA -based technologies, due to the intense energy resources requirements of these models.

The term refers to a technique by which a system of IA learns from another system of Ia, allowing the newer model to benefit from the time investments and computing power that were necessary for the construction of the previous model, but without the associated costs.

Deepseek said in January that he used the Llama Ai Open-Source of Meta for some “distilled” versions of his own models.

Deepseek said in Nature that the training data for his V3 model were based on indexed web pages containing “a significant number of responses generated by the Openai model, which can cause the basic model to indirectly acquire knowledge from other powerful models.”

But he said that this was not intentional, but rather by chance.

Photo source: dreamstime.com

Ashley Davis

I’m Ashley Davis as an editor, I’m committed to upholding the highest standards of integrity and accuracy in every piece we publish. My work is driven by curiosity, a passion for truth, and a belief that journalism plays a crucial role in shaping public discourse. I strive to tell stories that not only inform but also inspire action and conversation.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button