Silicon Valley Is Raving About a Made-in-China AI Model

Dow Jones01-27

SINGAPORE -- A Chinese artificial-intelligence company has Silicon Valley marveling at how its programmers nearly matched American rivals despite using inferior chips.

AI models from DeepSeek, the Chinese company, have zoomed to the global top 10 in performance, according to a popular ranking, suggesting Washington's export curbs are having difficulty blocking rapid advances in China.

On Jan. 20, DeepSeek introduced R1, a specialized model designed for complex problem-solving.

"Deepseek R1 is one of the most amazing and impressive breakthroughs I've ever seen," said Marc Andreessen, the Silicon Valley venture capitalist who has been advising President Trump, in an X post on Friday.

DeepSeek's development was led by a Chinese hedge-fund manager, Liang Wenfeng, who has become the face of the country's AI push. On Jan. 20, Liang met China's premier and discussed how homegrown companies could narrow the gap with the U.S.

Specialists said DeepSeek's technology still trails that of OpenAI and Google. But it is a close rival despite using fewer and less-advanced chips, and in some cases skipping steps that U.S. developers considered essential.

DeepSeek said training one of its latest models cost $5.6 million, compared with the $100 million to $1 billion range cited last year by Dario Amodei, chief executive of the AI developer Anthropic, as the cost of building a model.

Barrett Woodside, co-founder of the San Francisco AI hardware company Positron, said he and his colleagues have been abuzz about DeepSeek. "It's very cool," said Woodside, pointing to DeepSeek's open-source models in which the software code behind the AI model is made available free.

Users of DeepSeek's latest flagship model, called V3 and released in December, have noticed that it refuses to answer sensitive political questions about China and leader Xi Jinping. In some cases, the product gives responses in line with Beijing's official propaganda rather than including the perspective of government critics, as ChatGPT does.

"The only strike against it is some half-baked PRC censorship," said Woodside, referring to the People's Republic of China, but he said this could be removed because other developers can freely modify the code.

DeepSeek said R1 and V3 both performed better than or close to leading Western models. As of Saturday, the two models were ranked in the top 10 on Chatbot Arena, a platform hosted by University of California, Berkeley, researchers that rates chatbot performance. A Google Gemini model was in the top spot, while DeepSeek bested Anthropic's Claude and Grok from Elon Musk's xAI.

DeepSeek grew out of the AI research unit of High-Flyer, a hedge-fund manager with $8 billion in assets that is known for leveraging AI to trade.

"When humans make investment decisions, it's an art, and they just do it by the seat of their pants. When computer programs make such decisions, it's a science, and it has the optimal solution," said Liang in a 2019 speech.

Born in 1985, Liang grew up in China's southeastern province of Guangdong. He went to China's prestigious Zhejiang University and specialized in machine vision. A few years after graduation, Liang founded High-Flyer with two college friends in 2015.

Liang prefers to be thought of as an engineer rather than a trader, according to people close to him. His High-Flyer was a pioneer in China in applying deep learning to computerized trading. The technique, modeled on the human brain, allows computers to analyze more diverse types of data.

While DeepSeek's flagship model is free, the company charges users who connect their own applications to DeepSeek's model and computing infrastructure -- say, a business that wants to tap the technology to give AI answers to customers' queries.

Early last year, DeepSeek cut its prices for this service to a fraction of what other vendors charged, prompting the industry in China to start a price battle.

Anthony Poo, co-founder of a Silicon Valley-based startup using generative AI to predict financial returns, said his company moved to DeepSeek from Anthropic's Claude model in September. Tests showed DeepSeek performed similarly for around one-fourth of the cost.

"OpenAI's model is the best in performance, but we also don't want to pay for capacities we don't need," said Poo.

At their Jan. 20 meeting, DeepSeek's Liang told Chinese Premier Li Qiang that while Chinese companies were working to catch up, American restrictions on the export of advanced chips to China were still a bottleneck, according to people familiar with the meeting.

In 2019, High-Flyer began building a cluster of chips for AI research, in part with funds generated by its finance business. The company has said it later built a bigger cluster of around 10,000 Nvidia graphics-processing units that can be used to train large language models.

Only a handful of companies in China had computing infrastructure powerful enough to develop such models by late 2022, when OpenAI released ChatGPT.

DeepSeek said in a technical report that it used a cluster of more than 2,000 Nvidia chips to train its V3 model, compared with tens of thousands of chips for training models of similar size. A few U.S. AI specialists have recently questioned whether High-Flyer and DeepSeek have assembled further computing power beyond what they have announced.

Some external researchers said the DeepSeek model lacks certain capabilities of its more expensively trained rivals, for example, in keeping track of the context of long conversations.

For its latest reasoning model released Jan. 20, DeepSeek skipped a process known as supervised fine-tuning, in which programmers feed in the knowledge of human experts to give the model a head start. DeepSeek said its model, designed to solve tricky math word problems and similar challenges, was comparable to OpenAI's reasoning model o1 even though it omitted supervised fine-tuning and focused on reinforcement learning -- essentially directed trial and error.

Jim Fan, a senior research scientist at Nvidia, hailed as a breakthrough the DeepSeek paper reporting the results. He said on X it reminded him of earlier pioneering AI programs that mastered board games such as chess "from scratch, without imitating human grandmasters first."

Zack Kass, a former executive at OpenAI, said DeepSeek's advances despite American restrictions "underscore a broader lesson: Resource constraints often fuel creativity."

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Report

Comment

No comments yet

To The Moon

Silicon Valley Is Raving About a Made-in-China AI Model

Comment

Most Discussed

7x24