Meta's Custom Chip Initiative Faces Significant Setbacks

Deep News02-27 17:44

According to six sources with direct knowledge of the matter, Meta's in-house AI chip development project has encountered severe difficulties, even as the company signed new chip supply agreements with AMD and NVIDIA.

Sources indicate that due to significant design challenges, Meta canceled its most advanced in-development AI training chip last week, shifting focus to a structurally simpler version. The company informed employees in its AI infrastructure division of this adjustment last week.

This decision underscores the immense difficulty technology giants face in designing AI chips capable of competing with market leader NVIDIA.

Key points include: - Meta has scrapped its high-end, self-developed AI training chip, codenamed Olympus, because of design obstacles. - The move highlights the challenges of challenging NVIDIA's dominance. - Meta has already established data center chip supply agreements with both AMD and NVIDIA.

The revision to Meta's chip roadmap follows recent partnerships formed with Advanced Micro Devices (AMD) and NVIDIA in recent weeks: - On Tuesday, Meta and AMD announced that Meta would procure AMD chips with a capacity of up to 6 gigawatts, roughly sufficient to power several large data centers. - Earlier this month, Meta entered into a multi-generational, long-term agreement with NVIDIA, committing to deploy millions of NVIDIA's current and next-generation chips in its data centers.

Meta's in-house AI chip effort is part of the MTIA (Meta Training and Inference Accelerator) project. This initiative is a key part of the company's strategy to develop its own AI hardware, reduce reliance on external chip suppliers like NVIDIA, lower costs, and gain greater control over its data center infrastructure. For instance, Meta anticipates its capital expenditures will reach $115 billion to $135 billion by 2026, with a significant portion allocated to chips and servers.

A Meta spokesperson stated in a declaration: "We remain committed to investing in a diverse portfolio of chips to meet our needs, including advancing the MTIA family of products. We will share more details this year."

Other technology companies, including Microsoft, have faced similar challenges when developing their own AI chips. Last year, NVIDIA CEO Jensen Huang publicly suggested that most major tech firms would ultimately abandon plans for competing in-house chips, predicting their performance would continue to lag behind NVIDIA's offerings.

Meta has experienced setbacks with multiple custom chip projects: - The company canceled one version of its second-generation training chip, internally codenamed Iris. - It subsequently initiated a more advanced training chip project, codenamed Olympus, which has now also been canceled.

An individual involved in Meta's chip programs expressed that there is internal skepticism about whether self-developed chips can match NVIDIA's capabilities, citing risks of delays and redesigns. The person noted that such work requires a large team of engineers for design, debugging, and ensuring power consumption does not become excessive, which would negate any cost advantage over NVIDIA's chips.

The Iris chip utilized a SIMD (Single Instruction, Multiple Data) architecture, which is relatively simpler in hardware design but poses greater challenges for software programming when training AI models. In contrast, the Olympus chip was planned to use a SIMT (Single Instruction, Multiple Threads) architecture, similar to NVIDIA's AI chips, which is more software-friendly but presents extreme hardware design difficulties. Many tech companies favor the SIMT architecture popularized by NVIDIA due to its flexibility and suitability for training modern AI models.

According to four sources, Meta had initially aimed to finalize the Olympus design as early as the fourth quarter of 2026. The transition from initial design to mass production for a new chip typically takes an additional nine months or longer. The core GPU component of Olympus, responsible for AI computation, was planned to incorporate technology from chip startup Rivos, which Meta acquired last year. Rivos had claimed its GPU could efficiently run NVIDIA's proprietary CUDA software code, which is the dominant software ecosystem for training and running machine learning models.

One source mentioned that Meta initially planned to use Olympus to build large server clusters. However, executives ultimately determined that, amid intense competition with established rivals like OpenAI and Google, this approach would introduce significant risks for training new models. Multiple individuals indicated that the training software for Olympus was less stable than NVIDIA's, and its complex design could pose challenges for large-scale manufacturing.

Consequently, Meta is now opting to continue relying on training chips from third-party suppliers, whose software ecosystems are already mature and proven.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Meta's Custom Chip Initiative Faces Significant Setbacks

Comments