Xiaomi Unveils New AI Model MiMo-V2-Flash with "Extreme Cost-Effectiveness," Luo Fuli: "This is Just Step Two in Our AGI Roadmap"

Deep News12-17 10:56

Eleven hours ago, Xiaomi made a surprise late-night announcement, releasing and open-sourcing its latest Mixture of Experts (MoE) large language model, MiMo-V2-Flash. The model boasts a total of 309 billion parameters, with 15 billion active parameters, and adopts the developer-friendly MIT open-source license. Its base model weights have already been published on Hugging Face.

Fuli Luo, head of Xiaomi's MiMo team, stated on social media: "MiMo-V2-Flash is now live. This is just step two in our AGI roadmap." This declaration underscores Xiaomi’s long-term vision and technological ambitions in AI.

From a market perspective, MiMo-V2-Flash could disrupt the competitive landscape of open-source AI models. With an ultra-low cost of $0.1 per million input tokens and $0.3 per million output tokens, combined with an inference speed of up to 150 tokens per second, the model presents an attractive option for developers and enterprises. This could accelerate the adoption of high-performance AI across broader applications, particularly empowering Xiaomi’s expansive "Smartphone x AIoT" ecosystem.

**Performance Rivals DeepSeek-V3.2 with "Extreme Cost-Effectiveness"** MiMo-V2-Flash has demonstrated robust capabilities in multiple benchmark tests, competing with top-tier open-source and proprietary models. According to Xiaomi’s official data, the model scored 73.4% in the SWE-bench Verified test for programming proficiency, surpassing all known open-source models and nearing the level of leading proprietary models.

In reasoning-intensive evaluations such as the AIME 2025 math competition and GPQA-Diamond science knowledge test, the model ranked among the top two open-source performers. A Morgan Stanley research chart also indicates that MiMo-V2-Flash is competitive in overall performance against mainstream models like DeepSeek-V3.2.

For increasingly critical agent-based tasks, MiMo-V2-Flash excels. Data shows high scores in the τ²-Bench classification across sectors like telecommunications, retail, and aviation, proving its ability to comprehend complex task logic and execute multi-turn interactions. Xiaomi highlights that the model’s high performance, 150-token-per-second inference speed, and ultra-low operational costs make it one of the most cost-effective high-performance models available. Currently, the model is temporarily free on API platforms, with base weights open-sourced on Hugging Face under the MIT license.

**Innovations Behind "Extreme Cost-Effectiveness": Efficiency and Long-Context Capabilities** MiMo-V2-Flash achieves high performance at low cost and high efficiency through several key architectural and training innovations.

First is the "Hybrid Sliding Window Attention" mechanism. Xiaomi employs a 5:1 hybrid ratio—five layers of sliding window attention (SWA) paired with one layer of global attention—reducing KV cache storage by nearly sixfold while supporting an ultra-long context window of up to 256k.

Luo shared engineering details in an X post: "We ultimately chose hybrid SWA. It’s simple, elegant, and outperforms other linear attention variants in our internal benchmarks for long-context reasoning." She noted a counterintuitive finding: a 128-token window size is optimal, while expanding to 512 tokens degrades performance. She also emphasized that "sink values are indispensable."

Second is Lightweight Multi-Token Prediction (MTP), enabling parallel prediction of multiple tokens instead of traditional sequential generation, boosting inference speed by 2x to 2.6x.

Luo revealed: "With three-layer MTP, we observed an average acceptance of over three tokens, accelerating encoding tasks by ~2.5x." She added that while GPU idle time was mitigated, full integration into reinforcement learning (RL) loops was deferred due to tight timelines. Xiaomi has open-sourced the three-layer MTP for developers.

In November, Luo, formerly of DeepSeek, announced her high-profile move to Xiaomi as head of the MiMo team. MiMo is Xiaomi’s flagship initiative for advancing large-model R&D, with Luo’s appointment signaling ambitions in cutting-edge spatial intelligence.

**Training Breakthroughs: 1/50 Compute for Performance Parity** During training, Xiaomi leveraged state-of-the-art techniques to maximize efficiency. The model used FP8 mixed precision for pre-training on 27 trillion tokens.

A groundbreaking addition was the Multi-teacher Online Policy Distillation (MOPD) framework in post-training. Inspired by Thinking Machine’s On-Policy Distillation, MOPD enables the student model to receive dense reward signals from multiple expert teacher models. This approach requires just 1/50 of the compute used in traditional supervised fine-tuning (SFT) combined with RL, yet achieves teacher-model peak performance.

Luo noted that this framework lays the groundwork for a "self-reinforcing loop," where today’s student model evolves into tomorrow’s stronger teacher, enabling continuous and efficient iteration.

**Xiaomi’s AI Blueprint: From Smartphones to AGI** MiMo-V2-Flash is not an isolated showcase but a strategic component of Xiaomi’s AI ambitions. As Luo stated, this is merely "step two" in its AGI roadmap, hinting at deeper developments ahead.

Morgan Stanley’s research views this as a demonstration of Xiaomi’s "commitment to AI R&D," anticipating further progress in cloud and edge AI. Strong in-house AI capabilities could enhance intelligent experiences across Xiaomi’s smartphones, IoT devices, and even EVs, reinforcing its ecosystem moat.

The launch of MiMo-V2-Flash may reshape the open-source AI market while revealing Xiaomi’s strategy to deeply integrate self-developed AI across its "human-vehicle-home" ecosystem. Fourteen years ago, Xiaomi redefined flagship smartphones with its ¥1,999 pricing. Today, with MiMo-V2-Flash’s exceptional performance and disruptive cost, Xiaomi may be poised to create another "Xiaomi moment" in open-source AI.

Try the model here: https://aistudio.xiaomimimo.com/#/

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Comments

We need your insight to fill this gap
Leave a comment