Deepline | 'Hunter Alpha' knows tricks: Xiaomi unveils AI model MiMo-V2-Pro, redefining 'execution' over conversation

Deepline

2026.03.19 16:00

Last week, two anonymous models without any signatures quietly launched on the well-known API aggregation platform OpenRouter, under the codenames "Hunter Alpha" and "Healer Alpha." Without any promotion, their usage volume began to climb steadily at an unusual pace.

Among them, Hunter Alpha topped the daily rankings for multiple days, with cumulative usage exceeding 1 trillion tokens. The community started buzzing with speculation, with the most prevalent guess pointing to DeepSeek, suggesting these were internal beta versions of DeepSeek V4.

Peter Steinberger, founder of OpenClaw, also posted inquiries on X, further fueling the community's speculative fervor.

Just hours ago, however, Chinese technology company Xiaomi officially announced that both "the Hunter" and "the Healer" are early internal beta versions of Xiaomi's MiMo-V2 series large models.

As the mystery unraveled, Luo Fuli, head of Xiaomi's MiMo large-model team, publicly acknowledged the models on X. Coincidentally, Luo, who came from DeepSeek, created a model at Xiaomi that the entire internet mistook for a DeepSeek product.

MiMo-V2-Pro, with its internal beta version codenamed Hunter Alpha, is Xiaomi's flagship foundational model designed for complex real-world tasks. Its core positioning is no longer that of a "conversational tool" but rather the brain of an agent system—capable of understanding tasks, invoking tools, executing multi-step workflows, and ultimately delivering results.

From an architectural perspective, the model has a total parameter count exceeding 1 trillion (1T), with 42 billion activated parameters. It employs an improved hybrid attention mechanism, significantly enhancing model capacity while maintaining inference efficiency. Its context window has been further expanded to 1 million tokens, enabling support for ultra-long task chains and complex workflows.

During the earlier testing phase of Hunter Alpha, these capabilities had already begun to emerge.

In multiple Agent evaluations, the performance of MiMo-V2-Pro has entered the global top tier.

On the agent capability assessment Claw Eval, MiMo-V2-Pro (Hunter Alpha) scored 61.5, approaching Claude Opus's 4.6. In the PinchBench test, it ranked among the top three globally. In coding ability, its performance even surpassed Claude Sonnet 4.6, nearing the more advanced Opus level.

More importantly, Xiaomi emphasizes that the model's optimization focus is not on "benchmark scores" but on real-world performance.

Through supervised fine-tuning (SFT) and reinforcement learning (RL) on numerous agent task frameworks (such as OpenClaw), MiMo-V2-Pro has achieved significant improvements in tool invocation stability, multi-step reasoning capabilities, and task completion rates.

In other words, it is not just "capable of answering questions" but "capable of getting things done."

On the global authoritative large model comprehensive intelligence ranking, Artificial Analysis, MiMo-V2-Pro ranks eighth worldwide and second domestically.

A key transformation of MiMo-V2-Pro is its leap from a traditional conversational model to an "execution-oriented intelligent agent."

In practical tests, MiMo-V2-Pro has demonstrated "execution capabilities" that are distinctly different from traditional conversational models. It not only understands complex instructions but can also complete a full task cycle from design to implementation within a single prompt.

For example, a developer asked it to generate a complete 3D tower defense game, including multiple types of defense towers, various enemy mechanics, level designs, and special effects such as explosions and flames, with rendering based on Three.js, while also providing features like pause, restart, and scoring.

The model was able to directly deliver a structurally complete code solution covering game logic and front-end implementation.

In another type of task leaning more toward creative and front-end design, MiMo-V2-Pro also demonstrated strong cross-domain capabilities.

The test required it to "simulate the visual style of 1990s print magazines, including irregular multi-column layouts, bleeding headlines, paper texture backgrounds, and interactive designs with page-turning animations." The model not only understood this complex aesthetic description but also generated a complete front-end implementation solution encompassing font selection, layout structure, and dynamic effects.

These cases show that MiMo-V2-Pro is evolving from "generating content" to "generating systems," with its capability boundaries expanding to encompass the entire process of software engineering and digital creative production.

Furthermore, its ultra-long context window of 1 million tokens enables it to handle long-chain tasks, such as cross-file code understanding, large-scale document analysis, and even continuous multi-round task planning—scenarios where traditional models often fall short.

With its official release, Xiaomi has simultaneously opened up the MiMo-V2-Pro API service (platform.xiaomimimo.com) and introduced a relatively aggressive pricing strategy.

The model adopts a tiered pricing model: within a 256K context window, the input and output prices are US$1 and US$3 per million tokens, respectively. For the 1-million long context, the rates increase to US$2 for input and US$6 for output.

This pricing is significantly lower than that of comparable models (such as Claude Opus), aiming to rapidly attract the developer ecosystem. In particular, the strategy of "temporarily free cache writes" will greatly reduce operational costs for agent developers who frequently need to invoke long-context prompts.

Additionally, Xiaomi has partnered with multiple mainstream agent frameworks, including OpenClaw and Cline, to launch a limited-time free usage plan, further promoting its penetration into the developer community.

MiMo-V2-Pro is responsible for reasoning and planning, while the other two models in the series, MiMo-V2-Omni, handle multimodal perception and execution, and MiMo-V2-TTS manages speech expression. Together, the three form a complete AI capability stack driving the entire ecosystem.

Now, miclaw (Xiaomi's on-device AI agent for mobile phones) has already integrated the MiMo large model, featuring system-level execution capabilities and deep integration with the "Human x Car x Home," and becoming the first concrete manifestation of this capability stack in action. The subsequent integrations of WPS Lingxi and the Xiaomi Browser also point to the same thing: MiMo is not just a conversational product but a foundational capability layer being embedded into various application scenarios.

For years, major tech companies have been painting a vision of "AI connecting everything." Now, with the MiMo-V2 series models and a deep understanding of ecosystem strategies, Xiaomi is ready.

(With input from APPSO and Tencent Technology)

Deepline | Saving the Fox in the Snow: Darkly comedic AI short drama breaks internet