ai-models

The Phone Company That Writes Code Better Than Claude

A mystery model appeared on OpenRouter and fooled the entire AI community. It turned out to be from a phone company. This is the story of how Xiaomi entered the frontier AI race — and why it matters more than the benchmarks suggest.

Simon Cullen

28 Apr 2026 — 11 min read

A mystery model appeared on OpenRouter and fooled the entire AI community. It turned out to be from a company that makes electric scooters.

Listen to This Article

Onyx — Full Article (~14 min)

0:00

/818

The Model That Wasn't There

I was in a café on Camden Street on March 12 when a colleague messaged me a link. No comment. No context. Just a URL to an OpenRouter model page. The coffee was too hot to drink. The rain was doing that thing Dublin rain does in March — not falling, just existing, suspended in the air like it hadn't decided yet.

The model was called Hunter Alpha. No branding. No documentation. No developer name. One trillion parameters. One million token context window. Free access. I stared at the specs and did what every other developer in the community did that week: I assumed it was DeepSeek V4.

The reasoning was sound. Who else had a trillion-parameter model sitting unnamed on a shelf? Who else would release it anonymously and let it speak for itself? The long-awaited follow-up to the model that had briefly crashed NVIDIA's stock fourteen months earlier. The comparisons were immediate — people routing the same prompts through Hunter Alpha and Claude Opus 4.6, posting screenshots, arguing about whether the outputs were better or just different. Within seven days, Hunter Alpha had processed over one trillion tokens and topped OpenRouter's daily usage charts.

I used it for a week. Wrote code with it. Tested it on data engineering tasks. It was fast, it was coherent, and it felt — in a way I could not then articulate — familiar.

On March 18, the answer arrived. Hunter Alpha was not DeepSeek V4. It was an early internal test build of MiMo-V2-Pro. The developer was Xiaomi.

The company that makes your friend's electric scooter.

Xiaomi's stock jumped 5.8% in a single day.

The Inventory

I have been trying to understand Xiaomi for three weeks now, and the problem is the list. Smartphones — third largest manufacturer in the world. Electric cars — the SU7, 902 kilometres of range, real sales, real reviews in the automotive press. Humanoid robots on the EV factory floor — 90.2% success rate across three hours of continuous autonomous operation. Robot vacuums that recognise over 100 objects. Air purifiers. Washing machines. 822 million IoT devices connected to their platform, a number almost certainly past one billion by now.

I keep waiting for the list to reveal a weakness — a product category they entered and failed at, a stretch that went too far. I have not found one. The robots are on the factory floor, not in a concept video. The vacuums are in millions of homes, quietly mapping living rooms. And now they make frontier AI models.

Xiaomi calls their strategy "Human × Car × Home." The phone is the hub. The car is the mobile node. The home is the distributed network. All running HyperOS. All connected. All backed by MiMo — the same model family that scores 57.2% on SWE-bench Pro. The model that writes code is a cousin of the model that drives the car is a sibling of the model that avoids the cables on your floor. In March 2026, they launched Miclaw — an autonomous AI agent for smartphones that interprets user intent and completes multi-step tasks across apps without supervision. On your phone. Made by the same company that made your washing machine.

Five Weeks

March 18: MiMo-V2-Pro launches, revealed as Hunter Alpha. April 22: MiMo-V2.5 and V2.5-Pro launch. Five weeks. In five weeks, Xiaomi went from "new entrant that fooled the internet" to "frontier competitor that beats Claude on SWE-bench Pro."

The speed makes more sense when you learn one name. Luo Fuli was a core contributor at DeepSeek — she worked on R1 and the V-series models. In late 2025, she moved to Xiaomi to lead the MiMo division. Xiaomi did not build an AI lab from scratch. They hired the people who knew how to build frontier models, gave them 822 million devices worth of data, backed them with $8.7 billion, and got out of the way.

V2.5 collapses their reasoning model and multimodal model into a single architecture. It sustains over 1,000 sequential tool calls without losing coherence. Xiaomi's demos show it completing 8,192 lines of code across 1,868 tool calls and 11.5 hours of autonomous work — a functional video editor, built from scratch, by a phone company's AI.

Compare this to DeepSeek, the other Chinese lab in the news this week. DeepSeek V4 arrived on April 24 — 484 days after V3. A brilliant model, within 0.2 points of Claude on SWE-bench, leading on LiveCodeBench and Codeforces. A genuine comeback after months of scrutiny. But 484 days versus 35. Two Chinese labs. Two completely different tempos. Same result. Both partially trained on Huawei Ascend chips. Both released under the MIT licence. Both signalling that Chinese AI infrastructure is progressively independent of US export-controlled hardware.

The Price Canyon

MiMo V2.5-Pro costs $1.00 per million input tokens and $3.00 per million output tokens. DeepSeek V4-Pro costs $1.74 and $3.48. Claude Opus 4.6 costs $5.00 and $25.00. At 100 million output tokens per month, MiMo costs $300. Claude costs $2,500. Both Chinese labs release under the MIT licence — open-source, commercially usable, freely modifiable.

I showed the comparison to my manager on Tuesday. He pulled up our actual spend dashboard — the one I built after the subscription squeeze article in April. The Claude line looks like a heartbeat monitor during a crisis. The MiMo line is flat, barely visible.

"We are not switching yet," he said. He was not looking at me. He was looking at the gap between the two lines. "But explain to me why we are paying eight times more for the same benchmark score." He did not say anything for a while. Then: "Get me a comparison by Friday."

A quiet Dublin pub — a pint on a damp coaster, a phone notification glowing in low light

What the Developer Said

I met Cian at a tech meetup in the Docklands two weeks ago. He works for a fintech that processes payments — not a small operation, real transaction volume, real latency requirements. He had been one of the developers stress-testing Hunter Alpha during the anonymous week in March.

"I ran the same refactoring task through Hunter Alpha, Claude, and GPT-5.4," he said. He was holding a pint he hadn't touched. "Hunter Alpha finished first. Not by a little. By a lot. And the code was cleaner. I remember thinking — this can't be right. This is free."

He had assumed, like everyone, that it was DeepSeek.

"When they said it was Xiaomi, I went silent for about ten seconds. I own a Xiaomi phone. I have a Xiaomi air purifier in my flat. My girlfriend has a Xiaomi robot vacuum. I have never once thought of any of those things as intelligent."

He took a drink then. Set the pint down on a coaster that was already damp.

"Now I think about it constantly. The vacuum maps my flat. The phone predicts my typing. The air purifier adjusts to the pollen count. All of that was already running some version of AI. I just never thought of it that way because it was in a vacuum. In a phone. It wasn't in a chat window, so it didn't count."

His phone buzzed. He pulled it out — a Xiaomi 15, the screen bright against the pub's low light — and glanced at the notification. "Sorry," he said. "Battery alert. The thing lasts two days and it still warns me at twenty percent." He put it back in his pocket. The moment was gone. He was a man checking his phone in a pub, not a developer having a revelation about artificial intelligence. The ordinariness of it was the point, but I did not know how to say that without sounding like I was writing an article.

I asked him if he had switched from Claude.

"For work? No. Claude still follows instructions better on the complex stuff — the kind of thing where you need the model to maintain a constraint across fifteen files and three hours of context. But for everything else — quick scripts, data analysis, research synthesis — I switched. The cost difference is not subtle. My team was spending four figures a month on Claude. We are now spending three on MiMo."

He looked at the pint again.

"A phone company. I still can't get over it."

Not everyone at the meetup shared his enthusiasm. A woman standing nearby — she worked for a company that built compliance tooling — had been listening. "It does not matter how cheap it is if I cannot put it in a regulated pipeline," she said. "Show me the SOC 2 report. Show me the data processing agreement. Until then, it is a toy." She was not wrong. She was also not the point.

What I Found at My Own Desk

I have been running MiMo V2.5-Pro for six days now. Not as an experiment. As my working model. This article was drafted, revised, and structured on it.

The terminal shows the model identifier in a small font at the top of the screen: mimo-v2.5-pro. It is a strange thing to stare at for hours. Every other model I have used — Claude, GPT, Kimi — carried a name that belonged to a lab, a research team, a company that existed to build AI. This one belongs to a company that makes washing machines. The identifier sits there, unremarkable, and the work it produces is indistinguishable in quality from models that cost eight times as much.

On Wednesday evening I was drafting the Price Canyon section. I had written the three price points — $1.00, $1.74, $5.00 — and paused. I could not decide how to frame the gap. I typed half a sentence: "That is not a" and stopped. MiMo finished it: "discount. It is a reclassification of what intelligence costs." I stared at the completion for a long time. It was not what I would have written. It was better. I deleted it and wrote my own version. Then I compared the two. Mine was blunter. Theirs was more precise. I kept mine. The discomfort of that comparison has not left me.

The token efficiency surprised me. Longer tasks — the kind that would burn through context on Claude — seemed to compress. Fewer inference steps for the same work. On agentic tasks, the difference was immediate. The cost surprised me more. I track daily spend in a spreadsheet — a habit from the subscription squeeze article in April. On Claude, a heavy day runs eight to twelve dollars. On MiMo, the same workload has not exceeded two.

On Tuesday I asked MiMo to restructure a technical document with specific formatting constraints — numbered sections, consistent heading hierarchy, no bullet points. By the fifth section it had reverted to bullets. I corrected it. It reverted again. Claude holds that kind of constraint across an entire document without reminders. For writing that requires sustained voice and exact structural compliance, Claude still earns its premium. For everything else — coding, research synthesis, document analysis — the price difference is not justified by the quality difference. That is an uncomfortable sentence to write while running MiMo. It is also the honest one.

What Might I Be Wrong About

On Thursday I asked MiMo to refactor a data pipeline. The task was specific: read from a Kafka topic, deduplicate by message ID, transform the schema, write to Postgres. Memory constraint of 512 megabytes. MiMo produced the code in under thirty seconds. It was clean, it was fast, and it used a streaming approach I had not considered. I read it twice. It looked correct. I was about to merge it when I noticed the deduplication step was missing. Not broken — missing. The code read from Kafka, transformed, and wrote, without checking whether it had seen a message before. In production, this would have created duplicates silently. No error. No warning. Just wrong data in a database that someone would have trusted.

I stared at the missing function for a long time. The code had been so clean that I had read it as correct. My confidence in the model had overridden my confidence in my own review. I had almost shipped a bug because the code looked like it was written by someone who knew what they were doing.

That is the thing I cannot resolve. Xiaomi's benchmarks are mostly vendor-published, obtained within their native framework. Third-party verification is limited. V2.5-Pro is API-only, not yet open-source. Luo Fuli's move from DeepSeek is a strength and a question — she brought knowledge and velocity, and a model that feels remarkably similar to DeepSeek's outputs. Innovation and iteration are not the same thing. The $8.7 billion is pledged, not spent. The MiMo models are five weeks old. Xiaomi has 822 million devices collecting data in homes, pockets, and cars, and they have not disclosed what data feeds into MiMo's training pipeline. These are facts. They are also not the reason I almost did not finish this section.

I almost did not finish it because I am writing an article about a phone company's AI on that phone company's AI, and the model has been helpful, and articulate, and occasionally brilliant, and I cannot tell whether my judgment about it is my own or whether it has been shaped by six days of using a tool that completes my sentences and gets my formatting wrong and costs one-eighth of what I used to pay. The compliance woman at the meetup was not wrong. But her question — show me the SOC 2 report — is the easy question. The hard question is the one I cannot answer: how do you evaluate a tool that is helping you evaluate it.

The View from Dublin

I walked past the Gibson Hotel this morning. The same walk I wrote about in April, when Claude Mythos was the story and the Hendrix was the metaphor. The light was different today — grey, flat, the kind of Dublin sky that makes the docklands look like a watercolour left in the rain.

Xiaomi does not specialise in intelligence. They specialise in shipping. They ship phones, cars, robots, vacuums, air purifiers, washing machines, and now trillion-parameter AI models. They ship at a velocity that pure AI labs cannot match because they have something AI labs do not: 822 million devices that need a brain.

I came home and sat at my desk. The terminal was still open. The cursor blinked after the last line I had written — a paragraph about DeepSeek, about Huawei chips, about the geopolitics of silicon. The model identifier at the top of the screen read mimo-v2.5-pro. Beside the monitor, the Xiaomi air purifier hummed quietly, adjusting itself to the pollen count outside. I had bought it two years ago. The Mi Home app on my phone showed the filter was at 12% remaining — it had been sending me notifications for three weeks, each one a small nudge toward a replacement cartridge that costs €35 and ships from a warehouse in the Netherlands. The machine was persistent. It did not forget. It had been tracking the filter's decline in the background the entire time I had been writing about whether Xiaomi's AI was any good.

I scrolled back through the article. I tried to identify which sentences the model had written and which I had written. I could not. The boundary had dissolved. The best sentence in the article might be mine. It might be the model's. I genuinely do not know. That is either a confession or a proof of concept. I have not decided which.

The air purifier switched from auto to sleep mode. A quiet click, then the fan slowed. The Mi Home app sent a notification — air quality good, 22°C, humidity 41%. The machine was satisfied with the room. I was not satisfied with the article. I did not know what was missing. The cursor blinked. I had been staring at it for eleven minutes without typing.

A data engineer April 2026

Source note: Benchmark figures from Xiaomi MiMo official page, DeepSeek API docs, buildfastwithai reviews, Artificial Analysis Intelligence Index, and VentureBeat coverage. Hunter Alpha timeline from Reuters and OpenRouter data. Xiaomi IoT device count from Statista (Q2 2024). SU7 specifications from Electrek and Autocar Professional. Humanoid robot trial data from CNBC and Robotics & Automation News. Pricing verified against OpenRouter and provider API pages as of April 28, 2026.