AI Tools Review
Kimi K2.7 Code: Open-Source Coding That Beats Opus

Kimi K2.7 Code: Open-Source Coding That Beats Opus

21 June 2026

Quick Answer:

Kimi K2.7 Code is Moonshot AI's open-weights coding model, released in June 2026. It is a 1 trillion parameter Mixture-of-Experts model with 32 billion active parameters, a 256K context window and a Modified MIT licence, priced at about £0.75 (0.95 US dollars) per million input tokens and £3.15 (4.00 US dollars) per million output. On Moonshot's own benchmarks it is the strongest open coding model yet, narrowing the gap to Claude Opus 4.8 and GPT-5.5 and even beating Opus 4.8 on one agentic test, while costing a fraction of either. The headline "beats Opus" claim is true on price and on a couple of tasks, not across every benchmark.

The cheapest serious coding model of 2026 is not from a US lab. It is open source, it weighs a trillion parameters, and you can download it.

This is a full breakdown of Kimi K2.7 Code: what Moonshot actually shipped, the real numbers from the model card, how the open weights and licence work, what the API costs in pounds and dollars, and the honest version of how it stacks up against the closed frontier.

Executive Summary

On 12 June 2026 Moonshot AI released Kimi K2.7 Code, the latest in its rapid K2 series and the first variant explicitly tuned for software work rather than general chat. It is the successor to Kimi K2.5 and the April 2026 K2.6, and it carries the same enormous Mixture-of-Experts body: 1 trillion parameters in total, 32 billion of them active on any given token.

The pitch is efficiency and price. K2.7 Code uses roughly 30 percent fewer thinking tokens than K2.6 while posting double-digit gains on Moonshot's coding suites, and it lands on the API at a fraction of what the closed leaders charge. The weights are open under a Modified MIT Licence, so you can self-host the model and pay nothing per token if you have the hardware.

  • Architecture: 1T-parameter MoE, 32B active, 384 experts with 8 active, 256K context, thinking-only.
  • Coding scores (Moonshot): Kimi Code Bench v2 62.0, Program Bench 53.6, MLS Bench Lite 35.1, all up sharply on K2.6.
  • One real win: MCP Mark Verified 81.1, ahead of Opus 4.8's 76.4 (though behind GPT-5.5's 92.9).
  • Price: about £0.75 ($0.95) input and £3.15 ($4.00) output per million tokens, undercutting Opus 4.8 and GPT-5.5.
  • The caveat: every figure is Moonshot's own, on its own benchmarks, with no independent leaderboard runs yet.

What Kimi K2.7 Code Is

Moonshot AI is the Beijing lab behind the Kimi family, and its K2 line has built a reputation as the most credible open-weights challenger to the Western labs. K2.7 Code is a deliberate narrowing of that line. Where earlier releases such as K2.5 were broad, multimodal assistants, K2.7 Code is a coding-first model, built on top of K2.6 and post-trained for "real-world long-horizon coding tasks": multi-file edits, debugging, tool use and the kind of multi-step agentic workflows that a developer actually runs.

It is designed to be driven by an agent harness rather than chatted with. Moonshot's recommended front end is the Kimi Code CLI, and the model "always runs with thinking enabled" and does not support a non-thinking mode. In other words, this is not a general-purpose Q and A model that happens to code. It is a coding agent with a reasoning loop baked in, and the efficiency story, 30 percent fewer thinking tokens than K2.6 for the same or better results, is central to its appeal.

The release continues Moonshot's roughly quarterly cadence: K2 in July 2025, K2-Instruct and K2-Thinking through late 2025, K2.5 in January 2026, K2.6 in April, and now K2.7 Code in June. Each step has pushed the open-weights ceiling a little higher, and K2.7 Code is the first to be positioned squarely as a coding tool.

Architecture & Training

The model card is unusually specific about the architecture, which is part of the point of an open release. K2.7 Code is a sparse Mixture-of-Experts model. Of its 1 trillion total parameters, only 32 billion are active on any given token, which is what keeps inference cost down despite the headline size. It has 384 routed experts, of which 8 are selected per token, plus 1 shared expert that is always active, across 61 layers, one of which is a dense layer.

The attention mechanism is Multi-head Latent Attention (MLA) with 64 heads and a 7,168-dimension attention hidden size, paired with a SwiGLU activation and a 160K-token vocabulary. The context window is 256K tokens (262,144), enough to hold a sizeable codebase and a long agentic trace at once. A MoonViT vision encoder of around 400 million parameters gives the model the ability to read screenshots and diagrams, useful for agentic work even in a coding-first release.

Moonshot describes K2.7 Code as built "upon Kimi K2.6", so this is a targeted post-training pass on the existing K2.6 base rather than a fresh pretraining run, focused on long-horizon coding behaviour and token efficiency. The 30 percent reduction in thinking tokens is the most concrete training claim: the model reaches its answers with less internal deliberation, which translates directly into lower latency and lower cost for the same task.

Coding Benchmarks

Here is the most important caveat up front, and it shapes how every number below should be read. Moonshot reports K2.7 Code against its own benchmark suites, chiefly Kimi Code Bench v2, Program Bench and MLS Bench Lite for coding, and Kimi Claw 24/7 Bench, MCP Atlas and MCP Mark Verified for agentic work. These are not the public leaderboards. As of mid-June 2026 there are no independent third-party results for K2.7 Code on SWE-bench Verified, SWE-bench Pro, LiveCodeBench, Terminal-Bench or any neutral suite that runs all models under identical conditions. Treat the figures as the vendor's own.

Kimi K2.7 Code banner showing the Kimi logo, a black rounded-square K mark with a blue dot and the KIMI wordmark, from Moonshot AI's official model card.
Kimi K2.7 Code is Moonshot AI's coding-first open-weights model. Image credit: Moonshot AI, official Kimi-K2.7-Code model card on Hugging Face.

With that framing, the picture is one of strong generation-on-generation gains. On Moonshot's coding tests, K2.7 Code improves on K2.6 by a wide margin: Kimi Code Bench v2 rises from 50.9 to 62.0, a 21.8 percent jump; Program Bench from 48.3 to 53.6; and MLS Bench Lite from 26.7 to 35.1, up 31.5 percent. The agentic suites move in the same direction: Kimi Claw 24/7 Bench from 42.9 to 46.9, MCP Atlas from 69.4 to 76.0, and MCP Mark Verified from 72.8 to 81.1.

The cross-model comparisons are where the "beats Opus" headline needs care. On raw coding, K2.7 Code still trails both closed leaders:

  • Kimi Code Bench v2: K2.7 Code 62.0, Opus 4.8 67.4, GPT-5.5 69.0.
  • Program Bench: K2.7 Code 53.6, Opus 4.8 63.8, GPT-5.5 69.1.
  • MLS Bench Lite: K2.7 Code 35.1, GPT-5.5 35.5, Opus 4.8 42.8.
  • MCP Atlas (agentic): K2.7 Code 76.0, GPT-5.5 79.4, Opus 4.8 81.3.
  • MCP Mark Verified (agentic): K2.7 Code 81.1, Opus 4.8 76.4, GPT-5.5 92.9.
  • Kimi Claw 24/7 Bench (agentic): K2.7 Code 46.9, Opus 4.8 50.4, GPT-5.5 52.8.

So the accurate version is this: K2.7 Code beats Claude Opus 4.8 on MCP Mark Verified (81.1 against 76.4) and is otherwise close behind both leaders, while costing a small fraction of either. It does not beat them across the board on quality. What it does beat them on, decisively, is value, which is the real story of an open model that gets within a handful of points of the closed frontier. For the wider picture of where the closed leaders sit, see our Claude vs ChatGPT vs Gemini vs Grok comparison.

Open Weights & Licence

The defining feature of K2.7 Code, and the reason it matters beyond the leaderboard, is that the weights are open. Moonshot publishes the full model on Hugging Face at moonshotai/Kimi-K2.7-Code, along with deployment documentation and configuration files. You can download it, inspect it, fine-tune it and run it on your own infrastructure.

The licence is a Modified MIT Licence. For the overwhelming majority of users this behaves exactly like a standard permissive open licence: free to use, modify and deploy, including commercially. The one modification is an attribution clause aimed at very large operators. Deployments above a high threshold, reported as roughly 100 million monthly active users or 20 million US dollars in monthly revenue, must prominently display "Kimi" on the product's user interface. Below that, there is no such obligation. This is the same pattern Moonshot has used across the K2 line to allow genuinely open use while asking the largest commercial beneficiaries to credit the model.

Open weights at this scale change the deployment calculus. A bank, government department or privacy-sensitive enterprise can run K2.7 Code entirely inside its own network with no data leaving the building, which is something no amount of closed-API discounting can match. The community has already produced quantised GGUF builds and third-party hosts, lowering the hardware bar that a raw 1T MoE model would otherwise impose.

Pricing & How to Access

There are three ways to use K2.7 Code. The simplest is the Kimi Code chat and CLI at kimi.com/code, where it is the default model with thinking enabled. The second is Moonshot's hosted API on its developer platform. The third is to self-host the open weights, which removes per-token cost altogether if you have the GPUs.

Moonshot's API pricing is the headline number. It charges roughly £0.75 (0.95 US dollars) per million input tokens on a cache miss, about £0.15 (0.19 US dollars) per million input tokens on a cache hit, and about £3.15 (4.00 US dollars) per million output tokens. Against the closed leaders that is dramatic. Claude Opus 4.8 lists at roughly 5 and 25 US dollars per million input and output; GPT-5.5 at roughly 5 and 30; and Claude Fable 5 at 10 and 50. On output tokens, the most expensive part of any agentic coding run, K2.7 Code is more than twelve times cheaper than Fable 5 and around six times cheaper than Opus 4.8.

For an agentic coding workload, where the model emits large volumes of tokens across many turns, that output gap compounds fast. A task that costs a few pounds on a closed frontier model can cost pennies on K2.7 Code, and nothing per token at all if you self-host. The trade is quality: you are paying far less for a model that is a few points behind on most benchmarks, which for a great deal of real coding work is an easy trade to accept.

How It Compares

Versus Claude Opus 4.8. Opus 4.8 remains ahead on most coding and agentic benchmarks, sometimes by several points, and it has the deeper safety testing and ecosystem of a closed flagship. But K2.7 Code beats it on MCP Mark Verified, runs at roughly a sixth of the output price, and can be self-hosted. For a team that values cost and data control over the last few points of capability, the case for K2.7 Code is real.

Versus GPT-5.5. GPT-5.5 is the strongest of this group on Moonshot's own table, leading on Program Bench, Kimi Code Bench v2 and MCP Mark Verified. K2.7 Code does not catch it on quality. It competes purely on openness and price, and on those two axes it wins comfortably.

Versus GLM 5.2. The other heavyweight open-weights coding model of 2026 is Zhipu's GLM line, covered in our GLM 5.2 analysis. The two are direct rivals for the title of best open coding model, and the choice between them increasingly comes down to harness fit, licence terms and which one your hosting provider serves cheaply. Both prove the same point: the open-weights gap to the closed frontier has shrunk to a margin, not a chasm.

Versus DeepSeek. DeepSeek's MoE models were the breakout open releases of 2025 and set the template for cheap, sparse, downloadable frontier-class models. K2.7 Code is the coding-specialised continuation of that lineage from a different lab, and its 1T MoE body, aggressive pricing and permissive licence are all recognisably in the DeepSeek mould. The competition between these Chinese open labs is what keeps pushing open-weights coding quality up and prices down.

Limitations

  • Vendor-only benchmarks: every score is Moonshot's own, on Moonshot's suites. Until independent SWE-bench Verified or LiveCodeBench runs land, the comparisons to Opus 4.8 and GPT-5.5 cannot be verified by a neutral party.
  • Coding-specialised and thinking-only: the model has no non-thinking mode and is tuned for software work, so it is not a drop-in general assistant and its always-on reasoning adds latency for trivial queries.
  • Heavy to self-host: a 1T-parameter MoE is large. Running the full weights needs serious GPU memory, and quantised community builds trade some quality for accessibility.
  • Behind on raw quality: on most listed benchmarks it trails both closed leaders. The win is price and openness, not a clean sweep of the leaderboard.
  • Licence threshold: the Modified MIT attribution clause only bites at very large scale, but operators near that threshold should read the terms carefully.

Who Should Use It

Use Kimi K2.7 Code if cost or data control is a priority. Teams running high-volume agentic coding, where output tokens dominate the bill, will see dramatic savings against the closed frontier, and the open weights let privacy-sensitive organisations keep everything in-house. It is also the natural choice for anyone who wants to fine-tune a frontier-class coding model on their own codebase.

Stick with a closed model if you need the absolute top of the coding leaderboard, the deepest safety and reliability track record, or a polished general assistant rather than a coding-specialist agent. On pure quality, Opus 4.8 and GPT-5.5 still lead on most of Moonshot's own numbers, and they remove the operational burden of hosting a 1T model. As ever with agentic tools, deploy with scoped permissions and human checkpoints for irreversible actions, whichever model sits underneath.

Frequently Asked Questions

Does Kimi K2.7 Code beat Claude Opus 4.8 and GPT-5.5?

Not across the board. On Moonshot's figures it trails both on most coding tests (Kimi Code Bench v2 62.0 against 67.4 and 69.0; Program Bench 53.6 against 63.8 and 69.1) but wins MCP Mark Verified at 81.1 against Opus 4.8's 76.4. It is the strongest open coding model and competitive on quality, while winning clearly on price. There are no independent third-party scores yet.

What is the architecture of Kimi K2.7 Code?

A 1 trillion parameter Mixture-of-Experts model with 32 billion active per token, 384 experts (8 selected) plus 1 shared, 61 layers, Multi-head Latent Attention, SwiGLU, a 160K vocabulary, a 256K context window and a MoonViT vision encoder. It always runs with thinking enabled and uses about 30 percent fewer thinking tokens than K2.6.

Is Kimi K2.7 Code free and open source?

The weights are open on Hugging Face at moonshotai/Kimi-K2.7-Code under a Modified MIT Licence, free to download, self-host and use commercially. The only added condition is an attribution clause for very large deployments (roughly 100 million monthly users or 20 million US dollars monthly revenue), who must display Kimi in the interface.

How much does the Kimi K2.7 Code API cost?

About £0.75 (0.95 US dollars) per million input tokens on a cache miss, £0.15 (0.19 US dollars) on a cache hit, and £3.15 (4.00 US dollars) per million output. That undercuts Opus 4.8 (5 and 25 US dollars) and GPT-5.5 (5 and 30), and is more than twelve times cheaper than Claude Fable 5 on output. Self-hosting removes per-token cost.

What is the catch with Kimi K2.7 Code?

Benchmarks are vendor-run on Moonshot's own suites, not public leaderboards. The model is coding-specialised and thinking-only, so not a general assistant. And the 1T MoE weights are heavy to self-host, so most users will rely on the API or a hosting provider.

The Bottom Line

Kimi K2.7 Code is the clearest sign yet that open-weights coding models have caught the closed frontier on value, if not quite on raw capability. It is a few points behind Opus 4.8 and GPT-5.5 on most of Moonshot's own benchmarks, ahead on one agentic test, and a fraction of the price on all of them. For high-volume agentic coding, for fine-tuning, and for anyone who needs the model to run inside their own walls, that combination is hard to argue with.

The honest caveat stands: the numbers are Moonshot's own and want independent confirmation. But the direction of travel is unmistakable. A downloadable, permissively licensed, trillion-parameter coding model that gets within a handful of points of the best closed systems, at one twelfth of the output cost, is a genuinely new option for developers. The closed labs still lead on quality. Moonshot has made them lead on quality alone.

Last updated: June 2026. This review draws on Moonshot AI's Kimi K2.7 Code model card on Hugging Face and its official documentation; benchmark figures are Moonshot's own reported results and may be refined as independent evaluations land.

Free Guide

Get the free guide: Claude vs ChatGPT, Gemini & Grok

A 20-page playbook covering everything you need to choose and use the big four AI models in 2026, full cost and feature comparisons, what each is best (and worst) at, and how-tos for images, vectors, building a website, Claude Code and more.

Pop your email in to get it free
Preview of the free guide: Claude vs ChatGPT, Gemini and Grok, 2026 features, pricing and what-you-can-do comparison.
AI Tools Review Editorial Team

AI Tools Review Editorial Team Expert Verified

Our editorial team consists of veteran AI researchers, software engineers, and industry analysts. We spend hundreds of hours benchmarking frontier models natively to provide you with objective, actionable intelligence on agentic AI capabilities and cybersecurity landscapes.