Logo
Back to Blog
AI & LLMsJune 13, 202614 min read

Kimi K2.7 Code Developer Guide: Benchmarks, API & Hermes Agent

Moonshot AI's Kimi K2.7 Code is an open-source, coding-focused 1T-parameter MoE model that cuts thinking tokens by 30% versus K2.6. This guide covers the architecture, the Kimi Code Bench gains, 256K context, Moonshot API pricing, and how to wire it into Hermes Agent as a self-improving coding backend.

Lushbinary Team

Lushbinary Team

AI & Cloud Solutions

Kimi K2.7 Code Developer Guide: Benchmarks, API & Hermes Agent

Moonshot AI shipped Kimi K2.7 Code on June 12, 2026, and it landed as one of the most capable open-source coding models available. Built on top of Kimi K2.6 and released under a Modified MIT license, it pairs a trillion-parameter Mixture-of-Experts backbone with a coding-first training recipe aimed squarely at long-horizon programming, frontend work, DevOps, and performance optimization. For teams that want frontier-class coding assistance without committing to a fully closed model, it is a serious option.

This guide walks through what changed versus K2.6, the architecture that makes it tick, the benchmark gains Moonshot reports, why its token efficiency matters for your bill, and exactly how to call it from your own code. We also cover how to wire it into the open-source Hermes Agent so you get a self-improving coding agent backed by an open model you can inspect and self-host.

If you are evaluating Kimi K2.7 Code against closed alternatives, you may also want our Kimi K2.7 Code vs Claude Fable 5 and GPT-5.5 coding comparison, and if cost is your main concern, the cost optimization and token efficiency guide digs deeper into the math.

1What Is Kimi K2.7 Code

Kimi K2.7 Code is Moonshot AI's coding-focused agentic model, released on June 12, 2026. Moonshot is a Beijing-based AI lab, and the model continues their pattern of shipping open weights: K2.7 Code is open-source under a Modified MIT license, with the weights published on Hugging Face as moonshotai/Kimi-K2.7-Code. That means you can download, inspect, fine-tune, and self-host the model rather than relying solely on a hosted API.

The "Code" in the name is not marketing. This is a variant built specifically for software engineering work, trained on top of the general-purpose Kimi K2.6 release. Moonshot tuned it for long-horizon programming tasks, frontend development, DevOps automation, and performance optimization, the kinds of jobs that require an agent to hold a lot of context and stay coherent across many steps.

Compared to K2.6, the headline changes are threefold: stronger coding benchmark scores, materially better token efficiency, and a continued commitment to open weights. K2.6 was already a capable general model. K2.7 Code takes that foundation and specializes it, trimming the tendency to overthink while pushing coding accuracy higher. The result is a model that is both more accurate on code tasks and cheaper to run per task, which is a rare combination.

Quick facts

Released June 12, 2026 by Moonshot AI. Open-source under a Modified MIT license. Mixture-of-Experts with 1 trillion total parameters and 32 billion active per token. 256K token context. Native multimodal input. Always runs in thinking mode.

2Architecture: 1T-Parameter MoE, 256K Context, Multimodal, Always-Thinking

Kimi K2.7 Code uses a Mixture-of-Experts (MoE) architecture with 1 trillion total parameters, of which 32 billion are active per token. The MoE design is what makes a model this large practical to run: only a small fraction of the parameters are engaged for any given token, so you get the knowledge capacity of a trillion-parameter model with the per-token compute closer to a 32 billion parameter dense model. For inference economics, that ratio matters a lot.

The context window is 256K tokens. For coding work that is enough to hold a substantial slice of a real repository in context at once: multiple source files, the relevant tests, configuration, and a long back-and-forth conversation. Long-horizon agentic tasks live or die on context, and 256K gives an agent room to plan, read, edit, and verify without constantly losing the thread.

K2.7 Code is natively multimodal, accepting both text and vision input. In a coding context, that means you can hand it a screenshot of a broken UI, a design mockup, an architecture diagram, or a chart and have it reason about the image alongside your code. For frontend work in particular, being able to point at a rendered page and describe the change you want is a meaningful workflow upgrade.

Finally, the model always runs in thinking mode. There is no separate fast non-reasoning path to toggle: every response goes through an internal reasoning process before it answers. That is what makes the token-efficiency work in the next sections so important. Always-on thinking improves quality on hard tasks, but it also means the number of reasoning tokens the model spends directly shapes your cost and latency.

3Benchmarks: Kimi Code Bench v2, Program Bench, MLS Bench Lite

Moonshot reports gains across three coding-oriented benchmarks relative to K2.6. These are vendor-reported numbers, so treat them as you would any first-party benchmark, but they are consistent with the model's coding-specialized training.

BenchmarkImprovement over K2.6What it measures
Kimi Code Bench v2+21.8%General coding and agentic software tasks
Program Bench+11.0%Program synthesis and correctness
MLS Bench Lite+31.5%Multi-language coding tasks

The standout figure is the 31.5% jump on MLS Bench Lite, which tracks multi-language coding. If your codebase spans several languages, a model that improved most on multi-language tasks is worth a close look. The 21.8% gain on Kimi Code Bench v2 is the broadest signal, since that benchmark covers general coding and agentic workflows, while the 11.0% gain on Program Bench reflects steadier improvement on raw program synthesis and correctness.

As always, your own evaluation on your own tasks beats any leaderboard. Benchmarks are a starting point for shortlisting, not a substitute for running the model against your real workload.

4Token Efficiency: 30% Fewer Thinking Tokens

Because K2.7 Code always reasons before answering, the cost of a task is dominated by how many thinking tokens it generates. Moonshot reports that K2.7 Code uses roughly 30% fewer thinking and reasoning tokens than K2.6 for comparable tasks. The model overthinks less and reaches a confident answer with a shorter internal chain.

This matters for two reasons: cost and latency. Reasoning tokens are billed at the output rate, so cutting them reduces your bill on the most expensive part of every request. Fewer tokens also means the model finishes sooner, which lowers end-to-end latency for interactive coding sessions where a developer is waiting on each response.

  • Lower cost per task: fewer output-billed reasoning tokens for the same result.
  • Lower latency: shorter internal chains mean faster responses in interactive sessions.
  • Higher effective throughput: an agent that spends fewer tokens per step can complete more steps within the same budget and the same context window.

For long-horizon agentic runs, where a single task can involve dozens of model calls, a 30% reduction in reasoning tokens compounds. It is one of the most practical reasons to prefer K2.7 Code over its predecessor for production agents. For a deeper treatment of the cost math, see our cost optimization and token efficiency guide.

5Pricing & Where to Access It

On the Moonshot API, Kimi K2.7 Code is priced as follows. Cache-hit pricing applies to cached input tokens, which is a substantial saving for agentic workloads that resend a large stable prompt prefix across many calls.

Token typePrice per million tokens
Input$0.95
Output$4.00
Cache-hit (cached input)$0.19

You can reach the model through several channels, which gives you flexibility on billing, routing, and edge deployment:

Separately from the raw API, Moonshot offers Kimi Code, a terminal-first coding agent, with membership plans starting at $19 per month. API usage is billed separately from the membership. If you want to drive the model from a polished CLI rather than build your own harness, see our Kimi Code CLI developer guide.

6Calling the API

The Moonshot API is OpenAI-compatible, so you can use the official OpenAI SDKs or any OpenAI-compatible client by overriding the base URL. Get an API key at platform.moonshot.ai and point the client at https://api.moonshot.ai/v1.

Here is a minimal JavaScript example using the OpenAI SDK:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.MOONSHOT_API_KEY,
  baseURL: "https://api.moonshot.ai/v1",
});

const response = await client.chat.completions.create({
  model: "kimi-k2.7-code",
  messages: [
    { role: "system", content: "You are a senior software engineer." },
    { role: "user", content: "Refactor this function for readability and add tests." },
  ],
});

console.log(response.choices[0].message.content);

And the same request with curl:

curl https://api.moonshot.ai/v1/chat/completions \
  -H "Authorization: Bearer $MOONSHOT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kimi-k2.7-code",
    "messages": [
      { "role": "user", "content": "Write a TypeScript debounce function with tests." }
    ]
  }'

Because the surface is OpenAI-compatible, most existing tooling that targets the chat completions endpoint works with a base URL and model id swap. That is also what makes the model so easy to drop into agent frameworks, which we cover next.

7Using Kimi K2.7 Code with Hermes Agent

Hermes Agent is an open-source, self-improving AI agent framework from Nous Research, hosted at github.com/nousresearch/hermes-agent. It runs in your terminal as a TUI, in messaging platforms, and inside IDEs. What sets it apart is a built-in learning loop: it creates skills from experience and persists memory across sessions, so the agent gets better at your specific workflows over time.

Hermes is provider-agnostic. You configure the model with the hermes model command, and it supports OpenAI-compatible endpoints and custom providers. That makes pointing it at Kimi K2.7 Code straightforward: target Moonshot's OpenAI-compatible endpoint with the model id kimi-k2.7-code, or route through OpenRouter using moonshotai/kimi-k2.7-code. There is no need for any framework-specific Kimi integration; the OpenAI-compatible surface is the bridge.

A self-improving agent paired with an open model is a powerful combination. The agent compounds skills and memory across runs, while the open weights mean you can inspect the model, self-host it for privacy or cost control, and avoid being locked to a single closed vendor. You own both halves of the stack.

Here is an example of configuring Hermes to use Kimi K2.7 Code through the Moonshot endpoint as a custom OpenAI-compatible provider:

# Configure a custom OpenAI-compatible provider
export MOONSHOT_API_KEY="sk-..."

hermes model add moonshot \
  --base-url "https://api.moonshot.ai/v1" \
  --api-key-env MOONSHOT_API_KEY \
  --model "kimi-k2.7-code"

# Select it as the active model
hermes model use moonshot/kimi-k2.7-code

# Or route through OpenRouter instead
hermes model add openrouter \
  --base-url "https://openrouter.ai/api/v1" \
  --api-key-env OPENROUTER_API_KEY \
  --model "moonshotai/kimi-k2.7-code"

Once configured, Hermes gives you three coding modes to choose from depending on the task:

  • Direct execution: the agent writes and runs code itself through its execute_code capability.
  • Delegation: the agent hands off to external coding CLIs such as Claude Code or OpenCode when that is the better tool for the job.
  • Structured planning: the agent uses bundled skills to break a large task into a plan before executing.

The exact provider flags can evolve, so confirm the current commands in the Hermes Agent documentation. For a full walkthrough, see our Hermes Agent autonomous coding setup guide and the broader Hermes Agent developer guide.

8When to Choose Kimi K2.7 Code

Kimi K2.7 Code is a strong default when these things matter to you:

  • Open weights: you want the option to inspect, fine-tune, or self-host rather than depending only on a hosted API.
  • Long-horizon agentic coding: multi-step tasks that benefit from the 256K context and efficient reasoning.
  • Multi-language codebases: the largest reported gain is on multi-language tasks.
  • Cost-sensitive production agents: fewer reasoning tokens and cache-hit pricing keep per-task costs down.
  • Vision-assisted frontend work: native multimodal input lets you reason about screenshots and mockups.

It may not be the right pick when:

  • You need a non-reasoning fast path for trivial, latency-critical calls, since the model always thinks before answering.
  • Your task is general, non-coding work where a broader generalist model may fit better than a coding-specialized one.
  • You are contractually tied to a specific closed vendor and cannot add another provider to your stack.

To weigh it head-to-head against closed alternatives, see our Kimi K2.7 Code vs Claude Fable 5 and GPT-5.5 comparison.

9Why Lushbinary for AI Coding Agent Builds

Picking a model is the easy part. Turning it into a reliable coding agent that ships real work is where most projects stall. Lushbinary builds production AI coding agents end to end, from model selection and provider routing to the agent harness, memory, evaluation, and the AWS infrastructure that runs it.

Here is what we bring to a Kimi-powered agent build:

  • Model and provider strategy: we match the workload to the right model and route across Moonshot, OpenRouter, Cloudflare Workers AI, or a self-hosted deployment for cost and reliability.
  • Agent architecture: we design the harness, tool calling, memory, and self-improvement loop, including framework choices like Hermes Agent.
  • Token efficiency and cost control: we exploit cache-hit pricing and prompt design to keep your bill predictable.
  • Evaluation: we build the test harness that proves the agent works on your tasks, not just on a leaderboard.
  • AWS infrastructure: production deployment with isolation, monitoring, and auto-scaling.

🚀 Free Consultation

Thinking about building a Kimi K2.7 Code agent? Lushbinary offers a free consultation for AI coding agent projects. Tell us what you want to automate and we will scope the right model and agent stack with you, no obligation.

Ready to Build Something Great?

Get a free 30-minute strategy call. We'll map out your project, timeline, and tech stack - no strings attached.

Let's Talk About Your Project

Prefer email? Reach us directly:

Contact Us

❓ Frequently Asked Questions

What is Kimi K2.7 Code and who made it?

Kimi K2.7 Code is a coding-focused agentic large language model released on June 12, 2026 by Moonshot AI, a Beijing-based AI lab. It is built on Kimi K2.6 and is open-source under a Modified MIT license, with weights published on Hugging Face at moonshotai/Kimi-K2.7-Code. It uses a Mixture-of-Experts architecture with 1 trillion total parameters and 32 billion active per token, a 256K token context window, native multimodal input, and always runs in thinking mode.

How is Kimi K2.7 Code different from K2.6?

Kimi K2.7 Code improves on K2.6 with measurable coding gains and better token efficiency. Moonshot reports a 21.8% improvement on Kimi Code Bench v2, an 11.0% improvement on Program Bench, and a 31.5% improvement on MLS Bench Lite for multi-language tasks. It also uses about 30% fewer thinking and reasoning tokens than K2.6, which reduces overthinking, inference cost, and latency.

How much does Kimi K2.7 Code cost?

On the Moonshot API, Kimi K2.7 Code is priced at $0.95 per million input tokens, $4.00 per million output tokens, and $0.19 per million cache-hit (cached input) tokens. Moonshot also offers Kimi Code, a terminal-first coding agent, with membership plans starting at $19 per month, billed separately from API usage. The model is also available through OpenRouter, Cloudflare Workers AI, and the Vercel AI Gateway.

Can I use Kimi K2.7 Code with the Hermes Agent?

Yes. Hermes Agent is an open-source self-improving AI agent framework from Nous Research that is provider-agnostic and supports OpenAI-compatible endpoints and custom providers. You configure the model with the hermes model command and point it at Moonshot's OpenAI-compatible endpoint using the model id kimi-k2.7-code, or route through OpenRouter using moonshotai/kimi-k2.7-code. Hermes offers three coding modes: direct execute_code, delegation to external coding CLIs such as Claude Code or OpenCode, and structured planning via bundled skills.

Where can I access Kimi K2.7 Code?

Kimi K2.7 Code is available on the Moonshot API at the OpenAI-compatible base URL https://api.moonshot.ai/v1 with the model id kimi-k2.7-code. It is also available on OpenRouter as moonshotai/kimi-k2.7-code, on Cloudflare Workers AI as @cf/moonshotai/kimi-k2.7-code, and through the Vercel AI Gateway. The open weights can be downloaded from Hugging Face at moonshotai/Kimi-K2.7-Code under a Modified MIT license.

📚 Sources

Content was rephrased for compliance with licensing restrictions. Benchmark and pricing data sourced from Moonshot AI and Hugging Face as of June 2026. Figures may change - always verify on the vendor's website.

Build Your Kimi-Powered Coding Agent

Tell us what you want to automate and we will scope it with the right model and agent stack.

Ready to Build Something Great?

Get a free 30-minute strategy call. We'll map out your project, timeline, and tech stack - no strings attached.

Let's Talk About Your Project

Prefer email? Reach us directly:

Contact Us

Subscribe · Newsletter

Ship With Kimi K2.7 Code

Practical breakdowns of open-source coding models, agentic workflows, and the tools that turn them into shipped software.

  • New deep-dives on AI agents and cloud architecture
  • Engineering teardowns of shipped products
  • No spam, unsubscribe in one click

We respect your inbox. Read our privacy policy.

Exclusive Offer for Lushbinary Readers
WidelAI

One Subscription. Every Flagship AI Model.

Stop juggling multiple AI subscriptions. WidelAI gives you access to Claude, GPT, Gemini, and more - all under a single plan.

Claude Opus & SonnetGPT-5.5 & o3Gemini ProSingle DashboardAPI Access

Use code at checkout for 10% off your subscription:

Kimi K2.7 CodeMoonshot AIOpen Source LLMCoding ModelMixture of ExpertsAgentic CodingHermes Agent256K ContextKimi Code BenchAI Coding AgentLong-Horizon TasksOpen Weight Models

ContactUs