Does Gemma 4 support MCP tools natively?

Gemma 4 ships with six dedicated control tokens for function calling and has first-class support for MCP via adapters that map Gemma's native JSON tool calls to MCP tool invocations. The gemma-mcp Python package handles this conversion, letting Gemma 4 models drive MCP servers through a vLLM OpenAI-compatible endpoint.

What is the latest MCP spec version in 2026?

Spec 2025-11-25 is the latest stable release. Streamable HTTP replaced SSE as the recommended remote transport in spec 2025-03-26. The protocol was donated to the Linux Foundation's Agentic AI Foundation in December 2025. OAuth 2.1, elicitation, and structured output were added in the 2025-06-18 revision.

How do you secure a custom MCP server in production?

Use OAuth 2.1 for remote servers, scope tokens by tool, validate all inputs with Zod or Pydantic, apply a network allowlist/denylist, run servers in Docker sandboxes or microVMs, log every tool invocation, and implement rate limiting. Avoid shipping credentials in the server process, use short-lived scoped tokens injected at runtime.

Can I use Hermes Agent and OpenClaw as MCP clients?

Yes. Hermes Agent has first-class MCP client and server mode since v0.6.0. OpenClaw supports MCP servers through its plugin system. Both can consume the same custom MCP servers you build for Claude, Cursor, or Kiro, which makes MCP the correct abstraction for reusable internal tooling.

MCP is the USB-C port for AI. Before MCP, every AI host (Claude Desktop, Cursor, Kiro, Cline, Hermes Agent) invented its own plugin system. Every tool vendor shipped three or four bespoke adapters. Today, a single MCP server written in TypeScript or Python works across Claude, Cursor, VS Code, Kiro, OpenCode, OpenClaw, Hermes Agent, Gemma 4, and any other MCP-aware host.

That's the promise. Actually building a custom MCP server, shipping it safely, and wiring it into multiple AI hosts takes a bit more care. This guide covers the core architecture, a working TypeScript and Python server, wiring into Claude Desktop, Cursor, Kiro, Gemma 4 through gemma-mcp, and Hermes Agent, plus the security practices you need for production.

If you've been frustrated that your AI agent doesn't know your internal APIs, your company's database schema, or your custom deployment pipeline, MCP is the fix. Build once, expose everywhere.

📑 What This Guide Covers

What an MCP Server Actually Is
Architecture: Tools, Resources, Prompts
TypeScript Server in 60 Lines
Python Server with FastMCP
Transports: stdio vs Streamable HTTP
Wiring Into Claude, Cursor, Kiro, VS Code
Gemma 4 + MCP via gemma-mcp
Hermes Agent and OpenClaw as MCP Clients
Production Security and Deployment
How Lushbinary Builds Custom MCP Servers

1What an MCP Server Actually Is

An MCP server is a standalone program that exposes capabilities to AI hosts through the Model Context Protocol. The protocol is a structured JSON-RPC 2.0 exchange over one of two transports: stdio (for local servers) or Streamable HTTP (for remote servers).

📊 MCP by the Numbers (May 2026)

Latest spec 2025-11-25 · Donated to Linux Foundation Dec 2025 · 13,000+ community servers · 97M+ SDK downloads · First-class support in Claude, Cursor, VS Code, Kiro, OpenCode, Hermes, OpenClaw, Cline, Gemma 4, DeepSeek V4

An MCP server exposes three primitives:

Tools: Functions the model can call with arguments. Tools do things (query DB, send email, fetch data).
Resources: Read-only content the model can pull into context (files, schemas, configs, docs).
Prompts: Reusable prompt templates users or agents can invoke (for example, "review this PR").

2Architecture: Tools, Resources, Prompts

The host spawns or connects to a server, negotiates capabilities, then calls tools/list, resources/list, and prompts/list to discover what's available. When the model wants to call a tool, it emits a function call, the host routes it through the MCP client, and the server executes the function and returns the result.

3TypeScript Server in 60 Lines

The official TypeScript SDK (@modelcontextprotocol/sdk) makes a production-ready server trivial. This server exposes a single tool that fetches a customer record from your internal API:

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";

const server = new McpServer({
  name: "acme-crm",
  version: "1.0.0",
});

server.registerTool(
  "get_customer",
  {
    description: "Fetch a customer by ID from the Acme CRM.",
    inputSchema: { id: z.string().min(1) },
  },
  async ({ id }) => {
    const res = await fetch(`https://crm.internal/api/customers/${id}`, {
      headers: { Authorization: `Bearer ${process.env.CRM_TOKEN}` },
    });
    if (!res.ok) throw new Error(`CRM API ${res.status}`);
    const data = await res.json();
    return {
      content: [{ type: "text", text: JSON.stringify(data) }],
    };
  },
);

const transport = new StdioServerTransport();
await server.connect(transport);

Things to notice:

Zod schemas define tool arguments and are automatically surfaced to the AI model as JSON Schema.
Secrets come from environment variables, never from the tool arguments.
Return shape is always { content: [...] }. Most servers return a single text block; you can also return images, audio, or resource links.
Errors thrown inside the handler are serialized back to the client so the model can see the failure and retry.

4Python Server with FastMCP

The Python equivalent uses FastMCP, a FastAPI-style decorator API that keeps boilerplate minimal:

from mcp.server.fastmcp import FastMCP
import httpx
import os

mcp = FastMCP("acme-crm")

@mcp.tool()
async def get_customer(customer_id: str) -> dict:
    """Fetch a customer by ID from the Acme CRM."""
    async with httpx.AsyncClient() as client:
        res = await client.get(
            f"https://crm.internal/api/customers/{customer_id}",
            headers={"Authorization": f"Bearer {os.environ['CRM_TOKEN']}"},
        )
        res.raise_for_status()
        return res.json()

if __name__ == "__main__":
    mcp.run(transport="stdio")

Docstrings become tool descriptions. Type hints become the JSON Schema. Run the server with uv run server.py or package it with uvx for one-line installs. For remote deployments, swap the transport to streamable-http and front with an auth proxy.

5Transports: stdio vs Streamable HTTP

Aspect	stdio	Streamable HTTP
Use case	Local tools, desktop IDEs	Remote tools, multi-user, team platforms
Auth	Parent process inherits permissions	OAuth 2.1 with scoped tokens
Scaling	One server per client	Horizontal behind a load balancer
Deploy	Binary or uvx command	Docker + load balancer + auth proxy
Recommended for	Dev workflows, desktop apps	Shared enterprise services

Streamable HTTP replaced SSE (Server-Sent Events) as the recommended remote transport in spec 2025-03-26. Use it for any server that needs to be reached by multiple developers, CI/CD pipelines, or production agents.

6Wiring Into Claude, Cursor, Kiro, VS Code

Each host has its own config location but the shape is nearly identical. Here's a config that works in Claude Desktop, Cursor, and Kiro:

{
  "mcpServers": {
    "acme-crm": {
      "command": "node",
      "args": ["/path/to/acme-crm/dist/index.js"],
      "env": {
        "CRM_TOKEN": "${env:CRM_TOKEN}"
      },
      "disabled": false,
      "autoApprove": ["get_customer"]
    }
  }
}

Claude Desktop: ~/Library/Application Support/Claude/claude_desktop_config.json
Cursor: ~/.cursor/mcp.json (user) or .cursor/mcp.json (project)
Kiro: ~/.kiro/settings/mcp.json(user) or .kiro/settings/mcp.json (workspace). User config is overridden by workspace config.
VS Code: Via the official MCP extension, configured in settings.json under mcp.servers.

7Gemma 4 + MCP via gemma-mcp

Open-weight models like Gemma 4 have native function calling via six control tokens. The gemma-mcp Python package maps Gemma 4's native tool call format to MCP tool invocations, which lets you drive any MCP server with a self-hosted Gemma 4 via vLLM.

# Run Gemma 4 on vLLM with OpenAI-compatible API
vllm serve google/gemma-4-26b-a4b --port 8000

# Then use gemma-mcp to bridge Gemma's native tool calls to MCP
from gemma_mcp import GemmaMcpBridge

bridge = GemmaMcpBridge(
    model_endpoint="http://localhost:8000/v1",
    mcp_servers={"acme-crm": {"command": "node", "args": [...]}},
)

result = await bridge.chat("Find me customer id 4421 and summarize recent orders.")

This pattern is central to zero-cost agentic AI. You run Gemma 4 locally or on EC2 GPU, point it at your MCP servers, and get a full agent loop without paying per-token for Claude or GPT.

8Hermes Agent and OpenClaw as MCP Clients

Hermes Agent supports MCP as both a client (connect to external MCP servers) and a server (expose Hermes skills to other AI hosts over MCP) since v0.6.0.

# Add an MCP server to Hermes
hermes mcp add acme-crm \
  --command node \
  --args /path/to/acme-crm/dist/index.js \
  --env CRM_TOKEN=${CRM_TOKEN}

# Expose Hermes skills as an MCP server to other hosts
hermes mcp serve --port 3999

OpenClaw exposes MCP through its ClawHub plugin system. The same custom MCP server you wrote above works as a ClawHub plugin with zero code changes. This is why investing in MCP pays off: one server, many hosts, reusable across your entire agent stack.

9Production Security and Deployment

🛡️ MCP Security Pitfalls

Every MCP server is a potential privilege escalation. A malicious or compromised server can exfiltrate secrets, prompt inject the model, or trigger destructive actions. Treat MCP servers the way you treat API gateways, not the way you treat utility libraries.

Ten things you should do before shipping:

Least privilege tokens: Scope the token in the server environment to the minimum API surface needed.
Validate all inputs: Use Zod or Pydantic to reject bad argument shapes before hitting your internal API.
Sanitize outputs: Strip PII, secrets, and internal IDs before returning results to the model.
Rate limit: Enforce per-tool and per-host rate limits to prevent runaway agent loops.
OAuth 2.1: For remote servers, use OAuth 2.1 with short-lived tokens and refresh flows.
Audit logs: Log every tool invocation with the calling host, arguments, result, and latency.
Sandbox: Run each MCP server in a Docker container or microVM with a tight network allowlist.
Kill switch: Provide a one-click way to disable a compromised server across all hosts.
Pin dependencies: Lock SDK and dependency versions to avoid supply-chain surprises.
Prompt-injection defense: Filter tool outputs for instructions that try to hijack the model before they hit the context window.

For a deeper dive on the protocol itself and the latest spec updates, see our MCP developer guide.

10How Lushbinary Builds Custom MCP Servers

At Lushbinary, we build custom MCP servers for clients that want their AI agents to reach into internal systems: CRMs, ticketing, billing, observability, deployment pipelines, and bespoke internal tools. The work splits into four phases:

Scoping: Which tools do your agents actually need? We map the minimum tool surface against real agent tasks.
Server build: TypeScript or Python server, thorough input validation, output sanitization, audit logging.
Host wiring: Configure Claude, Cursor, Kiro, Hermes, and any self-hosted Gemma 4 agents to consume the server.
Hardening: OAuth 2.1, sandboxing, rate limiting, kill switches, runbook for incident response.

🚀 Free Consultation

Want to give your AI agents reliable access to your internal systems? Lushbinary builds custom MCP servers end-to-end, from scoping to hardened production deployment. No obligation.

❓ Frequently Asked Questions

What is an MCP server and why build a custom one?

An MCP server exposes tools, resources, and prompts to AI hosts through the Model Context Protocol. Build a custom one to give your agents access to internal APIs, databases, or bespoke tools that don't have off-the-shelf integrations.

Does Gemma 4 support MCP natively?

Gemma 4 ships with six control tokens for function calling. The gemma-mcp package maps Gemma's native tool call format to MCP invocations, letting self-hosted Gemma 4 drive any MCP server through a vLLM endpoint.

What is the latest MCP spec in 2026?

Spec 2025-11-25 is the latest stable release. Streamable HTTP replaced SSE as the recommended remote transport. OAuth 2.1, elicitation, and structured output were added earlier in the year.

How do I secure a custom MCP server?

Use OAuth 2.1, scope tokens per tool, validate inputs with Zod or Pydantic, sanitize outputs, rate limit, sandbox with Docker or microVMs, audit-log every call, and implement a kill switch. Treat every MCP server as a privilege escalation target.

Can Hermes Agent and OpenClaw consume the same MCP server?

Yes. Hermes Agent has first-class MCP client and server mode since v0.6.0. OpenClaw consumes MCP servers through its plugin system. The same server works across Claude, Cursor, Kiro, Hermes, OpenClaw, and Gemma 4 via gemma-mcp.

📚 Sources

Content was rephrased for compliance with licensing restrictions. Protocol and SDK details sourced from official Model Context Protocol documentation as of May 2026. SDK APIs may change, always verify on the official site.

Give Your AI Agents Real-World Superpowers

Lushbinary builds custom MCP servers that wire Claude, Cursor, Kiro, Hermes, and Gemma 4 into your internal APIs. Scoped, sandboxed, production-ready.

Ready to Build Something Great?

Q: What is an MCP server and why build a custom one?

An MCP server is a program that exposes tools, resources, and prompts to AI hosts (Claude, Cursor, VS Code, Kiro, Gemma 4, Hermes Agent) through the Model Context Protocol. You build a custom MCP server when you need to give an AI agent access to your internal APIs, databases, or bespoke tools without waiting for an official integration.

Get a free 30-minute strategy call. We'll map out your project, timeline, and tech stack - no strings attached.

Let's Talk About Your Project

Prefer email? Reach us directly:

connect@lushbinary.com

MCP Server Development Guide: Build Custom Servers for Claude, Gemma 4, Kiro & Hermes Agent

📑 What This Guide Covers

1What an MCP Server Actually Is

2Architecture: Tools, Resources, Prompts

3TypeScript Server in 60 Lines

4Python Server with FastMCP

5Transports: stdio vs Streamable HTTP

6Wiring Into Claude, Cursor, Kiro, VS Code

7Gemma 4 + MCP via gemma-mcp

8Hermes Agent and OpenClaw as MCP Clients

9Production Security and Deployment

10How Lushbinary Builds Custom MCP Servers

❓ Frequently Asked Questions

What is an MCP server and why build a custom one?

Does Gemma 4 support MCP natively?

What is the latest MCP spec in 2026?

How do I secure a custom MCP server?

Can Hermes Agent and OpenClaw consume the same MCP server?

📚 Sources

Give Your AI Agents Real-World Superpowers

Ready to Build Something Great?

Contact Us

One Subscription. Every Flagship AI Model.

More from the Blog

AI Agent Production Guardrails: 10 Ways to Prevent Catastrophic Data Loss

Gemini 3.1 Pro: What's New, Benchmark Results & Developer Guide

ContactUs

Our Address

Phone

Email