MCP is the USB-C port for AI. Before MCP, every AI host (Claude Desktop, Cursor, Kiro, Cline, Hermes Agent) invented its own plugin system. Every tool vendor shipped three or four bespoke adapters. Today, a single MCP server written in TypeScript or Python works across Claude, Cursor, VS Code, Kiro, OpenCode, OpenClaw, Hermes Agent, Gemma 4, and any other MCP-aware host.
That's the promise. Actually building a custom MCP server, shipping it safely, and wiring it into multiple AI hosts takes a bit more care. This guide covers the core architecture, a working TypeScript and Python server, wiring into Claude Desktop, Cursor, Kiro, Gemma 4 through gemma-mcp, and Hermes Agent, plus the security practices you need for production.
If you've been frustrated that your AI agent doesn't know your internal APIs, your company's database schema, or your custom deployment pipeline, MCP is the fix. Build once, expose everywhere.
📑 What This Guide Covers
- What an MCP Server Actually Is
- Architecture: Tools, Resources, Prompts
- TypeScript Server in 60 Lines
- Python Server with FastMCP
- Transports: stdio vs Streamable HTTP
- Wiring Into Claude, Cursor, Kiro, VS Code
- Gemma 4 + MCP via gemma-mcp
- Hermes Agent and OpenClaw as MCP Clients
- Production Security and Deployment
- How Lushbinary Builds Custom MCP Servers
1What an MCP Server Actually Is
An MCP server is a standalone program that exposes capabilities to AI hosts through the Model Context Protocol. The protocol is a structured JSON-RPC 2.0 exchange over one of two transports: stdio (for local servers) or Streamable HTTP (for remote servers).
📊 MCP by the Numbers (May 2026)
Latest spec 2025-11-25 · Donated to Linux Foundation Dec 2025 · 13,000+ community servers · 97M+ SDK downloads · First-class support in Claude, Cursor, VS Code, Kiro, OpenCode, Hermes, OpenClaw, Cline, Gemma 4, DeepSeek V4
An MCP server exposes three primitives:
- Tools: Functions the model can call with arguments. Tools do things (query DB, send email, fetch data).
- Resources: Read-only content the model can pull into context (files, schemas, configs, docs).
- Prompts: Reusable prompt templates users or agents can invoke (for example, "review this PR").
2Architecture: Tools, Resources, Prompts
The host spawns or connects to a server, negotiates capabilities, then calls tools/list, resources/list, and prompts/list to discover what's available. When the model wants to call a tool, it emits a function call, the host routes it through the MCP client, and the server executes the function and returns the result.
3TypeScript Server in 60 Lines
The official TypeScript SDK (@modelcontextprotocol/sdk) makes a production-ready server trivial. This server exposes a single tool that fetches a customer record from your internal API:
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
const server = new McpServer({
name: "acme-crm",
version: "1.0.0",
});
server.registerTool(
"get_customer",
{
description: "Fetch a customer by ID from the Acme CRM.",
inputSchema: { id: z.string().min(1) },
},
async ({ id }) => {
const res = await fetch(`https://crm.internal/api/customers/${id}`, {
headers: { Authorization: `Bearer ${process.env.CRM_TOKEN}` },
});
if (!res.ok) throw new Error(`CRM API ${res.status}`);
const data = await res.json();
return {
content: [{ type: "text", text: JSON.stringify(data) }],
};
},
);
const transport = new StdioServerTransport();
await server.connect(transport);Things to notice:
- Zod schemas define tool arguments and are automatically surfaced to the AI model as JSON Schema.
- Secrets come from environment variables, never from the tool arguments.
- Return shape is always
{ content: [...] }. Most servers return a single text block; you can also return images, audio, or resource links. - Errors thrown inside the handler are serialized back to the client so the model can see the failure and retry.
4Python Server with FastMCP
The Python equivalent uses FastMCP, a FastAPI-style decorator API that keeps boilerplate minimal:
from mcp.server.fastmcp import FastMCP
import httpx
import os
mcp = FastMCP("acme-crm")
@mcp.tool()
async def get_customer(customer_id: str) -> dict:
"""Fetch a customer by ID from the Acme CRM."""
async with httpx.AsyncClient() as client:
res = await client.get(
f"https://crm.internal/api/customers/{customer_id}",
headers={"Authorization": f"Bearer {os.environ['CRM_TOKEN']}"},
)
res.raise_for_status()
return res.json()
if __name__ == "__main__":
mcp.run(transport="stdio")Docstrings become tool descriptions. Type hints become the JSON Schema. Run the server with uv run server.py or package it with uvx for one-line installs. For remote deployments, swap the transport to streamable-http and front with an auth proxy.
5Transports: stdio vs Streamable HTTP
| Aspect | stdio | Streamable HTTP |
|---|---|---|
| Use case | Local tools, desktop IDEs | Remote tools, multi-user, team platforms |
| Auth | Parent process inherits permissions | OAuth 2.1 with scoped tokens |
| Scaling | One server per client | Horizontal behind a load balancer |
| Deploy | Binary or uvx command | Docker + load balancer + auth proxy |
| Recommended for | Dev workflows, desktop apps | Shared enterprise services |
Streamable HTTP replaced SSE (Server-Sent Events) as the recommended remote transport in spec 2025-03-26. Use it for any server that needs to be reached by multiple developers, CI/CD pipelines, or production agents.
6Wiring Into Claude, Cursor, Kiro, VS Code
Each host has its own config location but the shape is nearly identical. Here's a config that works in Claude Desktop, Cursor, and Kiro:
{
"mcpServers": {
"acme-crm": {
"command": "node",
"args": ["/path/to/acme-crm/dist/index.js"],
"env": {
"CRM_TOKEN": "${env:CRM_TOKEN}"
},
"disabled": false,
"autoApprove": ["get_customer"]
}
}
}- Claude Desktop:
~/Library/Application Support/Claude/claude_desktop_config.json - Cursor:
~/.cursor/mcp.json(user) or.cursor/mcp.json(project) - Kiro:
~/.kiro/settings/mcp.json(user) or.kiro/settings/mcp.json(workspace). User config is overridden by workspace config. - VS Code: Via the official MCP extension, configured in settings.json under
mcp.servers.
7Gemma 4 + MCP via gemma-mcp
Open-weight models like Gemma 4 have native function calling via six control tokens. The gemma-mcp Python package maps Gemma 4's native tool call format to MCP tool invocations, which lets you drive any MCP server with a self-hosted Gemma 4 via vLLM.
# Run Gemma 4 on vLLM with OpenAI-compatible API
vllm serve google/gemma-4-26b-a4b --port 8000
# Then use gemma-mcp to bridge Gemma's native tool calls to MCP
from gemma_mcp import GemmaMcpBridge
bridge = GemmaMcpBridge(
model_endpoint="http://localhost:8000/v1",
mcp_servers={"acme-crm": {"command": "node", "args": [...]}},
)
result = await bridge.chat("Find me customer id 4421 and summarize recent orders.")This pattern is central to zero-cost agentic AI. You run Gemma 4 locally or on EC2 GPU, point it at your MCP servers, and get a full agent loop without paying per-token for Claude or GPT.
8Hermes Agent and OpenClaw as MCP Clients
Hermes Agent supports MCP as both a client (connect to external MCP servers) and a server (expose Hermes skills to other AI hosts over MCP) since v0.6.0.
# Add an MCP server to Hermes
hermes mcp add acme-crm \
--command node \
--args /path/to/acme-crm/dist/index.js \
--env CRM_TOKEN=${CRM_TOKEN}
# Expose Hermes skills as an MCP server to other hosts
hermes mcp serve --port 3999OpenClaw exposes MCP through its ClawHub plugin system. The same custom MCP server you wrote above works as a ClawHub plugin with zero code changes. This is why investing in MCP pays off: one server, many hosts, reusable across your entire agent stack.
9Production Security and Deployment
🛡️ MCP Security Pitfalls
Every MCP server is a potential privilege escalation. A malicious or compromised server can exfiltrate secrets, prompt inject the model, or trigger destructive actions. Treat MCP servers the way you treat API gateways, not the way you treat utility libraries.
Ten things you should do before shipping:
- Least privilege tokens: Scope the token in the server environment to the minimum API surface needed.
- Validate all inputs: Use Zod or Pydantic to reject bad argument shapes before hitting your internal API.
- Sanitize outputs: Strip PII, secrets, and internal IDs before returning results to the model.
- Rate limit: Enforce per-tool and per-host rate limits to prevent runaway agent loops.
- OAuth 2.1: For remote servers, use OAuth 2.1 with short-lived tokens and refresh flows.
- Audit logs: Log every tool invocation with the calling host, arguments, result, and latency.
- Sandbox: Run each MCP server in a Docker container or microVM with a tight network allowlist.
- Kill switch: Provide a one-click way to disable a compromised server across all hosts.
- Pin dependencies: Lock SDK and dependency versions to avoid supply-chain surprises.
- Prompt-injection defense: Filter tool outputs for instructions that try to hijack the model before they hit the context window.
For a deeper dive on the protocol itself and the latest spec updates, see our MCP developer guide.
10How Lushbinary Builds Custom MCP Servers
At Lushbinary, we build custom MCP servers for clients that want their AI agents to reach into internal systems: CRMs, ticketing, billing, observability, deployment pipelines, and bespoke internal tools. The work splits into four phases:
- Scoping: Which tools do your agents actually need? We map the minimum tool surface against real agent tasks.
- Server build: TypeScript or Python server, thorough input validation, output sanitization, audit logging.
- Host wiring: Configure Claude, Cursor, Kiro, Hermes, and any self-hosted Gemma 4 agents to consume the server.
- Hardening: OAuth 2.1, sandboxing, rate limiting, kill switches, runbook for incident response.
🚀 Free Consultation
Want to give your AI agents reliable access to your internal systems? Lushbinary builds custom MCP servers end-to-end, from scoping to hardened production deployment. No obligation.
❓ Frequently Asked Questions
What is an MCP server and why build a custom one?
An MCP server exposes tools, resources, and prompts to AI hosts through the Model Context Protocol. Build a custom one to give your agents access to internal APIs, databases, or bespoke tools that don't have off-the-shelf integrations.
Does Gemma 4 support MCP natively?
Gemma 4 ships with six control tokens for function calling. The gemma-mcp package maps Gemma's native tool call format to MCP invocations, letting self-hosted Gemma 4 drive any MCP server through a vLLM endpoint.
What is the latest MCP spec in 2026?
Spec 2025-11-25 is the latest stable release. Streamable HTTP replaced SSE as the recommended remote transport. OAuth 2.1, elicitation, and structured output were added earlier in the year.
How do I secure a custom MCP server?
Use OAuth 2.1, scope tokens per tool, validate inputs with Zod or Pydantic, sanitize outputs, rate limit, sandbox with Docker or microVMs, audit-log every call, and implement a kill switch. Treat every MCP server as a privilege escalation target.
Can Hermes Agent and OpenClaw consume the same MCP server?
Yes. Hermes Agent has first-class MCP client and server mode since v0.6.0. OpenClaw consumes MCP servers through its plugin system. The same server works across Claude, Cursor, Kiro, Hermes, OpenClaw, and Gemma 4 via gemma-mcp.
📚 Sources
Content was rephrased for compliance with licensing restrictions. Protocol and SDK details sourced from official Model Context Protocol documentation as of May 2026. SDK APIs may change, always verify on the official site.
Give Your AI Agents Real-World Superpowers
Lushbinary builds custom MCP servers that wire Claude, Cursor, Kiro, Hermes, and Gemma 4 into your internal APIs. Scoped, sandboxed, production-ready.
Ready to Build Something Great?
Get a free 30-minute strategy call. We'll map out your project, timeline, and tech stack - no strings attached.
Prefer email? Reach us directly:

