Logo
Back to Blog
AI & AutomationApril 17, 202612 min read

Using Hermes Agent with Qwen 3.6: Self-Improving AI with Alibaba's Latest Model

Hermes Agent's self-improving loop gets a major upgrade with Qwen 3.6-35B-A3B — 37.0 MCPMark for tool calling, 73.4% SWE-bench, and thinking preservation across turns. We cover local Ollama, DashScope, OpenRouter setup, multi-agent profiles, and production deployment.

Lushbinary Team

Lushbinary Team

AI & Cloud Solutions

Using Hermes Agent with Qwen 3.6: Self-Improving AI with Alibaba's Latest Model

Hermes Agent is the self-improving AI agent from Nous Research that learns from every task, creates reusable skills, and gets smarter the longer it runs. But its intelligence is only as good as the LLM behind it. With Alibaba's Qwen 3.6-35B-A3B now available — scoring 73.4% on SWE-bench Verified and 37.0 on MCPMark — you can pair Hermes with one of the strongest open-weight models for agentic coding at zero API cost.

Hermes already supports Qwen models natively through its Alibaba Cloud provider (DashScope), OpenRouter, and Hugging Face. But the real power move is running Qwen 3.6 locally via Ollama — giving Hermes a private, zero-cost brain that activates only 3B parameters per token while drawing on 35B of learned capacity.

This guide walks through every integration path: local Ollama, cloud DashScope, and OpenRouter. We cover provider configuration, skill development, the self-improving loop, multi-agent profiles, and production deployment.

What this guide covers:

  1. Why Qwen 3.6 Is a Strong Fit for Hermes Agent
  2. Prerequisites & Installation
  3. Option A: Local Qwen 3.6 via Ollama
  4. Option B: DashScope API (Alibaba Cloud)
  5. Option C: OpenRouter Multi-Provider
  6. Hermes Self-Improving Loop with Qwen 3.6
  7. Multi-Agent Profiles
  8. Qwen 3.6 vs Other Models for Hermes
  9. Troubleshooting & Optimization
  10. Why Lushbinary for Your Hermes Deployment

1Why Qwen 3.6 Is a Strong Fit for Hermes Agent

Hermes Agent's core loop — receive task, plan steps, call tools, evaluate results, create skills — demands a model that excels at structured tool calling and multi-step reasoning. Qwen 3.6-35B-A3B delivers exactly that:

MCPMark: 37.0

Up from 27.0 on Qwen 3.5-35B-A3B. Directly measures tool-calling accuracy — the core operation Hermes uses for every task.

SWE-bench Verified: 73.4%

Handles real-world GitHub issues end-to-end. Hermes can use this for automated bug fixing and PR generation.

Terminal-Bench 2.0: 51.5%

Shell command execution accuracy jumped from 40.5%. Critical for Hermes's exec tool and system automation tasks.

Thinking Preservation

New in Qwen 3.6 — retains reasoning context across turns. Hermes's iterative skill refinement benefits directly from this.

The architecture matters too. Qwen 3.6's hybrid Gated DeltaNet + Gated Attention design with 256 experts (8 routed + 1 shared) means only 3B parameters activate per token. On a 24 GB GPU, you get dense-model quality at MoE speed — fast enough for Hermes's real-time agent interactions.

2Prerequisites & Installation

Install Hermes Agent

# Linux / macOS / WSL2
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

# Reload your shell
source ~/.bashrc  # or source ~/.zshrc

Hermes Agent was released February 25, 2026 under the MIT license and has accumulated over 64,000 GitHub stars as of April 2026. It requires a Unix-like environment — on Windows, use WSL2.

Install Ollama (for local path)

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Pull Qwen 3.6
ollama pull qwen3.6:35b-a3b

The Qwen 3.6-35B-A3B model is a 24 GB download (Q4_K_M quantization). It requires 24 GB VRAM (NVIDIA) or 32 GB unified memory (Apple Silicon). See the OpenClaw + Qwen 3.6 guide for detailed hardware requirements.

3Option A: Local Qwen 3.6 via Ollama

Hermes Agent supports any OpenAI-compatible endpoint as a custom provider. Ollama exposes exactly that at http://localhost:11434/v1.

Step 1: Configure via Interactive Wizard

# Run the model setup wizard
hermes model

# Select "Custom endpoint" from the provider list
# Base URL: http://127.0.0.1:11434/v1
# Model: qwen3.6:35b-a3b
# API Key: ollama (any non-empty string works)

Step 2: Or Configure Manually

Edit ~/.hermes/.env directly:

# ~/.hermes/.env
CUSTOM_API_KEY=ollama
CUSTOM_BASE_URL=http://127.0.0.1:11434/v1
CUSTOM_MODEL=qwen3.6:35b-a3b

Step 3: Verify the Connection

# Start a chat session
hermes chat

# Test with a simple prompt
> What model are you? List your capabilities.

Hermes auto-detects capabilities like streaming and tool use per provider. With Ollama serving Qwen 3.6, it will detect streaming support and function calling automatically.

4Option B: DashScope API (Alibaba Cloud)

Hermes has native support for Alibaba Cloud's DashScope as a first-class provider. This gives you access to Qwen 3.6 Plus with its 1M token context window without running anything locally.

# Set the DashScope API key
echo "DASHSCOPE_API_KEY=sk-your-key" >> ~/.hermes/.env

# Configure via wizard
hermes model
# Select "Alibaba Cloud" (aliases: dashscope, qwen)
# Model will default to the latest Qwen Plus

Or configure directly in ~/.hermes/.env:

DASHSCOPE_API_KEY=sk-your-dashscope-key

DashScope Free Tier

Alibaba offers 2,000 free daily API calls on DashScope. For moderate Hermes Agent use (10–30 tasks per day), this is often enough to stay within the free tier entirely.

5Option C: OpenRouter Multi-Provider

OpenRouter gives you access to Qwen 3.6 Plus Preview (currently free during preview as of April 2026) with automatic fallback routing across providers.

# Set OpenRouter API key
echo "OPENROUTER_API_KEY=sk-or-your-key" >> ~/.hermes/.env

# Configure via wizard
hermes model
# Select "OpenRouter"
# Model: qwen/qwen3.6-plus-preview

An added benefit: Hermes uses a separate "auxiliary" model for vision, web summarization, and Mixture-of-Agents tasks. With an OpenRouter key configured, these auxiliary features activate automatically using Gemini Flash by default.

⚠️ Auxiliary Model Note

Even when using DashScope or a custom endpoint as your primary provider, some Hermes tools (vision, web summarization) use a separate auxiliary model — by default Gemini Flash via OpenRouter. Setting an OPENROUTER_API_KEY enables these tools automatically.

6Hermes Self-Improving Loop with Qwen 3.6

Hermes Agent's defining feature is its closed learning loop. When it completes a task, it writes a reusable Markdown skill file, stores the outcome in persistent memory, and adjusts its approach for next time. Every 15 tasks, it runs a self-evaluation cycle.

Qwen 3.6's thinking preservation feature makes this loop more effective. Instead of re-deriving reasoning context from scratch on each turn, the model retains chain-of-thought from previous messages. For Hermes, this means:

  • Faster skill creation: The model remembers what worked in earlier steps when writing the skill summary
  • Better self-evaluation: The 15-task review cycle can reference reasoning from all 15 tasks without context window pressure
  • More accurate tool sequences: Multi-step tool chains maintain coherent reasoning across calls

How Skills Work

When Hermes solves a task, it generates a Markdown skill file like this:

# ~/.hermes/skills/deploy-nextjs-vercel.md

## Task
Deploy a Next.js app to Vercel from the CLI

## Steps
1. Verify `vercel` CLI is installed: `which vercel`
2. Run `vercel --prod` in the project root
3. Confirm deployment URL in output
4. Verify with `curl -I <deployment-url>`

## Notes
- Requires VERCEL_TOKEN in environment
- Use --yes flag to skip confirmation prompts
- If build fails, check next.config.ts for output: "export"

Next time you ask Hermes to deploy a Next.js app, it loads this skill and follows the proven steps instead of reasoning from scratch. Over time, Hermes builds a library of skills tailored to your specific workflows.

7Multi-Agent Profiles

Hermes supports running multiple independent agent profiles on the same machine. This is powerful when combined with Qwen 3.6 — you can run separate agents for different purposes without them sharing memory or skills.

# Create a coding agent profile with local Qwen 3.6
hermes profile create coding-agent
hermes profile use coding-agent
hermes model  # Configure with Ollama + qwen3.6:35b-a3b

# Create a research agent profile with cloud Qwen 3.6 Plus
hermes profile create research-agent
hermes profile use research-agent
hermes model  # Configure with DashScope + qwen-plus-latest

# Switch between profiles
hermes profile use coding-agent
hermes chat

Each profile maintains its own skills, memory, tool configuration, and provider settings. The coding agent uses local Qwen 3.6 for fast, private code generation. The research agent uses cloud Qwen 3.6 Plus for its 1M token context window when analyzing large codebases or documents.

8Qwen 3.6 vs Other Models for Hermes

How does Qwen 3.6-35B-A3B compare to other open-weight models you might use with Hermes Agent?

ModelSWE-benchMCPMarkVRAMLicense
Qwen 3.6-35B-A3B73.4%37.024 GBApache 2.0
Qwen 3.5-27B Dense75.0%36.320 GBApache 2.0
Qwen 3.5-35B-A3B70.0%27.024 GBApache 2.0
Gemma 4-31B Dense52.0%18.124 GBApache 2.0
Gemma 4-26B MoE20 GBApache 2.0

Benchmarks sourced from the Qwen 3.6-35B-A3B model card. Qwen 3.6-35B-A3B offers the best MCPMark score among MoE models at this VRAM tier, making it the strongest choice for Hermes's tool-heavy workflows.

9Troubleshooting & Optimization

Common Issues

  • "Provider not configured": If you set up DashScope but Hermes doesn't recognize it, make sure you ran hermes model from your terminal (not inside a Hermes chat session). The /model command inside a session only switches between already-configured providers.
  • Slow local inference: Qwen 3.6-35B-A3B with Ollama on an RTX 4090 should give ~15–20 tok/s. If you're getting less, check that GPU offloading is enabled with OLLAMA_GPU_LAYERS=-1.
  • Tool calls failing: Ensure tools are enabled with hermes tools. Qwen 3.6's MCPMark score of 37.0 means it handles tool calling well, but some tools (browser, vision) require the auxiliary model via OpenRouter.
  • Memory/skills not persisting: Check that ~/.hermes/skills/ and ~/.hermes/memory/ directories exist and are writable. Hermes stores all learning artifacts as local files.

Performance Tips

  • Use llama.cpp for faster local inference: Ollama is convenient but adds overhead. For production Hermes deployments, serving Qwen 3.6 directly via llama.cpp can give 2–5x faster token generation.
  • Set context window appropriately: Qwen 3.6 supports 262K natively, but most Hermes tasks don't need that much. Setting OLLAMA_NUM_CTX=32768 saves memory and speeds up inference.
  • Enable MCP servers: Hermes supports MCP server mode for connecting to external tools. With Qwen 3.6's strong MCPMark score, it handles MCP tool routing reliably.

10Why Lushbinary for Your Hermes Deployment

Hermes Agent with Qwen 3.6 is a powerful combination for individual developers. But scaling it to a team, integrating it with your CI/CD pipeline, or deploying it as a 24/7 service on your infrastructure requires production engineering.

Lushbinary has deployed both Hermes Agent and OpenClaw setups for clients across SaaS, e-commerce, and healthcare. We handle:

  • Hermes Agent deployment on AWS with systemd, auto-restart, and monitoring
  • Custom skill development for your team's specific workflows
  • Multi-agent profile architecture for different team roles
  • Hybrid local/cloud model routing for cost optimization
  • Security hardening and access control for shared agent instances

🚀 Free Consultation

Want to deploy Hermes Agent with Qwen 3.6 for your team? Lushbinary specializes in self-improving AI agent infrastructure. We'll scope your setup, recommend the right model and deployment strategy, and give you a realistic timeline — no obligation.

❓ Frequently Asked Questions

Can Hermes Agent use Qwen 3.6 as its LLM backend?

Yes. Hermes supports Qwen models through DashScope (set DASHSCOPE_API_KEY), OpenRouter (set OPENROUTER_API_KEY), Hugging Face (set HF_TOKEN), and custom endpoints (Ollama at http://127.0.0.1:11434/v1).

How do I set up Hermes Agent with Qwen 3.6 locally via Ollama?

Install Hermes with the one-line installer, pull qwen3.6:35b-a3b via Ollama, then run 'hermes model' and select 'Custom endpoint'. Set the base URL to http://127.0.0.1:11434/v1 and the model to qwen3.6:35b-a3b.

Does Hermes Agent's self-improving loop work with Qwen 3.6?

Yes. The learning loop is model-agnostic. Hermes creates skill files, stores outcomes in memory, and self-evaluates every 15 tasks regardless of which model is configured. Qwen 3.6's MCPMark score of 37.0 makes it effective at the tool-calling patterns Hermes relies on.

What is the cheapest way to run Hermes Agent with Qwen 3.6?

Local inference via Ollama is completely free after hardware costs. Qwen 3.6-35B-A3B runs on a 24 GB GPU or 32 GB Apple Silicon Mac. For cloud, Qwen 3.6 Plus Preview is free on OpenRouter, and DashScope offers 2,000 free daily API calls.

How does Qwen 3.6 compare to other models for Hermes Agent?

Qwen 3.6-35B-A3B scores 73.4% on SWE-bench Verified and 37.0 on MCPMark, outperforming Gemma 4-31B (52.0% SWE-bench, 18.1 MCPMark) at similar VRAM cost. It offers the best open-weight performance per GB for agentic workflows as of April 2026.

Sources

Content was rephrased for compliance with licensing restrictions. Benchmark data sourced from official Qwen model cards and Hermes Agent documentation as of April 2026. Pricing and availability may change — always verify on the vendor's website.

Need Help Deploying Hermes Agent with Qwen 3.6?

From local setup to production-grade self-improving AI agents, Lushbinary builds agent infrastructure that scales.

Build Smarter, Launch Faster.

Book a free strategy call and explore how LushBinary can turn your vision into reality.

Let's Talk About Your Project

Contact Us

Hermes AgentQwen 3.6Nous ResearchQwen 3.6-35B-A3BOllamaDashScopeSelf-Improving AIAI AgentLocal LLMMoE ModelAgentic CodingOpen-Source AI

ContactUs