OpenClaw has crossed 250,000 GitHub stars and become the default open-source AI agent framework in 2026. But most guides assume you're plugging in a Claude or GPT API key — which means $20–60/month in token costs before you've even done anything interesting. What if you could run the same autonomous agent with open-source models for almost nothing?
You can. OpenClaw is provider-agnostic. It supports 12+ LLM providers including Ollama for fully local inference, OpenRouter for multi-model access, and direct API connections to DeepSeek, Together AI, and Fireworks AI — all of which serve open-weight models at a fraction of frontier API costs.
This guide covers every approach: running OpenClaw with Llama 4 Scout, DeepSeek V3, and Qwen 3.5 locally via Ollama, through cheap cloud APIs, and self-hosted on AWS with vLLM. We'll break down the exact monthly cost for each setup so you can pick the one that fits your budget.
What This Guide Covers
- Why Open-Source Models for OpenClaw
- Best Open-Source Models for OpenClaw in 2026
- Approach 1: Local Ollama — $0/Month Token Cost
- Approach 2: Cheap Cloud APIs — $1–5/Month
- Approach 3: Self-Hosted vLLM on AWS — Full Control
- Deploying OpenClaw on AWS (Server Setup)
- Smart Model Routing: Mix Cheap + Premium
- Full Monthly Cost Comparison Table
- AWS Bedrock: Managed Open-Weight Models
- Architecture Diagram
- Why Lushbinary for OpenClaw Deployment
1Why Open-Source Models for OpenClaw
OpenClaw's default setup points you at Claude or GPT — models that cost $3–15 per million output tokens. For a personal AI agent that runs shell commands, manages your calendar, and answers questions throughout the day, that adds up fast. A moderate user generating 200K tokens/day hits $90–900/month on frontier APIs.
Open-source models have closed the gap dramatically. Llama 4 Scout's MoE architecture activates only 17B parameters per token while delivering performance that rivals GPT-4o on many benchmarks. DeepSeek V3 offers API access at $0.28/M input tokens — roughly 10–50x cheaper than frontier models.
The tradeoff? Open-source models are slightly weaker at complex multi-step reasoning and tool calling compared to Claude Opus 4.6 or GPT-5.4. But for 80% of what a personal AI agent does — answering questions, summarizing content, drafting messages, running simple automations — they're more than capable. And you can always route the hard tasks to a premium model while keeping the cheap model as your default.
2Best Open-Source Models for OpenClaw in 2026
Not all open-source models work well as OpenClaw backends. You need solid tool-calling support, reasonable context windows, and good instruction following. Here are the top picks as of March 2026:
| Model | Params (Active) | Context | API Cost (per 1M tokens) | Local VRAM |
|---|---|---|---|---|
| Llama 4 Scout | 109B (17B active) | 10M tokens | ~$0.10 in / $0.25 out | ~24GB (Q4) |
| Llama 4 Maverick | 400B (17B active) | 1M tokens | ~$0.20 in / $0.60 out | ~80GB+ (multi-GPU) |
| DeepSeek V3.2 | 671B MoE | 128K tokens | $0.28 in / $0.42 out | Too large for local |
| Qwen 3.5 9B | 9B dense | 128K tokens | ~$0.05 in / $0.15 out | ~6GB (Q4) |
| Qwen 3.5 35B-A3B | 35B (3B active) | 1M tokens | ~$0.08 in / $0.20 out | ~4GB (Q4) |
💡 Our Pick for Most Users
Llama 4 Scout via a cloud API (Together AI, Fireworks AI, or OpenRouter) gives you the best quality-to-cost ratio. It fits on a single H100 GPU, has a massive 10M token context window, and its MoE architecture means you only pay for 17B active parameters per token. For local-only setups, Qwen 3.5 9B runs well on a Mac with 16GB RAM.
3Approach 1: Local Ollama — $0/Month Token Cost
The cheapest possible setup: run the LLM on the same machine as OpenClaw (or on a local server) using Ollama. Zero API costs. Complete privacy. The only cost is your electricity bill.
Hardware Requirements
CPU-Only (8–16GB RAM)
Qwen 3.5 0.8B–9B, Llama 3.2 3B. Slow but functional for light personal use.
GPU with 8–12GB VRAM
Qwen 3.5 9B Q4, Llama 3.2 8B. Good for moderate daily use.
GPU with 24GB VRAM
Llama 4 Scout Q4, Qwen 3.5 35B-A3B. Best local experience.
Mac M-series (16–32GB)
Unified memory lets you run larger models. M2/M3/M4 with 32GB handles Scout well.
Setup Steps
Install Ollama and pull a model:
# Install Ollama curl -fsSL https://ollama.com/install.sh | sh # Pull a model (pick one) ollama pull llama4-scout # 24GB VRAM recommended ollama pull qwen3.5:9b # 8GB VRAM or 16GB RAM ollama pull qwen3.5:0.8b # CPU-only, minimal hardware # Verify it's running ollama list
Then configure OpenClaw to use Ollama. During onboarding (openclaw onboard), select Ollama as your provider. Or edit your config directly:
# ~/.openclaw/.env OLLAMA_API_KEY="ollama-local" OLLAMA_BASE_URL="http://localhost:11434"
⚠️ Important: Use the Native Ollama API
Do not use the /v1 OpenAI-compatible URL (http://localhost:11434/v1) with OpenClaw. This breaks tool calling and models may output raw tool JSON as plain text. Use the native Ollama API URL: http://localhost:11434 (no /v1).
Monthly Cost: Local Ollama
| Item | Cost |
|---|---|
| LLM token costs | $0 |
| Electricity (GPU idle + inference) | ~$5–15/month |
| Hardware (amortized) | Depends on existing setup |
| Total | ~$5–15/month (electricity only) |
4Approach 2: Cheap Cloud APIs — $1–5/Month
If you don't have a GPU or want your agent running 24/7 on a cloud server, the cheapest approach is using open-source model APIs. These providers host the same open-weight models on optimized infrastructure and charge per token — at rates 20–90% cheaper than frontier APIs.
Top Providers for Open-Source Model APIs
| Provider | Llama 4 Scout (in/out per 1M) | DeepSeek V3.2 (in/out per 1M) |
|---|---|---|
| DeepSeek API (direct) | N/A | $0.28 / $0.42 |
| Together AI | ~$0.10 / $0.25 | ~$0.20 / $0.60 |
| Fireworks AI | ~$0.10 / $0.25 | ~$0.20 / $0.60 |
| OpenRouter | Varies (+ 5.5% fee) | Varies (+ 5.5% fee) |
| Groq | ~$0.05 / $0.10 | N/A |
Configuring OpenClaw with DeepSeek
DeepSeek V3.2 is the cheapest high-quality option. Get an API key from platform.deepseek.com and configure OpenClaw:
# ~/.openclaw/.env DEEPSEEK_API_KEY="sk-your-key-here" # Or use OpenRouter for multi-model access OPENROUTER_API_KEY="sk-or-your-key-here"
Configuring OpenClaw with OpenRouter
OpenRouter gives you access to 300+ models through a single API key. It adds a 5.5% platform fee but lets you switch models without changing your config. There's even a free-ride skill on ClawHub that configures OpenClaw to use free models from OpenRouter with ranked fallbacks:
# Install the free-ride skill openclaw skill install free-ride # Or manually set OpenRouter OPENROUTER_API_KEY="sk-or-your-key-here" # Then select models like: # meta-llama/llama-4-scout # deepseek/deepseek-chat # qwen/qwen-3.5-9b
Monthly Cost: Cloud API (Casual Personal Use)
Assuming ~100K tokens/day (a few dozen messages, some automations):
| Item | Cost |
|---|---|
| DeepSeek V3.2 tokens (~3M/month) | ~$0.84–1.26 |
| OpenClaw server (AWS t2.micro Free Tier) | $0 |
| Total | ~$1–3/month |
Even heavy users generating 500K tokens/day would spend only $8–15/month with DeepSeek V3.2. Compare that to $60–150/month on Claude Opus 4.6 for the same volume.
5Approach 3: Self-Hosted vLLM on AWS — Full Control
For teams that need data sovereignty, custom fine-tuned models, or predictable costs at high volume, self-hosting the LLM on an AWS GPU instance with vLLM is the way to go. vLLM's V1 engine delivers up to 111% higher throughput compared to V0 for smaller models at high concurrency.
AWS GPU Instance Options
| Instance | GPU | VRAM | On-Demand $/hr | Monthly (24/7) | Best For |
|---|---|---|---|---|---|
| g6.xlarge | 1x L4 | 24GB | $0.805 | ~$588 | Qwen 3.5 9B, small models |
| g5.xlarge | 1x A10G | 24GB | $1.006 | ~$735 | Llama 4 Scout Q4 |
| g5.2xlarge | 1x A10G | 24GB | $1.212 | ~$885 | Scout with more CPU/RAM |
| p5.48xlarge | 8x H100 | 640GB | ~$98.00 | ~$71,540 | Maverick, large models (team/enterprise) |
💰 Cost Optimization Tip
Use Spot Instances for up to 90% savings on GPU instances. A g5.xlarge Spot Instance often runs at $0.30–0.40/hr instead of $1.006 on-demand — bringing the monthly cost down to ~$220–290. Spot works well for personal agents since brief interruptions are tolerable. Pair with an auto-restart script.
vLLM Setup on EC2
# Launch a g5.xlarge with Deep Learning AMI (Ubuntu) # SSH in, then: pip install vllm # Serve Llama 4 Scout vllm serve meta-llama/Llama-4-Scout-17B-16E-Instruct \ --tensor-parallel-size 1 \ --max-model-len 32768 \ --port 8000 # Or serve Qwen 3.5 9B vllm serve Qwen/Qwen3.5-9B-Instruct \ --port 8000
Then point OpenClaw at your vLLM server using the OpenAI-compatible endpoint:
# ~/.openclaw/.env OPENAI_API_KEY="not-needed" OPENAI_BASE_URL="http://<your-ec2-ip>:8000/v1" # Set the model name in OpenClaw config to match vLLM
🎤 AWS re:Invent 2025 Update
At re:Invent 2025, AWS announced the SageMaker Large Model Inference container v15 with vLLM V1 engine support, delivering up to 111% higher throughput for smaller models. AWS also launched Graviton5 processors (192 cores, 25% higher performance) and Trainium3 UltraServers for AI workloads. For managed deployments, Amazon Bedrock now offers nearly 100 serverless models including 18 new open-weight models from Qwen, Mistral, and others.
6Deploying OpenClaw on AWS (Server Setup)
Regardless of which LLM approach you choose, you need a server to run OpenClaw itself. The agent process is lightweight — it just needs Node.js and a network connection to your messaging apps (WhatsApp, Telegram, etc.) and your LLM provider.
Recommended EC2 Setup
# Instance: t3.small (2 vCPU, 2GB RAM) — ~$15/month # Or t2.micro for AWS Free Tier — $0/month (first 12 months) # AMI: Ubuntu Server 24.04 LTS # Storage: 8GB gp3 # SSH in and install OpenClaw curl -fsSL https://openclaw.ai/install.sh | bash # Run the onboarding wizard openclaw onboard # Select your LLM provider: # - Ollama (if running locally or on same instance) # - DeepSeek (cheapest cloud API) # - OpenRouter (multi-model access) # - Custom OpenAI-compatible (for self-hosted vLLM) # Connect your messaging app (Telegram, WhatsApp, etc.) # OpenClaw will guide you through the setup # Keep it running with systemd or pm2 pm2 start openclaw --name "openclaw-agent" pm2 save pm2 startup
Security Essentials
- Security Group: Allow SSH (22) from your IP only. Allow HTTPS (443) if using webhooks. Block everything else.
- IAM Role: Don't store AWS credentials on the instance. Use an IAM instance profile with minimal permissions.
- Updates: Enable unattended-upgrades for security patches.
- Isolation: Run OpenClaw in a Docker container to limit blast radius if the agent executes unexpected commands.
🔒 Security Note
OpenClaw runs arbitrary shell commands by design — that's its power. But giving an AI agent shell access to a server is a risk. A dedicated EC2 instance limits the blast radius. If something goes wrong, you nuke the instance and spin up a new one. Never run OpenClaw on a server with production databases or sensitive credentials.
7Smart Model Routing: Mix Cheap + Premium
The smartest cost strategy isn't picking one model — it's routing different tasks to different models. OpenClaw's model-router skill on ClawHub automatically sends simple tasks (quick answers, message drafts) to cheap models and complex tasks (multi-step reasoning, code generation) to premium ones.
# Install the model-router skill openclaw skill install model-router # Example routing config: # Simple tasks → DeepSeek V3.2 ($0.28/M input) # Complex tasks → Claude 3.5 Sonnet ($3/M input) # Coding tasks → Llama 4 Scout ($0.10/M input)
Third-party tools like ClawPane sit between OpenClaw and your model providers, routing each request to the cheapest model that meets a quality threshold — no per-agent configuration needed.
A typical routing setup might look like:
- 80% of requests → DeepSeek V3.2 or Qwen 3.5 (~$0.28–0.42/M tokens)
- 15% of requests → Llama 4 Scout (~$0.10–0.25/M tokens)
- 5% of requests → Claude Sonnet or GPT-4o for hard reasoning tasks
This hybrid approach can cut your monthly bill by 60–80% compared to using a frontier model for everything, while maintaining quality where it matters.
8Full Monthly Cost Comparison Table
Here's the complete picture. We assume a personal agent with moderate daily use (~100K tokens/day, ~3M tokens/month):
| Approach | Server Cost | LLM Cost | Total/Month |
|---|---|---|---|
| Local Mac + Ollama | $0 (your machine) | $0 | ~$5–15 (electricity) |
| AWS Free Tier + DeepSeek V3.2 | $0 (t2.micro) | ~$2–5 | ~$2–5 |
| AWS t3.small + DeepSeek V3.2 | ~$15 | ~$2–5 | ~$17–20 |
| AWS t3.small + OpenRouter (Llama 4) | ~$15 | ~$2–5 | ~$17–20 |
| AWS t3.small + Model Routing (hybrid) | ~$15 | ~$3–8 | ~$18–23 |
| AWS g6.xlarge + vLLM (Qwen 3.5 9B) | ~$588 | $0 | ~$588 |
| AWS g5.xlarge Spot + vLLM (Scout) | ~$220–290 | $0 | ~$220–290 |
| Baseline: Claude Opus 4.6 API | ~$15 | ~$45–135 | ~$60–150 |
🏆 Best Value
For most personal users, AWS Free Tier + DeepSeek V3.2 API at $2–5/month is unbeatable. You get a 24/7 AI agent with solid reasoning capabilities for less than a cup of coffee. If you need better quality, add model routing to send 5% of hard tasks to Claude — still under $10/month total.
9AWS Bedrock: Managed Open-Weight Models
If you prefer a fully managed AWS-native approach, Amazon Bedrock now offers nearly 100 serverless models including open-weight options from Meta (Llama), Mistral, and Qwen. At re:Invent 2025, AWS added 18 new fully managed open-weight models to Bedrock.
Bedrock pricing for open-weight models is higher than direct API providers (you're paying for the managed infrastructure), but it integrates natively with IAM, VPC, CloudWatch, and other AWS services. This makes it a good fit for enterprise teams that need governance and compliance.
To use Bedrock with OpenClaw, you'd configure the OpenAI-compatible endpoint via the Bedrock Runtime API or use a proxy layer. Bedrock also supports Custom Model Import, letting you bring your own fine-tuned open-weight models.
📺 Recommended re:Invent Session
Deep dive into Amazon Bedrock's expanded model catalog, including open-weight model deployment, custom model import, and reinforcement fine-tuning capabilities announced at re:Invent 2025.
Search re:Invent 2025 Bedrock Sessions on YouTube →10Architecture Diagram
Here's how the pieces fit together across all three approaches:
11Why Lushbinary for OpenClaw Deployment
We've deployed OpenClaw for clients across multiple AWS architectures — from Free Tier personal agents to multi-GPU enterprise setups with custom fine-tuned models. Our team handles:
- Architecture design: Choosing the right LLM approach (local, API, self-hosted) based on your usage patterns and budget
- AWS infrastructure: EC2, VPC, security groups, IAM roles, and cost optimization with Spot Instances and Savings Plans
- Model routing: Setting up intelligent routing to minimize costs while maintaining quality for critical tasks
- Custom skills: Building OpenClaw skills tailored to your business workflows
- Security hardening: Docker isolation, network policies, and monitoring for production deployments
🚀 Free Consultation
Want to run OpenClaw with open-source models on AWS but not sure which approach fits your needs? Book a free 30-minute call with our team. We'll review your use case and recommend the most cost-effective setup.
❓ Frequently Asked Questions
Can you run OpenClaw with open-source LLMs for free?
Yes. OpenClaw supports Ollama as a local LLM provider, letting you run models like Llama 4 Scout, DeepSeek V3.2, or Qwen 3.5 on your own hardware with zero API costs. You need at least 8GB RAM for small models or a GPU with 24GB VRAM for larger ones.
What is the cheapest way to run OpenClaw with a cloud LLM API?
DeepSeek V3.2 via the DeepSeek API is the cheapest high-quality option at $0.28 per million input tokens and $0.42 per million output tokens. For casual personal use (~100K tokens/day), that works out to roughly $2–5 per month.
How much does it cost to run OpenClaw on AWS per month?
The OpenClaw server itself runs on a t3.small EC2 instance for about $15/month (or free on the AWS Free Tier with t2.micro). Token costs depend on your LLM choice: $0/month with local Ollama, $2–5/month with DeepSeek V3.2 API, or $588–735/month if self-hosting a GPU instance with vLLM.
Which open-source LLM is best for OpenClaw in 2026?
For the best balance of quality and cost, Llama 4 Scout (17B active parameters, 10M context) is the top pick — it fits on a single GPU and outperforms many larger models. DeepSeek V3.2 offers the cheapest API pricing. Qwen 3.5 9B is ideal for CPU-only local setups.
Does OpenClaw support model routing to use cheap models for simple tasks?
Yes. OpenClaw has a model-router skill on ClawHub that automatically routes simple tasks to cheap models and complex tasks to premium models. You can also configure OpenRouter as a provider to access 300+ models and manually set routing rules.
📚 Sources
- OpenClaw GitHub Repository — v2026.3.8, 250K+ stars
- DeepSeek API Pricing — V3.2: $0.28/M input (cache miss), $0.42/M output (as of March 2026)
- Llama 4 Scout on Hugging Face — 109B total, 17B active, 10M context
- AWS EC2 On-Demand Pricing — g5.xlarge: $1.006/hr, g6.xlarge: $0.805/hr (us-east-1)
- Amazon Bedrock Open-Weight Models — 18 new models added at re:Invent 2025
- Ollama — Local LLM runtime
Content was rephrased for compliance with licensing restrictions. Pricing data sourced from official vendor pages as of March 2026. Prices may change — always verify on the vendor's website.
Deploy OpenClaw on AWS — The Right Way
Whether you need a $3/month personal agent or a production-grade enterprise deployment with custom models, we'll architect the most cost-effective solution for your use case.
Build Smarter, Launch Faster.
Book a free strategy call and explore how LushBinary can turn your vision into reality.
