Insights on AI, Cloud
& Modern Engineering
We write about AI agents, cloud architecture, cost optimization, and the tools we use every day to build software.
Cursor Composer 2.5 Developer Guide: Benchmarks, Pricing & What's New in May 2026
Cursor's Composer 2.5 shipped May 18, 2026. Built on Kimi K2.5 with 25x more synthetic training tasks, it scores 79.8% on SWE-Bench Multilingual and 63.2% on CursorBench v3.1, matching Opus 4.7 and GPT-5.5 at 1/10th the cost. Full breakdown of training, pricing tiers, behavioral improvements, and how to wire it into Cursor and the SDK.
Composer 2.5 vs Claude Opus 4.7 vs GPT-5.5: The Real Coding Model Comparison
Composer 2.5 ties Opus 4.7 and GPT-5.5 on SWE-Bench Multilingual (79.8%) and CursorBench v3.1 (63.2%) at one tenth the cost. GPT-5.5 still leads Terminal-Bench 2.0 by 13 points. Full benchmark, pricing, and harness comparison plus a per-task decision framework.
Composer 2 to Composer 2.5 Migration Guide: Phased Rollout, Eval Harness & Hooks Audit
Composer 2.5 is a drop-in upgrade with real behavior changes. SWE-Bench jumps from 73.7% to 79.8%, Fast tier pricing doubled, and effort calibration changes how the agent runs. Phased rollout plan, pre-migration audit, eval harness setup, and rollback plan for production teams.
Long-Horizon Agents with Composer 2.5 & Cursor SDK: Production Patterns, Guardrails, Cost Modeling
A 2-hour Composer 2.5 agent run costs $2.20 vs $66 on Opus 4.7 for the same workload. Full guide to building long-horizon agents with the Cursor SDK: architecture, guardrails, sandboxing, observability, and a complete CI auto-fix example.
Composer 2.5 Cost Optimization: Fast vs Standard Tier and How to Cut Spend by 50-70%
Composer 2.5 Fast and Standard run the same model with the same intelligence. Fast costs 6x more per token. Most teams default to Fast and pay for it. Tactical playbook for routing background work to Standard with hooks, worked cost examples, and team-level controls.
TanStack npm Supply Chain Attack: How Cache Poisoning Compromised 42 Packages in 6 Minutes
On May 11, 2026, an attacker published 84 malicious versions across 42 @tanstack/* packages by chaining pull_request_target abuse, GitHub Actions cache poisoning, and OIDC token extraction from runner memory. Part of the Mini Shai-Hulud campaign that hit 170+ packages across npm and PyPI.
Hermes Agent v0.12 Curator Release: Complete Upgrade Guide & New Features
Nous Research shipped Hermes Agent v0.12.0 with an autonomous Curator that maintains your skill library, a rubric-based self-improvement rewrite, 4 new providers, 19 messaging platforms, and a 57% faster cold start. 1,096 commits from 213 contributors.
Multi-Agent AI Orchestration Patterns: Supervisor, Swarm, Pipeline & Router Production Guide
Four production-proven patterns for coordinating AI agent fleets. Gartner says 40% of enterprise apps embed agents by end of 2026. We cover architecture, trade-offs, and implementation with Hermes Agent, LangGraph, and Kimi K2.6.
Best Open-Source LLMs for AI Agents in May 2026: DeepSeek V4 vs Kimi K2.6 vs Qwen 3.6 vs GLM 5.1
Four open-source LLMs dominate AI agent workloads in May 2026. We compare DeepSeek V4 (1.6T params, 1M context, MIT), Kimi K2.6 (58.6% SWE-Bench Pro, 300 sub-agents), Qwen 3.6 (37.0 MCPMark tool calling), and GLM 5.1 (58.4% SWE-Bench Pro, MIT) on pricing, hardware, and Hermes Agent compatibility.
AI Agent Prompt Injection Defense: The 2026 Production Security Playbook
OWASP ranks prompt injection as #1 LLM vulnerability, affecting 73% of production deployments. We cover the Gemini CLI CVSS-10 supply chain attack, OpenAI April 2026 defense guide, and 10 defense layers from input validation to human-in-the-loop.
Hermes Agent vs OpenClaw in May 2026: The Definitive Comparison After v0.12 & ClawHub
Fresh comparison after Hermes v0.12 Curator release (110K stars in 10 weeks, self-improving skills, 19 platforms) and OpenClaw ClawHub migration (345K+ stars, 19.9T tokens, plugin marketplace). Decision framework: compounding intelligence vs largest ecosystem.
AI Engineering Transformation: How to Restructure Your Team Without Breaking Shipping Velocity
DORA data shows AI raised PR output 98% but increased incidents 242%. DX Research found median gains of only 7.76% despite 65% more AI usage. Here is the framework for restructuring engineering teams around AI that actually works.
Ship Better Engineering, Every Week
Practical writing on AI agents, cloud architecture, and product teardowns. Read by builders at startups and Fortune 500s.
- New deep-dives on AI agents and cloud architecture
- Engineering teardowns of shipped products
- No spam, unsubscribe in one click
We respect your inbox. Read our privacy policy.
