Back to Blog
AI & AutomationApril 6, 202614 min read

Gemma 4 + MCP + AWS: Build Self-Hosted Agentic AI with Function Calling & Tool Use

Complete guide to connecting Gemma 4's native function calling to MCP servers on AWS. Covers the gemma-mcp package, custom MCP server development, EC2/SageMaker/Bedrock AgentCore deployment, multi-tool agent architecture, and production cost optimization.

Lushbinary Team

Lushbinary Team

AI & Cloud Solutions

Gemma 4 + MCP + AWS: Build Self-Hosted Agentic AI with Function Calling & Tool Use

Building AI agents that can discover and use external tools is one of the most powerful patterns in modern software. But most implementations lock you into proprietary APIs with unpredictable pricing and rate limits. What if you could run the entire stack yourself β€” an open-weight model with native function calling, connected to any tool via the Model Context Protocol (MCP), deployed on AWS infrastructure you control?

That's exactly what Gemma 4 enables. Released April 2, 2026 under Apache 2.0, Gemma 4 ships with 6 dedicated control tokens for function calling, configurable thinking modes, and 256K context windows. Combined with MCP's standardized tool protocol and AWS's GPU infrastructure, you get a production-grade agentic AI stack with zero vendor lock-in.

This guide walks through the complete architecture: how Gemma 4's function calling maps to MCP, building MCP servers that Gemma 4 can use, deploying the full stack on AWS (EC2, SageMaker, Bedrock AgentCore), and production patterns for multi-tool agentic workflows.

πŸ“‹ Table of Contents

  1. 1.Why Gemma 4 + MCP + AWS
  2. 2.Gemma 4's Function Calling ↔ MCP Mapping
  3. 3.The gemma-mcp Python Package
  4. 4.Building Custom MCP Servers for Gemma 4
  5. 5.Deploying Gemma 4 on AWS for MCP Workloads
  6. 6.AWS Bedrock AgentCore & MCP Gateway
  7. 7.Multi-Tool Agent Architecture
  8. 8.Production Patterns & Cost Optimization
  9. 9.Security & Guardrails
  10. 10.Limitations & Workarounds
  11. 11.Why Lushbinary for Gemma 4 + MCP on AWS

1Why Gemma 4 + MCP + AWS

Three technologies converge to create the most flexible agentic AI stack available in 2026:

Gemma 4

Open-weight model with native function calling (6 control tokens), thinking modes, 256K context, Apache 2.0 license. The 26B MoE activates only 3.8B parameters per token.

MCP

Open standard (Anthropic, now Linux Foundation) for connecting AI to tools via JSON-RPC 2.0. 13,000+ servers, 97M+ SDK downloads. Supported by Claude, Cursor, Kiro, VS Code.

AWS

GPU instances (g6, p5), SageMaker managed endpoints, Bedrock AgentCore with native MCP Gateway, and Inferentia2 chips for cost-efficient inference.

The key insight: Gemma 4's function calling tokens map directly to MCP's tool protocol. When you serve Gemma 4 via an OpenAI-compatible API (vLLM or llama.cpp), any MCP client can use it as the inference backend. You get the same tool-use capabilities as Claude or GPT, but running on your own infrastructure.

Cost comparison

Claude Opus 4.6 API: $15/M input, $75/M output tokens. GPT-5.4: $2.50/M input, $15/M output. Gemma 4 26B MoE self-hosted on AWS g6.2xlarge: ~$0.98/hr flat, unlimited tokens. For high-volume agent workloads processing 10M+ tokens/day, self-hosting can cut costs by 80-95%.

2Gemma 4's Function Calling ↔ MCP Mapping

Gemma 4 uses 6 special tokens for its tool-use lifecycle. These map cleanly to MCP's three primitives (tools, resources, prompts). Here's how the two protocols align:

Gemma 4 TokenPurposeMCP Equivalent
<|tool> / <tool|>Define a tooltools/list response
<|tool_call> / <tool_call|>Model requests tool usetools/call request
<|tool_response> / <tool_response|>Return tool resulttools/call response

The translation layer is straightforward. When an MCP client sends a tools/list request, you convert each tool definition into Gemma 4's <|tool> format and inject it into the system prompt. When Gemma 4 emits a <|tool_call>, you parse the function name and arguments, execute the MCP tools/call, and feed the result back as a <|tool_response>.

Gemma 4 Tool Definition Format

<|turn>system
<|think|>You are a helpful assistant.
<|tool>declaration:get_weather{
  description:<|"|>Get current weather for a location<|"|>,
  parameters:{
    location:{type:<|"|>string<|"|>,required:true},
    units:{type:<|"|>string<|"|>,default:<|"|>celsius<|"|>}
  }
}<tool|>
<|tool>declaration:query_database{
  description:<|"|>Run a SQL query against the analytics DB<|"|>,
  parameters:{
    query:{type:<|"|>string<|"|>,required:true}
  }
}<tool|><turn|>

Note the <|"|> delimiter token β€” this is Gemma 4's way of escaping string values so special characters inside strings don't break the structured format. Every string literal in tool declarations, calls, and responses must use this delimiter.

Tool Call β†’ MCP Execution β†’ Response

# Gemma 4 emits:
<|tool_call>call:get_weather{location:<|"|>London<|"|>}<tool_call|>

# Your middleware:
# 1. Parse function name: "get_weather"
# 2. Parse args: {"location": "London"}
# 3. Execute MCP tools/call
# 4. Inject response:

<|tool_response>result:get_weather{
  temperature:<|"|>18Β°C<|"|>,
  condition:<|"|>partly cloudy<|"|>
}<tool_response|>

3The gemma-mcp Python Package

The gemma-mcp package is the fastest way to connect Gemma models to MCP servers. It handles tool discovery, registration, and the function calling loop automatically.

Installation & Setup

# Install with uv (recommended) or pip
uv add gemma-mcp
# or
pip install gemma-mcp

# Requirements: Python 3.10+, google-genai SDK, FastMCP

Connecting to MCP Servers

from gemma_mcp import GemmaMCPClient

mcp_config = {
    "mcpServers": {
        "weather": {
            "url": "https://weather-api.example.com/mcp"
        },
        "database": {
            "command": "python",
            "args": ["./db_server.py"]
        }
    }
}

async with GemmaMCPClient(
    model="gemma-4-27b-it",  # or gemma-4-4b-it for lighter workloads
    mcp_config=mcp_config,
    temperature=0.3  # lower for deterministic tool calls
).managed() as client:
    response = await client.chat(
        "What's the weather in Tokyo and how many users signed up today?",
        execute_functions=True  # auto-execute tool calls
    )
    print(response)

Key features of gemma-mcp:

  • Automatic tool discovery β€” connects to all configured MCP servers and registers their tools with Gemma
  • Both transports β€” supports SSE (HTTP) and stdio MCP servers
  • Local + remote tools β€” mix Python functions with MCP server tools in the same conversation
  • Async context management β€” proper resource cleanup with async with
  • Multi-server support β€” connect to multiple MCP servers simultaneously

Adding Local Functions

# Add a local Python function alongside MCP tools
async def calculate_cost(
    instance_type: str,
    hours: int,
    region: str = "us-east-1"
) -> dict:
    """Calculate AWS EC2 cost for a given instance and duration."""
    prices = {"g6.xlarge": 0.80, "g6.2xlarge": 0.98, "p5.xlarge": 3.22}
    hourly = prices.get(instance_type, 0)
    return {"total_cost": hourly * hours, "hourly_rate": hourly}

client.add_function(calculate_cost)

# Gemma 4 can now call both MCP tools AND local functions
response = await client.chat(
    "How much would it cost to run a g6.2xlarge for 720 hours?",
    execute_functions=True
)

4Building Custom MCP Servers for Gemma 4

While gemma-mcp connects Gemma to existing MCP servers, you'll often need to build custom servers that expose your own APIs, databases, or internal tools. The MCP Python SDK (requires Python 3.10+) and FastMCP make this straightforward.

Example: AWS Resource MCP Server

# aws_mcp_server.py
from fastmcp import FastMCP
import boto3

mcp = FastMCP("AWS Resources")

@mcp.tool()
def list_ec2_instances(region: str = "us-east-1") -> list[dict]:
    """List all EC2 instances in a region with their status."""
    ec2 = boto3.client("ec2", region_name=region)
    response = ec2.describe_instances()
    instances = []
    for reservation in response["Reservations"]:
        for inst in reservation["Instances"]:
            instances.append({
                "id": inst["InstanceId"],
                "type": inst["InstanceType"],
                "state": inst["State"]["Name"],
                "launch_time": str(inst.get("LaunchTime", ""))
            })
    return instances

@mcp.tool()
def get_cloudwatch_metric(
    instance_id: str,
    metric: str = "CPUUtilization",
    period_hours: int = 1
) -> dict:
    """Get a CloudWatch metric for an EC2 instance."""
    cw = boto3.client("cloudwatch", region_name="us-east-1")
    from datetime import datetime, timedelta
    response = cw.get_metric_statistics(
        Namespace="AWS/EC2",
        MetricName=metric,
        Dimensions=[{"Name": "InstanceId", "Value": instance_id}],
        StartTime=datetime.utcnow() - timedelta(hours=period_hours),
        EndTime=datetime.utcnow(),
        Period=300,
        Statistics=["Average"]
    )
    return {
        "metric": metric,
        "datapoints": response.get("Datapoints", [])
    }

@mcp.tool()
def estimate_monthly_cost(
    instance_type: str,
    hours_per_day: int = 24
) -> dict:
    """Estimate monthly EC2 cost for an instance type."""
    prices = {
        "g6.xlarge": 0.80, "g6.2xlarge": 0.98,
        "g5.xlarge": 1.006, "p5.xlarge": 3.22,
        "inf2.xlarge": 0.76, "t3.small": 0.0208
    }
    hourly = prices.get(instance_type, 0)
    monthly = hourly * hours_per_day * 30
    return {
        "instance_type": instance_type,
        "hourly_rate": hourly,
        "monthly_estimate": round(monthly, 2)
    }

if __name__ == "__main__":
    mcp.run()  # starts stdio transport by default

Connecting Gemma 4 to Your Custom Server

from gemma_mcp import GemmaMCPClient

config = {
    "mcpServers": {
        "aws-resources": {
            "command": "python",
            "args": ["./aws_mcp_server.py"]
        }
    }
}

async with GemmaMCPClient(
    model="gemma-4-27b-it",
    mcp_config=config
).managed() as client:
    response = await client.chat(
        "List all EC2 instances in us-west-2 and estimate "
        "the monthly cost for each instance type",
        execute_functions=True
    )
    print(response)

5Deploying Gemma 4 on AWS for MCP Workloads

For production MCP agent workloads, you need Gemma 4 running on AWS with an OpenAI-compatible API. Three deployment paths, each with different cost and complexity tradeoffs:

ApproachInstanceCost/hrBest For
EC2 + vLLMg6.2xlarge (L4 24GB)~$0.98Full control, custom configs
SageMaker Endpointml.g6.2xlarge~$1.21Managed scaling, monitoring
Inferentia2inf2.xlarge~$0.76Cost-optimized inference

Option A: EC2 + vLLM (Recommended for MCP)

vLLM provides an OpenAI-compatible API out of the box, which is exactly what MCP middleware needs. Here's the setup for the 26B MoE model:

# Launch EC2 g6.2xlarge with Deep Learning AMI (Ubuntu)
# SSH in, then:

# Install vLLM
pip install vllm

# Serve Gemma 4 26B MoE with OpenAI-compatible API
vllm serve google/gemma-4-26b-a4b-it \
  --host 0.0.0.0 \
  --port 8000 \
  --max-model-len 32768 \
  --tensor-parallel-size 1 \
  --gpu-memory-utilization 0.90 \
  --enable-auto-tool-choice \
  --tool-call-parser hermes

# Test the endpoint
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemma-4-26b-a4b-it",
    "messages": [{"role": "user", "content": "Hello"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string"}
          },
          "required": ["location"]
        }
      }
    }]
  }'

⚠️ VRAM Requirements

The 26B MoE model needs ~16GB VRAM with Q4 quantization or ~24GB at FP16. A single L4 GPU (g6.2xlarge) handles Q4 comfortably. The 31B Dense model requires ~24GB+ VRAM at Q4 or ~60GB at FP16 β€” use a g6e.2xlarge (L40S 48GB) for Q4 or a multi-GPU setup for FP16.

Option B: SageMaker Managed Endpoint

import sagemaker
from sagemaker.huggingface import HuggingFaceModel

role = sagemaker.get_execution_role()

model = HuggingFaceModel(
    model_data="s3://your-bucket/gemma-4-26b-a4b-it/",
    role=role,
    transformers_version="4.51",
    pytorch_version="2.6",
    py_version="py312",
    env={
        "HF_MODEL_ID": "google/gemma-4-26b-a4b-it",
        "SM_NUM_GPUS": "1",
        "MAX_INPUT_LENGTH": "32768",
        "MAX_TOTAL_TOKENS": "65536"
    }
)

predictor = model.deploy(
    initial_instance_count=1,
    instance_type="ml.g6.2xlarge",
    endpoint_name="gemma4-mcp-endpoint"
)

SageMaker adds auto-scaling, CloudWatch monitoring, and A/B testing out of the box. The tradeoff is ~23% higher hourly cost ($1.21 vs $0.98) and less control over the serving configuration. For MCP workloads where you need custom vLLM flags like --enable-auto-tool-choice, EC2 is usually the better path.

6AWS Bedrock AgentCore & MCP Gateway

AWS has gone all-in on MCP. At re:Invent 2025 and throughout early 2026, they shipped several MCP-native services that integrate directly with Gemma 4 deployments:

🎀 AWS re:Invent 2025 Update

AWS announced Bedrock AgentCore at re:Invent 2025, providing managed infrastructure for deploying AI agents with built-in MCP support. AgentCore Gateway can route MCP requests to custom endpoints, including self-hosted Gemma 4 instances. They also added 18 new open-weight models to Bedrock (from Google, Mistral, OpenAI, Qwen, and others), bringing the total to nearly 100 serverless models.

AgentCore Gateway + MCP

AgentCore Gateway acts as a single control plane for routing, authentication, and tool management across MCP servers. Key capabilities:

  • MCP proxy for API Gateway β€” transform existing REST APIs into MCP-compatible endpoints without rewriting code (launched December 2025)
  • MCP server deployment in AgentCore Runtime β€” deploy MCP servers as managed services with automatic session management
  • Custom endpoint routing β€” route MCP requests to your self-hosted Gemma 4 vLLM endpoint on EC2
  • Built-in auth β€” OAuth 2.1 and IAM-based authentication for MCP connections

Architecture: Gemma 4 + AgentCore Gateway

MCP ClientsClaude DesktopCursor / KiroCustom AppAWS Bedrock AgentCore Gateway (MCP Router)Gemma 4 26B MoE (vLLM)EC2 g6.2xlarge Β· OpenAI APIMCP ServersCustom tools Β· AWS APIsAWS Services (via MCP Tools)S3 / DynamoDBCloudWatchRDS / AuroraLambdaGemma 4 reasons β†’ calls MCP tools β†’ tools execute against AWS services β†’ results flow back

πŸ“Ί Recommended re:Invent Session

"Modernize containers for AI agents using AgentCore Gateway" covers how to expose existing Kubernetes microservices to AI agents via MCP without rewriting application code.

Search re:Invent Sessions on YouTube β†’

7Multi-Tool Agent Architecture

Real-world agents need multiple tools. Here's a complete architecture for a DevOps agent that uses Gemma 4 with multiple MCP servers to monitor infrastructure, query logs, and take remediation actions:

# devops_agent.py β€” Multi-tool agent with Gemma 4 + MCP
from gemma_mcp import GemmaMCPClient

MCP_CONFIG = {
    "mcpServers": {
        # AWS infrastructure tools
        "aws-infra": {
            "command": "python",
            "args": ["./servers/aws_mcp_server.py"]
        },
        # Log analysis tools
        "logs": {
            "command": "python",
            "args": ["./servers/cloudwatch_logs_server.py"]
        },
        # PagerDuty integration
        "pagerduty": {
            "url": "https://mcp.pagerduty.example.com/sse"
        },
        # GitHub for PR creation
        "github": {
            "command": "npx",
            "args": ["-y", "@modelcontextprotocol/server-github"]
        }
    }
}

SYSTEM_PROMPT = """You are a DevOps agent. When investigating issues:
1. Check CloudWatch metrics first
2. Query relevant logs
3. Identify root cause
4. Propose and execute remediation
5. Update the PagerDuty incident
Always explain your reasoning before taking action."""

async def run_devops_agent():
    async with GemmaMCPClient(
        model="gemma-4-27b-it",
        mcp_config=MCP_CONFIG,
        system_prompt=SYSTEM_PROMPT,
        temperature=0.2
    ).managed() as client:
        response = await client.chat(
            "CPU on prod-api-3 has been above 95% for 20 minutes. "
            "Investigate and fix.",
            execute_functions=True
        )
        print(response)

The agent flow for this scenario:

  1. Gemma 4 activates thinking mode to plan the investigation
  2. Calls get_cloudwatch_metric (aws-infra MCP) to confirm CPU spike
  3. Calls search_logs (logs MCP) to find error patterns
  4. Identifies a memory leak from a recent deployment
  5. Calls create_pull_request (github MCP) with a fix
  6. Calls update_incident (pagerduty MCP) with root cause and remediation status

Thinking mode is critical for multi-tool agents

Enable thinking with <|think|> in the system prompt. Gemma 4 will reason through which tools to call and in what order before executing. This dramatically reduces hallucinated tool calls and improves multi-step accuracy. The thinking output appears in <|channel>thought...<channel|> blocks that you can log for debugging.

8Production Patterns & Cost Optimization

Model Routing for Cost Efficiency

Not every MCP tool call needs the full 31B Dense model. Use a routing layer to match request complexity to model size:

Task ComplexityModelAWS InstanceMonthly Cost (24/7)
Simple lookups, single toolGemma 4 E4Bg6.xlarge~$580
Multi-tool, moderate reasoningGemma 4 26B MoEg6.2xlarge~$706
Complex multi-step, planningGemma 4 31B Denseg6e.2xlarge (L40S 48GB)~$1,614

Spot Instances for Non-Critical Workloads

For development, testing, and batch agent workloads, EC2 Spot Instances can cut costs by 70-90%. The g6 family typically sees 60-75% savings in us-east-1:

# Launch Spot Instance for Gemma 4 MCP workloads
aws ec2 run-instances \
  --instance-type g6.2xlarge \
  --image-id ami-0abcdef1234567890 \  # Deep Learning AMI
  --instance-market-options '{"MarketType":"spot","SpotOptions":{"SpotInstanceType":"persistent","InstanceInterruptionBehavior":"stop"}}' \
  --block-device-mappings '[{"DeviceName":"/dev/sda1","Ebs":{"VolumeSize":100,"VolumeType":"gp3"}}]'

# Estimated Spot price: ~$0.29/hr (vs $0.98 On-Demand)
# Monthly savings: ~$497/month

Auto-Scaling MCP Endpoints

For variable workloads, use SageMaker auto-scaling or an EC2 Auto Scaling Group behind an ALB:

  • Scale-to-zero β€” use SageMaker Serverless Inference for sporadic MCP workloads (cold start ~60s)
  • Scheduled scaling β€” scale up during business hours, scale down at night (saves ~50%)
  • Request-based scaling β€” use CloudWatch metrics on vLLM's /metrics endpoint to trigger scaling

🎀 AWS re:Invent 2025 Update

AWS launched Graviton5 processors with 192 cores and 25% higher performance than Graviton4. While Graviton5 is CPU-only (no GPU), the new M9g instances are ideal for running MCP server middleware, API gateways, and orchestration layers at lower cost. Pair a Graviton5 instance for your MCP servers with a GPU instance for Gemma 4 inference.

9Security & Guardrails

Running an open-weight model with tool access to your AWS infrastructure requires careful security design. Here are the non-negotiable guardrails:

Network Isolation

  • Run Gemma 4 vLLM in a private subnet β€” no public IP, no internet access
  • MCP servers in the same VPC, communicating over private IPs
  • Use VPC endpoints for S3, DynamoDB, CloudWatch (free for Gateway endpoints, avoids NAT Gateway costs)
  • ALB or API Gateway as the only public-facing entry point, with WAF rules

IAM Least Privilege

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeInstances",
        "cloudwatch:GetMetricStatistics",
        "logs:FilterLogEvents"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "aws:RequestedRegion": "us-east-1"
        }
      }
    }
  ]
}
// Read-only by default. Write actions (ec2:StopInstances,
// ec2:StartInstances) require explicit human approval.

Tool Execution Guardrails

  • Allowlist tools β€” only register specific MCP tools with Gemma 4, never expose a wildcard
  • Human-in-the-loop for destructive actions β€” any tool that modifies state (delete, stop, terminate) should require human approval
  • Rate limiting β€” cap tool calls per minute to prevent runaway agent loops
  • Output validation β€” validate Gemma 4's tool call arguments against the schema before execution
  • Audit logging β€” log every tool call, its arguments, and the result to CloudWatch Logs or S3

MCP Authentication

MCP spec 2025-06-18 added OAuth 2.1 support. For AWS deployments, use AgentCore Gateway's built-in IAM authentication for MCP connections. For custom MCP servers, implement token-based auth with short-lived JWTs and rotate credentials via AWS Secrets Manager.

10Limitations & Workarounds

LimitationImpactWorkaround
No native MCP client in Gemma 4Need middleware to translate between Gemma 4 tokens and MCP protocolUse gemma-mcp package or vLLM's OpenAI-compatible API with MCP client libraries
Tool call accuracy varies by model sizeE2B/E4B may hallucinate tool arguments on complex schemasUse 26B MoE or 31B Dense for production. Enable thinking mode. Validate args before execution
No streaming tool callsModel must finish generating the full tool_call block before executionAcceptable for most MCP workloads. Stream the final response after tool execution
Context window consumed by tool definitionsMany tools = less room for conversation historyDynamic tool loading β€” only inject relevant tools per request. Use 256K context (26B/31B)
Cold start on SageMaker Serverless~60s cold start for GPU inferenceKeep a warm instance for latency-sensitive workloads. Use provisioned concurrency

11Why Lushbinary for Gemma 4 + MCP on AWS

We've been building AI agent infrastructure since the early days of MCP and open-weight models. Our team has deployed Gemma 4 on AWS for production workloads, built custom MCP servers for enterprise clients, and optimized inference costs across EC2, SageMaker, and Inferentia.

  • End-to-end architecture β€” from model selection and deployment to MCP server development and AgentCore integration
  • AWS cost optimization β€” we've helped teams cut inference costs by 60-80% through model routing, Spot Instances, and right-sizing
  • Security-first β€” IAM least privilege, VPC isolation, audit logging, and human-in-the-loop guardrails built into every deployment
  • Open-weight expertise β€” deep experience with Gemma 4, Llama 4, Qwen 3.5, and multi-model routing architectures

πŸš€ Free Architecture Consultation

Planning a Gemma 4 + MCP deployment on AWS? Book a free 30-minute call with our AI infrastructure team. We'll review your use case, recommend the right model size and deployment strategy, and estimate your monthly AWS costs. Book now β†’

❓ Frequently Asked Questions

Can Gemma 4 act as an MCP client to call external tools?

Yes. Gemma 4 has native function calling with 6 dedicated control tokens. When served via an OpenAI-compatible API (vLLM, llama.cpp), MCP clients can route tool calls through Gemma 4 as the inference backend.

How do I build an MCP server powered by Gemma 4 on AWS?

Deploy Gemma 4 on an EC2 GPU instance (g6.2xlarge with L4 GPU, ~$0.98/hr) or SageMaker endpoint using vLLM. Then build an MCP server in Python or TypeScript that exposes tools, and connect it to Gemma 4 for inference.

What is the gemma-mcp Python package?

gemma-mcp is an open-source Python package that combines Gemma models with MCP server integration. It supports both local Python functions and remote MCP tools, automatic tool discovery, and async context management. Install with pip install gemma-mcp.

Which Gemma 4 model size is best for MCP tool use on AWS?

The 26B MoE model offers the best cost-to-performance ratio. It activates only 3.8B parameters per token while scoring 82.6% on MMLU Pro. It runs on a single L4 GPU (g6.2xlarge, ~$0.98/hr). The 31B Dense is better for complex multi-step reasoning.

Does AWS Bedrock support Gemma 4 models?

AWS Bedrock added 18 new open-weight models at re:Invent 2025 from providers including Google. You can also deploy Gemma 4 via SageMaker JumpStart or Bedrock Marketplace for managed inference with Bedrock's agent tooling.

πŸ“š Sources

Content was rephrased for compliance with licensing restrictions. Technical specifications sourced from official Google AI and AWS documentation as of April 2026. Pricing and feature availability may change β€” always verify on the vendor's website.

Build Your Gemma 4 + MCP Agent on AWS

From model deployment to custom MCP server development and AWS cost optimization β€” let's architect your agentic AI stack together.

Build Smarter, Launch Faster.

Book a free strategy call and explore how LushBinary can turn your vision into reality.

Contact Us

Sponsored

Gemma 4MCPModel Context ProtocolAWSFunction CallingAI AgentsvLLMSageMakerBedrock AgentCoreOpen-Weight AITool UsePython

Sponsored

ContactUs