When an AI agent needs to call 5 MCP tools to answer a question, that’s 5 round-trips between the LLM and the client. Each one means waiting for a new LLM generation, parsing the tool call, executing it, sending the result back, and waiting for the next generation. For complex workflows this adds up fast.
Code Mode flips this around. Instead of calling tools one at a time, the agent writes a JavaScript script that calls all the tools it needs - and VoidLLM executes it in a single request.
graph TD
A[LLM generates JS code] --> B[VoidLLM receives script]
B --> C[WASM sandbox starts]
C --> D{Execute JS}
D -->|tool call 1| T1[MCP Server A]
D -->|tool call 2| T2[MCP Server B]
D -->|tool call 3| T1
D -->|tool call 4| T3[MCP Server C]
T1 -.-> D
T2 -.-> D
T3 -.-> D
D --> E[Return result]
E --> F[LLM receives result]
style B fill:#8b5cf6,stroke:#6366f1,color:#fff
style C fill:#8b5cf6,stroke:#6366f1,color:#fff
style D fill:#8b5cf6,stroke:#6366f1,color:#fff
style T1 fill:#1a1a24,stroke:#22c55e,color:#e2e8f0
style T2 fill:#1a1a24,stroke:#22c55e,color:#e2e8f0
style T3 fill:#1a1a24,stroke:#22c55e,color:#e2e8f0 Code Mode builds on top of the MCP Gateway - every tool registered there is available inside the sandbox.
The script runs inside QuickJS compiled to WebAssembly via Wazero - pure Go, no CGO, no system dependencies. The sandbox has no filesystem access, no network access, and a configurable memory limit.
When the JS calls tools.aws.search({query: "..."}), an ES6 Proxy intercepts it and calls a Go host function. Go then makes the actual MCP request to the external server - the WASM sandbox never talks to the network directly.
Code Mode exposes three tools on /api/v1/mcp:
list_servers - shows which MCP servers are available and how many tools each hassearch_tools - finds tools by keyword across all accessible servers, returns names, descriptions, and input schemasexecute_code - runs JavaScript with all MCP tools available as async functionsThe agent uses list_servers and search_tools to discover what’s available, then writes a script for execute_code.
The JS runtime provides every MCP tool as an async function. The agent just calls them by name:
// Fetch data from two different MCP servers in one execution
const docs = await aws_search({ query: "IAM best practices" })
const related = await exa_search({ query: docs[0].title })
return { docs: docs.slice(0, 3), related: related.slice(0, 5) }
ℹTypeScript types included
VoidLLM auto-generates TypeScript declarations from tool schemas and injects them at discovery time. The LLM knows the exact parameter types and return shapes before writing a single line of code.
Everything is configurable:
A pool of pre-initialized WASM runtimes means there’s no cold start. The first tool call in a script hits the MCP server, not a runtime boot. See the full numbers in Proxy Overhead Benchmarks: How Fast is VoidLLM?.
The concept of having AI agents write code instead of calling tools individually was pioneered by Cloudflare. They built it to solve a specific problem: their API has over 2,500 endpoints, and exposing each one as an MCP tool would consume over a million tokens. Their solution - two tools (search and execute) that let the agent query an OpenAPI spec and then write code against it - is elegant for that use case.
We took the same core idea but adapted it for a different problem. VoidLLM isn’t wrapping a single large API - it’s aggregating tools from many different MCP servers. So instead of working against an OpenAPI spec, our agents work against MCP tool schemas exposed as TypeScript declarations. The LLM sees typed function signatures like tools.aws.search(args: { query: string }) and writes code that feels natural - no HTTP paths, no request bodies, no status codes to worry about.
The tradeoffs follow from there: OpenAPI is the right representation when you need full HTTP semantics for a large REST API. Compact tool schemas are the right representation when you’re orchestrating actions across many smaller services and want to keep the context window lean.
Our runtime is also different - QuickJS in WASM instead of V8 isolates. Lighter, no platform dependency, runs anywhere VoidLLM runs. V8 is faster at raw execution, but for tool orchestration scripts the bottleneck is always the MCP server response time, not the JS runtime.
Not every tool should be available in Code Mode. Admins can block specific tools per server - a tool that’s fine for interactive use might be too dangerous in an automated script. Blocked tools are invisible to Code Mode but still work through the regular MCP proxy.
MCP tools advertise inputs but not outputs. We taught Code Mode to learn return types from the first successful call and surface them as TypeScript on the next discovery.
VoidLLM acts as an MCP gateway - proxy, manage, and control access to external MCP servers from one place.
Step-by-step setup for using VoidLLM as your LLM proxy in Cursor and Windsurf, and as an MCP server in Claude Code.
Route LLM requests across multiple deployments with automatic failover, health-aware routing, and four balancing strategies.