mcp features architecture

Code Mode: Let AI Agents Write Scripts, Not Chat

· 5 min read

When an AI agent needs to call 5 MCP tools to answer a question, that’s 5 round-trips between the LLM and the client. Each one means waiting for a new LLM generation, parsing the tool call, executing it, sending the result back, and waiting for the next generation. For complex workflows this adds up fast.

Code Mode flips this around. Instead of calling tools one at a time, the agent writes a JavaScript script that calls all the tools it needs - and VoidLLM executes it in a single request.

How it works

graph TD
  A[LLM generates JS code] --> B[VoidLLM receives script]
  B --> C[WASM sandbox starts]
  C --> D{Execute JS}
  D -->|tool call 1| T1[MCP Server A]
  D -->|tool call 2| T2[MCP Server B]
  D -->|tool call 3| T1
  D -->|tool call 4| T3[MCP Server C]
  T1 -.-> D
  T2 -.-> D
  T3 -.-> D
  D --> E[Return result]
  E --> F[LLM receives result]

  style B fill:#8b5cf6,stroke:#6366f1,color:#fff
  style C fill:#8b5cf6,stroke:#6366f1,color:#fff
  style D fill:#8b5cf6,stroke:#6366f1,color:#fff
  style T1 fill:#1a1a24,stroke:#22c55e,color:#e2e8f0
  style T2 fill:#1a1a24,stroke:#22c55e,color:#e2e8f0
  style T3 fill:#1a1a24,stroke:#22c55e,color:#e2e8f0
One LLM generation produces a script. VoidLLM executes it in a WASM sandbox, calling multiple tools, and returns the final result.

Code Mode builds on top of the MCP Gateway - every tool registered there is available inside the sandbox.

The script runs inside QuickJS compiled to WebAssembly via Wazero - pure Go, no CGO, no system dependencies. The sandbox has no filesystem access, no network access, and a configurable memory limit.

When the JS calls tools.aws.search({query: "..."}), an ES6 Proxy intercepts it and calls a Go host function. Go then makes the actual MCP request to the external server - the WASM sandbox never talks to the network directly.

Three dedicated tools

Code Mode exposes three tools on /api/v1/mcp:

The agent uses list_servers and search_tools to discover what’s available, then writes a script for execute_code.

What the agent writes

The JS runtime provides every MCP tool as an async function. The agent just calls them by name:

// Fetch data from two different MCP servers in one execution
const docs = await aws_search({ query: "IAM best practices" })
const related = await exa_search({ query: docs[0].title })

return { docs: docs.slice(0, 3), related: related.slice(0, 5) }

TypeScript types included

VoidLLM auto-generates TypeScript declarations from tool schemas and injects them at discovery time. The LLM knows the exact parameter types and return shapes before writing a single line of code.

Sandbox limits

Everything is configurable:

A pool of pre-initialized WASM runtimes means there’s no cold start. The first tool call in a script hits the MCP server, not a runtime boot. See the full numbers in Proxy Overhead Benchmarks: How Fast is VoidLLM?.

The idea behind Code Mode

The concept of having AI agents write code instead of calling tools individually was pioneered by Cloudflare. They built it to solve a specific problem: their API has over 2,500 endpoints, and exposing each one as an MCP tool would consume over a million tokens. Their solution - two tools (search and execute) that let the agent query an OpenAPI spec and then write code against it - is elegant for that use case.

We took the same core idea but adapted it for a different problem. VoidLLM isn’t wrapping a single large API - it’s aggregating tools from many different MCP servers. So instead of working against an OpenAPI spec, our agents work against MCP tool schemas exposed as TypeScript declarations. The LLM sees typed function signatures like tools.aws.search(args: { query: string }) and writes code that feels natural - no HTTP paths, no request bodies, no status codes to worry about.

The tradeoffs follow from there: OpenAPI is the right representation when you need full HTTP semantics for a large REST API. Compact tool schemas are the right representation when you’re orchestrating actions across many smaller services and want to keep the context window lean.

Our runtime is also different - QuickJS in WASM instead of V8 isolates. Lighter, no platform dependency, runs anywhere VoidLLM runs. V8 is faster at raw execution, but for tool orchestration scripts the bottleneck is always the MCP server response time, not the JS runtime.

Per-tool blocklist

Not every tool should be available in Code Mode. Admins can block specific tools per server - a tool that’s fine for interactive use might be too dangerous in an automated script. Blocked tools are invisible to Code Mode but still work through the regular MCP proxy.

Related posts