When an LLM has a tool that can call five downstream services in one round-trip, you expect it to use that tool. In practice it often doesn’t - it calls each service one at a time, waiting for a full LLM generation between each call. For five tools that’s five generations instead of one. The tool description was technically correct, but it wasn’t driving the behavior.
We ran into this with Code Mode. The execute_code tool lets an LLM write JavaScript that runs in a WASM sandbox and calls multiple MCP tools in a single execution. The old description explained what the tool did. LLMs read it and then called tools individually anyway. We rewrote the description and saw measurable improvement. This post is about the techniques that worked.
Each individual tool call is a complete LLM round-trip: generate the call, execute it, send the result back, wait for the next generation. For agentic workflows with multiple tools this isn’t a minor inefficiency.
With chaining - 1 LLM round-trip:
graph LR L5[LLM writes script] -->|execute_code| SB["sandbox: search + fetch_doc + summarize"] SB -->|all results| L6[LLM final answer] style L5 fill:#1a1a24,stroke:#8b5cf6,color:#e2e8f0 style L6 fill:#1a1a24,stroke:#8b5cf6,color:#e2e8f0 style SB fill:#1a1a24,stroke:#22c55e,color:#e2e8f0
Without chaining - 3 LLM round-trips:
graph LR L1[LLM generates] -->|tool call| T1[search] T1 -->|result| L2[LLM thinks] L2 -->|tool call| T2[fetch_doc] T2 -->|result| L3[LLM thinks] L3 -->|tool call| T3[summarize] T3 -->|result| L4[LLM final answer] style L1 fill:#1a1a24,stroke:#8b5cf6,color:#e2e8f0 style L2 fill:#1a1a24,stroke:#8b5cf6,color:#e2e8f0 style L3 fill:#1a1a24,stroke:#8b5cf6,color:#e2e8f0 style L4 fill:#1a1a24,stroke:#8b5cf6,color:#e2e8f0 style T1 fill:#1a1a24,stroke:#ef4444,color:#e2e8f0 style T2 fill:#1a1a24,stroke:#ef4444,color:#e2e8f0 style T3 fill:#1a1a24,stroke:#ef4444,color:#e2e8f0
The original execute_code description read like documentation:
Execute JavaScript code in a sandboxed WASM runtime with MCP tools available as async functions. Tools are accessible via tools.serverAlias.toolName(args). Use await for tool calls. Return a value to send it back as the result.
Everything in it is accurate. It doesn’t tell the LLM when to use the tool, how to sequence discovery before execution, what to do when tool calls fail, or why chaining matters. An LLM reading this description treats execute_code as one option among several roughly equivalent options.
The new description shipped in v0.0.17 (April 29, 2026) is structured differently. It uses labeled sections that give the LLM distinct signal types: preference, process, rules, and patterns. Each section does a specific job.
The new description opens with:
STRONG PREFERENCE: Whenever a task requires more than one downstream MCP tool call, use execute_code. Two or more downstream calls = use execute_code. One downstream call = call it directly.
LLMs assign higher weight to all-caps text than surrounding prose. This isn’t a documented behavior in any model card, but it’s reproducible - the same instruction in sentence case gets followed less reliably than in caps. The label STRONG PREFERENCE is explicit about the nature of the instruction before the instruction itself arrives. WORKFLOW, RULES, PATTERNS, and NOTE follow the same pattern throughout the description.
The label tells the LLM how to classify what it’s about to read. That classification happens before interpretation, so the content lands in the right mental category.
Vague guidance like “prefer to batch tool calls” leaves the LLM deciding what “prefer” means in each situation. The new description replaces that with a threshold:
Two or more downstream calls = use execute_code. One downstream call = call it directly.
The LLM doesn’t have to weigh tradeoffs. It counts. Two is the boundary. This kind of concrete numeric rule generalizes well because the LLM can apply it to situations that don’t pattern-match anything in training data.
The PATTERNS section includes two code examples the LLM can match against:
// Fan-out: run calls in parallel
const [a, b, c] = await Promise.allSettled([
tools.search.query({ q: "topic A" }),
tools.search.query({ q: "topic B" }),
tools.docs.fetch({ id: "abc" })
])
// Continue-on-partial-failure
const results = []
for (const id of ids) {
try {
results.push(await tools.db.get({ id }))
} catch (e) {
console.log("skipped", id, e.message)
}
}
The key is that these are complete, runnable patterns - not pseudocode. The LLM doesn’t need to invent the error handling approach or the parallelism strategy. It pattern-matches and adapts. Asking an LLM to invent patterns from a prose description produces inconsistent results. Giving it concrete patterns to adapt produces consistent ones.
The description also notes: “unhandled rejection aborts the run” - because LLMs frequently write Promise.all when they mean Promise.allSettled, and one failing tool call should not cancel the whole script.
The description explains the cost model explicitly:
Calling tools individually means one round-trip each. Calling them inside execute_code means one round-trip total, regardless of how many tools the script calls.
This sounds redundant if you’re thinking of tool descriptions as an instruction set. But LLMs that understand why a rule exists apply it to novel situations that don’t match the original pattern. An LLM that only knows “two or more calls = use execute_code” might still call tools sequentially when the task is framed differently. An LLM that understands “sequential calls are expensive because of per-call LLM overhead” will chain them in situations the description never explicitly anticipated.
The reasoning is the generalization mechanism.
The WORKFLOW section in execute_code says:
Always call search_tools() first to find what’s available, then call execute_code with what you found. Do not skip discovery.
And the search_tools description says:
After searching, call execute_code with the tools you found. Calling tools individually after searching wastes round-trips; chain them inside execute_code instead.
Neither description works as well alone. The execute_code description tells the LLM to start with discovery. The search_tools description closes the loop by pointing back to execution. An LLM reading both descriptions gets a consistent workflow from two independent angles, which reinforces the behavior more than either one could on its own.
💡The discovery step matters
A common failure mode is LLMs calling execute_code without knowing the exact tool names available. The WORKFLOW section enforces search_tools() before execute_code. The two-step pattern - discover, then act - prevents “I’ll guess the tool name” hallucinations inside the sandbox.
| Technique | What it does | Why it works |
|---|---|---|
| ALL-CAPS labels | STRONG PREFERENCE, WORKFLOW, RULES | Visual anchors - LLM weighs them higher than prose |
| Numeric rules | ”2+ calls = use execute_code” | No interpretation room, counts instead of weighs |
| Working code snippets | Promise.allSettled(...), try/catch loop | Pattern matching, not invention |
| Reasoning included | ”one round-trip total vs one each” | LLM generalizes to novel situations |
| Cross-links between tools | search_tools points to execute_code and back | Two descriptions reinforce the same behavior |
The description includes a section explaining what Promise<any> means in the TypeScript declarations the LLM receives at discovery time:
NOTE: Tool return types are declared as
Promise<any>because the actual shape is only known at runtime. Use console.log to inspect what a tool returns before building logic on top of it. console.log output appears in the execution result.
This matters because LLMs frequently write code that assumes a specific return shape (result.items[0].title) without knowing whether the tool actually returns that. The NOTE section teaches the right debugging behavior - inspect first, then build - rather than hoping the LLM guesses correctly.
The console.log output appearing in the result is also non-obvious behavior worth stating explicitly. LLMs don’t naturally reach for console.log in a tool-call context the way they would in a Node.js script. Telling them it works, and that the output is visible, makes them use it.
There’s a companion post on how Code Mode learns tool return types at discovery time that goes deeper on the TypeScript declaration injection.
None of these techniques are VoidLLM-specific. If you’re writing tool descriptions for your own MCP server, the same patterns apply:
Tool descriptions are the primary interface between your API design and LLM behavior. The LLM follows what’s in the description more reliably than anything you put in a system prompt, because descriptions are right next to the tool in the context window when the model decides whether to call it.
The techniques above landed in VoidLLM and VoidMCP v0.0.17 on April 29, 2026. The full Code Mode description is in the VoidMCP repo if you want to read the complete version.
VoidLLM's Code Mode lets AI agents orchestrate multiple MCP tool calls in a single WASM-sandboxed JavaScript execution. No round-trips, no latency penalty.
MCP tools advertise inputs but not outputs. We taught Code Mode to learn return types from the first successful call and surface them as TypeScript on the next discovery.
Step-by-step setup for using VoidLLM as your LLM proxy in Cursor and Windsurf, and as an MCP server in Claude Code.
VoidLLM acts as an MCP gateway - proxy, manage, and control access to external MCP servers from one place.