If you’re calling OpenAI or Anthropic directly, switching to VoidLLM is a one-line change. The proxy speaks the same API - your SDK doesn’t know the difference.
Before:
from openai import OpenAI
client = OpenAI(api_key="sk-...")
After:
from openai import OpenAI
client = OpenAI(
base_url="https://your-voidllm/v1",
api_key="vl_uk_your_key_here"
)
Everything else stays the same - client.chat.completions.create(), streaming, function calling, all of it.
If you’re using Anthropic models through VoidLLM, you use the OpenAI SDK - VoidLLM translates the request format to Anthropic’s Messages API on the upstream side:
from openai import OpenAI
client = OpenAI(
base_url="https://your-voidllm/v1",
api_key="vl_uk_your_key_here"
)
# This hits Anthropic through VoidLLM - same OpenAI SDK call
response = client.chat.completions.create(
model="claude-sonnet", # alias configured in VoidLLM
messages=[{"role": "user", "content": "hello"}]
)
Your code uses one SDK format. VoidLLM handles the translation per provider.
import OpenAI from 'openai'
const client = new OpenAI({
baseURL: 'https://your-voidllm/v1',
apiKey: 'vl_uk_your_key_here',
})
curl https://your-voidllm/v1/chat/completions \
-H "Authorization: Bearer vl_uk_your_key_here" \
-H "Content-Type: application/json" \
-d '{"model": "default", "messages": [{"role": "user", "content": "hello"}]}'
graph LR
A[Your App] -->|same SDK call| B[VoidLLM]
B -->|model: default| C{Resolve alias}
C -->|claude-sonnet| D[Anthropic]
C -->|gpt-4o| E[OpenAI]
C -->|llama-70b| F[vLLM]
style B fill:#8b5cf6,stroke:#6366f1,color:#fff
style A fill:#1a1a24,stroke:#333,color:#e2e8f0
style D fill:#1a1a24,stroke:#22c55e,color:#e2e8f0
style E fill:#1a1a24,stroke:#22c55e,color:#e2e8f0
style F fill:#1a1a24,stroke:#22c55e,color:#e2e8f0 When you send model: "default", VoidLLM resolves the alias to the actual model, builds the correct upstream request (including provider-specific translation for Anthropic and Azure), and streams the response back in the format your SDK expects.
By routing through VoidLLM instead of calling providers directly:
ℹUnder 500 microseconds of overhead
The proxy adds less than 0.1% latency to a typical LLM response. Your users won’t notice it. See our benchmark numbers.
Most SDKs respect environment variables, so you don’t even need code changes:
export OPENAI_BASE_URL=https://your-voidllm/v1
export OPENAI_API_KEY=vl_uk_your_key_here
Set these in your deployment config and every service in your cluster goes through VoidLLM automatically - regardless of whether the upstream is OpenAI, Anthropic, Azure, or self-hosted.
Step-by-step setup for using VoidLLM as your LLM proxy in Cursor and Windsurf, and as an MCP server in Claude Code.
A quick walkthrough of VoidLLM - from docker run to your first proxied LLM request, with a look at the built-in UI.
A practical guide to tracking and allocating LLM spend across teams using org/team/user/key hierarchies. No more monolithic OpenAI bills.
VoidLLM's Code Mode lets AI agents orchestrate multiple MCP tool calls in a single WASM-sandboxed JavaScript execution. No round-trips, no latency penalty.