Provider Setup

Configure OpenAI, Anthropic, Azure, Ollama, vLLM, and custom providers

Provider Setup

VoidLLM supports 6 providers. Each handles request/response translation differently.

OpenAI

Direct passthrough - no translation needed.

models:
  - name: gpt-4o
    provider: openai
    base_url: https://api.openai.com/v1
    api_key: ${OPENAI_KEY}
    aliases: [default, smart]

Anthropic

VoidLLM translates between the OpenAI request format and Anthropic’s Messages API automatically. Your clients send OpenAI-format requests, VoidLLM handles the conversion.

models:
  - name: claude-sonnet
    provider: anthropic
    base_url: https://api.anthropic.com
    api_key: ${ANTHROPIC_KEY}

Note: Claude Code talks directly to Anthropic’s API for LLM access - you can’t route its LLM requests through VoidLLM. But you can add VoidLLM as an MCP server in Claude Code to manage your proxy and access external tools.

Azure OpenAI

Azure uses deployment names instead of model names in the URL. VoidLLM handles the URL mapping.

models:
  - name: gpt-4o-azure
    provider: azure
    base_url: https://mycompany.openai.azure.com
    api_key: ${AZURE_KEY}
    azure_deployment: my-gpt4o-deployment
    azure_api_version: "2024-10-21"

The azure_deployment is your Azure deployment name (not the model name). VoidLLM constructs the URL as {base_url}/openai/deployments/{deployment}/chat/completions?api-version={version}.

Azure uses the api-key header instead of Authorization: Bearer. VoidLLM handles this automatically.

Ollama

Local Ollama instances work as passthrough.

models:
  - name: llama3
    provider: ollama
    base_url: http://localhost:11434/v1
    aliases: [local]

No API key needed for local Ollama.

vLLM

Self-hosted vLLM instances are OpenAI-compatible.

models:
  - name: llama-70b
    provider: vllm
    base_url: http://vllm-large:8000/v1
    aliases: [large]

No API key needed - vLLM doesn’t have built-in auth. Use network policies to restrict access to the vLLM endpoint.

Custom (any OpenAI-compatible endpoint)

For any endpoint that speaks the OpenAI API format.

models:
  - name: my-model
    provider: custom
    base_url: https://my-provider.com/v1
    api_key: ${MY_KEY}

This works with any service that implements /v1/chat/completions in OpenAI format: Together AI, Fireworks, Groq, Replicate, local inference servers, etc.

Model Types

Models can be typed to distinguish their capabilities:

models:
  - name: gpt-4o
    type: chat              # default
    provider: openai
    base_url: https://api.openai.com/v1

  - name: text-embedding-3
    type: embedding
    provider: openai
    base_url: https://api.openai.com/v1

Supported types: chat (default), embedding, completion, image, audio_transcription, tts.

The type affects:

  • Which models appear in the Playground (tabs per type)
  • Health check probe format (embeddings use /embeddings, not /chat/completions)
  • Create Model dialog in the UI

Per-Model Timeout

Override the global timeout for specific models:

models:
  - name: slow-model
    provider: custom
    base_url: https://slow-provider.com/v1
    timeout: 120s           # this model gets 2 minutes instead of the global default

Upstream API Key Storage

Upstream provider API keys are encrypted at rest with AES-256-GCM. The encryption key is derived from VOIDLLM_ENCRYPTION_KEY. Keys are decrypted in memory only when needed for upstream requests.