It’s the last day of the month. Your CFO walks over and asks a simple question: how much did engineering, marketing, and support spend on LLMs in March? You open the OpenAI dashboard and see one number. Good luck breaking that down.
If you’ve deployed OpenAI, Anthropic, or Azure OpenAI at any real scale, you’ve hit this wall. One invoice. One usage graph. No idea who spent what. The bill keeps growing and nobody can explain why. Finance wants chargeback. Team leads want budgets. You want to sleep.
This post walks through what good LLM cost allocation actually looks like, and how to set it up without rewriting your stack.
Provider API keys are flat. OpenAI gives you a key, you give it to your team, everyone shares it. Anthropic is the same. Azure at least lets you create multiple deployments, but the per-team breakdown is still manual work in Cost Management.
Some teams try to solve this by minting one key per person. That works until you have thirty people across five teams and rotating keys becomes a second job. And you still cannot tell from the provider dashboard which key belongs to which team, because the provider has no concept of teams.
A gateway or proxy can solve this, and most of them do to some degree. But many stop at per-key tracking and leave the aggregation (by team, by department, by cost center) as an exercise for your BI tool.
Before evaluating tools, it helps to write down what “good” means. For cost allocation to work end to end, you need:
VoidLLM is built around exactly this shape. The core model is org to team to user to key, four levels deep, with four roles (system_admin, org_admin, team_admin, member) controlling who can see and change what.
Here is what a setup looks like in practice.
1. Create an org. “Acme Corp” is the top-level tenant. It holds the billing relationship, the default model allowlist, and the org-wide budget.
2. Create teams. Engineering, Marketing, and Support each get their own team inside Acme Corp. Team admins manage their own members and keys without touching the others.
3. Create users and keys. Each team member gets a user account and one or more API keys. Keys carry the prefix vl_uk_ for user keys, vl_tk_ for team keys, and vl_sa_ for service accounts used by CI and background jobs.
4. Set limits at the right level. On each key (and at the team and org level) you can set:
5. Let inheritance do the work. VoidLLM applies the most-restrictive-wins rule. If the org allows GPT-4o and the team does not, the team blocks it. If the key has a $100 monthly cap and the team has a $500 cap, the key stops at $100.
6. Watch the usage roll in. Every proxied request writes a usage event with key ID, org ID, team ID, user ID, model name, prompt tokens, completion tokens, total cost, duration, TTFT, and TPS. No prompt or response content is ever stored - see Zero-Knowledge by Architecture for why that is a hard design rule and not a toggle.
7. Look at the dashboard. The admin UI has a usage page that lets you group by org, team, user, key, model, day, or hour. Switch the grouping, pick a date range, and you have the answer to your CFO’s question in about four clicks.
8. Export for finance. The same usage view has a CSV export button. The CSV carries the grouped totals and drops straight into Excel, Google Sheets, or whatever your finance team uses for chargeback.
Acme Corp has a $3,000 monthly LLM budget. Here is how they split it:
| Team | Monthly cap | Models allowed | Typical use |
|---|---|---|---|
| Engineering | $2,000 | GPT-4o, text-embedding-3-small | Code assist, RAG, internal tools |
| Marketing | $500 | GPT-4o-mini | Blog drafts, ad copy, translations |
| Support | $500 | Claude Haiku | Ticket summaries, auto-replies |
The org admin sets an org-level cap of $3,000. Each team admin sets the team cap. Inside Engineering, the team admin further splits the budget: $1,500 for the backend team (three engineers, one service account for CI), $500 for data science (two users with their own keys).
Three things happen automatically:
No spreadsheets. No manual tagging. No one arguing about whose bot ate the budget.
VoidLLM is not the only answer here, and it would be dishonest to pretend otherwise. LiteLLM has virtual keys with budgets and team objects - see our honest comparison for where each fits. Portkey offers virtual keys with budget enforcement as part of its SaaS control plane. Bifrost has customer-level budgets. Each of them can get you most of the way to per-team cost tracking.
Where they differ is in the parts around the hierarchy: self-hosted versus SaaS, pricing models that scale with traffic versus flat fees, whether prompt and response content is stored on the control plane, and how much of the admin experience is included out of the box versus assembled from other tools.
Cost allocation is not a nice-to-have once you are past a handful of users on a shared key. It is the difference between “LLMs are a line item we understand” and “LLMs are a line item we fight about.” The ingredients are the same everywhere: a real hierarchy, per-key attribution, budgets with inheritance, and an export finance can use.
VoidLLM ships all of that in a single binary with an embedded UI, and the pricing is flat - no per-token markup, no per-user fees, no surprise at the end of the quarter. If you want to try it on your own infrastructure, the getting started guide gets you from Docker pull to first request in about five minutes.
Your CFO will still ask the question. You will just have an answer ready.
Step-by-step setup for using VoidLLM as your LLM proxy in Cursor and Windsurf, and as an MCP server in Claude Code.
Switch from direct OpenAI API calls to VoidLLM by changing one line. Route to any provider - Anthropic, Azure, vLLM - through the same SDK.
A quick walkthrough of VoidLLM - from docker run to your first proxied LLM request, with a look at the built-in UI.
VoidLLM's Code Mode lets AI agents orchestrate multiple MCP tool calls in a single WASM-sandboxed JavaScript execution. No round-trips, no latency penalty.