features getting-started

How to Allocate LLM Costs Across Teams and Departments

· 7 min read

It’s the last day of the month. Your CFO walks over and asks a simple question: how much did engineering, marketing, and support spend on LLMs in March? You open the OpenAI dashboard and see one number. Good luck breaking that down.

If you’ve deployed OpenAI, Anthropic, or Azure OpenAI at any real scale, you’ve hit this wall. One invoice. One usage graph. No idea who spent what. The bill keeps growing and nobody can explain why. Finance wants chargeback. Team leads want budgets. You want to sleep.

This post walks through what good LLM cost allocation actually looks like, and how to set it up without rewriting your stack.

Why this is harder than it should be

Provider API keys are flat. OpenAI gives you a key, you give it to your team, everyone shares it. Anthropic is the same. Azure at least lets you create multiple deployments, but the per-team breakdown is still manual work in Cost Management.

Some teams try to solve this by minting one key per person. That works until you have thirty people across five teams and rotating keys becomes a second job. And you still cannot tell from the provider dashboard which key belongs to which team, because the provider has no concept of teams.

A gateway or proxy can solve this, and most of them do to some degree. But many stop at per-key tracking and leave the aggregation (by team, by department, by cost center) as an exercise for your BI tool.

What good allocation actually needs

Before evaluating tools, it helps to write down what “good” means. For cost allocation to work end to end, you need:

  1. Hierarchical entities. An org contains teams. Teams contain users. Users own keys. You cannot fake this with naming conventions - the hierarchy needs to be first-class so you can roll up usage at any level.
  2. Per-key usage capture. Every request needs to be attributed to a key, and through that key to a user, a team, and an org. Token counts, cost, model, duration - all tagged with who spent them.
  3. Budgets at every level. Org-wide caps protect the bottom line. Team budgets protect the org from one runaway team. Per-key budgets protect teams from one runaway script.
  4. Inheritance rules. When budgets exist at multiple levels, the most restrictive one should win. A team with a $500 monthly cap should not be allowed to burn $2000 just because the org cap is $10,000.
  5. Model-level cost awareness. Spend on GPT-4o is not the same as spend on GPT-4o-mini. A good system tracks which model each dollar went to, so you can spot cheap wins.
  6. Exportable data. Finance runs on spreadsheets and ERPs. CSV export is the lowest common denominator and the one that actually gets used.

How VoidLLM handles it

VoidLLM is built around exactly this shape. The core model is org to team to user to key, four levels deep, with four roles (system_admin, org_admin, team_admin, member) controlling who can see and change what.

Here is what a setup looks like in practice.

1. Create an org. “Acme Corp” is the top-level tenant. It holds the billing relationship, the default model allowlist, and the org-wide budget.

2. Create teams. Engineering, Marketing, and Support each get their own team inside Acme Corp. Team admins manage their own members and keys without touching the others.

3. Create users and keys. Each team member gets a user account and one or more API keys. Keys carry the prefix vl_uk_ for user keys, vl_tk_ for team keys, and vl_sa_ for service accounts used by CI and background jobs.

4. Set limits at the right level. On each key (and at the team and org level) you can set:

5. Let inheritance do the work. VoidLLM applies the most-restrictive-wins rule. If the org allows GPT-4o and the team does not, the team blocks it. If the key has a $100 monthly cap and the team has a $500 cap, the key stops at $100.

6. Watch the usage roll in. Every proxied request writes a usage event with key ID, org ID, team ID, user ID, model name, prompt tokens, completion tokens, total cost, duration, TTFT, and TPS. No prompt or response content is ever stored - see Zero-Knowledge by Architecture for why that is a hard design rule and not a toggle.

7. Look at the dashboard. The admin UI has a usage page that lets you group by org, team, user, key, model, day, or hour. Switch the grouping, pick a date range, and you have the answer to your CFO’s question in about four clicks.

8. Export for finance. The same usage view has a CSV export button. The CSV carries the grouped totals and drops straight into Excel, Google Sheets, or whatever your finance team uses for chargeback.

A concrete example

Acme Corp has a $3,000 monthly LLM budget. Here is how they split it:

TeamMonthly capModels allowedTypical use
Engineering$2,000GPT-4o, text-embedding-3-smallCode assist, RAG, internal tools
Marketing$500GPT-4o-miniBlog drafts, ad copy, translations
Support$500Claude HaikuTicket summaries, auto-replies

The org admin sets an org-level cap of $3,000. Each team admin sets the team cap. Inside Engineering, the team admin further splits the budget: $1,500 for the backend team (three engineers, one service account for CI), $500 for data science (two users with their own keys).

Three things happen automatically:

No spreadsheets. No manual tagging. No one arguing about whose bot ate the budget.

What other tools offer

VoidLLM is not the only answer here, and it would be dishonest to pretend otherwise. LiteLLM has virtual keys with budgets and team objects - see our honest comparison for where each fits. Portkey offers virtual keys with budget enforcement as part of its SaaS control plane. Bifrost has customer-level budgets. Each of them can get you most of the way to per-team cost tracking.

Where they differ is in the parts around the hierarchy: self-hosted versus SaaS, pricing models that scale with traffic versus flat fees, whether prompt and response content is stored on the control plane, and how much of the admin experience is included out of the box versus assembled from other tools.

Wrapping up

Cost allocation is not a nice-to-have once you are past a handful of users on a shared key. It is the difference between “LLMs are a line item we understand” and “LLMs are a line item we fight about.” The ingredients are the same everywhere: a real hierarchy, per-key attribution, budgets with inheritance, and an export finance can use.

VoidLLM ships all of that in a single binary with an embedded UI, and the pricing is flat - no per-token markup, no per-user fees, no surprise at the end of the quarter. If you want to try it on your own infrastructure, the getting started guide gets you from Docker pull to first request in about five minutes.

Your CFO will still ask the question. You will just have an answer ready.

Related posts