In May 2026 an AI consultant told Axios that one of their clients spent half a billion dollars on Claude in a single month. Not over a year. One month. The cause was almost stupidly simple: the company handed out Claude licenses to employees with no usage limit, and people used them for everything, including, one CTO said, checking the weather.
It is easy to read that as a story about a giant company with more money than sense. It is not. It is a story about a control that nobody turned on, and that same control is missing from most small teams right now.
TL;DR: Token-based billing is now the default for AI coding and agent tools, and it has blown up budgets at Microsoft, Uber, and one company that spent $500M on Claude in a single month after forgetting to cap employee usage. Flat seat pricing hid the real cost; usage pricing exposed it. Five controls fix this for small teams: hard per-seat usage caps, budget alerts with a kill switch, a deliberate seat-vs-token billing choice, per-team token allocation, and a monthly spend review with a named owner.
Why AI spend is suddenly exploding
The shift that broke everyone's budget is the move from seat pricing to token pricing.
For the first couple of years of the AI boom, you paid a flat monthly fee per user. Twenty dollars a seat, thirty dollars a seat, whatever it was, the number did not move no matter how hard anyone leaned on the tool. Finance liked it because it was predictable. The downside was invisible: you had no idea whether a seat was worth it, because everyone cost the same.
Then the coding assistants and agents moved to token-based billing, where you pay for what you actually consume. Suddenly the real cost showed up, and it was not flat at all. A developer running an agent that spawns sub-tasks, reads a whole codebase, and retries on failure can burn through thousands of dollars of tokens in a day without noticing. Multiply that by a team, leave it uncapped, and the invoice stops looking like software and starts looking like payroll.
This is not hypothetical. Microsoft canceled its internal Claude Code pilot effective June 30, 2026, after the switch to usage-based pricing made the true cost unmanageable, and steered its developers to GitHub's Copilot CLI instead. Uber warned employees it had burned through its entire 2026 Claude Code budget in four months. Fortune ran a piece in May arguing that, for some workloads, running the AI now costs more than paying a human to do the same task. The common thread is not bad tools. It is missing controls.
The reason small teams should pay attention is that the controls that would have saved Microsoft are the same ones a ten-person startup needs, and they are cheaper and faster to put in place when you are small. You do not have a procurement department slowing you down. You also do not have a finance team watching the invoice, which is exactly why an uncapped tool can run for a full billing cycle before anyone notices.
The 5 controls that stop runaway AI spend
None of these require a FinOps team or special software. Most are toggles inside the vendor admin console that take an afternoon to set up.
1. Set a hard per-seat usage cap
This is the control the $500M company skipped. Before you roll out any usage-priced AI tool, set a monthly ceiling per seat, either a token limit or a dollar limit, in the vendor's admin settings. When a user hits the cap, they should be blocked or throttled, not allowed to keep spending.
Pick the cap from a real number: estimate what a productive user consumes in a normal week, multiply by four, add headroom, and set the limit there. You can always raise it for a specific person who has a real reason. The default, though, should be a ceiling, not the sky.
2. Turn on budget alerts and a kill switch
A cap stops one user. A budget alert protects the whole team. Enable the vendor's spend alerts so you get an email at 50% and 80% of your monthly budget, and again if a single day's spend is abnormally high. Most AI platforms now support this; if yours does not, that is a reason to reconsider the vendor.
Pair the alert with a kill switch: know, in advance, how to pause all usage on the account in one click. The point of the alert is to give you time to act before the cycle closes. The point of the kill switch is that acting takes seconds, not a support ticket.
3. Choose seat pricing or token pricing on purpose
Do not let the billing model be an accident of whatever the vendor defaulted you into. Decide.
Flat seats are right for broad, light usage, the marketer who summarizes a few documents a week, the support rep who drafts replies. The cost is predictable and the usage is low, so the hidden-usage downside does not matter. Token pricing is right for power users and agent workloads where usage is spiky and you want to see the truth, but only if you have caps and alerts in place. The mistake is using token pricing with no guardrails, which is how you get the headline.
4. Allocate a token budget per team, not just per company
A single company-wide limit tells you that you are over budget, but not where the spend went. Give each team or function its own monthly allocation: engineering, support, marketing. Now the engineering lead owns the engineering number and notices when an agent experiment doubles it. Ownership at the team level catches problems weeks earlier than a finance review at the company level.
This also makes the spend a management conversation instead of a surprise. A team that knows its allocation will self-police, because going over means a conversation with their own lead, not an anonymous line on a corporate invoice.
5. Run a 30-minute monthly spend review
Put one recurring meeting on the calendar. One named owner pulls spend by tool and by user, looks for anything that grew more than expected, and asks a single question for each tool: is this spend producing value we can point to? Tools that cannot answer that question are candidates for a lower cap or removal.
Thirty minutes a month is the entire overhead. It is the difference between catching a runaway in week one and discovering it on an invoice you cannot dispute.
Why agents spend the fastest
If you only harden one category, make it agents. A chat assistant spends roughly what a person reads and types. An agent spends what a machine can consume at machine speed, which is a different order of magnitude.
Three things make agentic and code-generation workloads the spike source. They read large context, a single task can pull an entire repository or document set into the prompt, and you pay for every token of it. They retry, when a step fails, the agent often tries again, sometimes in a loop, each attempt billed in full. And they spawn sub-tasks, one instruction can fan out into many model calls that the user never sees and never approved.
The practical implication is that your tightest cap belongs on the agent tools, not the chat tools. For anything that can run autonomously, set a low default cap, require explicit approval to raise it, and watch the first week of real usage closely before you trust the number. The weekend run that drains a budget is almost always an agent left to iterate without a ceiling, not a person typing too much.
Flat seat versus token billing: the risk tradeoff
| Dimension | Flat seat pricing | Token-based pricing |
|---|---|---|
| Cost predictability | High, fixed per user | Low, varies with usage |
| Visibility into real value | Poor, everyone costs the same | High, you see exactly who uses what |
| Spike risk | None | High without caps |
| Best for | Broad, light usage | Power users and agent workloads |
| Required guardrail | Periodic seat audit (remove unused seats) | Hard caps + budget alerts + kill switch |
| Failure mode | Paying for seats nobody uses | One uncapped user drains the budget |
The honest answer is that both models are fine and both are dangerous, in opposite directions. Flat seats quietly waste money on licenses nobody touches. Token pricing loudly wastes money when one person runs an agent over the weekend. Match the model to the usage pattern and add the matching guardrail, and neither failure mode can hurt you much.
Copy-paste AI spend policy
Adapt the bracketed fields. One page is enough for a team under 50 people.
[Company Name] AI Spend Policy
Approved tools and billing model. The following AI tools are approved, with the billing model noted: [Tool, seat / token]. No other paid AI tool may be expensed without approval from [role].
Per-seat usage cap. Every usage-priced AI seat has a monthly cap of [$X or N tokens]. Users who reach the cap are throttled until the next cycle or until an increase is approved by [role].
Team budgets. Each team has a monthly AI budget: [Engineering $X, Support $X, Marketing $X]. The team lead owns their number.
Alerts and kill switch. Budget alerts are enabled on every account at 50% and 80% of the monthly budget. [Named owner] can pause all usage on any account if spend is abnormal.
Approvals. A cap or budget increase requires written approval from [role] and a one-line reason.
Monthly review. [Named owner] reviews spend by tool and by user on the [first Monday] of each month and reports anything that grew unexpectedly.
Save it in your shared docs next to your AI acceptable use policy. The act of writing down a cap and an owner is what turns "we should watch our AI spend" into something that actually happens.
Checklist
- Listed every paid AI tool and its billing model (seat or token)
- Set a hard per-seat usage cap on every usage-priced tool
- Enabled budget alerts at 50% and 80% on every account
- Confirmed you know how to pause/kill usage in one click
- Chose seat vs token pricing deliberately for each tool
- Allocated a monthly token budget to each team
- Assigned one owner for the monthly spend review
- Scheduled the recurring 30-minute review
- Documented all of the above in a one-page AI spend policy
Where this fits in your governance
AI spend governance is the financial side of the same discipline that covers data and risk. The tools you are capping are the same ones in your AI tool register, governed by the same AI acceptable use policy, and approved through the same CEO AI tool approval checklist. If your tools are specifically coding assistants like Copilot or Cursor, the AI coding tool governance and cost control guide goes deeper on that category.
The $500M invoice is a useful reminder precisely because the company that paid it was not reckless, it was just missing a toggle. Turn the toggle on while you are small and the headline can never be about you.
