Brief
Executive Summary
- The opex mix is shifting from headcount to tokens, but organizations face different bottlenecks at each level.
- Despite falling model prices, total spending remains stubbornly high as usage scales, complexity increases, and enterprises default to premium frontier models.
- The winners will actively manage token economics, rightsizing models, instrumenting cost per task, and redesigning operating models to move faster than the technology curve.
In part I, we laid out the provocation that the opex mix is shifting from headcount to tokens and that there is no clear path of transition.
In software engineering, the domain furthest along in AI enablement, spending on tokens is only about 1% to 2% the cost of headcount. The pattern is the same across sales, support, strategy, and ops: at 1% and talking about getting to 20% to 30% (see Figure 1). But where we are looks radically different depending on which industry you’re in and where you sit within the organization. Three levels, three different problems, one transformation.
At the CEO level, the destination is clear. Intuit’s CEO has set a vision to become a builder powerhouse by tripling developer productivity. Across our tech-forward client base, the mindset has flipped from “you helping AI” to “AI helping you.” Alpha teams redesign the workflow first, prove the velocity gain, then expand to the broader organization. These aren’t experiments; they’re mandates. And the leaders setting them aren’t debating the returns on investment. They’re impatient with the pace of organizational change beneath them.
At the general manager (GM) level, it’s a budget-and-speed problem. Where do I find the unbudgeted millions this quarter? Token spending doesn’t fit an existing line item and requires an approval chain that doesn’t exist. The pilot was stellar, but scaling to 10,000 seats means navigating procurement, security, legal, data, and IT cycles that were built for an annual cadence. The GM knows the destination, but the path is littered with quarterly budget reviews, vendor approvals, and organizational redesigns that can’t happen fast enough.
At the individual level, there’s a chasm between the heroes and everyone else. Early data suggests that the top 5% of users in each company often consume more tokens than the other 95% combined. In some cases, senior staff engineers, chief architects, sales rainmakers, and strategy directors say, “I don’t need a team; they only slow me down.” It’s unclear how we get everyone to cross the chasm or how to manage superusers’ costs without slowing them down.
The CEO sees the destination. The general manager sees the obstacles. The individual sees a tool that works for some people but not yet for them. Same transformation, three different bottlenecks.
The cost-per-task paradox
Here’s what makes this truly uncomfortable: In some domains, agent and token costs are already more expensive than offshore human resources. Not everywhere: These are still sparse domains. But they exist. Agents consume significant tokens on multistep reasoning, error correction, and context loading, which add up fast on complex workflows. The speed and quality gains are real, but the per-task economics don’t always pencil out, particularly when done for the wrong tasks or without the appropriate orchestration. The question is where you’re paying a premium for capability vs. where you should be getting a cost arbitrage.
So, will the cost come down? The hope is obvious: Model prices are falling roughly 10 times per generation. But here’s the reality of what we’re actually seeing: The effective cost per task is often staying flat. Why? Three forces are working against the headline price declines. First, everyone stays on frontier. When the next Claude or GPT ships, nobody says, “Great, I’ll keep using the old model and pocket the savings.” They upgrade. Last generation gets cheaper; frontier stays expensive (see Figure 2). Second, tokens per query keep climbing as agents take on more complex, multistep work, such as orchestrating tool calls, correcting errors, and loading context, with more advanced models consuming more tokens on more difficult problems. Third, usage expands: Once a team discovers what agents can do, they find 10 more workflows to throw at them.
Note: MMLU-Pro is massive multitask language understanding, a benchmark used to evaluate large language models
Source: Bain analysisThe trend is more nuanced than the headline. The cost of tokens (typically measured in millions) fell by half from December 2024 to December 2025 while tokens consumed grew by 4.5 times over the same period (see Figure 3).
Net-net: The models get less expensive per token, the usage gets heavier per task, and the bill stays stubbornly high. The hope is that this is like 2G/3G costs circa 2009: expensive now, destined to plummet. And maybe it is. But every six months feels like it should be the inflection point. It never quite is.
The models get cheaper. The usage gets heavier. The bill stays stubbornly high.
What would swing it
The 70/30 hypothesis, reflecting the cost of headcount and cost of tokens, is a scenario, not a forecast. Whether it materializes and how fast both depend on a handful of variables that interact in nonlinear ways. The range of outcomes is enormous. AT&T learned this firsthand: At 8 billion tokens a day, the company said it reorchestrated so that large "super agents" route tasks to smaller, domain-specific worker models instead of pushing everything through frontier. The company reported a 90% cost reduction and three times throughput, not by using less AI but by right-fitting the model to the job. One architectural decision moved the economics by an order of magnitude. That's the kind of nonlinearity we're talking about. Here are some of the variables we watch most closely.
- Token cost trajectory: This is the most obvious lever. If frontier pricing falls by a factor of 10 per generation cycle, the economics tip fast. But the pattern so far is that the last generation gets cheaper while frontier stays expensive, and tokens per task scale with complexity.
- On-premise inference: This could be the wildcard, shifting cost from opex to capex, but few enterprises have even evaluated the true cost.
- Rightsizing models to tasks: The three variables that matter are cost per token (model complexity), number of tokens (task complexity), and return (value of output). A draft email doesn’t need frontier reasoning; a customer-facing financial model does. Organizations that flex model complexity to the task will have fundamentally different economics than those that run everything on the most powerful model available. Most enterprises haven't started this work.
- Enterprise software-as-a-service (SaaS) lock-in: Salesforce, ServiceNow, Workday, and SAP control the data fabric. How quickly they open agent-friendly APIs (vs. defending the moat) governs how fast the mix can shift. This is the variable that is most outside your control and the one most likely to move suddenly when a major vendor breaks ranks.
- Organizational clock speed: Annual planning cycles don’t accommodate weekly capability shifts. The companies that move fastest will decouple organizational design cadence from the fiscal calendar. Organizational design and technology-led transformation have to move in lockstep. Annual cycles don't cut it.
- Regulatory and trust thresholds: An agent that can do the work of a compliance analyst still needs a human to sign the attestation. How regulators adapt and how quickly institutional trust in AI-generated outputs builds will determine the ceiling.
- Quality at scale: Agents work brilliantly in pilots. At 1,000 concurrent production workflows with cascading errors, the picture changes. Until enterprise-grade reliability is proven, the human backstop stays.
The 70/30 scenario on headcount and token cost isn’t a forecast; it’s a stress test. Which assumptions are you betting on?
Navigating the token economics
Uncertainty is real; it’s not an excuse for inaction. Here are five moves that hold up across scenarios.
- Create a dedicated AI compute budget. Stop funding tokens from existing line items. Treat it like cloud migration spending in 2015: a protected transformation line with its own governance and far more control and visibility than most companies currently have. If GMs have to find it midquarter, it dies in every budget review. This is table stakes, and most haven’t done it.
- Portfolio your model spending. Frontier for high-stakes work. Last generation or open weight for volume. Using one model for everything is flying first class for every trip. We see three to five times cost differences between companies that model-match and those that don’t. Pull last month's API invoices. What percentage of calls went to frontier models? For how many of those was frontier actually necessary?
- Instrument your token economics. Most companies have no idea what they spend per task, per workflow, per outcome. Build the metering now. You need to know: What does it cost to generate a proposal? To resolve a Tier-1 ticket? To draft a contract? Without this data, you’re optimizing blind. Painful to do early. Impossible to retrofit.
- Plan for the dual-cost transition. There will be a period during which you’re paying for both the legacy workforce and the scaling token bill. Know where that overlap is, and what the ROI commitments are. Build it into the board narrative now, not when the CFO notices.
- Test on-premise. For highest-volume workflows, run open-weight models on your own silicon. Understand true cost (capex, energy, ops) vs. API pricing. This is potentially the biggest lever in the entire opex equation and the one most enterprises haven’t evaluated. Don’t wait for the bill to force your hand.
The bottom line
The opex shift from headcount to tokens isn’t a budget problem; it’s a structural transformation. The economics are unsettled, and the path is nonlinear. That's not a reason to wait. It's the reason to start instrumenting now so that when the curve breaks, you're navigating with data instead of intuition.
Next Monday: Pull your top 10 SaaS contracts and your token spend to date. Instrument one workflow end to end: What does it actually cost per task, per outcome? That’s the number nobody in your org knows yet. Once you have it, every other decision gets easier.