Is AI really a net cost savings, or a OpEx disaster?

Analysts, consultancies (Bain, Deloitte, Gartner, Workday), researchers, and companies are actively dissecting this. The data shows a messy, mixed reality — real productivity gains and cost wins in specific cases, but widespread waste, rework, measurement failures, and token spend that often undercuts or fails to deliver clear net labor savings at scale. It’s not a complete shit show, but the “insane token spend → automatic human labor replacement” story has major holes right now.

Token Spend Is Real, Growing Fast, and Becoming a New OpEx Line

Companies are burning serious money on API tokens (especially with agentic/multi-step systems that loop, self-check, and consume far more tokens per task).

Token usage has exploded (e.g., one report showed business token use up ~1,000% while spend rose ~500% due to falling per-token prices). Some enterprises hit billions or trillions of tokens monthly.
In AI-heavy teams, compute/token costs can already exceed or rival employee salaries. Uber’s CTO reportedly blew through the full 2026 AI budget on tokens. Nvidia’s applied deep learning VP noted compute costs for his team far outstrip employee costs.
Startups have reported eye-watering bills (e.g., one 4-person team at ~$113k/month). Agentic AI is a particular culprit — it can burn 10-1,000x more tokens than simple queries.

Per-token prices have fallen (roughly halved in one recent 12-month period), but consumption often rises faster due to more complex use, premium models, and expanding workflows. This keeps total spend high or growing.

Labor Cost Savings? Real in Pockets, Elusive or Offset at Scale

Task-level wins exist and can be substantial:

In software engineering, current token spend is often just 1-2% of headcount cost. Bain sees potential for agents/tokens + data to replace 20-30% of headcount OpEx in domains like engineering, support, sales, and ops (a scenario/stress test, not a guarantee).
Strong examples: AT&T used a multi-agent system (super agents overseeing worker agents) and smart model routing (smaller/cheaper models for simpler tasks). Result: 90% cost reduction while tripling throughput and increasing token volume from 8B to 27B/day. A small startup spent ~$2,000 in tokens to generate ~300k lines of code across workflow tools.
Vercel’s CEO noted top token spenders are often the most productive employees; a $10k token day can save millions in delivery costs.
Broader projections (e.g., Wharton/Penn) estimate average labor cost savings of ~25% now, potentially rising to 40% as tools improve. Customer service automation has shown 30-90% labor/operational cost drops in targeted cases.

But net enterprise impact is much weaker:

Rework eats a huge chunk of gains. Workday’s early 2026 global study (3,200+ employees/leaders): 85% of employees save 1-7 hours/week with AI, but ~37-40% of that saved time is lost to fixing errors, rewriting, verifying, or managing low-quality output. That’s roughly 1.5 weeks per employee per year lost to the “AI tax.” Only ~14% consistently see net-positive results.
Many organizations see no clear headcount reduction or bottom-line labor savings. Labor markets have remained relatively stable so far; AI is more often augmenting work or enabling smaller teams/new entrants than causing mass displacement. Some replacements happen (e.g., junior/routine tasks), but new work (oversight, prompt work, validation, integration) often appears.
Gartner: Only ~41% of agentic AI rollouts hit positive ROI within 12 months; ~19% never do. They predict over 40% of agentic AI projects could be canceled by end of 2027 due to escalating costs, unclear value, and weak controls. Many lack maturity for complex autonomous work.
Broader studies/meta-analyses often find weak or no robust aggregate productivity gains at the economy or firm level once you account for offsets, methodological issues, and the fact that many “gains” don’t translate to measurable financial outcomes. 25% or fewer AI initiatives deliver expected ROI in some surveys; proving value remains a top barrier.

Why It Feels Like a Partial Disaster

Naive deployment wastes money. Bolting AI onto legacy processes without redesign creates rework, quality issues, and new costs. “Tokenmaxxing” (pushing heavy internal use for productivity metrics) backfired for some companies.
Measurement is hard. Many firms can’t reliably quantify ROI net of token costs. Productivity gains at the individual/task level don’t automatically become enterprise labor savings or profit.
Costs shift, not disappear. You trade (some) human labor for token/compute spend + oversight labor + integration/governance costs. In poorly managed cases, the new variable costs (tokens) grow unpredictably and can rival or exceed the old fixed ones (salaries).
Agentic systems amplify both upside and downside. They enable bigger automation but are token-hungry and error-prone without strong orchestration.

The Path Forward (Where It’s Working)

Winners treat this as a structural OpEx shift requiring active management:

Rightsizing models (cheap/small for simple tasks, frontier only when needed).
Multi-agent orchestration and smart routing (like AT&T).
Strong governance, FinOps-style tracking of cost-per-task, chargebacks, and clear KPIs tied to business outcomes (not just tokens used).
Workforce redesign: AI amplifies capable people; it exposes or creates drag with weak processes/oversight.
Hybrid approaches (private cloud/self-hosting for scale, caching, open-source where viable).

Deloitte and Bain both emphasize that CFOs need visibility and partnership with tech leaders now — token economics are becoming strategic, like electricity or raw materials.

Bottom line: There are credible analyses showing AI can deliver strong ROI and meaningful labor cost displacement when executed with discipline (especially in technical domains where tokens are still cheap relative to skilled humans). However, for a large portion of deployments today, the insane token spend is delivering partial/illusory gains, high waste, and unclear net savings on human labor. It’s more “expensive experiment with real pockets of value” than proven, scalable replacement. The economics are improving with falling prices and better tooling, but success depends heavily on how companies use it — not just how much they spend. Many are still figuring out the “how.”

The hype cycle is colliding with reality in 2026, and the survivors will be the ones measuring rigorously and redesigning work rather than just adding tokens.