AI Token Usage: From Abundance to Caution

Companies are now facing unexpected expenses from AI token usage, prompting a shift from encouragement to caution in their AI strategies.

Just a year ago, large companies encouraged employees to maximize their use of AI—tokens were handed out as corporate bonuses, and internal rankings rewarded those who consumed the most. Now, executives are frantically cutting access to models or imposing strict usage limits while simultaneously revising budgets.

We explore how tokens transitioned from "free fuel" to a major unexpected expense and why corporations overlooked this shift.

What is a Token and Why Do We Pay for It?

In the context of artificial intelligence, a token refers to the smallest unit of memory processed by a language model. Depending on the type of LLM, one token can correspond to several characters or an entire word.

Models use tokens to process user queries and generate text responses, code snippets, and images. On average, 100 words break down into about 130–140 tokens.

Tokens are divided into two types: input (the user's prompt) and output (the model's response).

For a long time, tokens were viewed as a mere unit of information in API systems of neural networks, with little impact. This changed when they became monetized.

Previously, public AI projects like OpenAI offered access to their products through subscription models or corporate plans. However, in March 2023, Sam Altman's startup released GPT-4 and established public pricing based on tokens—$30 for 1 million input tokens and $60 for 1 million output tokens.

From that point on, tokens gradually became a unit of measurement for the cost of AI-based tools.

From 2023 to 2025, there was an "era of subsidization": companies continued to offer services to regular users via subscriptions, but corporate developers were already paying for tokens. During this time, pricing was more lenient, allowing for the widespread adoption of LLM systems.

Starting in spring 2026, all major providers fully transitioned to monetizing requests, citing high data processing costs. In April, Anthropic made this decision, followed by GitHub Copilot in July.

Average token prices for various LLMs as of December 2025. Source: FourWeekMBA.

Some observers now refer to the current period as the beginning of the "tokenpocalypse," where, in addition to computing costs, companies must urgently incorporate additional expenses into their budgets and optimize strategies.

Prices Fall, Demand Rises

The paradox of the current situation is that, over the past three years, the actual cost of tokens has only decreased. The average price of an input token has dropped by about 98%, according to The Next Web.

Nevertheless, numerous industry analyses indicate that enterprise AI spending has surged by approximately 320%. The average annual budget for AI has increased from $1.2 million in 2024 to $7 million in 2026.

This surge is attributed to the increased adoption of AI solutions and, consequently, the skyrocketing volume of token usage. Agent technologies, in particular, have significantly contributed to this load.

A simple linear workflow in 2023 cost about $0.04 per interaction. A request to a full-fledged agent system costs around $1.2 (30 times more).

Analysts estimate that agent workflows multiply token consumption several times compared to "manual usage." For the most complex workflows, costs can increase by 50 to 500 times.

While cheaper models and token optimization methods continue to gain popularity, corporate buyers still prefer to pay for advanced model features, despite inflated budgets, according to researchers from yipitData.

Token costs in the APIs of different models. Source: yipitData.

Economists refer to such situations as the Jevons Paradox—a decrease in the price of a resource leads to an increase in its overall consumption. In the case of AI, this effect has been exaggerated.

Who Miscalculated?

The token crisis is manifesting not just "on paper." Many large companies have publicly acknowledged issues in this area and are revising their strategies for using AI tools.

Limits at Uber

Ride-hailing company Uber has actively integrated AI tools like Claude Code and Cursor into its workflows over the past few years. In May, CEO Dara Khosrowshahi claimed that autonomous agents were already writing about 10% of the company's code.

However, less than a month after this statement, news emerged about restrictions on AI usage within Uber. The company set a limit of $1,500 per month per employee for using such services.

The issue stemmed from not-so-conservative application of the technology. In April, CTO Pravin Neppalli Naga told The Information that the firm had exhausted its annual programming budget. He recalled a case where he personally spent $1,200 on tokens during a two-hour demonstration.

"I had to start all over: the budget I calculated turned out to be insufficient," Naga noted.

Before discovering unexpected expenses, Uber encouraged employees to use AI "as much as possible" and maintained internal activity rankings for teams. Now, developers must monitor their own expenses and request additional funding if they exceed the established limit.

Microsoft's Failed Experiment

In December 2025, Microsoft provided thousands of its developers, managers, and designers access to Claude Code from Anthropic—at the company's expense. By spring, the tool had spread far beyond engineering teams.

Internally, the implementation was presented as a learning process. However, externally, the situation appeared stranger: despite having its own tool in GitHub Copilot, the company was using competitors' solutions.

Six months later, Microsoft ultimately curtailed the experiment, canceling most direct licenses for Claude Code in its Experiences and Devices division (Windows, Microsoft 365, Outlook, Teams, Surface). Developers were instructed to migrate to Copilot CLI by June 30—the last day of the fiscal year.

In an interview with The Verge, Microsoft’s VP of user experience and devices Rajesh Jha noted that the goal of implementing Claude Code was to quickly master the tools and assess their effectiveness.

However, according to internal surveys, most engineers preferred the tool from Anthropic. Journalists speculate that its excessive use may have impacted the corporation's expenses.

Researchers from The Next Web believe Microsoft’s withdrawal from Claude is a strong signal that the economics of corporate AI coding at current token prices are not viable. Not because the tools are poor. On the contrary, they are so good that they are used constantly.

Additionally, in April, GitHub suspended new registrations for Copilot Pro and Pro+ users, as the platform's costs for payment agents exceeded the price of their monthly subscription plans.

New Contract with Amazon

In June, The Information reported on a restructuring of the partnership between Amazon and Anthropic. Starting next year, the corporation will pay not for used computing hours but for consumed tokens.

Sources from the publication believe this change may increase Amazon's costs for the Claude family of language models. However, Amazon has denied this claim.

Jeff Bezos's company utilizes AI models for several AI services, including the voice assistant Alexa, the programming tool Kiro, and the assistant Quick.

As part of the collaboration, Anthropic also integrated Amazon Web Services systems and used the partner's chips for training LLMs.

Of course, the "tokenpocalypse" is unlikely to significantly impact giants like Microsoft and Amazon, partly due to their direct investment in many AI projects. However, smaller firms risk facing or are already experiencing financial difficulties.

Costly Mistakes

As an example of excessive token spending, journalists from Axios cited an anonymous company that forgot to set limits on using Claude Opus for employees. By the end of the month, the firm received a bill for $500 million for AI computations, which became a debt burden.

Representatives from the Priceline service also described difficulties with AI solutions:

"It's like a crack cocaine epidemic. They let you try it to get you hooked, and now you're somewhat dependent," said senior IT director Chris Reed, noting that the company began imposing token limits for certain employee groups.

Meanwhile, Faros AI CEO Vitaly Gordon recalled one developer who spent over $40,000 on tokens in a month.

Jellyfish's research head Nicholas Arcolano told TechCrunch that AI expenses are rapidly increasing, largely due to agent functions. Token consumption per developer has increased about 18.6 times over nine months.

"At the same time, the return on excessive spending depends on the ultimate commercial value of the released code (for example, revenue), which most companies still cannot measure," Arcolano emphasized.

A Not-So-Dim Future

It's challenging to assess the real financial implications of the "tokenpocalypse," but the term clearly exaggerates the problem. Large corporations will easily cover their "bills," while smaller companies can adapt if they consume wisely.

Against this backdrop, the Linux Foundation has introduced the Tokenomics Foundation—a new standard aimed at instilling "discipline" regarding AI costs. A similar model has been applied by FinOps to cloud service expenses.

Tokens have already become a new budget item, and enterprises looking to keep pace with technology should take this into account. As AI solutions evolve, such expenses may become as predictable as costs for basic corporate subscriptions.

Using AI Wisely: The Shift from Abundance to Caution