Tokens Replace ARR: Why Your SaaS Metric Just Died

HI {{first_name|Investor}} -

Yesterday’s issue made the case that the seat is no longer where software revenue lives. The seat is the floor; consumption is the upside. Microsoft's CFO Amy Hood put the architecture explicitly on the record this quarter:

"I start to think about it as a license business plus a consumption business. It'll still have that per-seat license logic, but it'll also have a meter, just like you see in Azure, and it may not all flow through bookings in the same way. You'll just bill for usage."

This is an admission that consumption pricing does not only exist alongside seats — but that the consumption layer doesn't always flow through bookings, which means RPO (remaining performance obligation, the dollar value of contracts signed but not yet delivered) becomes an increasingly incomplete proxy for the consumption story. The CFO of the company that defined modern enterprise software pricing just told investors the metric they've been using is going to get less informative.

If that's the architecture, ARR — annual recurring revenue, the metric the entire SaaS industry was built to report — is no longer the right unit of measurement. ARR was designed to count seats and contracts. It was designed for a world where a customer's revenue contribution was bounded by their headcount. That's the gap this issue is about — and the unit replacing it is the token.

The Hyperscalers Are Already Reporting in Tokens

The quiet pivot in this earnings cycle wasn't what the hyperscalers — the major cloud and AI infrastructure operators — said. It was what they reported.

Google's first-party AI models now process 16 billion tokens per minute via direct API (the programmatic interface that lets developers send queries directly to a model) — up from 10 billion the prior quarter. A 60% sequential jump. 330 Google Cloud customers each processed more than one trillion tokens over the past twelve months. Thirty-five crossed 10 trillion. Cloud GenAI model revenue grew nearly 800% year over year.

Microsoft's Foundry, the developer platform sitting on top of Azure, is on track to process more than one trillion tokens in aggregate across more than 300 customers this year — accelerating 30% quarter over quarter. The Copilot credit-consumptive offer, the metered usage tier sitting on top of seat licenses, grew nearly 2x quarter over quarter. Tens of thousands of companies are managing tens of millions of agents in Agent 365 — Microsoft's governance platform for AI agents inside enterprises — and every agent action is a chargeable event.

Amazon's Bedrock — AWS's managed service for running AI models — processed more tokens in Q1 alone than in all prior years combined. Customer spend grew 170% quarter over quarter. AgentCore, the AWS infrastructure layer for agents, deploys a new agent every 10 seconds.

Meta is the last to formalize a consumption layer, but the foundations are there. WhatsApp Business AI grew weekly conversations 10x. Family of Apps "Other" revenue — mostly WhatsApp paid messaging and subscriptions — grew 74% to $885 million.

None of those numbers fit on an ARR dashboard. They aren't seats. They aren't contracts. They're tokens, agent calls, conversation volumes — the units that scale with what the software actually does.

Why Tokens Beat ARR

ARR has done useful work for fifteen years. It's predictable, comparable across companies, easy to forecast. But it has blind spots that tokens don't.

ARR doesn't capture intensity. A customer paying $10,000 a year for ten seats counts the same in ARR whether those seats run agents at industrial scale or whether they're collecting dust in a directory. The customer running ten million tokens a day is doing real work in the platform; the customer running ten thousand tokens a month is not. ARR can't distinguish them, but tokens can.

ARR doesn't capture growth velocity. When a customer's usage 10x's because they deployed a new agent workflow, ARR moves only when their contract resizes at renewal — twelve months later. Tokens move in real time. By the time ARR catches up, the consumption story has already been priced in by anyone reading the token numbers.

ARR doesn't capture pricing power. In a seat-based world, the only way to grow per-customer revenue was to sell more seats or raise the price per seat. Both invite procurement pushback. In a consumption world, the customer's bill grows because they ran more workloads — they pay more because they got more value. Pricing power is embedded in the workload, not negotiated at renewal.

The implication is direct. Software companies that grow ARR at 20% but tokens at 60% are compounding much faster than the headline suggests. Software companies that grow ARR at 20% but have no token disclosure at all are companies whose growth might be entirely seat-driven — which means it carries the structural ceiling Sunday's issue described.

The Token Economics

A token is a unit of input or output in a language model — roughly four characters of text, or about three-quarters of a word. Every prompt sent to a model is some number of input tokens. Every response is some number of output tokens. The cost of running a model is approximately linear in tokens processed.

Inference — the cost of running a trained model to produce an output — is the dominant compute expense in production AI today. Training is one-time and amortizes over the life of the model. Inference is recurring and scales with usage. Every token represents a tiny slice of compute time on a GPU (graphics processing unit, the chips most AI runs on) or TPU (tensor processing unit, Google's custom AI chip) somewhere on a data center floor. The hyperscaler business is: bill the customer per token at one price, run the model on infrastructure at a lower per-token cost, capture the margin in the spread.

This is why custom silicon — chips designed in-house by the cloud providers themselves, rather than purchased from Nvidia or AMD — matters so much in this earnings cycle. Microsoft's Maia 200 AI accelerator delivers over 30% better tokens per dollar versus the latest silicon in their fleet. Amazon's Trainium (its custom AI chip family) provides "several hundred basis points of operating margin advantage" on inference workloads versus NVIDIA-heavy alternatives. Google's TPU 8i is a low-latency (fast-responding) inference chip designed specifically for the agentic workload. Each is a margin lever on the per-token spread. The vendor that owns the chip controls the unit economics; the vendor renting compute pays the spread to someone else.

Agentic workloads also change the math underneath. A traditional chatbot interaction is one prompt, one response — call it 5,000 tokens. An agent doing a multi-step research task can chain through dozens of model calls, each one feeding the next, each one multiplying the input — call it 500,000 tokens for a single user request. Two orders of magnitude — a hundred times more. Same human user. The agent multiplies the consumption layer in a way per-seat pricing can't capture.

How to Read a Software Earnings Call in Tokens

Three questions to ask when a software CEO talks about AI revenue.

One — is the AI revenue showing up in tokens, agent calls, or some other consumption unit, or is it showing up in seat upgrades? Token-denominated AI revenue is consumption-driven. Seat-upgrade AI revenue is a more expensive seat. They deserve different multiples — the valuation ratios investors are willing to pay.

Two — what's the growth rate of the consumption layer versus the seat layer? The companies whose consumption is growing materially faster than their seat base are repricing themselves upward. Microsoft made this visible this quarter — seats up 250% year over year, but Foundry tokens up 30% quarter over quarter and Copilot credit-consumptive up 2x quarter over quarter. Different velocities. Different stories.

Three — what's the gross margin on the consumption layer? The most important and least disclosed. A vendor running consumption at 70% gross margin is making money on the workload. A vendor running consumption at 30% gross margin is renting GPUs from someone else and barely covering the spread. The same revenue line tells very different stories depending on the margin underneath.

Most companies don't disclose all three cleanly. The ones that do are flagging confidence in the unit economics. The ones that don't are usually hiding something.

The Bridge to Tuesday

Two issues in, the framework is built. Software pricing is shifting from seats to consumption. The unit of measurement that replaces ARR is the token. The companies reporting consumption metrics — and growing them faster than their seats — are repricing themselves into a different multiple. The companies that aren't are slowly being repriced down.

Tuesday's issue is the portfolio map. Three buckets — platform owners, structural beneficiaries, and exposed names — and a scoring framework I'm using to think about my own positions across all three. It also pulls in the second-order beneficiaries that haven't shown up in the first two issues: the memory, custom silicon, and power infrastructure names that benefit regardless of which platform wins.

As always: I'm not telling you what to buy. I'm sharpening the lens you use to look.

Stay disciplined, Koh

Disclaimer: Nothing in this newsletter constitutes investment advice or a recommendation to buy or sell any security. Numbers and observations are as of publication. I may hold positions in companies discussed above. Always do your own research and consult a licensed financial advisor before making investment decisions.

Tokens are the new ARR (Part 2 of 3)

The Hyperscalers Are Already Reporting in Tokens

Why Tokens Beat ARR

The Token Economics

How to Read a Software Earnings Call in Tokens

The Bridge to Tuesday

Keep Reading

Reset and Invest