The Meter Is Running

Every engineer I've worked with was really managing a budget. Before AI, the budgets were just easier to name: memory, CPU cycles, bandwidth, disk, the cloud bill, and underneath them all, time. Good engineering was never about having infinite resources. It was about knowing which one was scarce and spending it well.

There's a new line in the budget now, and it's the strangest one yet: tokens. When you code with a model, you're spending a finite allowance of its attention, and the model is perfectly happy to help you spend it badly. It will chase a bug in circles, rewrite the same function four ways, and confidently burn your whole context window on a dead end — cheerfully, because the responsibility to stop was never its. It's yours. Knowing which prompt actually gets the result. Knowing when the context is polluted and you should clear it and start clean. Knowing when to keep steering the model and when to close the lid on it, write the thing yourself, or ask a human who already knows. The model doesn't manage the budget. You do.

The hard part is that this budget is nearly impossible to see. You can profile memory and stare at a cloud invoice, but token spend hides behind a chat window that feels free. For a while it basically was — the industry sold AI coding flat-rate, one monthly price, prompt all you want, and everyone happily overspent because overspending had no cost. That era is ending. The all-you-can-eat pricing that trained a generation of engineers to be wasteful is quietly going away. The meter was always running. Now you can finally hear it.

The shift has two opposite currents. Per token, intelligence collapsed in price: GPT-4 launched in 2023 near thirty dollars a million tokens, and the frontier is an order of magnitude cheaper now, with smaller models hundreds of times cheaper still — Anthropic even cut the price of its most expensive model by two-thirds overnight. Yet the tools built on those tokens got costlier to run, because agents burn through them at a scale autocomplete never did. So the meter came out. GitHub Copilot started life as a flat ten-dollar autocomplete, added "premium requests" in 2025, and recently moved every plan to metered billing against a token allowance. Cursor swapped its fixed quota for a usage pool priced at raw API rates, and some bills jumped tenfold overnight — enough backlash to force a public apology and refunds. Cheaper tokens, bigger bills: the flat-rate subsidy is what's vanishing, and that direction runs one way.

So the skill to build is spending. Treat context like the scarce thing it is — clear it often, keep it clean, don't drag a thousand irrelevant lines into every question. Match the model to the job; you don't need the biggest, priciest one to rename a variable. Write the prompt that lands the answer in one shot instead of five. Learn the moment a session has gone sideways and cut your losses. And keep a human in your back pocket for the problems a model will only ever circle. None of this is about using AI less. It's about not setting money — and time — on fire.

The engineers who learn to spend tokens well out-ship the ones who don't. This is the sharper version of that. Knowing how to use these tools was the first advantage, and it's already becoming table stakes. The next one is knowing what they cost. Soon the budget-aware engineer will out-build the merely AI-knowledgeable one — because the meter is running, and it always was.