Claude: how to stop wasting your tokens

Since late March 2026, Anthropic has tightened its five-hour rolling usage window during weekday peak hours. Pro and Max users are hitting their limits faster than before. The main reason isn’t the number of messages sent. It’s how Claude reads a conversation.

Key Takeaways

Claude rereads the entire conversation history with every new message, making consumption grow as the thread gets longer
Limits run on a 5-hour rolling window and a weekly cap, both measured in tokens, not messages
Opening one thread per topic, grouping questions, and picking the right model can recover a significant share of your credits

The hidden cost of every message

Most users assume their credits deplete based on the number of messages they send. That’s not how it works. Claude doesn’t process each message in isolation: it rereads the entire conversation from the beginning before responding to every new input. The first message costs almost nothing. The thirtieth forces the model to reprocess twenty-nine full exchanges before it gets to the new question.

This behavior is inherent to the architecture of large language models, not specific to Claude. But it has a direct consequence: a conversation thread that grows longer becomes exponentially more expensive, even if the content of each individual message stays simple.

Several factors amplify this effect. The size of attached files, enabling tools like web search or Research mode, generating Artifacts (documents, spreadsheets, presentations). Anthropic confirms in its documentation that these tools are particularly token-intensive. A single Research mode call inside a long conversation can account for a significant share of the session limit.

The total context window available is 200,000 tokens across all paid plans, except Enterprise which gets 500,000 tokens on certain models. This is Claude’s working memory for a given session. But this limit isn’t what triggers the most common blocks. The session and weekly limits kick in well before that ceiling is reached.

The rise of agentic use cases makes things worse. Claude Code, long sessions, and multi-step tasks consume far more resources than simple chat. What Pro and Max users have been experiencing since late March isn’t a quota reduction. It’s the mechanical effect of these new usage patterns on tightened rolling windows.

Two counters, two logics

Claude’s limit system relies on two distinct mechanisms. The 5-hour rolling limit works like a moving counter: it measures the volume of resources consumed over a continuous five-hour window. Once the limit is reached, you wait for the gauge to reset progressively.

The weekly limit is an envelope renewed once per week. When it’s reached, access to Claude is suspended until it resets. Both limits apply independently and can trigger in very different contexts.

Neither limit is measured in number of messages. Both are measured exclusively in tokens. A token corresponds roughly to one word, or three to four characters. What feels like a light conversation can represent thousands of tokens once the accumulated history is factored in.

Claude offers a usage dashboard accessible under Settings > Usage. It shows real-time limits broken down by active tools. Checking it regularly helps anticipate blocks and schedule intensive sessions outside peak hours when needed.

Also on Horizon:

The habits that actually make a difference

The first mistake to fix is also the most common: mixing multiple topics in the same conversation thread. Every new topic adds to a history that Claude fully reprocesses with each exchange. Opening a new thread when switching topics is the simplest and most effective habit to build. For long sessions on the same subject, asking Claude for a summary of key decisions at the end, then starting a new thread with that summary as the first message, carries the essential context without paying for the full history.

Second habit: group your questions. Sending three separate messages for three related questions forces Claude to reread the full history three times. Grouping them into one message produces the same result for a third of the consumption. Formatting the questions as a list inside a single message is enough to structure the request without multiplying back-and-forth exchanges.

Editing a request rather than correcting it inside the thread is an underused lever. When Claude doesn’t deliver the expected answer, every follow-up message like “no, I meant…” piles onto the history and gets reinterpreted indefinitely. The pencil button lets you modify a request directly: the exchange is replaced, not stacked.

Model selection has a direct impact on consumption. Opus is the most capable model in the lineup, but also the most resource-intensive. For a spelling correction, reformatting, or a simple factual question, Sonnet delivers very similar results at a much lower cost. Haiku is even more economical for short, repetitive tasks. Reserving Opus for tasks that genuinely require deep reasoning keeps the available budget for what actually needs it.

Two practices are worth building into long-term workflows. Claude’s Projects feature solves the recurring document problem: a file uploaded once to a Project is cached and stays available across all conversations in the Project without consuming tokens again. And for files that need analysis, converting PDFs to plain text before uploading significantly cuts consumption: Claude otherwise extracts the text and converts each page to an image for visual analysis, two operations instead of one.

Follow the story on Horizon.

Post Views: 38

Claude: how to stop wasting your tokens

Key Takeaways

The hidden cost of every message

Two counters, two logics

The habits that actually make a difference

Comments

Leave a Reply Cancel reply