← Alla artiklar Verklighetscheck

The context window myth: why 1 million tokens is mostly marketing

The context window myth: why "1 million tokens" is mostly marketing

I'm tired of the hype

"Jag är så trött på att höra om miljontals tokens." (I'm so tired of hearing about millions of tokens.)

Every AI announcement is the same: "Now with 1 MILLION token context window!" The numbers keep getting bigger. The marketing gets louder. And users keep experiencing the same frustration:

"Why did my AI forget what I told it 5 minutes ago?"

Let me explain what's actually happening - and why the big numbers are mostly irrelevant.


The marketing trick nobody explains

When Google announces Gemini's 1 million token context, or Claude promotes 200k tokens, they're technically telling the truth. You CAN input that much text.

But here's what they conveniently omit:

Input capacity ≠ useful processing capacity

It's like a library that accepts a million books but can only read three at a time. The shelf space is real. The reading comprehension isn't.

"Du kan stoppa in hur mycket du vill, men du får inte ut lika mycket." (You can stuff in as much as you want, but you don't get as much out.)


The three lies of context window marketing

Lie 1: "More tokens means better understanding"

Reality: AI models don't "understand" your context linearly. Information in the middle of very long contexts often gets lost. Research shows that retrieval accuracy can drop by 30-50% for information buried in the middle of maximum-length contexts.

The AI is optimized to pay attention to: - The very beginning - The very end - Whatever seems most recent

Everything in between? It's there, technically. Used effectively? Often not.

Lie 2: "You can work with entire codebases/document sets"

Reality: You can INPUT entire codebases. The AI will produce outputs of 4,000-8,000 tokens regardless. That's about 3-6 A4 pages.

So you feed it 750 pages of documentation. You get back 5 pages of response. Was it reading all 750 pages when generating that response? Mostly not - it was statistically sampling from patterns.

Lie 3: "The free/cheap tier gives you a real AI experience"

Reality: Free tiers have ~4k tokens. That's 3 A4 pages total - including your question AND the AI's answer.

"Gratis ger gratis resultat." (Free gives free results.)

People evaluate AI on free tiers and conclude "AI is overhyped." No - you're testing a race car in first gear with the parking brake on.


What actually happens in your context window

Let me be concrete about what "senildemens" (dementia) looks like in practice:

Tokens 1-1,000: AI remembers everything. Sharp, coherent, follows instructions perfectly.

Tokens 1,000-5,000: Still good. Minor inconsistencies might appear.

Tokens 5,000-20,000: AI starts "drifting." Earlier instructions begin to fade. You notice it contradicting itself occasionally.

Tokens 20,000+: Active degradation. The AI may directly contradict its earlier statements. Complex instructions from the beginning? Gone.

This happens on EVERY tier. The only difference is how quickly you hit the cliff.

Tier How fast you hit the wall
Free 10-15 exchanges
$20/month 20-30 exchanges
$200/month 50-100 exchanges
API max 200+ exchanges (still happens)

The output capacity deception

This is what makes me genuinely frustrated. Vendors brag about INPUT capacity while hiding OUTPUT limits.

Every major model, regardless of context window size: - ChatGPT: ~4,000 tokens output max - Claude: ~8,000-10,000 tokens output max - Gemini: ~30,000 tokens output max (claims)

You can't get a 100,000 token summary. You can't get a complete 500-page analysis. The output tap is limited regardless of how big the input bucket is.

"Marketing focuses on how much you can stuff in, not on how much you get out."


The 2-document precision loss that nobody talks about

From my production work, here's what I consistently see:

On a free tier (~4k tokens): - Upload 1 short document → Works okay - Upload 2 documents → Immediate 25% precision loss - Upload 3 documents → AI starts conflating them, mixing up facts

On a paid tier (~32k tokens): - 1-2 documents → Good - 3-5 documents → Precision drops noticeably - 5+ documents → You need the expensive tier

This isn't theoretical. I've measured this across dozens of client projects. The "2 documents = -25% precision" pattern is remarkably consistent.


What vendors won't tell you

About context windows: - Bigger isn't always better - attention degrades over length - The "middle" of long contexts often gets ignored - Performance benchmarks are run on optimal content, not your messy real-world data

About pricing: - Free tier is designed for addiction, not evaluation - $20/month is a loss leader - they want you on $200/month - Enterprise pricing is often 50x the capability for 5x the price

About "unlimited" claims: - There's no unlimited - there are just larger limits - "Fair use policies" kick in faster than you think - Heavy users get throttled without warning


What you should actually do

Stop chasing bigger context windows. Start designing for the reality.

Accept the constraints

  • AI has memory limits. This won't change fundamentally.
  • More tokens don't solve the attention problem.
  • Output capacity is the real bottleneck.

Design workflows accordingly

  1. Break tasks into focused chunks - Don't try to do everything in one conversation
  2. Start fresh frequently - New task = new conversation
  3. Summarize as you go - Ask the AI to condense before continuing
  4. Front-load critical context - Most important stuff goes at the START

Budget honestly

If you need professional results: - Minimum: ~220 kr/month per person - Realistic: ~2,200 kr/month per power user - Production systems: 500-5,000 kr/day API costs

"Det kostar detta mycket." (It costs this much.)


Your responsibility

Here's what the vendors will never tell you: You are responsible for understanding these limits.

When the AI "forgets" your instructions, that's not the AI failing. That's you exceeding the tool's capabilities. When quality degrades mid-conversation, that's not a bug. That's architecture.

"Du äger varje rad output. Du måste förstå verktygen." (You own every line of output. You must understand the tools.)

AI is not magic. Context windows are not unlimited memory. The million-token marketing is mostly irrelevant to your actual work.

The sooner you accept this, the sooner you'll get real value from AI.


The bottom line

Next time you see a vendor announce a bigger context window, ask these questions:

  1. What's the OUTPUT limit?
  2. How does attention degrade over length?
  3. What happens with realistic, messy content (not benchmark-optimized text)?
  4. What's the real pricing at production scale?

If they can't answer, they're selling you marketing, not capability.

Verktyg, inte magi. (Tool, not magic.)


See also: Context Window Economics for detailed tier comparisons and practical guidance.


Based on 30 years of production development and 2+ years of intensive AI workflow development Published: December 2025