10 Context Management Techniques That Actually Work

May 20, 2026

Good morning!

After a (long) 2-week vacation, here’s finally an exciting new video for you!

Okay, as you may know, this wasn’t only a vacation. I actually gave a talk at the amazing UphillConf! It went extremely well, even considering how sick I was with the flu. Still, I’m not saying that to share a success with you, but to remind you that, if you weren’t in Switzerland on May 8th, the talk (and many others to come) is available to all my paid subscribers, if you are up for (1) supporting my work and (2) getting even more AI engineering content! Check the conference talk here: https://louisbouchard.substack.com/p/a-private-look-at-my-next-ai-agents

Now, back to this week’s iteration.

We can now paste our entire codebase into Claude, a two-hour meeting transcript into Gemini, or every project note you have into GPT… and somehow get worse answers.

That’s the frustrating reality of recent long-context LLMs.

We’ve been told that bigger context windows solve everything. One million tokens sounds like magic: just give the model all the information, and it should figure things out. But in practice, the opposite often happens. The model misses key details, contradicts itself, forgets decisions you made earlier, and quietly becomes less reliable while your API bill keeps climbing.

The problem is not just cost. It’s context quality.

In this week’s video, I break down why long context windows can be misleading, why models tend to overuse the beginning and end of a prompt while underusing the middle, and why serious AI agents like Claude Code, Codex-style systems, and production agent workflows rely on context compaction instead of blindly stuffing everything into the prompt.

I walk through 10 practical context compaction techniques every AI engineer should know: from simple trimming and source-level filtering, to semantic summarization, retrieval-based memory, knowledge graphs, learned prompt compression, multi-tier memory, and agentic context engineering.

More importantly, I explain when to use each one, when not to, and the order I recommend applying them in real projects to reduce cost, improve reliability, and prevent agents from falling apart after a few turns.

If you’re building with LLMs, using AI coding agents, or trying to make agents work beyond a short demo, this is one of the most important engineering skills to learn.

Long context is not enough. You need context management.

Watch the full video here:

Discussion about this post

Ready for more?