How to decide between CAG and RAG?

A technical guide to choosing the right approach

Jul 04, 2025

Good morning everyone! A few weeks ago, I partnered with my friend

to finally settle it between RAG and CAG.

I’ve already made a stand-alone deep dive on RAG, another one on CAG, and a comparison of both, and I thought that was enough acronyms for one lifetime.

Then, along with my friend Miguel, the amazing human being behind The Neural Maze, we decided to do a guest post comparing the two.

This time, in a practical format. Not just recommendations and theory, but an actual comparison (with code!)...

But first, we have a very cool sponsor for the newsletter this week: AI Video Cut!

1️⃣ (sponsor) Image might be a carousel made of the 4-step images you sent

Editing my podcast interviews into punchy reels used to eat an afternoon for my editor and even more if I did it.

Now I can just drop the podcast link into AI Video Cut and let the usual AI magic happen.

Their new in-browser editor lets you edit transcripts, trim video segments, and customize captions with just a few clicks, and it spits out vertical, horizontal or square for all platforms from clips up to 50 minutes for free.

Paste a URL, sip your coffee, and download ready-to-post Shorts. Click the link below to try it now and level-up your content workflow!

Try AI Video Cut now and level-up your content workflow:

https://www.aivideocut.com/?utm_source=WhatsAI&utm_medium=video&utm_campaign=youtube

2️⃣ RAG vs CAG - A hands-on technical breakdown to choosing the right approach

First, a one-breath recap for anyone who wandered in from cat-video YouTube:

RAG turns every user question into some sort of scavenger hunt. Your query becomes an embedding, that embedding searches into a vector database, and out pops the top passages, and those passages get glued to the prompt before the LLM even thinks about answering. It’s flexible, it’s modular. It’s super convenient to focus on the information the model actually needs.
With CAG: You dump all your documents into the model once, capture every Key and Value tensor the transformer emits—that’s the KV cache—and reuse those tensors forever. No more retrieval trip per query, no more “hold on, I’m searching.” It basically pre-processes all the information to have it ready to compute in its cache. The model is basically living in a perpetual open-book exam.

You can learn more about RAG here and CAG here.

Once ready, let's jump in and finally settle when to use which!

So, grab a coffee, mute your Slack or Discord, and let’s talk about what really happens when Cache-Augmented Generation and Retrieval-Augmented Generation step into the same octagon...

P.S. The code and article links are in the video description!

Thanks again

Miguel Otero Pedrido

for working with us on this great piece!

And that's it for this iteration! I'm incredibly grateful that the What's AI newsletter is now read by over 30,000 incredible human beings. Click here to share this iteration with a friend if you learned something new!

Looking for more cool AI stuff? 👇

Looking for AI news, code, learning resources, papers, memes, and more? Follow our weekly newsletter at Towards AI!
Looking to connect with other AI enthusiasts? Join the Discord community: Learn AI Together!

Want to share a product, event or course with my AI community? Reply directly to this email, or visit my Passionfroot profile to see my offers.

Thank you for reading, and I wish you a fantastic week! Be sure to have enough sleep and physical activities next week!

Louis-François Bouchard

How to decide between CAG and RAG?

A technical guide to choosing the right approach

1️⃣ (sponsor) Image might be a carousel made of the 4-step images you sent

2️⃣ RAG vs CAG - A hands-on technical breakdown to choosing the right approach

Discussion about this post