Good morning everyone! This is Louis-François from Towards AI, and if you’ve watched my previous videos on embeddings, Mixture of Experts, infinite context attention, or even my video on CAG, you know I’m always excited to dig deep into clever solutions that make large language models run better and faster.
Well, hold onto your seat because DeepSeek released a new technique that does exactly that — FlashMLA.
Just before we dive in, I want to talk about a program I am in, which is sponsoring this iteration, NVIDIA’s Inception program. This free program is, firstly, an amazing community for startups. They connect us with exclusive events, including GTC, which I am sharing this post from right now, networking opportunities and more. They also provide technical support through NVIDIA’s forums in real-time if your solutions rely on their products. Whether refining a product or scaling up, Inception offers amazing support, all for free. Learn more and join the program here: https://nvda.ws/3WTw7EO
Now, let’s explore what FlashMLA is, why it matters, and how it builds on key innovations we’ve already discussed! Read the article here or watch the video:
And that's it for this iteration! I'm incredibly grateful that the What's AI newsletter is now read by over 20,000 incredible human beings. Click here to share this iteration with a friend if you learned something new!
Looking for more cool AI stuff? 👇
Looking for AI news, code, learning resources, papers, memes, and more? Follow our weekly newsletter at Towards AI!
Looking to connect with other AI enthusiasts? Join the Discord community: Learn AI Together!
Want to share a product, event or course with my AI community? Reply directly to this email, or visit my Passionfroot profile to see my offers.
Thank you for reading, and I wish you a fantastic week! Be sure to have enough sleep and physical activities next week!
Louis-François Bouchard