Mixture of Experts explained simply

Mixtral MoE Explained

Louis-François Bouchard

Apr 16, 2024

This iteration is sponsored by my friends at Qdrant!

Good morning everyone!

What you know about Mixture of Experts (MoE) is wrong.

We are not using this technique because each "model" is an expert on a specific topic.

In fact, each of these so-called experts is not an individual model but something much simpler...

So, MoEs are not experts or even models... What are they, and why does it work?

Let's dive into MoEs and discover why it is so powerful!

But first, here's a new, very cool product for all of us experimenting and building RAG-based applications!

1️⃣ Qdrant Hybrid Cloud: The First Managed Vector Database You Can Run Anywhere with Unmatched Flexibility and Control (Sponsor)

Qdrant, the leading open-source vector database, today announced Qdrant Hybrid Cloud, a groundbreaking managed service for deployment across cloud, on-premises, or edge settings. Built on a Kubernetes-native architecture, it offers flexibility in setup and ensures full database isolation, enhancing data privacy in AI. This allows developers like us to choose where to process vector search workloads, making it easier to work on RAG-based applications, advanced semantic search, or recommendation systems in a data-driven world. Read more about Qdrant Hybrid Cloud in Qdrant’s official release announcement.

2️⃣ Mixture of Experts: Mixtral 8x7B Dive in

Thanks to Jensen, we can now assume that the rumour of GPT-4 having 1.8 trillion parameters is true…

1.8 trillion is 1,800 billion, which is 1.8 million million. If we could find someone to process each of these parameters in a second, which would basically be to ask you to do a complex multiplication with values like these, it would take them 57,000 years! Again, assuming you can do that in a second. If we do this all together, calculating one parameter per second with 8 billion people, we could achieve this in 2.6 days. Yet, transformer models do this in milliseconds.

This is thanks to a lot of engineering, including what we call a “mixture of experts,” where we supposedly have eight smaller models put together to reach this ginormous single model. But do we? Learn more in the article here or the video:

And that's it for this iteration! I'm incredibly grateful that the What's AI newsletter is now read by over 16,000 incredible human beings. Click here to share this iteration with a friend if you learned something new!

Looking for more cool AI stuff? 👇

Looking for AI news, code, learning resources, papers, memes, and more? Follow our weekly newsletter at Towards AI!
Looking to connect with other AI enthusiasts? Join the Discord community: Learn AI Together!

Want to share a product, event or course with my AI community? Reply directly to this email, or visit my Passionfroot profile to see my offers.

Thank you for reading, and I wish you a fantastic week! Be sure to have enough sleep and physical activities next week!

Louis-François Bouchard

Mixture of Experts explained simply

Mixtral MoE Explained

1️⃣ Qdrant Hybrid Cloud: The First Managed Vector Database You Can Run Anywhere with Unmatched Flexibility and Control (Sponsor)

2️⃣ Mixture of Experts: Mixtral 8x7B Dive in

Discussion about this post