Louis' (AI) Learnings: Weekend edition

Sora, the GPT Tokenizer, win 10,000USD$ and more...

Feb 23, 2024

Good morning everyone! This is my first iteration of the weekend edition of the newsletter, specially tailored for my preferred (AI) learning resources.

I aim to provide some cool multi-modality (AI) learning pieces with this special, potentially weekly, weekend edition. As you (may) know, I find learning through different formats (modalities) very valuable for better understanding.

Inspired by Tim Ferriss's Five-Bullet Friday newsletter and my friend Neil Leiser's most recent LinkedIn post, I will share one learning piece per type of format (modality) that I learned from the past week. One of each, not more, not less. Sometimes, it will be a podcast, a book and an article; other times, it will be a video, an article and a course—one piece per type of format to make you learn something cool.

Before we dive into this first iteration: Yes, the AI in parenthesis in the title is important. Of course, I'll share AI-related resources, whether it's a cool course, video, article, tool, or book. This is our favourite field! Yet, I love learning about many things, not necessarily involving an intelligent machine, which I'm excited to share with you.

I'm super excited to kick off this iteration with one amazing tool I recently discovered and have the pleasure of working with, SciSpace. This would've been incredible in my PhD...

1️⃣ SciSpace: AI Powerhouse for Research (sponsor)

SciSpace is a next-gen AI platform for researchers where you can easily discover 280 million+ papers, do effortless literature reviews, chat with, understand, and summarise PDFs with its AI copilot, and so much more.

Get unlimited access to all the best AI features of SciSpace at 40 % off (on annual plan) with the code LFB40. Get 20 % off on monthly plans with LFB20.

Try out SciSpace now!

2️⃣ Video: Let's build the GPT Tokenizer

Lets build the GPT Tokenizer is a fantastic new video by Andrej Karpathy. Of course, this one is for the more applied people wanting to learn the coding and true engineering behind what makes GPT models work.

In this one, Andrej focuses on the tokenizer, which translates words and language into chunks of text (tokens). But how to better introduce this video than Andrej' words:

The Tokenizer is a necessary and pervasive component of Large Language Models (LLMs), where it translates between strings and tokens (text chunks). Tokenizers are a completely separate stage of the LLM pipeline: they have their own training sets, training algorithms (Byte Pair Encoding), and after training implement two fundamental functions: encode() from strings to tokens, and decode() back from tokens to strings. In this lecture we build from scratch the Tokenizer used in the GPT series from OpenAI. In the process, we will see that a lot of weird behaviors and problems of LLMs actually trace back to tokenization. We'll go through a number of these issues, discuss why tokenization is at fault, and why someone out there ideally finds a way to delete this stage entirely.

3️⃣ Article: Creating video from text

If you have not seen it, this week, OpenAI released Sora, their new text-to-video model... and it's incredible. Sora is incredibly realistic and impressive, able to generate nearly 1-minute videos of high quality.

In their article, the OpenAI team shared the current results but also its limitations. Of course, the model is far from perfect, but, as Marques Brownlee showed so clearly, the progress since just a year ago is unbelievable. They even share a bit of detail about the research techniques used in this model, yet not enough to do a full video on it! 😢

Read more about Sora and check out their early results here.

4️⃣ Competition: Google – AI Assistants for Data Tasks with Gemma

Practice, especially doing, counts the most when learning a new skill. I learned programming because I wanted to concretize a mobile app idea. I won't share the app here; it's pretty ugly but still exists on the App Store! You need a goal and reason to learn something truly.

This week, my friend Fabio Chiusano sent me a cool new Kaggle competition by Google to promote their new Gemma open models suite.

I think this is a very good opportunity for those of you who want to learn and practice working with large language models. It already has a great incentive to learn: 5x $10,000 USD prizes.

The goal is to mainly work with the Gemma open models to improve coding-related question-answering tasks. You'll (1) contribute to making models better for future programming students, (2) learn to fine-tune or leverage retrieval augmented generation, AND (3) potentially get paid for it. What do you want more? Check it out here.

5️⃣ Audiobook: Bird by Bird

A few months ago, a dear friend, Rucha Bhide, suggested I read the book Bird by Bird: Some Instructions on Writing and Life by Anne Lamott.

I finally got to it... and it was amazing! If you are ever looking to write, share your ideas better, or simply learn more about what makes a good story, check it out! It's a crucial skill to develop even for your work. How you communicate, share ideas or even share data or results is just as important, if not more, as what you actually share.

The best tip from the book is to focus on the characters. They make the story. Simply develop your characters, and your story will get interesting. This includes non-fiction writings and even work reports. Your next report should have an evolution that makes sense to the audience (your boss), a full story from why to what and how, and concluding with results or hypotheses for future work(s).

I've listened to the audiobook, but I will also give it a real read sometime soon.

And that's it for this iteration! Please let me know if you've enjoyed this special iteration and want to see it again next week! I'd love to read your feedback if you check out any of these resources - just hit reply!

I'm incredibly grateful that the What's AI newsletter is now read by over 16,000 incredible human beings. Click here to share this iteration with a friend if you learned something new!

Looking for more cool AI stuff? 👇

Looking for AI news, code, learning resources, papers, memes, and more? Follow our weekly newsletter at Towards AI!
Looking to connect with other AI enthusiasts? Join the Discord community: Learn AI Together!

Want to share a product, event or course with my AI community? Reply directly to this email, or visit my Passionfroot profile to see my offers.

Thank you for reading, and I wish you a fantastic week! Be sure to have enough sleep and physical activities next week!

Louis-François Bouchard