GPT-4 Built this new multimodal model!!

The First General-Purpose Visual and Language AI: LLaVA

Sep 03, 2023

Good morning fellow AI enthusiast! This week's iteration focuses on a very hot GitHub repository and research called LLaVA, an end-to-end large multimodal model that connects a vision encoder and LLM for general-purpose visual and language understanding.

GPT-4 is powerful, but did you know that some AIs are built entirely thanks to it? Yes, GPT-4 is so good that it can be used to generate good enough data to train other AI models. And not any model but better models than itself!

Liu et al. just used GPT-4 to create a general-purpose language vision model called LLaVA, the first general-purpose model that understands and follows visual and language-based instructions. Yes, they didn't use GPT-4 as the base model, but to train their model! As we will see in the video, GPT-4 was used to generate a large and high-quality dataset to train a new model that understands images. Oh and obviously it not only understands images but also text (there's the multimodality), which means it can answer a wide variety of questions about them! Learn more in the full article or in the video...

We are incredibly grateful that the newsletter is now read by over 12'000+ incredible human beings counting our email list and LinkedIn subscribers. Reach out to contact@louisbouchard.ai with any questions or details on sponsorships or visit my Passionfroot profile. Follow our newsletter at Towards AI, sharing the most exciting news, learning resources, articles, and memes from our Discord community weekly.

If you need more content to go through your week, check out the podcast!

Thank you for reading, and we wish you a fantastic week! Be sure to have enough rest and sleep!

Louis

GPT-4 Built this new multimodal model!!

The First General-Purpose Visual and Language AI: LLaVA

Discussion about this post