Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs

Special iteration for a new paper of ours!

Nov 16, 2023

Good morning fellow AI enthusiast! This is a special iteration for a new paper we just published with Learn Prompting, Towards AI, Sander Schulhoff, Jeremy Pinto, Anaum Khan, Chenglei S., Jordan Boyd-Graber, and the winners of the HackAPrompt competition... and we made a video about it!

Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs

In this new paper, we explored the vulnerability of Large Language Models (LLMs) to prompt hacking. by hosting a global-scale competition, where models are manipulated to follow malicious instructions. Prompt hacking is when a user, like you and me, tricks an AI like ChatGPT into saying or doing not intended "bad things", like generating hate speech or anything you see where ChatGPT usually answers, "I cannot talk about this topic".

With 2800+ participants from 50+ countries, we gathered over 600K+ adversarial prompts against three popular LLMs (ChatGPT, GPT-3, and FLAN), confirming their susceptibility to such attacks and categorizing these adversarial prompts into a comprehensive taxonomical ontology.

Here's the video demo we made about the paper. We also have a project page with a new prompt-hacking dataset and the paper if you'd like to read it.

Video demo for the EMNLP 2023 paper "Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition" by Sander Schulhoff and Jeremy Pinto and Anaum Khan and Louis-François Bouchard and Chenglei Si and Svetlina Anati and Valen Tagliabue and Anson Liu Kost and Christopher Carnahan and Jordan Boyd-Graber.

And that's it for this iteration! I'm incredibly grateful that the What's AI newsletter is now read by over 13,000+ incredible human beings and counting. Share this iteration with a friend if you learned something new!

Looking for more cool AI stuff? 👇

Looking for AI news, code, learning resources, papers, memes, and more? Follow our newsletter at Towards AI, which is going out weekly!
Looking for other AI enthusiasts? Join my Discord community: Learn AI Together!
If you need more content to go through your week, check out the podcast!
Please reach out with any questions or details on sponsorships, or visit my Passionfroot profile to see the sponsorship offers.

Thank you for reading, and we wish you a fantastic week! Be sure to have enough rest and sleep!

Louis

Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs

Special iteration for a new paper of ours!

Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs

Discussion about this post