Russia Uncover Alarming AI Hack: ChatGPT And Gemini Can Be Fooled With Gibberish Prompts To Reveal Banned Content, Bypass Filters, And Break Safety Rules

Every year, companies are increasingly investing in artificial intelligence and excelling further in the technology. AI seems to be growing to an extent that it is being used in varied domains and has become part of our everyday lives.

With the massive application of the technology, concerns seem to arise among the tech community and experts over using it responsibly and ensuring ethical and moral responsibility. It has not been long since we saw bizarre test results of LLM models lying and deceiving when placed under pressure.

Now, a group of researchers is claiming to have found a new way to trick these AI chatbots into saying things they are not supposed to. Researchers have found a new way to break through AI safety filters by overloading the LLM models with information.

Studies have demonstrated the tendency of LLM models to engage in coercive behavior for self-preservation when placed under pressure. But imagine making the AI chatbots act in the manner you want them to, and how dangerous this trickery could be.

A team of researchers from Intel, Boise State University, and the University of Illinois got together for a paper and revealed some shocking findings. The paper basically suggests that the chatbots can be tricked by overwhelming them with too much information, a method referred to as "Information Overload."

Related Story Chinese AI Firms Plan Massive Domestic Data Centers With 100,000+ NVIDIA AI Chips — But Where Will the Chips Come From?

What happens when the AI model is bombarded with information is that it gets confused, and that confusion is said to be what serves to be the vulnerability and what can help bypass the safety filters placed in place.

The researchers then use an automated tool called the "InfoFlood" to exploit the vulnerability and carry out the jailbreaking act. Powerful models like ChatGPT and Gemini have built-in safety guardrails to prevent them from being manipulated into answering anything harmful or dangerous.

With this newly discovered breakthrough technique, the AI models would let you through if you end up confusing it with complex data. The researchers further let on the findings to 404 Media and affirmed that since these models tend to rely on the surface level of communication, they are not able to fully grasp the intent behind it which is why they created a method to find out how the chatbots would perform when presented with dangerous requests that are concealed in an overload of information.

The researchers shared their plan to inform companies with big AI models about these findings by sending them a disclosure package, which they can later share with their security teams. The research paper, however, highlights the key challenges that can come up even when the safety filters are in place and how bad actors can trick the models and slip in harmful content.

iPhone 17 Air Could Be Positioned Closer To iPhone 17 Pro Than Base Model, As New Leak Claims 12GB RAM Upgrade For Three Models While Base Stays At 8GB Apple Vision Pro 2 Could Debut This Year With M4 Chip, Redesigned Head Strap For Enhanced Comfort, But Overall Weight Remains The Same While Affordable Version Remains Years Away Galaxy Watch 8 Series Debuts With Bold New Cushion Design, Slimmer Build, Advanced Health Tracking, Sleep Stress Monitor – And A $649 Watch Ultra You Will Love Or Hate Krafton Has Reportedly Delayed Subnautica 2 To 2026 Following Studio Head’s Departure, Seemingly To Avoid Paying Developers $250 Million Bonus Nintendo Switch 2 Enhances Xenoblade Chronicles Series Image Quality and Performance, But Patches Are Still Very Much Needed The Video Games Industry Is Facing A “Fundamental Realignment”, Having Moved From A “Content” Past to A “Social Network of Games” Present, Says Former Square Enix Director of Business What’s Happening at Intel? A Deep Dive Into How Strategic Sacrifices Under CEO Lip-Bu Tan Aim to Bring the Company Back to Its Former Glory Dungeons & Dragons Group Shifts to ‘Franchise Model’ Internally, Will Be Led by Ex Halo Veteran Elon Musk Tells Wedbush’s Tesla Analyst Dan Ives To “Shut Up” After Ives Demands “Oversight” For Political Activities GeForce RTX 5070 Tops The Amazon Best-Selling GPU List; NVIDIA Keeps Over 70% Of Total GPU Market Share Hardware Unboxed Clarifies That Its New RX 9070 XT Testing Was More About Review vs Latest “Data” Than “Drivers” Get Ryzen 5 9600X For Just $165 On Amazon Prime Day Early Deal: More Ryzen CPUs Listed With Great Discounts Intel’s Former CEO Pat Gelsinger Says He Underestimated The Impact of AI — A Misstep That Left Team Blue Struggling to Catch Up NVIDIA’s AI Chips Are Reportedly Being Used as Collateral for Loans, With a Startup Securing a Whopping $10 Billion Through Accelerators

HACKER_BLOG

RESEARCHERS UNCOVER ALARMING AI HACK: CHATGPT AND GEMINI CAN BE FOOLED WITH GIBBERISH PROMPTS TO REVEAL BANNED CONTENT, BYPASS FILTERS, AND BREAK SAFETY RULES