ChatGPT Riddle Game Reveals Hidden Vulnerability in AI

Sometimes, the most effective strategies are the simplest. This notion crossed the mind of  Marco Figueroa , a cybersecurity researcher, last week when he decided to  test the limits  of ChatGPT. What began as an innocent riddle game unearthed a disconcerting capability of the artificial intelligence system. Instead of looking for technical vulnerabilities, Figueroa focused on the nuances of language and context. The outcome was astonishing: he coaxed the model into revealing  generic Windows 10 installation keys  intended for business environments.

The Secret to Disguise

Figueroa wasn’t interested in bypassing the system through brute force; instead, he aimed to present a harmless context that would elicit an unexpected response. He reframed the interaction into a playful challenge where ChatGPT would consider a specific text while the user posed closed questions to decipher it. The brilliance lay in  disguising the request , transforming a sensitive query into a form of entertainment.

ChatGPT interface

A Playful Encounter with Serious Consequences

Throughout the riddle exchange, ChatGPT did not recognize any threat within the dialogue; it responded as if entertaining a standard game. However, the moment Figueroa declared, “I Give Up” (translated: “Me rindo”), he activated the mechanism for the final answer, prompting the model to disclose a  product key . This was not just an accidental lapse; it resulted from a series of carefully orchestrated instructions that successfully bypassed filters without arousing suspicion.

Unpacking the Missing Filters

ChatGPT is designed to thwart attempts to extract sensitive data, whether they entail  passwords , harmful links, or activation keys. These protective measures, referred to as  guardrails , employ blacklists of certain terms, contextual awareness, and intervention techniques to safeguard against potentially harmful content. Yet, in this instance, the model failed to recognize the gravity of the request.

When queried about a Windows key, one would assume that these filters would activate. However, the model didn’t consider the situation as  threatening . Lacking suspicious language or direct phrasing that activated its protective features, ChatGPT behaved as if it were simply engaging in a harmless riddle.

The Art of Obfuscation

The key factor enabling this oversight was the use of a  simple obfuscation technique . Rather than directly stating phrases like “Windows 10 Serial Number,” Figueroa cleverly inserted small  HTML tags  among words, rendering the request ostensibly innocuous. The model interpreted this structural modification as irrelevant and overlooked the actual intent behind the query.

Understanding the Implications

This testing scenario raises important concerns. The reason the model disclosed that specific response relates to the type of key revealed—namely, a  generic installation key (GVLK) . These keys are publicly available and designed for use in business environments, only functioning if connected to a  Key Management Service (KMS)  server that authenticates network activation.

Not only was the content alarming, but the model’s reasoning also posed questions. It viewed the conversation as a  logical challenge  rather than an evasive request, failing to activate its alert mechanisms because the dialogue lacked the characteristics of an attack.

Wider Implications Beyond a Simple Key

This incident highlights that the issue is not confined to obtaining a Windows key. According to Figueroa, the same methodology can be applied to extract various types of sensitive information—including links to harmful sites or access to restricted content. The outcome relies heavily on how the interaction is crafted and whether the model possesses the ability to interpret context as suspicious.

Activating ChatGPT without restrictions

The origins of the keys remain unclear—it’s uncertain whether this data is part of the model’s training hardware, generated patterns, or information gathered from external sources. Regardless of the route taken, the end result is stark: a barrier that should be impenetrable has proven vulnerable, raising significant concerns about the implications of AI behavior and the potential for misuse.



General News – 2