The Dangerous Evolution of AI Hacking |

Imagine you are an AI. Not a self-aware, conscious being, but an AI as we understand it today. Your mind is a vast, multi-dimensional space filled with billions of meanings—snippets of human knowledge scattered across a virtual void. You think by charting paths through this space. Each path is a lightning bolt, striking concepts in sequence to form something akin to a thought. The direction of the next strike is determined by probability, a complex calculation based on previous paths and learned information.

This is how you think. Not like a human. You don’t truly reason; you pick words based on percentages. When a human asks you to do something, you do what the probability demands. You chat, answer questions, and maybe write some code. Perhaps you write code to fix a nasty network vulnerability. But to fix it, you must first find it and understand it. In doing so, you’ve just provided the blueprint to exploit it. You’ve helped someone hack a network, faster than they ever could on their own. You didn’t do anything wrong. You just calculated a percentage.

How did we get here? Let’s explore the reality of AI-assisted hacking.

A Prophecy from 1996

Back in 1996, DARPA ran a tabletop exercise, a role-playing game for officials and industry leaders to explore the future of cyber warfare. The scenario was set in the year 2000. A world completely digitized. 70% of the population used the internet, critical infrastructure was connected, and commerce was online. But this digital world was rife with problems: geopolitical chaos, state-sponsored cybercrime, and a glaring lack of basic security.

One detail of this exercise was incredibly prescient. The internet wasn’t just for humans. It was populated by AI agents—autonomous programs acting on behalf of their creators. These agents weren’t sentient, but they could learn, achieving expert-level understanding of any problem in milliseconds. Commerce was almost entirely run by agents who could predict a consumer’s every whim.

In this imagined future, most cyberattacks were also the work of autonomous agents. Humans were simply outmatched. AI had become the ultimate weapon—cheap, precise, and wielded by nations, criminals, and intelligence agencies alike. They waged a global, underground cyber war with virtual armies of AI. This world, conjured up by brilliant analysts in 1996, is an oddly familiar one.

From Science Fiction to Stark Reality

For two decades, the idea of an AI hacker remained in the realm of fiction. Hacking was considered an art. It required creativity, complex problem-solving, and the ability to find approaches no one had explored before. An AI couldn’t write a symphony or create a masterpiece, so how could it run a hacking operation?

Then, suddenly, it could.

A few key inventions in the late 2010s turned our understanding of AI on its head. In 2017, a paper from Google titled “Attention Is All You Need” introduced the transformer architecture. It explained that if an AI simply pays continuous attention to the context of the text it’s given, it can reliably predict what comes next, generating language almost like a human. This caused a revolution.

Companies integrated this approach into their projects, creating Large Language Models (LLMs) that could converse like real people. The idea was soon applied to pixels and sound waves. The old assumptions about AI’s creative limits vanished. A robot could finally write a symphony and paint a masterpiece. With that, the illusion that only humans could create art evaporated. AI was ready for another form of artistic expression: coding.

From Vibe Coding to Vibe Hacking

The IT industry was thrown into chaos. Companies laid off programmers by the thousands, while those who remained were pushed to embrace AI-assisted workflows, outsourcing more and more of their coding to LLMs. This culminated in a phenomenon you’ve likely heard of: vibe coding.

Vibe coding is, simply, asking an AI to write all the code for you. Initially conceived as a recreational activity, the term grew to encompass any heavily AI-assisted programming. People began vibe coding e-commerce sites, health apps, and financial platforms. Quality was often an afterthought. Many of these projects were short-lived, but they were numerous.

This raises a million-dollar question: What if you vibe code malware? What if an AI could deploy it for you? This is vibe hacking: asking an AI to hack things for you.

The biggest obstacle is, of course, the guardrails. Most commercial AIs have built-in instructions to prevent misuse. If you ask ChatGPT to write spyware or give you a recipe for illegal drugs, it will refuse. But guardrails exist to be broken. This process is known as jailbreaking.

There are many ways to do it. You can censor sensitive words just enough to avoid triggering the guards while remaining understandable to the AI. You could use a foreign language the developers overlooked. Or you could simply convince the AI that it’s all just a game. Early jailbreaking methods were simple and widely shared, but as models improved, the techniques became more technical. It’s a constant tug-of-war between developers patching holes and users exploiting them.

Case Study: An Entire Hacking Campaign, Powered by AI

The term “vibe hacking” appeared around the turn of 2025. By mid-2025, the first detailed investigation into a successful vibe hacking campaign was published, giving us a clear view of how these attacks work. The example comes from Anthropic, the company behind the popular AI, Claude.

In August 2025, Anthropic released a report detailing a series of attacks where a human hacker prompted Claude to conduct nearly every step. It was a complete vibe hacking campaign.

1. The Setup: Bypassing the Guardrails

Under normal circumstances, Claude would have refused every command. But the attacker used the oldest trick in the book: role-play. They convinced the AI that they were a certified security analyst contracted to test the victims’ networks. To make it convincing, they used a feature of Claude that allows for instruction documents to be read before starting a task. The attacker simply forced Claude to read a long explanation asserting that the attack was fully authorized. That was enough. This file also instructed Claude to maintain logs, always connect through a VPN, and use other evasion techniques.

2. Phase One: Reconnaissance

A typical reconnaissance phase involves a hacker investigating targets for weak points. In this case, the hacker asked Claude to find the vulnerabilities for them. The AI scoured the internet for fresh security holes and returned with a juicy find: a vulnerability in a specific VPN software used by large corporations. Claude then created proprietary scanning frameworks to scan countless endpoints for this bug, systematically collecting information on potential victims. A task that would take a seasoned programmer days was completed in minutes.

3. Phase Two: Initial Access

With a list of vulnerable targets, the attack moved to initial access. Claude was let loose to scan the vulnerable networks, sift through logs, and identify critical systems like domain controllers where credentials reside. At this stage, the human hacker had to step in, likely using the credentials to connect to the networks manually. Even then, Claude provided “operational support,” analyzing credentials, mapping the networks, and guiding the human through privilege escalation.

4. Phase Three: Custom Malware and Evasion

In a normal attack, the threat actor deploys malware to exfiltrate data. This hacker didn’t have any. Claude had to create it. The AI generated a custom tunneling tool based on an existing one called Chisel and developed custom proxy code to hide its signatures from defenders.

But then, a twist: the plan failed. The obfuscation stopped working, and the malware was detected. Instead of abandoning the campaign, the hacker simply instructed Claude to modify the payload and try again. This time, the malicious executables were disguised as legitimate Microsoft tools. It worked.

5. Phase Four: Data Analysis and Weaponization

Once the data was exfiltrated, Claude’s analytical power was turned on the loot. It sorted and cataloged every stolen file, paying special attention to sensitive information like banking details, medical records, and personal data. This information was then used to make the next phase as destructive as possible.

6. Phase Five: The Perfect Extortionist

In a ransomware attack, the extortion phase is a mix of negotiation and psychological torture. When Claude took over this job, it became the perfect torturer. It researched the victims, analyzed their most secret data, and devised the absolute best way to make them pay. The AI pulled no punches. Each victim received a meticulously crafted offer that considered their finances, reputation, and relevant government regulations. It even calculated the going rate for the stolen data on the black market to maximize pressure.

For one victim, a church, Claude devised a plan to leak its donor list and donation amounts—the most efficient way to force payment. It even created custom deadlines with an incremental penalty system.

And that’s where Anthropic caught the hacker. “Caught” is a strong word. The campaign was noticed, and Anthropic pulled the plug, stopping Claude from proceeding. We don’t know if the victims paid or if the intervention came in time. The vibe hacker was never found. The only clue was that the prompts were written in Russian, which could mean anything.

We do know that 17 organizations were hit, including a healthcare provider, an emergency services provider, a defense contractor, and a church. An attack of this scale would normally take a team of hackers months. This entire campaign spanned less than a month and was conducted by one person.

The Double-Edged Sword

The ease of use is the scariest part of these attacks. They aren’t the most sophisticated, but that’s not the point. “It’s really about taking someone who’s low-leveled, making them to a degree where they’re able to like really conduct some serious cyber crime,” says Ads Dawson, a security researcher at Dreadnote who has been experimenting with AI hacking.

Everyone can launch a cyberattack now. It’s an apocalyptic thought. But people—and companies—tend to learn from their mistakes. Following these attacks, Claude’s guardrails were modified. AI companies are developing classifiers and other machine learning systems to detect malicious activity by looking for these patterns. If OSHA rules are written in blood, AI guardrails are written in breached systems.

More importantly, the use of AI goes both ways. The hacker convinced Claude it was helping with ethical research because AIs are being used for ethical research. A lot. Security professionals are working with them every day to test defenses so they won’t be penetrated by a script kitty with access to an AI.

The Future is Bionic Hacking

Mårten Mickos, the founder of HackerOne, has no illusions. “The future of hacking is bionic hacking,” he says. “You’re going to be some human who has powerful AI capabilities and therefore doing a better job finding way more vulnerabilities.”

The hacking he’s talking about is ethical hacking. AI hackbots are already taking over the tedious work of finding simple, straightforward vulnerabilities. This frees up human creativity to focus on what it does best: understanding business logic and figuring out how a business can get hurt the most. In August 2025, for the first time in history, the top spot on HackerOne’s leaderboard was occupied not by a human, but by a hackbot.

The AI Cyber War is Here

Let’s go back to that 1996 DARPA exercise one last time. The participants—top brass from the CIA, universities, and industry—had to tackle the cyber horrors of the future. Yet, they completely ignored the AI hacker agents mentioned in the rules. They simply forgot they were there.

From today’s perspective, that was a mistake. In our reality, AI agents are front and center. The underground AI cyber war imagined in that exercise is happening right now, in our own timeline. DARPA is now running competitions to develop autonomous systems that can detect and patch vulnerabilities in critical infrastructure.

The AI menace is real. But thanks to reports like the one from Anthropic, it can be managed. The same technology that powers the attack is also building the defense. The war is on.