Vulnerability Research Isn’t Cooked; It’s Burned Beyond Recognition

On March 30, 2026, Thomas & Erin Ptacek posted a blog titled “Vulnerability Research Is Cooked“. I don’t believe I know Erin, but I know of Thomas as an old-school vulnerability researcher who has been well respected for a long, long time. When he speaks about vulnerability research, I certainly listen. So this blog was of interest to me for a variety of reasons as it primarily talked about the use of so-called AI in vulnerability research and the implications.

I want to chime in on a few things he said with some additional perspective, but then quickly dive into why I think that while he is right, he isn’t covering another major thing happening in the world of “AI” and security research. Many people are starting to notice what I will talk about it despite it being older than most realize. It’s a trend that won’t be going away any time soon and has just as much implication as the Ptacek’s topic.

Additional Commentary

“I spent the last week bouncing these thoughts off veteran vulnerability researchers. Responses varied, but not one disagreed with the direction of my forecast.”

Sure, with the “direction” you should get universal agreement I would hope! But “how fast” has been the question for a while now. I am still an “AI” skeptic, and for good reason, but I know it is improving and will become as ubiquitous as any other technology. Hell, it already is whether we know it or not. The question for me in a majority of “AI”-centered discussions is not ‘if’ but ‘when’ we hit the point where it can reliably do $whatever.

I got to talk with Nicholas Carlini at Anthropic about this. Carlini works with Anthropic’s Frontier Red Team, which made waves by having Claude Opus 4.6 generate 500 validated high-severity vulnerabilities. He described the process for me.

Yes, Anthropic did this. Yes an LLM can do a lot of this but you don’t just ./install.sh and feed it a few simple prompts. A lot of the cool things this technology is doing requires installing the software and spending considerable time training it with specific data including bug classes of interest, traversing code to know if something is user-reachable, and then “verify” it is exploitable (mileage may vary). Basically, creating a one-off “AI” that is specifically geared to do one task and not drown in a world of additional, irrelevant data. That is a core problem of the broader tools we use like Gemini, Copilot, and ChatGPT. They are answering based on a literal world of data, including all the wrong information and LLM poisoned data.

When I tried a simple version of the process they describe Carlini doing, using ChatGPT just a couple years ago, I had different results. I too gave it a specific code repository and asked for it to find the most simple of bug classes; cross-site scripting (XSS). It hallucinated 100% of the vulnerabilities and not just that it was wrong about a file being vulnerable, it made up files that didn’t exist in the repository at all and then said they were vulnerable.

Gemini prompt: Create an image of a shitty looking robot named “Clanker” daydreaming about vulnerabilities like “XSS” and “SQLi”

One old friend at a big vendor doubted that the transition I’m predicting will be as easy as I’ve made it sound. Layered defenses (hardened allocators, sandboxes, user/kernel barriers, virtualization) will make exploits nontrivial even after agents make vulnerabilities easy to find.

Said big vendor friend they cite is basically wrong in the big picture but likely right(ish) when it comes to the high-end memory corruption vulns. But you have to remember, all those defenses mean precisely nothing for an Internet-facing SQL injection vulnerability that reaches your production database. Same with a lot of remote code execution, file inclusion, SSRF, XXE, and many other bug classes. We can’t talk about “vulnerabilities” as a broad term while talking about more precise use-cases of a given technology (offense or defense).

Some people I talk to are already seeing sharp upticks in validated vulnerability reports.

Yep! And others are seeing the sharp upticks in the waves of absolute slop.

The smartest vulnerability researcher I know called me out on the strength of my prediction. They agree agents will generate working zero-days in, well, everything. But to them, it’s merely a product of settled science, variations on well-documented themes. Lots of technique isn’t documented at all, and it remains to be seen if LLM agents will be able to recapitulate any of it.

I think that is largely correct also, but see above. You don’t need those high-end vulnerabilities for a majority of hacking still. Every day I see redteamers talk about engagements they are on getting domain controller access with all the old favorite methods. Sure, they find novel new things from time to time but they are almost always talking about internal engagements focused on Windows environments which are notoriously easy-mode. But if we are talking about the high-end 0-days that sell for one million dollars or more, then yes, that remains to be seen.

So that covers the “Vulnerability Research Is Cooked” bit from the Ptacek’s, which I think is a great read. But let’s explore the other side of this coin.

Vulnerability Research is Burned Beyond Recognition

Time to look at a different angle of how “AI” is impacting the vulnerability landscape, and it isn’t along the lines most people are talking about. While the Ptacek’s et al are correct that an LLM can be trained to find vulnerabilities, what about the “AI” software itself? For years now we’ve been seeing a sharp uptick in vulnerabilities reported against “AI” / LLM / Agentic software itself. A few days ago this became front-and-center in some circles as a fun site was created, https://days-since-openclaw-cve.com/.

While this site is certainly fun, it’s also just wrong in the intent. As usual, people that are based entirely in CVE will use imprecise data leading to inaccurate conclusions. That “streak” from February 7 to 18 for example, is skewed because it is using CVE publication dates, not vulnerability publication dates. In reality there were vulnerabilities disclosed in OpenClaw on February 12, 13, 14, 15, 16, 17, and 18th. Bonus points that two vulnerabilities in that time period don’t have CVEs assigned either.

Anyway, my point is that OpenClaw alone has had 265 vulnerabilities disclosed since January 26th of this year. That is 265 in 69 days or 3.84 a day average, in a single “AI” software package. That should be a warning sign unto itself. Using a fuzzy search for “LLM” in VulnDB titles there are over 700 results right now. Note that search matches on non-LLM software and matches e.g. “Hellman”, so it only gives a rough idea of my point, but a majority of the matches are in scope.

They range in a variety of products, most of which you likely haven’t heard of. From plugins for WordPress to third-party apps for high-use recognized apps like Anthropic and Gemini, the vulnerabilities range from the lowest severity (literally none) to the highest (remote code/command execution). Basically, just like any other software. Remember, that is just a subset of the bigger picture since a lot of “AI” software doesn’t incorporate ‘LLM’ into the name. Searching titles for “AI” yields too many false positives in the mix of “AI” software like PraisonAI, vanna-ai, CrewAI, Spring AI, and so many others.

So when we talk about “AI and vulnerabilities” it’s fun to guess how fast the technology will get us to reliable, quality vulnerability research that doesn’t need to be triple-checked. But it is more important, in my opinion, that we consider all this new software we are using is often riddled with vulnerabilities. And just like any other software package, some of those vulnerabilities do not have a CVE ID, some are not patched promptly, and other vendors will be diligent in responding to such reports.

Basically, it’s like the rest of the software in the world except too many people are fawning over this new technology while blindly stumbling past the reality of the situation. Disclosures are a range of quality to absolute trash that cannot be trusted. Kidiots are farming the new LLM software for vulnerabilities just like they continue to for junk software and personal routers. In short, vulnerability research is burned beyond recognition which is business as usual.

Vulnerability Research Isn’t Cooked; It’s Burned Beyond Recognition

Additional Commentary

Vulnerability Research is Burned Beyond Recognition

Share this:

Like this:

Leave a ReplyCancel reply

Discover more from Rants of a deranged squirrel.