AI is everywhere in offensive security right now. HackerOne's 2025 Hacker-Powered Security Report found that 70% of researchers now use AI tools in their workflow, applying them to recon, exploit development, report writing, and every phase of the bug bounty process. That's not inherently a problem. These tools do make skilled researchers faster and more effective.
However, ethics and researcher responsibility need to be considered when the AI gets it wrong and nobody checks before submitting.
Bug bounty programs are seeing a surge in submissions that look polished, read professionally, and fall apart the moment a triage analyst tries to reproduce the finding. Fabricated submissions. Attack chains that reference real techniques but don't actually work against the target. Detailed reports about vulnerabilities that simply don't exist. These aren't edge cases. They're a growing pattern, and they're wasting real time and costing real trust on both sides of the program.
AI Doesn't Know What's Real
Large language models don't verify claims. They predict the next plausible token in a sequence. When you ask one to analyze a target and identify vulnerabilities, it will produce output that looks like expert analysis because it's been trained on thousands of real vulnerability reports. But it has no mechanism for confirming that the vulnerability exists, that the CVE it cited applies to that software, or that the attack chain it described would actually execute.
This is the hallucination problem. And in bug bounty research, it has teeth.
When a hallucinated finding gets submitted to a program, a human defender has to stop working on real threats to investigate a ghost. Multiply that by dozens of AI-assisted submissions per week, and you've created an operational cost: not just in hours, but in the trust that makes bug bounty programs work at all. Triage teams that spend their days chasing phantom findings start treating all external submissions with suspicion. That's bad for everyone, including the skilled researchers submitting legitimate work.
We’re already seeing the fallout. cURL shut down its bug bounty program entirely in January 2026, citing a flood of AI-generated reports that drove its confirmed vulnerability rate below 5%. A program that had run successfully since 2019 and paid out over $100,000 to legitimate researchers is gone because the noise became unmanageable.
They’re not the only ones. Other open-source projects have scaled back or restructured their vulnerability intake for the same reasons.
We're past the 'what-if' stage. This is happening now.
The Junior Analyst Mental Model
The most useful way to think about AI in your research workflow is as a junior analyst on your team. It's fast, eager, and capable of producing work that looks competent. But you would never let a junior submit a finding to a client without reviewing it yourself. And when the junior gets something wrong, the accountability doesn't rest with them. It rests with you.
Here's what's changed recently: the junior is getting promoted. Six months ago, AI-generated findings were often obviously wrong: nonexistent endpoints, nonsensical parameters, vulnerabilities that didn't exist. Today's models produce fewer hallucinations, but the ones that slip through are significantly more convincing. A valid-looking attack chain where one step doesn't work against the target's stack. An injection point that looks exploitable on the surface but is sanitized further down in the real implementation. A plausible report about a vulnerability class the target could have, but doesn't.
That shift from "obviously wrong" to "plausibly wrong" is the real danger. It means your review process has to improve at the same rate as the models, or faster.
A quick example: I once asked an internal AI tool six times whether a set of fixes had landed on a branch. It flagged three items as unresolved, then reversed course and confirmed everything was fixed. That confident final answer would have closed the loop if I hadn't kept pushing. The hallucinations don't disappear. They just get better at sounding right.
Context Engineering: Controlling What the AI Knows
Hallucinations are best controlled by watching what goes into the model's context window, and how it’s formatted. The industry is increasingly calling this context engineering, the discipline of loading the right information into the AI's working memory so it reasons from facts rather than patterns.
In practice, this means grounding every prompt in real data from your actual reconnaissance. Don't ask the AI to "find vulnerabilities in this application." Instead, feed it your actual HTTP responses, server headers, endpoint documentation, and scope definition, then ask it to analyze what you've provided. The instruction "Use only the provided input and do not assume details not present" is one of the most effective anti-hallucination guardrails available.
Structured input matters too. Wrapping your inputs in clear tags like <recon_data>, <scope>, and <server_response> noticeably improves output quality across current models. It's the difference between handing the junior a pile of unmarked papers and handing them a labeled folder. The model performs better when it can distinguish what's context, what's instruction, and what's constraint.
Three More Techniques That Actually Work
Five techniques form the practical toolkit for reducing hallucinations in security research. The first two, grounding and structured input, were covered above. The remaining three:
Chain-of-thought prompting. Instead of asking for a conclusion, ask the model to show its reasoning step by step. "Walk through how this vulnerability would be exploited, including each prerequisite and what could prevent it from working." This forces the model to surface its assumptions, which makes the wrong ones visible.
Recent research shows that while chain-of-thought reduces hallucination frequency, the hallucinations that survive tend to be more confident and harder to catch, which is exactly why you pair this technique with validation.
Self-validation. Ask the model to critique its own output before you accept it. "Now review your analysis. Identify any assumptions you made that aren't supported by the provided data, and flag any claims you're less than 90% confident in." Models are surprisingly effective at catching their own hallucinations when explicitly asked to look for them. There's a peer-reviewed method called Chain-of-Verification (Dhuliawala et al., 2023) that showed this approach improves accuracy by up to 23% on factual tasks. This logic drives the validator agent pattern emerging in more complex agentic workflows, where a dedicated subagent automatically checks outputs before the pipeline moves forward (Li et al., 2026).
Few-shot examples. When a well-structured prompt isn't producing the right output format or reasoning depth, show the model what a correct answer looks like. Provide an example of a good vulnerability analysis alongside the data that supported it, then ask it to follow the same pattern with your target. This is an escalation technique: try the simpler approaches first and reach for few-shot when you need tighter control over output quality.
None of these techniques are magic. They're the review process. They're how you keep pace with a junior analyst whose work looks increasingly credible.
The Ethics That Don't Get Talked About Enough
As more people try to break into full-time pentesting, we often forget the “ethical” part of “ethical hacking.”
The conversation about AI in bug bounty has mostly focused on detection: can programs tell when a submission is AI-generated? I believe this is the wrong place to start.
The real question is about the researcher's responsibility.
Let’s look at the range of reports researchers are using AI to generate.
Using AI to accelerate analysis but verifies every finding independently before submitting. That's responsible use.
Using AI to translate from a researcher’s native language to English. Also responsible use.
AI-generated findings without full verification. Starts to border unethical. Perhaps just lazy, but still costly to the defenders who have to triage the result.
Inventing endpoints that don’t exist anywhere in the target application. An unethical submission.
Submitting “proof-of-concept” scripts that are outputting to the terminal the word “success” because that’s precisely what it was scripted to do (e.g.;
print(“success”)). Also an unethical submission, and evident the researcher doesn’t understand their own work.AI scraping a public disclosure or security document, rewrites it, and submits it as original work.
That last one is not a prompt engineering problem. That's fraud.
Bug bounty platforms are moving from exploration to enforcement on these issues. Detection mechanisms for AI-generated submissions are maturing, and the reputational consequences for researchers caught submitting unverified work are getting worse. Platforms are increasingly treating that last category as a policy violation on par with plagiarism or duplicate disclosure abuse.
Every program owner I've spoken with says the same thing: they don't mind AI-assisted reports. They mind lazy ones.
But let’s forget the platform policies for a second. This is about reputation.
Your name on a report is your testimony that the finding is real. Every shortcut that undermines that testimony costs a real person their time: a defender who stopped working on actual threats to investigate your phantom finding. The quality of your work when nobody is auditing it is the truest measure of your professional integrity.
Your Proof Comes from the Product, Not the Model
Here's the practical standard: every finding you submit should be reproducible without AI. Your proof-of-concept should execute against the target. Your screenshots should come from your testing environment. Your technical details should match what the application actually does, not what the model predicted it would do.
AI can help you get there faster. It can surface patterns you might miss, suggest attack vectors worth investigating, and help you articulate findings more clearly. Those are real advantages. But the evidence has to come from the product itself. If your finding only exists in the model's output and you can't reproduce it independently, it's not a finding. It's a hallucination.
Before you hit submit:
Personally verify the finding, don’t delegate validation to an AI
Reproduce the issue on the real system, confirm it actually manifests
Speculative language, “might”, “possibly”, or “could” signals more investigation is necessary
Record environment details, OS, version, config, and required permissions
Provide deterministic reproduction steps, so another person can follow them
Attach concrete proof, for example logs, exploit screenshots, or pcap, not LLM output
If anything is uncertain, state the assumption and the next validation action
Using AI to Think Faster, Not Lazier
AI tools aren't going away, and they shouldn't. When used right, they actually make security research easier and faster. The researchers who will thrive in this environment aren't the ones who use AI the most. They're the ones who verify the hardest, treat the model as a peer reviewer rather than an oracle, and understand that the tools are only as trustworthy as the process wrapped around them.
You're not just responsible for what you submit. You're responsible for what you unleash. Every tool this powerful demands stewardship, not just skill.
The junior keeps getting promoted. Make sure your review process does too.