Project Glasswing: Anthropic's AI Finds Over 10,000 Security Flaws in One Month

Claude Mythos Preview autonomously scanned millions of lines of code and flagged more than 10,000 distinct vulnerabilities in 30 days — a result that is forcing a rethink of what AI can do in offensive and defensive security research.

TL;DR: Anthropic's Project Glasswing tasked Claude Mythos Preview — a specialized experimental model — with autonomously hunting software vulnerabilities across real-world codebases. In just 30 days, the system flagged over 10,000 distinct security flaws, from memory corruption and injection vulnerabilities to logic errors and misconfigured access controls. All findings were shared under responsible disclosure protocols. The result is the most significant public demonstration to date of a commercial AI operating as a primary vulnerability discovery engine — and a clear signal that AI-powered security is no longer supplementary.

Anthropic has reached a milestone in AI-powered security research that few in the industry saw coming at this scale or speed. Project Glasswing, a focused internal research initiative, deployed Claude Mythos Preview — an experimental model variant optimized for deep code analysis — to autonomously discover software vulnerabilities across multiple real-world codebases. In 30 days, the system identified more than 10,000 distinct security flaws.

The number is striking. But the implications run deeper than the headline figure. Project Glasswing is not just a performance benchmark — it is a proof of concept for a fundamentally different model of security research, one in which AI transitions from assistant to primary investigator. For defenders and attackers alike, that shift changes the calculus entirely.

10,000+ Distinct security flaws identified in 30 days
30 Days of autonomous vulnerability discovery
100% Findings shared via responsible disclosure protocols

What Is Project Glasswing?

Project Glasswing is an Anthropic research program designed to evaluate how frontier AI models perform on real-world offensive security tasks — not in controlled lab environments, but against production-grade codebases with the complexity and ambiguity that implies. Its centerpiece is Claude Mythos Preview, a specialized model variant built for deep technical reasoning, structured vulnerability assessment, and code-level pattern recognition across large codebases.

The name references the glasswing butterfly — an insect whose wings are almost entirely transparent, making visible what would otherwise be hidden. That metaphor captures the project's core aim precisely: using AI to make opaque, complex codebases legible, surfacing flaws that would otherwise remain invisible to defenders for months or years.

Model Design

Claude Mythos Preview — Built for Security Reasoning

Unlike general-purpose Claude models, Mythos Preview was specifically tuned for deep code analysis, multi-file context tracking, and vulnerability pattern recognition. The model reasons across large codebases simultaneously, maintaining context across function calls, dependency chains, and data flows — the kind of multi-hop reasoning that classic static analysis tools cannot replicate.

Scope

Real-World Codebases, Not Synthetic Benchmarks

Project Glasswing operated against real production software, not curated CTF challenges or synthetic benchmark datasets. Anthropic has not disclosed the full list of systems audited, citing responsible disclosure obligations, but confirmed the scope spanned multiple industries and codebases of varying size and complexity.

10,000 Flaws in 30 Days: What the Numbers Mean

Over one calendar month, Claude Mythos Preview flagged more than 10,000 distinct security issues. The vulnerability categories spanned the full spectrum of common flaw classes: memory corruption, injection vulnerabilities (SQL, command, LDAP), authentication bypass patterns, insecure deserialization, logic errors in access control, and misconfigured cryptographic implementations.

For context: a skilled human security researcher conducting a focused audit typically identifies between 5 and 20 significant vulnerabilities per week, depending on codebase complexity. At that pace, surfacing 10,000 findings would require a team of roughly 50 researchers working full-time for a month. Claude Mythos Preview did it autonomously.

This volume reflects a core structural advantage of LLM-based security tooling: coverage at scale. A language model does not fatigue, does not miss a file because it ran out of time, and does not develop tunnel vision after a long session. It sweeps every line of code within its context window with consistent attention, surfacing anomalies that human teams would realistically take weeks to reach.

Anthropic confirmed that all findings were shared with affected parties through coordinated responsible disclosure protocols before any public announcement — a deliberate choice that positions Project Glasswing not as a threat demonstration, but as a model for how AI-assisted security research should be conducted.

The Dual-Edged Dimension

Project Glasswing carries an uncomfortable implication alongside its impressive numbers. If a safety-focused AI lab can use its own model to discover this many vulnerabilities in a single month, the same capability in less responsible hands is not hypothetical — it is imminent.

Offensive Risk

The Same Capability, Misused

A threat actor with access to a similarly capable model — whether through a fine-tuned open-source variant or a jailbroken commercial API — could deploy it to discover unpatched vulnerabilities in target systems at machine speed. The attack surface discovery phase, which traditionally requires significant human expertise and time, becomes dramatically cheaper and faster.

Defensive Opportunity

Finding Flaws Before Attackers Do

The flip side is equally significant. Organizations that deploy AI-assisted security tooling proactively — before a breach — gain a structural advantage over those that do not. Project Glasswing demonstrates that the gap between a well-resourced security team and an under-resourced one could be substantially closed by AI, provided the tooling reaches teams that need it most.

Anthropic acknowledges this tension directly and frames it as central to the project's purpose. Understanding AI's offensive potential in controlled conditions is prerequisite to building stronger model guardrails and more effective defenses. The findings from Glasswing feed directly into Claude's alignment research, shaping how future model versions reason about — and decline — requests that cross into genuinely harmful security territory.

What This Means for the Security Industry

Project Glasswing is not an isolated experiment. It is part of a broader wave of AI-native security tooling that is rapidly maturing: Google's Project Zero has been integrating LLMs into its vulnerability research pipeline, Microsoft Security Copilot has moved from preview to production, and a growing set of startups are building AI-first penetration testing and code auditing platforms.

What separates Glasswing from those efforts is scale and autonomy. Most AI security tools today function as accelerators for human researchers — surfacing candidates that a human then validates. Glasswing operated as a primary investigator across a broad codebase scope, with human review concentrated at the output stage rather than distributed throughout the process.

For CISOs and security teams, the strategic takeaway is clear: organizations that treat AI as a supplementary layer in their security workflow are already behind the curve. The question is no longer whether to integrate AI into vulnerability research — it is how quickly that integration can be operationalized at meaningful depth.

TechVernia Verdict

Project Glasswing is a benchmark moment for AI-powered cybersecurity — and a warning shot for the entire industry. The 10,000-flaw figure is not just a performance metric; it is evidence that the economics of vulnerability discovery have fundamentally changed. What previously required large, expensive human teams can now be accomplished, in part, by a single well-tuned model running continuously over a month.

Anthropic's decision to publish this work — and to conduct it under responsible disclosure — sets a standard for how AI labs should approach offensive capability research. The dual-use risk is real and acknowledged. But the alternative — ignoring this capability until it appears in adversarial hands — is worse. The security industry needs to move faster. Project Glasswing just showed everyone exactly how much faster.

Frequently Asked Questions

What is Claude Mythos Preview?

Claude Mythos Preview is a specialized experimental variant of Anthropic's Claude model family, optimized for deep code analysis, multi-file context reasoning, and structured vulnerability assessment. It is distinct from general-purpose Claude models and was specifically developed for Project Glasswing's security research objectives.

Were the 10,000 vulnerabilities all critical?

Anthropic has not published a full severity breakdown of the findings. The 10,000+ figure represents distinct security issues across all severity levels — from critical memory corruption bugs and authentication bypasses to lower-severity logic errors and configuration weaknesses. In any large-scale audit, findings naturally span a severity spectrum, with critical issues typically representing a fraction of the total volume.

Which systems were audited in Project Glasswing?

Anthropic has not disclosed the specific systems or organizations included in the audit scope, citing responsible disclosure obligations. The company confirmed that the scope spanned multiple codebases across different industries, and that all affected parties were notified through coordinated disclosure before any public announcement.

Can organizations access Claude Mythos Preview for their own security audits?

As of June 2026, Claude Mythos Preview remains an internal research model and is not available through Anthropic's public API or enterprise offerings. Anthropic has not announced a commercial release timeline. Organizations looking for AI-assisted security tooling today should explore existing solutions from vendors who have integrated LLMs into static analysis and penetration testing workflows.

Does Project Glasswing mean AI will replace human security researchers?

Not in the near term — but the role is changing. AI systems like Claude Mythos Preview excel at breadth: sweeping large codebases continuously for known vulnerability patterns. Human researchers still provide irreplaceable value in novel attack chain discovery, contextual business logic analysis, and the kind of creative adversarial thinking that identifies entirely new vulnerability classes. The most effective security teams will be those that combine AI-driven coverage with human-led depth.

Related Articles:

Kodjo Apedoh

Kodjo Apedoh

Network Engineer & AI Entrepreneur

Founder of TechVernia & SankaraShield. Certified Network Security Engineer with 4+ years of experience specializing in network automation (Python), AI tools research, and advanced security implementations. Holds certifications from Palo Alto Networks, Fortinet, and 15+ other vendors. Based in Arlington, Virginia.

Connect on LinkedIn →