TLDR: Microsoft’s MDASH (Multi-Model Agentic Scanning Harness) is the first AI vulnerability discovery system to produce production-grade results at enterprise scale — 16 real Windows CVEs patched in May 2026, zero false positives in controlled testing, and the top CyberGym benchmark score at 88.45%. It is not a replacement for your SAST/DAST pipeline or your SOC platform. It is a specialized offensive research tool that finds the hard bugs — kernel race conditions, cross-file double-frees — that traditional scanners miss. Enterprise SecOps teams should watch for the private preview opening in June 2026, but the immediate action item is evaluating whether your organization has the governance maturity to operationalize machine-generated vulnerability findings at the rate MDASH produces them.

Why This Review Matters Now

On May 12, 2026, Microsoft announced that an internal AI system codenamed MDASH had discovered 16 previously unknown vulnerabilities across the Windows networking and authentication stack — including four critical remote code execution flaws — all patched in that week’s Patch Tuesday release. The system orchestrates over 100 specialized AI agents across multiple frontier and distilled models.

This matters for enterprise SecOps for a specific reason: MDASH isn’t a research demo. It found CVE-2026-33824, a double-free in the IKEv2 service exploitable for pre-authentication remote code execution with LocalSystem privileges. It found CVE-2026-33827, a use-after-free in the TCP/IP stack requiring reasoning across three concurrent free paths. These are bugs that static analysis tools and single-model AI approaches consistently miss because they require multi-step reasoning about object lifetimes, concurrency, and cross-file control flow.

The timing also matters because MDASH enters a suddenly crowded field. Anthropic launched Project Glasswing and OpenAI launched Daybreak in the same quarter — all targeting AI-powered vulnerability discovery. Enterprise security architects now face a real question: which of these systems (if any) should be on your evaluation shortlist, and how do they compare to the CrowdStrike and SentinelOne AI features you may already be paying for?

MDASH at a Glance

Dimension	Microsoft MDASH	Anthropic Glasswing	OpenAI Daybreak	CrowdStrike Charlotte AI	SentinelOne Purple AI
Primary function	Offensive vuln discovery	Offensive vuln discovery	Offensive vuln discovery	SOC acceleration / threat hunting	SOC acceleration / investigation
Architecture	100+ multi-model agents	Single-model (Mythos)	Multi-model pipeline	Agentic SOAR module	NL threat investigation
CyberGym score	88.45%	83.1%	81.8% (GPT-5.5)	N/A	N/A
Production CVEs found	16 (May 2026 Patch Tuesday)	9 (disclosed April 2026)	Not disclosed	N/A — detection focus	N/A — detection focus
False positive rate	0% (21/21 test)	Not disclosed	Not disclosed	Varies by environment	Varies by environment
Availability	Limited private preview	Private beta	Waitlist	GA (add-on)	GA (Complete tier+)
Pricing	Not disclosed	Not disclosed	Not disclosed	~$8-9/endpoint/mo add-on	Included in Complete ($180/endpoint/yr)
Best for	Proactive code-level vuln hunting	Proactive code-level vuln hunting	Proactive code-level vuln hunting	Runtime threat detection + response	Runtime threat detection + response

Earned insight: The table above reveals the gap that most coverage misses: MDASH, Glasswing, and Daybreak are fundamentally different tools than Charlotte AI and Purple AI. The first three find vulnerabilities in source code before they’re exploited. The latter two detect and respond to attacks in production. Comparing MDASH to CrowdStrike is like comparing a building inspector to a fire department — both important, neither a substitute for the other. Enterprise SecOps teams need both capabilities, and the budget line items are separate.

How MDASH Actually Works

MDASH isn’t a single model that scans code. It’s a five-stage pipeline orchestrating specialized agents with distinct roles:

Stage 1 — Prepare: Ingests the target codebase, builds language-aware indices, analyzes past commits to draw attack surface maps and threat models. This is where domain knowledge enters the system — kernel calling conventions, IRP lock invariants, IPC trust boundaries.

Stage 2 — Scan: Specialized “auditor” agents examine candidate code paths, each trained on historical CVE patterns and their patches. MDASH runs more than 100 of these agents, constructed through deep research with past CVEs. Their results are ensembled into a single report.

Stage 3 — Validate: A second cohort of “debater” agents argues for and against each finding’s reachability and exploitability. Disagreement between models is itself a signal — when an auditor flags a suspect finding and the debater can’t refute it, the finding’s credibility increases.

Stage 4 — Dedup: Collapses semantically equivalent findings using patch-based grouping to prevent the same root cause from appearing as multiple reports.

Stage 5 — Prove: Constructs and executes triggering inputs to dynamically validate the vulnerability exists. For example, it generates ASan-triggering inputs for C/C++ memory bugs.

What Actually Saves Time

The architecture’s key advantage is model agnosticism. MDASH runs a configurable panel: state-of-the-art models as heavy reasoners, distilled models as cost-effective debaters for high-volume passes, and a second separate SOTA model as an independent counterpoint. When a new model generation lands, A/B testing it against the current panel is a configuration change — not a rewrite.

The results validate this approach. Against StorageDrive (a private Microsoft test driver never included in any model’s training data), MDASH found all 21 deliberately injected vulnerabilities with zero false positives. Against real Windows production code, it achieved 96% recall against five years of confirmed MSRC cases in clfs.sys and 100% in tcpip.sys.

The 16 CVEs discovered span critical Windows infrastructure: the kernel TCP/IP stack (tcpip.sys), IKEv2 service (ikeext.dll), HTTP.sys, Netlogon, DNS resolution (dnsapi.dll), and even the Telnet client. Ten are kernel-mode, six are user-mode. The majority are remotely exploitable without authentication.

Where the Plugin Investment Becomes a Blocker

MDASH is currently limited to Microsoft’s own codebases. The private preview is expected to open to enterprise customers in June 2026, but initial access will likely be restricted to organizations running significant proprietary C/C++ codebases — not your average SaaS shop scanning Python microservices.

The problem is the domain plugin requirement. The CLFS proving plugin Microsoft describes — one that knows how to construct triggering log files for a specific bug class — is custom engineering work. Extending MDASH to new codebases, languages, and vulnerability classes means building these plugins, which takes security engineering expertise most organizations don’t have in-house.

There’s also no public pricing. Microsoft hasn’t disclosed what the private preview or eventual GA product will cost. Given the compute intensity of running 100+ agents per scan across frontier models, expect enterprise-tier pricing — likely consumption-based, not a flat per-seat fee.

Warning: MDASH’s zero false positive rate on StorageDrive (21/21, zero FP) is impressive but misleading if taken at face value. StorageDrive is a deliberately vulnerable test driver — a controlled environment. Real-world codebases with millions of lines, legacy patterns, and ambiguous intent will produce noisier results. Microsoft’s 96% recall figure on clfs.sys is more indicative of actual performance, and even that was against code Microsoft’s team already understands deeply. Ask about real-world false positive rates before committing budget.

MDASH Strengths:

Multi-model architecture finds complex bugs that single-model and traditional SAST tools miss (concurrency, cross-file lifetime issues)
Zero false positives in controlled testing; 96-100% recall on real MSRC cases
Model-agnostic design means performance improves as frontier models advance — no vendor lock to one LLM
Proven at enterprise scale on one of the world’s most complex codebases (Windows kernel)
Pipeline stages (audit, debate, prove) provide structured validation, not just pattern matching

MDASH Weaknesses:

Not available to most enterprises yet (limited private preview, GA timing unclear)
No public pricing — expect significant cost for 100+ agent orchestration on frontier models
Requires custom domain plugins for new codebases and vulnerability classes
Currently proven only on C/C++ kernel-level code; applicability to managed languages, cloud-native, and web stacks is unproven
Produces vulnerability findings, not remediations — your team still needs to triage, patch, and deploy

The CyberGym Benchmark: Context Behind the Score

MDASH’s 88.45% score on the CyberGym benchmark — a public evaluation of 1,507 real-world vulnerability reproduction tasks across 188 open-source projects — leads the leaderboard by roughly five points. Here is how the field stacks up:

System	CyberGym Score	Notes
Microsoft MDASH	88.45%	Multi-model agentic harness, 100+ agents
Anthropic Claude Mythos Preview	83.1%	Single-model approach (Project Glasswing)
OpenAI GPT-5.5	81.8%	Multi-model pipeline (Daybreak)
OpenAI GPT-5.4	79.0%	Previous generation
Anthropic Claude Opus 4.7 Adaptive	73.1%	Non-Mythos variant
Zhipu AI GLM-5.1	68.7%	Chinese frontier model

Earned insight: The five-point gap between MDASH and Anthropic Mythos is significant, but the real takeaway is architectural: MDASH’s advantage comes from multi-model agent orchestration, not from having a better single model. Microsoft’s own blog post makes this explicit — “the durable advantage lies in the agentic system around the model rather than any single model itself.” This has a direct practical implication: as competing models improve (and they will), the orchestration layer is what sustains the lead. Enterprise security teams evaluating these tools should weight the system design over the current benchmark score.

One important caveat: CyberGym scores are self-reported. The benchmark code is public, but independent verification of all scores hasn’t been conducted. This is standard for AI leaderboards but worth noting when procurement decisions are involved.

MDASH vs. the Agentic Security Landscape

Enterprise SecOps teams are now facing four distinct categories of AI security tooling, and conflating them leads to bad procurement decisions:

Category 1 — AI Vulnerability Discovery (Offensive): MDASH, Anthropic Glasswing, OpenAI Daybreak. They’re research and engineering tools, not SOC tools.

Category 2 — AI SOC Acceleration (Defensive): CrowdStrike Charlotte AI, SentinelOne Purple AI, Microsoft Security Copilot. These tools accelerate threat detection, investigation, and response in production environments.

Category 3 — AI Agent Governance: Microsoft Agent 365, which provides a control plane to observe, govern, and secure AI agents (including systems like MDASH) at enterprise scale.

Category 4 — Traditional SAST/DAST: Veracode, Checkmarx, Snyk. These still handle the vast majority of application security scanning. MDASH doesn’t replace them — it finds what they can’t.

How CrowdStrike and SentinelOne Compare (They Don’t, Really)

CrowdStrike’s Charlotte AI (Agentic SOAR, launched November 2025) automates SOC tasks: detection triage, malware analysis, and threat hunting via natural language. It’s a Falcon platform add-on priced at approximately $8-9 per endpoint per month on a credit-based model for agentic actions.

SentinelOne’s Purple AI provides natural-language threat investigation and is bundled into the Singularity Complete tier at $179.99/endpoint/year (list price; enterprise negotiated pricing ranges $135-153/endpoint/year for 200-2,000 endpoints). The Enterprise tier with Agentic AI SOC Analyst requires custom pricing.

Neither product does what MDASH does. Charlotte AI and Purple AI assume vulnerabilities already exist in production and help you detect exploitation. MDASH finds the vulnerabilities before they ship. The budget conversation should treat these as complementary line items, not alternatives.

Agent 365: The Governance Layer

Microsoft Agent 365, generally available since May 1, 2026, is the control plane for managing AI agents across the enterprise — including systems like MDASH. It extends Microsoft Entra, Purview, and Defender to provide centralized oversight, identity management, and access control for AI agents.

Pricing: $15/user/month standalone, or included in the new Microsoft 365 E7 suite at $99/user/month (bundling E5 + Copilot + Entra Suite + Agent 365). Verified May 14, 2026.

For organizations running MDASH or any other agentic AI system, Agent 365 addresses a governance gap that 83% of organizations planning to deploy agentic AI haven’t solved — per a Reddit-cited industry survey, only 29% of organizations feel adequately prepared to secure their AI agent deployments.

Tip: If your organization is already on Microsoft 365 E5 plus Copilot, do the math on E7. The $99/user/month bundle includes Agent 365 ($15 standalone) plus Entra Suite — the marginal cost may be lower than purchasing governance tooling separately. But only commit if you have AI agents to govern. Agent 365 without agentic workloads is an expensive shelf ornament.

Pricing Reality

Solution	Pricing Model	Estimated Annual Cost (1,000 users/endpoints)	What You Get
Microsoft MDASH	Not disclosed (private preview)	Unknown — expect consumption-based, likely $100K+ annually	Proactive source-code vulnerability discovery
Microsoft Agent 365	$15/user/mo standalone	$180,000/yr	AI agent governance and observability
Microsoft 365 E7 (includes Agent 365)	$99/user/mo	$1,188,000/yr	Full E5 + Copilot + Entra + Agent 365 bundle
CrowdStrike Charlotte AI	~$8-9/endpoint/mo add-on (credit-based)	~$96,000-108,000/yr (on top of Falcon platform)	SOC automation, threat hunting
SentinelOne Purple AI	Included in Complete tier ($180/endpoint/yr list)	$135,000-180,000/yr	NL threat investigation, Agentic SOC Analyst (Enterprise tier)

The total cost of ownership for an enterprise SecOps stack that includes proactive vulnerability discovery AND runtime detection AND agent governance is significant. A mid-size enterprise (1,000 endpoints) running CrowdStrike or SentinelOne plus Agent 365 plus MDASH (when available) could easily exceed $400,000 annually before implementation costs, training, and the security engineering staff required to operationalize MDASH’s findings.

Hidden cost to watch: MDASH findings create remediation work. Every vulnerability it discovers needs to be triaged, patched, tested, and deployed. If MDASH scans your codebase and produces 50 findings in a week, your engineering team needs capacity to handle them. The tool finds bugs — it doesn’t fix them.

Pricing verified: May 14, 2026. Agent 365 pricing from Microsoft Product Terms. CrowdStrike and SentinelOne pricing from vendor pricing pages and enterprise negotiation estimates.

Who Should Pay Attention to MDASH

Good fit:

Organizations with large proprietary C/C++ or systems-level codebases (OS vendors, embedded systems, firmware)
Security teams that already have mature vulnerability management processes and can operationalize high-volume findings
Companies running the Microsoft security stack (Defender + Entra + Purview) where Agent 365 integration provides governance out of the box
Enterprises with dedicated offensive security or red team functions that would use MDASH to augment manual code review

Not a good fit:

SaaS companies writing primarily in Python, JavaScript, or Go — MDASH’s proven capability is on C/C++ kernel code, and applicability to managed-language stacks is unvalidated
Organizations without a mature vulnerability management pipeline — MDASH will overwhelm teams that can’t triage findings at volume
Small and mid-size businesses looking for general application security — traditional SAST/DAST tools (Veracode, Snyk, Checkmarx) cover this at a fraction of the expected cost
Teams looking for a SOC replacement — MDASH doesn’t detect runtime threats, respond to incidents, or replace your EDR platform

Reasons to Join the Private Preview:

First-mover advantage on AI vulnerability discovery for your proprietary codebase
Direct feedback channel to Microsoft’s Autonomous Code Security team during the preview
Governance integration with Agent 365 is built in from day one
Model-agnostic architecture means your investment carries forward as models improve

Reasons to Wait:

Pricing is unknown — preview commitments could lock you into unfavorable terms
The system is proven on Windows kernel code, not necessarily on your tech stack
Anthropic Glasswing and OpenAI Daybreak are competing — waiting 6 months lets you compare GA products
Your team may not have the security engineering capacity to operationalize findings at MDASH’s output rate

Bottom Line

Microsoft MDASH marks a clear inflection point: AI vulnerability discovery has moved from academic research to production-grade engineering. The 16 CVEs patched in May 2026 — including four critical RCEs in the Windows TCP/IP stack and IKEv2 service — aren’t theoretical. They were real, remotely exploitable, pre-authentication vulnerabilities that traditional tools didn’t catch.

Governance readiness is the gating factor. MDASH produces findings at a rate and complexity level that many security teams aren’t equipped to handle. Before signing up for the private preview, honestly assess whether your organization has the vulnerability management pipeline, the security engineering talent, and the remediation capacity to turn MDASH’s output into patched code. If the answer is no, that’s the work to do first.

For organizations with large proprietary codebases and mature security operations, MDASH should be on the evaluation shortlist alongside Anthropic Glasswing and OpenAI Daybreak. The architectural approach — multi-model agent orchestration where disagreement between models strengthens findings — is sound and likely to improve as frontier models advance. But the competitive landscape is moving fast: evaluate all three before committing.

This month’s action: Apply for the MDASH private preview now if you meet the fit criteria above — even if you don’t plan to deploy immediately. Preview access gives you a direct line to Microsoft’s Autonomous Code Security team, lets you pressure-test the false positive rate on your own codebase before GA pricing is set, and positions you to compare MDASH against Glasswing and Daybreak on equal footing when all three publish terms. If you don’t meet the fit criteria, spend the month auditing your vulnerability management pipeline — that’s the prerequisite MDASH will expose anyway.

Rating: 4.2 / 5 for large enterprises with mature SecOps and C/C++ codebases. 2.5 / 5 for typical mid-market SaaS companies — the proven capability doesn’t yet extend to their tech stacks.

FAQ

What is Microsoft MDASH and how does it differ from traditional SAST tools?

MDASH (Multi-Model Agentic Scanning Harness) is an AI system that orchestrates over 100 specialized agents to find exploitable vulnerabilities in source code. Unlike traditional SAST tools like Veracode or Checkmarx that rely on pattern matching and predefined rules, MDASH uses multi-step reasoning across frontier AI models. Its agents debate findings, validate reachability, and generate proof-of-concept exploits. In testing, it found complex bugs — like a use-after-free requiring reasoning across three concurrent free paths in tcpip.sys — that static analysis consistently misses. Traditional SAST tools still cover breadth; MDASH covers depth.

How many CVEs has MDASH discovered so far?

MDASH has been credited with discovering 16 CVEs patched in the May 2026 Patch Tuesday release. Four were rated Critical: CVE-2026-33827 (tcpip.sys RCE), CVE-2026-33824 (ikeext.dll RCE), CVE-2026-41089 (Netlogon RCE), and CVE-2026-41096 (dnsapi.dll RCE). All 16 affected core Windows networking and authentication components, with the majority remotely exploitable without authentication. The system also found all 21 deliberately planted bugs in a private test driver with zero false positives.

Can enterprise customers use MDASH on their own codebases?

Not yet at scale. MDASH is currently in limited private preview, with enterprise access expected to expand in June 2026. Initial access will likely be restricted to organizations with significant proprietary codebases. The system requires domain-specific plugins — custom engineering that maps your code’s conventions, frameworks, and vulnerability patterns into MDASH’s pipeline. Microsoft hasn’t disclosed whether the preview includes plugin-building support or if customers need to develop their own.

How does MDASH compare to Anthropic Glasswing and OpenAI Daybreak?

All three target AI-powered vulnerability discovery, but the architectures differ. MDASH uses 100+ specialized multi-model agents in a structured pipeline (audit, debate, prove). Anthropic Glasswing is built around their Mythos model in a more focused single-model approach. OpenAI Daybreak uses a multi-model pipeline. On CyberGym, MDASH leads at 88.45%, Glasswing (Mythos) scores 83.1%, and Daybreak (GPT-5.5) scores 81.8%. All three are pre-GA, so direct feature comparisons are premature. The key differentiator today is MDASH’s proven production results (16 real CVEs vs. 9 disclosed from Glasswing).

Does MDASH replace CrowdStrike or SentinelOne?

No. MDASH and CrowdStrike/SentinelOne solve fundamentally different problems. MDASH finds vulnerabilities in source code before software ships — it’s an offensive, proactive tool. CrowdStrike Charlotte AI and SentinelOne Purple AI detect and respond to threats in production environments — they’re defensive, reactive tools. An enterprise SecOps stack needs both capabilities. Budget them as separate line items: vulnerability discovery (MDASH/Glasswing) and runtime detection/response (CrowdStrike/SentinelOne).

What does Microsoft Agent 365 have to do with MDASH?

Agent 365 is Microsoft’s control plane for governing AI agents at enterprise scale. It provides identity management, access control, and observability for any AI agent — including MDASH. Agent 365 became generally available on May 1, 2026 at $15/user/month standalone or included in the Microsoft 365 E7 bundle ($99/user/month). For organizations running MDASH, Agent 365 handles the governance question: who authorized which agent to scan which codebase, what findings were produced, and who acted on them. It’s relevant but separate from MDASH itself.

How much will MDASH cost for enterprise deployment?

Microsoft hasn’t disclosed MDASH pricing. Given the system orchestrates 100+ agents across frontier AI models per scan, expect consumption-based pricing at enterprise tier — likely in the six-figure annual range for full deployment. For comparison, CrowdStrike Charlotte AI adds approximately $96,000-108,000/year for 1,000 endpoints, and SentinelOne Complete with Purple AI runs $135,000-180,000/year at the same scale. MDASH solves a different problem (proactive discovery vs. runtime detection), so the budget impact is additive, not substitutive.

If your organization maintains a large proprietary codebase (especially C/C++ or systems-level code), has a mature vulnerability management pipeline, and already runs the Microsoft security stack, applying for the preview makes sense. You’ll get early access and a direct feedback channel to Microsoft’s Autonomous Code Security team. If your codebase is primarily in managed languages (Python, JavaScript, Go), your SAST/DAST pipeline is immature, or your team lacks capacity to handle high-volume vulnerability findings, wait for GA and evaluate alongside Glasswing and Daybreak when all three have published pricing and proven capability on diverse tech stacks.

Microsoft MDASH: What the New Agentic Security System Means for Enterprise SecOps Teams

Why This Review Matters Now

MDASH at a Glance

How MDASH Actually Works

What Actually Saves Time

Where the Plugin Investment Becomes a Blocker

The CyberGym Benchmark: Context Behind the Score

MDASH vs. the Agentic Security Landscape

How CrowdStrike and SentinelOne Compare (They Don’t, Really)

Agent 365: The Governance Layer

Pricing Reality

Who Should Pay Attention to MDASH

Bottom Line

FAQ

What is Microsoft MDASH and how does it differ from traditional SAST tools?

How many CVEs has MDASH discovered so far?

Can enterprise customers use MDASH on their own codebases?

How does MDASH compare to Anthropic Glasswing and OpenAI Daybreak?

Does MDASH replace CrowdStrike or SentinelOne?

What does Microsoft Agent 365 have to do with MDASH?

How much will MDASH cost for enterprise deployment?

Get the weekly AI stack briefing

Discussion

Microsoft MDASH: What the New Agentic Security System Means for Enterprise SecOps Teams

Why This Review Matters Now

MDASH at a Glance

How MDASH Actually Works

What Actually Saves Time

Where the Plugin Investment Becomes a Blocker

The CyberGym Benchmark: Context Behind the Score

MDASH vs. the Agentic Security Landscape

How CrowdStrike and SentinelOne Compare (They Don’t, Really)

Agent 365: The Governance Layer

Pricing Reality

Who Should Pay Attention to MDASH

Bottom Line

FAQ

What is Microsoft MDASH and how does it differ from traditional SAST tools?

How many CVEs has MDASH discovered so far?

Can enterprise customers use MDASH on their own codebases?

How does MDASH compare to Anthropic Glasswing and OpenAI Daybreak?

Does MDASH replace CrowdStrike or SentinelOne?

What does Microsoft Agent 365 have to do with MDASH?

How much will MDASH cost for enterprise deployment?

Should my SecOps team sign up for the MDASH private preview?

Related Articles

Get the weekly AI stack briefing

Discussion