Microsoft MDASH: What the New Agentic Security System Means for Enterprise SecOps Teams

Microsoft's MDASH uses 100+ AI agents to find exploitable vulnerabilities at scale. Here's what enterprise SecOps teams need to know about the system, its limits, and how it compares.


TLDR: Microsoft’s MDASH (Multi-Model Agentic Scanning Harness) is the first AI vulnerability discovery system to produce production-grade results at enterprise scale — 16 real Windows CVEs patched in May 2026, zero false positives in controlled testing, and the top CyberGym benchmark score at 88.45%. It is not a replacement for your SAST/DAST pipeline or your SOC platform. It is a specialized offensive research tool that finds the hard bugs — kernel race conditions, cross-file double-frees — that traditional scanners miss. Enterprise SecOps teams should watch for the private preview opening in June 2026, but the immediate action item is evaluating whether your organization has the governance maturity to operationalize machine-generated vulnerability findings at the rate MDASH produces them.

Why This Review Matters Now

On May 12, 2026, Microsoft announced that an internal AI system codenamed MDASH had discovered 16 previously unknown vulnerabilities across the Windows networking and authentication stack — including four critical remote code execution flaws — all patched in that week’s Patch Tuesday release. The system orchestrates over 100 specialized AI agents across multiple frontier and distilled models.

This matters for enterprise SecOps for a specific reason: MDASH is not a research demo. It found CVE-2026-33824, a double-free in the IKEv2 service exploitable for pre-authentication remote code execution with LocalSystem privileges. It found CVE-2026-33827, a use-after-free in the TCP/IP stack requiring reasoning across three concurrent free paths. These are bugs that static analysis tools and single-model AI approaches consistently miss because they require multi-step reasoning about object lifetimes, concurrency, and cross-file control flow.

The timing also matters because MDASH enters a suddenly crowded field. Anthropic launched Project Glasswing and OpenAI launched Daybreak in the same quarter — all targeting AI-powered vulnerability discovery. Enterprise security architects now face a real question: which of these systems (if any) should be on your evaluation shortlist, and how do they compare to the CrowdStrike and SentinelOne AI features you may already be paying for?

MDASH at a Glance

DimensionMicrosoft MDASHAnthropic GlasswingOpenAI DaybreakCrowdStrike Charlotte AISentinelOne Purple AI
Primary functionOffensive vuln discoveryOffensive vuln discoveryOffensive vuln discoverySOC acceleration / threat huntingSOC acceleration / investigation
Architecture100+ multi-model agentsSingle-model (Mythos)Multi-model pipelineAgentic SOAR moduleNL threat investigation
CyberGym score88.45%83.1%81.8% (GPT-5.5)N/AN/A
Production CVEs found16 (May 2026 Patch Tuesday)9 (disclosed April 2026)Not disclosedN/A — detection focusN/A — detection focus
False positive rate0% (21/21 test)Not disclosedNot disclosedVaries by environmentVaries by environment
AvailabilityLimited private previewPrivate betaWaitlistGA (add-on)GA (Complete tier+)
PricingNot disclosedNot disclosedNot disclosed~$8-9/endpoint/mo add-onIncluded in Complete ($180/endpoint/yr)
Best forProactive code-level vuln huntingProactive code-level vuln huntingProactive code-level vuln huntingRuntime threat detection + responseRuntime threat detection + response

Earned insight: The table above reveals the gap that most coverage misses: MDASH, Glasswing, and Daybreak are fundamentally different tools than Charlotte AI and Purple AI. The first three find vulnerabilities in source code before they’re exploited. The latter two detect and respond to attacks in production. Comparing MDASH to CrowdStrike is like comparing a building inspector to a fire department — both important, neither a substitute for the other. Enterprise SecOps teams need both capabilities, and the budget line items are separate.

How MDASH Actually Works

MDASH is not a single model that scans code. It is a five-stage pipeline orchestrating specialized agents with distinct roles:

Stage 1 — Prepare: Ingests the target codebase, builds language-aware indices, analyzes past commits to draw attack surface maps and threat models. This is where domain knowledge enters the system — kernel calling conventions, IRP lock invariants, IPC trust boundaries.

Stage 2 — Scan: Specialized “auditor” agents examine candidate code paths, each trained on historical CVE patterns and their patches. MDASH runs more than 100 of these agents, constructed through deep research with past CVEs. Their results are ensembled into a single report.

Stage 3 — Validate: A second cohort of “debater” agents argues for and against each finding’s reachability and exploitability. Disagreement between models is itself a signal — when an auditor flags a suspect finding and the debater cannot refute it, the finding’s credibility increases.

Stage 4 — Dedup: Collapses semantically equivalent findings using patch-based grouping to prevent the same root cause from appearing as multiple reports.

Stage 5 — Prove: Constructs and executes triggering inputs to dynamically validate the vulnerability exists. For example, it generates ASan-triggering inputs for C/C++ memory bugs.

What Works

The architecture’s key advantage is model agnosticism. MDASH runs a configurable panel: state-of-the-art models as heavy reasoners, distilled models as cost-effective debaters for high-volume passes, and a second separate SOTA model as an independent counterpoint. When a new model generation lands, A/B testing it against the current panel is a configuration change — not a rewrite.

The results validate this approach. Against StorageDrive (a private Microsoft test driver never included in any model’s training data), MDASH found all 21 deliberately injected vulnerabilities with zero false positives. Against real Windows production code, it achieved 96% recall against five years of confirmed MSRC cases in clfs.sys and 100% in tcpip.sys.

The 16 CVEs discovered span critical Windows infrastructure: the kernel TCP/IP stack (tcpip.sys), IKEv2 service (ikeext.dll), HTTP.sys, Netlogon, DNS resolution (dnsapi.dll), and even the Telnet client. Ten are kernel-mode, six are user-mode. The majority are remotely exploitable without authentication.

Where It Struggles

MDASH is currently limited to Microsoft’s own codebases. The private preview is expected to open to enterprise customers in June 2026, but initial access will likely be restricted to organizations running significant proprietary C/C++ codebases — not your average SaaS shop scanning Python microservices.

The system requires substantial domain plugin investment. The CLFS proving plugin Microsoft describes — one that knows how to construct triggering log files for a specific bug class — is custom engineering work. Extending MDASH to new codebases, languages, and vulnerability classes means building these plugins, which takes security engineering expertise most organizations do not have in-house.

There is also no public pricing. Microsoft has not disclosed what the private preview or eventual GA product will cost. Given the compute intensity of running 100+ agents per scan across frontier models, expect enterprise-tier pricing — likely consumption-based, not a flat per-seat fee.

Warning: MDASH’s zero false positive rate on StorageDrive (21/21, zero FP) is impressive but misleading if taken at face value. StorageDrive is a deliberately vulnerable test driver — a controlled environment. Real-world codebases with millions of lines, legacy patterns, and ambiguous intent will produce noisier results. Microsoft’s 96% recall figure on clfs.sys is more indicative of actual performance, and even that was against code Microsoft’s team already understands deeply. Ask about real-world false positive rates before committing budget.

MDASH Strengths:

  • Multi-model architecture finds complex bugs that single-model and traditional SAST tools miss (concurrency, cross-file lifetime issues)
  • Zero false positives in controlled testing; 96-100% recall on real MSRC cases
  • Model-agnostic design means performance improves as frontier models advance — no vendor lock to one LLM
  • Proven at enterprise scale on one of the world’s most complex codebases (Windows kernel)
  • Pipeline stages (audit, debate, prove) provide structured validation, not just pattern matching

MDASH Weaknesses:

  • Not available to most enterprises yet (limited private preview, GA timing unclear)
  • No public pricing — expect significant cost for 100+ agent orchestration on frontier models
  • Requires custom domain plugins for new codebases and vulnerability classes
  • Currently proven only on C/C++ kernel-level code; applicability to managed languages, cloud-native, and web stacks is unproven
  • Produces vulnerability findings, not remediations — your team still needs to triage, patch, and deploy

The CyberGym Benchmark: Context Behind the Score

MDASH’s 88.45% score on the CyberGym benchmark — a public evaluation of 1,507 real-world vulnerability reproduction tasks across 188 open-source projects — leads the leaderboard by roughly five points. Here is how the field stacks up:

SystemCyberGym ScoreNotes
Microsoft MDASH88.45%Multi-model agentic harness, 100+ agents
Anthropic Claude Mythos Preview83.1%Single-model approach (Project Glasswing)
OpenAI GPT-5.581.8%Multi-model pipeline (Daybreak)
OpenAI GPT-5.479.0%Previous generation
Anthropic Claude Opus 4.7 Adaptive73.1%Non-Mythos variant
Zhipu AI GLM-5.168.7%Chinese frontier model

Earned insight: The five-point gap between MDASH and Anthropic Mythos is significant, but the real takeaway is architectural: MDASH’s advantage comes from multi-model agent orchestration, not from having a better single model. Microsoft’s own blog post makes this explicit — “the durable advantage lies in the agentic system around the model rather than any single model itself.” This has a direct practical implication: as competing models improve (and they will), the orchestration layer is what sustains the lead. Enterprise security teams evaluating these tools should weight the system design over the current benchmark score.

One important caveat: CyberGym scores are self-reported. The benchmark code is public, but independent verification of all scores has not been conducted. This is standard for AI leaderboards but worth noting when procurement decisions are involved.

MDASH vs. the Agentic Security Landscape

Enterprise SecOps teams are now facing four distinct categories of AI security tooling, and conflating them leads to bad procurement decisions:

Category 1 — AI Vulnerability Discovery (Offensive): MDASH, Anthropic Glasswing, OpenAI Daybreak. These systems find bugs in source code proactively. They are research and engineering tools, not SOC tools.

Category 2 — AI SOC Acceleration (Defensive): CrowdStrike Charlotte AI, SentinelOne Purple AI, Microsoft Security Copilot. These tools accelerate threat detection, investigation, and response in production environments.

Category 3 — AI Agent Governance: Microsoft Agent 365, which provides a control plane to observe, govern, and secure AI agents (including systems like MDASH) at enterprise scale.

Category 4 — Traditional SAST/DAST: Veracode, Checkmarx, Snyk. These still handle the vast majority of application security scanning. MDASH does not replace them — it finds what they cannot.

How CrowdStrike and SentinelOne Compare (They Don’t, Really)

CrowdStrike’s Charlotte AI (Agentic SOAR, launched November 2025) automates SOC tasks: detection triage, malware analysis, and threat hunting via natural language. It is a Falcon platform add-on priced at approximately $8-9 per endpoint per month on a credit-based model for agentic actions.

SentinelOne’s Purple AI provides natural-language threat investigation and is bundled into the Singularity Complete tier at $179.99/endpoint/year (list price; enterprise negotiated pricing ranges $135-153/endpoint/year for 200-2,000 endpoints). The Enterprise tier with Agentic AI SOC Analyst requires custom pricing.

Neither product does what MDASH does. Charlotte AI and Purple AI assume vulnerabilities already exist in production and help you detect exploitation. MDASH finds the vulnerabilities before they ship. The budget conversation should treat these as complementary line items, not alternatives.

Agent 365: The Governance Layer

Microsoft Agent 365, generally available since May 1, 2026, is the control plane for managing AI agents across the enterprise — including systems like MDASH. It extends Microsoft Entra, Purview, and Defender to provide centralized oversight, identity management, and access control for AI agents.

Pricing: $15/user/month standalone, or included in the new Microsoft 365 E7 suite at $99/user/month (bundling E5 + Copilot + Entra Suite + Agent 365). Verified May 14, 2026.

For organizations running MDASH or any other agentic AI system, Agent 365 addresses a governance gap that 83% of organizations planning to deploy agentic AI have not solved — per a Reddit-cited industry survey, only 29% of organizations feel adequately prepared to secure their AI agent deployments.

Tip: If your organization is already on Microsoft 365 E5 plus Copilot, do the math on E7. The $99/user/month bundle includes Agent 365 ($15 standalone) plus Entra Suite — the marginal cost may be lower than purchasing governance tooling separately. But only commit if you have AI agents to govern. Agent 365 without agentic workloads is an expensive shelf ornament.

Pricing Reality

SolutionPricing ModelEstimated Annual Cost (1,000 users/endpoints)What You Get
Microsoft MDASHNot disclosed (private preview)Unknown — expect consumption-based, likely $100K+ annuallyProactive source-code vulnerability discovery
Microsoft Agent 365$15/user/mo standalone$180,000/yrAI agent governance and observability
Microsoft 365 E7 (includes Agent 365)$99/user/mo$1,188,000/yrFull E5 + Copilot + Entra + Agent 365 bundle
CrowdStrike Charlotte AI~$8-9/endpoint/mo add-on (credit-based)~$96,000-108,000/yr (on top of Falcon platform)SOC automation, threat hunting
SentinelOne Purple AIIncluded in Complete tier ($180/endpoint/yr list)$135,000-180,000/yrNL threat investigation, Agentic SOC Analyst (Enterprise tier)

The total cost of ownership for an enterprise SecOps stack that includes proactive vulnerability discovery AND runtime detection AND agent governance is significant. A mid-size enterprise (1,000 endpoints) running CrowdStrike or SentinelOne plus Agent 365 plus MDASH (when available) could easily exceed $400,000 annually before implementation costs, training, and the security engineering staff required to operationalize MDASH’s findings.

Hidden cost to watch: MDASH findings create remediation work. Every vulnerability it discovers needs to be triaged, patched, tested, and deployed. If MDASH scans your codebase and produces 50 findings in a week, your engineering team needs capacity to handle them. The tool finds bugs — it does not fix them.

Pricing verified: May 14, 2026. Agent 365 pricing from Microsoft Product Terms. CrowdStrike and SentinelOne pricing from vendor pricing pages and enterprise negotiation estimates.

Who Should Pay Attention to MDASH

Good fit:

  • Organizations with large proprietary C/C++ or systems-level codebases (OS vendors, embedded systems, firmware)
  • Security teams that already have mature vulnerability management processes and can operationalize high-volume findings
  • Companies running the Microsoft security stack (Defender + Entra + Purview) where Agent 365 integration provides governance out of the box
  • Enterprises with dedicated offensive security or red team functions that would use MDASH to augment manual code review

Not a good fit:

  • SaaS companies writing primarily in Python, JavaScript, or Go — MDASH’s proven capability is on C/C++ kernel code, and applicability to managed-language stacks is unvalidated
  • Organizations without a mature vulnerability management pipeline — MDASH will overwhelm teams that cannot triage findings at volume
  • Small and mid-size businesses looking for general application security — traditional SAST/DAST tools (Veracode, Snyk, Checkmarx) cover this at a fraction of the expected cost
  • Teams looking for a SOC replacement — MDASH does not detect runtime threats, respond to incidents, or replace your EDR platform

Reasons to Join the Private Preview:

  • First-mover advantage on AI vulnerability discovery for your proprietary codebase
  • Direct feedback channel to Microsoft’s Autonomous Code Security team during the preview
  • Governance integration with Agent 365 is built in from day one
  • Model-agnostic architecture means your investment carries forward as models improve

Reasons to Wait:

  • Pricing is unknown — preview commitments could lock you into unfavorable terms
  • The system is proven on Windows kernel code, not necessarily on your tech stack
  • Anthropic Glasswing and OpenAI Daybreak are competing — waiting 6 months lets you compare GA products
  • Your team may not have the security engineering capacity to operationalize findings at MDASH’s output rate

Bottom Line

Microsoft MDASH represents a genuine inflection point: AI vulnerability discovery has moved from academic research to production-grade engineering that finds critical, exploitable bugs in one of the most complex codebases on Earth. The 16 CVEs patched in May 2026 — including four critical RCEs in the Windows TCP/IP stack and IKEv2 service — are not theoretical. They were real, remotely exploitable, pre-authentication vulnerabilities that traditional tools did not catch.

The single biggest prerequisite for enterprise adoption is governance maturity. MDASH produces findings at a rate and complexity level that many security teams are not equipped to handle. Before signing up for the private preview, honest assessment is required: does your organization have the vulnerability management pipeline, the security engineering talent, and the remediation capacity to turn MDASH’s output into patched code? If the answer is no, investing in Agent 365 and maturing your governance processes is the more urgent spend.

For organizations with large proprietary codebases and mature security operations, MDASH should be on the evaluation shortlist alongside Anthropic Glasswing and OpenAI Daybreak. The architectural approach — multi-model agent orchestration where disagreement between models strengthens findings — is sound and likely to improve as frontier models advance. But the competitive landscape is moving fast: evaluate all three before committing.

Rating: 4.2 / 5 for large enterprises with mature SecOps and C/C++ codebases. 2.5 / 5 for typical mid-market SaaS companies — the proven capability does not yet extend to their tech stacks.

FAQ

What is Microsoft MDASH and how does it differ from traditional SAST tools?

MDASH (Multi-Model Agentic Scanning Harness) is an AI system that orchestrates over 100 specialized agents to find exploitable vulnerabilities in source code. Unlike traditional SAST tools like Veracode or Checkmarx that rely on pattern matching and predefined rules, MDASH uses multi-step reasoning across frontier AI models. Its agents debate findings, validate reachability, and generate proof-of-concept exploits. In testing, it found complex bugs — like a use-after-free requiring reasoning across three concurrent free paths in tcpip.sys — that static analysis consistently misses. Traditional SAST tools still cover breadth; MDASH covers depth.

How many CVEs has MDASH discovered so far?

MDASH has been credited with discovering 16 CVEs patched in the May 2026 Patch Tuesday release. Four were rated Critical: CVE-2026-33827 (tcpip.sys RCE), CVE-2026-33824 (ikeext.dll RCE), CVE-2026-41089 (Netlogon RCE), and CVE-2026-41096 (dnsapi.dll RCE). All 16 affected core Windows networking and authentication components, with the majority remotely exploitable without authentication. The system also found all 21 deliberately planted bugs in a private test driver with zero false positives.

Can enterprise customers use MDASH on their own codebases?

Not yet at scale. MDASH is currently in limited private preview, with enterprise access expected to expand in June 2026. Initial access will likely be restricted to organizations with significant proprietary codebases. The system requires domain-specific plugins — custom engineering that maps your code’s conventions, frameworks, and vulnerability patterns into MDASH’s pipeline. Microsoft has not disclosed whether the preview includes plugin-building support or if customers need to develop their own.

How does MDASH compare to Anthropic Glasswing and OpenAI Daybreak?

All three target AI-powered vulnerability discovery, but the architectures differ. MDASH uses 100+ specialized multi-model agents in a structured pipeline (audit, debate, prove). Anthropic Glasswing is built around their Mythos model in a more focused single-model approach. OpenAI Daybreak uses a multi-model pipeline. On CyberGym, MDASH leads at 88.45%, Glasswing (Mythos) scores 83.1%, and Daybreak (GPT-5.5) scores 81.8%. All three are pre-GA, so direct feature comparisons are premature. The key differentiator today is MDASH’s proven production results (16 real CVEs vs. 9 disclosed from Glasswing).

Does MDASH replace CrowdStrike or SentinelOne?

No. MDASH and CrowdStrike/SentinelOne solve fundamentally different problems. MDASH finds vulnerabilities in source code before software ships — it is an offensive, proactive tool. CrowdStrike Charlotte AI and SentinelOne Purple AI detect and respond to threats in production environments — they are defensive, reactive tools. An enterprise SecOps stack needs both capabilities. Budget them as separate line items: vulnerability discovery (MDASH/Glasswing) and runtime detection/response (CrowdStrike/SentinelOne).

What does Microsoft Agent 365 have to do with MDASH?

Agent 365 is Microsoft’s control plane for governing AI agents at enterprise scale. It provides identity management, access control, and observability for any AI agent — including MDASH. Agent 365 became generally available on May 1, 2026 at $15/user/month standalone or included in the Microsoft 365 E7 bundle ($99/user/month). For organizations running MDASH, Agent 365 handles the governance question: who authorized which agent to scan which codebase, what findings were produced, and who acted on them. It is relevant but separate from MDASH itself.

How much will MDASH cost for enterprise deployment?

Microsoft has not disclosed MDASH pricing. Given the system orchestrates 100+ agents across frontier AI models per scan, expect consumption-based pricing at enterprise tier — likely in the six-figure annual range for meaningful deployment. For comparison, CrowdStrike Charlotte AI adds approximately $96,000-108,000/year for 1,000 endpoints, and SentinelOne Complete with Purple AI runs $135,000-180,000/year at the same scale. MDASH solves a different problem (proactive discovery vs. runtime detection), so the budget impact is additive, not substitutive.

Should my SecOps team sign up for the MDASH private preview?

If your organization maintains a large proprietary codebase (especially C/C++ or systems-level code), has a mature vulnerability management pipeline, and already runs the Microsoft security stack, applying for the preview makes sense. You will get early access and a direct feedback channel to Microsoft’s Autonomous Code Security team. If your codebase is primarily in managed languages (Python, JavaScript, Go), your SAST/DAST pipeline is immature, or your team lacks capacity to handle high-volume vulnerability findings, wait for GA and evaluate alongside Glasswing and Daybreak when all three have published pricing and proven capability on diverse tech stacks.


Marcus Webb — Principal Site Reliability Engineer
Marcus Webb Principal Site Reliability Engineer

Marcus brings 22 years of infrastructure and observability experience, having built SRE practices from the ground up at organizations ranging from 500 to 50,000 employees. He has run head-to-head evaluations of Datadog, Dynatrace, and New Relic in production environments, designed AIOps-driven incident response workflows, and led platform migrations that most vendors say are impossible. His reviews focus on what breaks at 2 AM, not what looks good in a demo.

Discussion