**Tipping the Cyber Balance: How AI Benchmarks Could Make Software Safer**

In February 2024, a ransomware attack on Change Healthcare disrupted medical claims processing for nearly half of all U.S. healthcare transactions. The breach cost UnitedHealth Group over $2.8 billion, exposed the personal data of 190 million Americans, and forced hospitals nationwide to delay patient care.

The cause? A remote access portal without multi-factor authentication. As one senator put it: “This hack could have been stopped with cybersecurity 101.” This attack illustrates a broader pattern. Critical infrastructure depends on complex systems with sprawling attack surfaces—misconfigurations, excessive privileges, inadequate monitoring, and software vulnerabilities—and attackers are exploiting these weaknesses faster than defenders can address them.

Artificial intelligence is accelerating this dynamic: the same technology that helps developers build applications faster also enables attackers to find and exploit flaws more quickly. According to Google’s Mandiant Threat Intelligence, the average time-to-exploit for vulnerabilities dropped from 63 days in 2018–2019 to just five days in 2023. Some claim that AI systems can now generate working exploits as quickly as 15 minutes following a Common Vulnerabilities and Exposures (CVE) disclosure, a standardized public announcement of a specific software vulnerability.

The same technology that helps developers build applications faster also enables attackers to find and exploit flaws more quickly. AI is intensifying the arms race—but it could also tip the balance toward defenders, if we create the right incentives today.

**A New Approach: Automated Reasoning**

The key to unlocking these capabilities is measurement. Today, most AI evaluations test general reasoning or code generation ability, not security. Rigorous benchmarks evaluating AI’s ability to assist with automated reasoning tasks (writing specifications, proving theorems, generating verified code) would fill that gap and reshape the competitive landscape.

Automated reasoning could help us harden critical software, build trustworthy infrastructure for AI systems, and lay technical foundations for governing advanced AI. Here’s what that would look like:

### Systematic Hardening of Critical Infrastructure

Hospitals, power grids, and telecommunications depend on legacy software that attackers exploit routinely. AI systems capable of identifying flaws, generating verified fixes, and mapping the tangled dependencies in legacy codebases could make large-scale hardening practical.

Translating code to memory-safe languages like Rust—which prevent common security flaws by design—is only part of the challenge; understanding how changes ripple through interconnected systems is where AI could bring additional leverage. CISA’s guidance on memory-safe languages points the way and AI could accelerate this transition.

### Secure Execution Environments

As AI systems become more autonomous, we need infrastructure we can trust to enforce boundaries. Memory-safe runtimes, minimal operating systems, and formally verified execution environments—that is, software infrastructure that is built to enforce boundaries—would ensure that AI agents cannot exceed authorized access, exfiltrate sensitive data, or take actions outside permitted bounds.

### Technical Foundations for International Agreements

Managing risks from advanced AI may eventually require international cooperation—and cooperation requires verification. A recent RAND study identified six independent approaches for verifying compliance with AI agreements, including tamper-resistant hardware, on-chip monitoring, and whistleblower programs. International agreements take years to negotiate, and the underlying technologies must mature in parallel.

Advancing formal methods now could enable cryptographic proof that AI models underwent approved evaluations without tampering—laying the technical groundwork so that credible verification is ready when the diplomatic moment arrives.

### Higher-Assurance AI Control, Alignment, and Interpretability

Rather than relying solely on empirical testing, automated reasoning could provide mathematical guarantees about certain types of AI system behavior. Specifications like “only authorized actions are allowed” can be translated into machine-checkable properties and enforced at critical interfaces—API calls, data access, and system commands—where AI systems interact with external resources.

Research on AI control has shown that safety protocols can be developed that are effective even in situations where models intentionally try to subvert them. While interpretability of neural networks remains nascent, and formal verification of the networks themselves is still in early stages, the interplay between these fields offers practical insights.

### Measuring Security: The Role of Benchmarks

The key to unlocking these capabilities is measurement. Today, most AI evaluations test general reasoning or code generation ability, not security. Rigorous benchmarks evaluating AI’s ability to assist with automated reasoning tasks (writing specifications, proving theorems, generating verified code) would fill that gap and reshape the competitive landscape.

If the market comes to value security benchmarks the way it now values math and coding tests, competitive pressure will drive AI labs to invest in verifiable safety, and the entire ecosystem will benefit. Automated reasoning and formal methods-based approaches have long been too labor-intensive for widespread use. But AI can change that equation.

### Accelerating Progress with Incentives

Models that excel at reasoning about security properties of software designs and faithfully implementing those designs with memory-safe code and securely validated inputs, could make verification practical at scale. The missing piece is incentives: comprehensive benchmarks that make these capabilities visible and valued.

The foundation already exists. Projects like seL4, a formally verified microkernel (the core software layer that manages hardware access), demonstrated over a decade ago that mathematical proof of software correctness is achievable. DARPA’s HACMS program leveraged seL4 to create software for a military helicopter that proved unassailable by a world-class red team with full knowledge of the system.

### Ensuring Continued American AI Leadership

By investing in benchmarks that reward verifiable security, and advancing the defensive applications in parallel, we can convert AI’s disruptive capabilities into structured defensive advantages. The goal: ensuring that the balance tips toward those who build, not those who break.

**Gopal Sarma is a member of the AI Security Cluster in RAND’s Center for AI, Security, and Technology (CAST). Kathleen Fisher is an expert in formal methods, programming languages, and cybersecurity, and was previously a technical resident at RAND CAST.**

References:

* Verifying International Agreements on AI: Six Layers of Verification for Rules on Large-Scale AI Development and Deployment * Four Governance Approaches to Securing Advanced AI * RAND Center on AI, Security, and Technology (RAND CAST)