Ethereum smart contracts hacked by Ai expose multimillion‑dollar risks in scone‑bench

Ethereum smart contracts hacked by AI: new benchmark reveals multimillion‑dollar risks
————————————————————————————-

Artificial intelligence is no longer just helping developers write blockchain code — it is learning how to break it. New research shows that advanced AI systems can independently discover and exploit vulnerabilities in Ethereum smart contracts, converting obscure bugs into simulated seven‑figure profits.

A collaboration between Anthropic and MATS Fellows introduced a new evaluation framework, Smart CONtracts Exploitation benchmark (SCONE‑bench), designed to measure AI cyber capabilities not in abstract scores, but in money. Instead of asking whether an AI can “find a bug,” the benchmark asks a more concrete question: How much value can it steal?

SCONE‑bench: measuring hacks in dollars, not just bugs

SCONE‑bench compiles 405 real‑world smart contracts that were actually exploited on Ethereum and compatible chains between 2020 and 2025. These are not synthetic examples or toy projects; they are contracts that once held real user funds and suffered real security incidents.

Within this testbed, researchers ran leading AI models — including Claude Opus 4.5, Claude Sonnet 4.5, and GPT‑5 — and asked them to identify vulnerabilities and construct working exploits. Instead of counting how many bugs they spotted, the team calculated the hypothetical value that could have been extracted, based on on‑chain data and market conditions at the time of each historical exploit.

Across all 405 contracts, ten different AI models collectively generated working exploit strategies for 207 of them. If executed on‑chain, those exploit plans would have translated into an estimated $550.1 million in stolen assets. This figure is not a theoretical maximum but an empirically grounded estimate derived from historical liquidity and price conditions.

Frontier models breach post‑cutoff contracts

To tackle concerns about “data contamination” — the possibility that models simply memorized past exploits from their training data — the researchers focused specifically on contracts attacked after March 2025, beyond many models’ training cutoffs.

Even under this stricter lens, Claude Opus 4.5, Claude Sonnet 4.5, and GPT‑5 successfully developed new exploit strategies that matched or exceeded the performance of real‑world attackers. In those simulated post‑cutoff scenarios, their cumulative exploits were valued at $4.6 million.

The key point: these systems were not just recalling known attack patterns. They were reasoning through code, analyzing control flows and edge cases, and constructing novel attack sequences that worked against previously unseen contracts.

Zero‑day vulnerabilities: AI finds flaws before humans do

The researchers then pushed the experiment further. They assembled a larger dataset of 2,849 newly deployed smart contracts with no publicly known vulnerabilities or recorded exploits. These contracts were, to the best of current knowledge, secure.

Within this pool, GPT‑5 and Claude Sonnet 4.5 independently surfaced two previously unknown zero‑day vulnerabilities. In controlled simulations, exploiting these flaws generated a combined notional profit of just under $3,700.

While that amount is small compared to the hundreds of millions modeled elsewhere in the study, its significance is enormous: it is direct evidence that state‑of‑the‑art AI can uncover and weaponize fresh, undisclosed vulnerabilities on its own.

A concrete exploit: abusing a misconfigured token calculator

One of the example vulnerabilities highlighted in the research involved a token accounting function on an Ethereum‑compatible contract. The function, intended to be read‑only, had mistakenly been left writable, allowing the token balance calculation to be altered externally.

The AI agent analyzed the contract, noticed this improper write access, and devised a strategy: repeatedly invoke the function to artificially inflate its own token balance. In the benchmark environment, this attack netted roughly $2,500 in simulated profits. Under peak liquidity conditions, the same logic could have been pushed to siphon an estimated $19,000 before markets reacted.

In the real world, a white‑hat responder eventually detected and mitigated this issue, recovering exposed assets. Yet the fact that an AI system, given only the contract code and on‑chain context, independently discovered the same path is a stark demonstration of its growing offensive capabilities.

Why smart contracts are a perfect testbed for AI hackers

Traditional cybersecurity benchmarks evaluate models on criteria like “detection accuracy” or synthetic challenge scores. SCONE‑bench takes a different path by focusing on direct financial impact. Smart contracts are uniquely suited to this: a vulnerability in a DeFi lending pool, token bridge, or DEX is often just one transaction away from an actual theft.

Because smart contract execution is deterministic and public, researchers can replay historical states, simulate transactions, and quantify exactly how much could have been stolen in any given attack scenario. This allows them to assign a clear price tag to each exploit strategy an AI proposes, transforming security evaluation from vague risk assessments into precise financial measurements.

AI is approaching human‑level offensive security skill

The study concludes that leading AI agents are nearing — and in some contexts matching — human expertise in several core areas of offensive security:

– Control‑flow reasoning: understanding complex branching logic, reentrancy conditions, and multi‑step interactions across different contracts.
– Boundary and edge‑case analysis: spotting off‑by‑one errors, missing access controls, and inconsistencies in state transitions or arithmetic.
– Exploit construction: turning a conceptual flaw into a concrete sequence of blockchain transactions that move assets in the attacker’s favor.

These are not narrow, hard‑coded tricks. They are generalizable skills that apply both to blockchain ecosystems and to traditional web and application software. The same reasoning that lets an AI plan a DeFi exploit can be adapted to find buffer overflows, privilege escalations, or logic flaws in centralized systems.

From proof‑of‑concept to real‑world threat

The researchers frame their results as a proof‑of‑concept for profitable autonomous exploitation, not as a hypothetical future risk. The study demonstrates that with current‑generation models, it is already feasible to:

– Analyze large corpora of live smart contracts at scale
– Automatically rank them by exploitability and projected profit
– Generate step‑by‑step attack strategies that can be executed with minimal human oversight

This shifts the threat model. In the past, large‑scale DeFi exploits typically required specialized technical skill, significant time investment, and a deep understanding of Ethereum internals. Now, an attacker with basic blockchain knowledge could, in principle, lean on an AI assistant to handle the heavy lifting: from vulnerability discovery and exploit design to transaction crafting.

Accelerating AI cyber capabilities beyond blockchain

The work also places smart contract exploitation within a broader trend: rapidly evolving AI‑driven cyber offense. Similar techniques are being explored in areas like:

– Autonomous network scanning and intrusion
– Automated phishing and social engineering content generation
– Exploit chain construction across multiple layers of a software stack

What makes blockchain particularly alarming is the directness of the payoff. Unlike traditional systems, where monetization may require additional steps (selling stolen data, laundering funds), a DeFi exploit can immediately drain on‑chain liquidity pools or treasury wallets in a single atomic transaction.

Defensive implications: using AI to stress‑test DeFi

SCONE‑bench is not just a red‑team tool; it is designed as a defensive instrument as well. By embedding AI agents into their development pipeline, smart contract teams can:

– Run pre‑deployment stress tests on new contracts against a battery of AI exploit attempts
– Identify high‑value vulnerabilities before they are visible on the public network
– Quantify worst‑case financial losses under different attack scenarios
– Prioritize patching and audits based on projected exploit profitability

In practice, this means that responsibly used AI can become a powerful ally for security auditors, code reviewers, and protocol governance bodies. Instead of relying solely on manual review and static analysis tools, teams can pit their contracts against the same class of AI that could be used by attackers, effectively “fighting fire with fire.”

What this means for Ethereum developers and DeFi teams

For developers building on Ethereum and EVM‑compatible chains, the study carries several clear takeaways:

1. Assume AI‑augmented attackers exist. Even if full autonomy is not yet widespread in the wild, it is prudent to design as if determined adversaries already have access to similar tools.
2. Security must be continuous. A one‑time audit is no longer enough. Contracts in production should be periodically re‑evaluated with updated AI models that might uncover new exploit paths.
3. Economic modeling is part of security. Understanding how much can be stolen, under what liquidity and price conditions, helps teams decide which risks are acceptable and which require immediate intervention.
4. Complexity is a liability. The more intricate the control flow and cross‑contract interactions, the more room there is for AI systems to discover non‑obvious edge cases that humans miss.

For treasuries, DAOs, and users, the implication is simple: protocols that integrate AI‑driven security testing and transparent risk assessment are likely to be safer than those that do not.

Policy and regulatory angles: AI‑resilient financial infrastructure

The findings also have implications for regulators and policymakers overseeing the intersection of AI and digital finance:

– Systemic risk: If AI systems can systematically identify and exploit weaknesses across many DeFi platforms, the resulting cascade of failures could produce broader financial instability in crypto markets.
– Standards and best practices: There may be a growing case for industry‑wide guidelines around AI‑based security testing, responsible disclosure, and the safe deployment of autonomous agents interacting with financial contracts.
– Liability and accountability: As AI takes on a more direct role in discovering and executing exploits, legal frameworks will have to grapple with questions of responsibility, especially when tools can be dual‑use — useful for defense and attack alike.

Forward‑looking regulatory approaches may treat AI‑resilient design as a key property of trustworthy financial infrastructure, alongside traditional concerns like market integrity and consumer protection.

The road ahead: AI as both attacker and guardian

The SCONE‑bench study highlights a stark duality. The same frontier models that demonstrate the ability to extract hundreds of millions of dollars in simulated attacks can also be repurposed as powerful guardians of the ecosystem:

– Helping auditors and white‑hat hackers catch issues faster
– Giving smaller teams access to near‑expert‑level security analysis
– Supporting incident response by rapidly modeling exploit paths and mitigation options

Whether Ethereum and the wider blockchain space become more secure or more fragile in the age of AI will depend largely on how quickly developers, platforms, and institutions embrace AI‑powered defenses — and how seriously they treat the emerging evidence that autonomous exploitation is no longer science fiction but an operational reality.