The Axios Compromise Is the Warning Shot. AI Is the Real Story.
← All writing

The Axios Compromise Is the Warning Shot. AI Is the Real Story.

The Axios npm compromise on March 31 hit a library with 100 million weekly downloads. The attack wasn't technically sophisticated — it exploited assumptions baked into how we build software. AI is about to change what both attackers and defenders can do with those assumptions.

Fig 01A three-hour window. That was long enough.

On March 31, a malicious version of Axios — a library shipping in roughly 80% of cloud and code environments — was live on npm for three hours. That was long enough. The attack wasn't technically sophisticated. It worked because of structural assumptions in how we build software. AI is about to change what both attackers and defenders can do with those assumptions.

A quick roadmap. Software supply chain security has rested on one assumption for twenty years — if you use someone else's code, you're effectively stuck with it. Replacing it is too expensive. So you build defenses around it instead. AI is taking that assumption away. When rewriting a library becomes cheap, the old defenses matter less and the code you're carrying matters more. The Axios compromise is exactly the kind of attack the old model was built to prevent, and didn't. Below is what happened, why the current toolkit missed it, and what changes when AI shows up on both sides of the fight.

Three Hours on March 31

The Axios incident is worth walking through in detail, because the details are the argument.

Axios is the most popular HTTP client library for JavaScript. It is downloaded roughly 100 million times a week. It ships inside roughly 80% of cloud and code environments. If your application makes HTTP calls from JavaScript, there's a good chance Axios is somewhere in your dependency tree — even if you didn't install it directly.

On March 31, 2026, a North Korean state actor Microsoft has named Sapphire Sleet published two malicious releases of the package: axios@1.14.1 and axios@0.30.4. They didn't exploit a bug in npm. They didn't exploit a bug in Axios. They targeted a human. Through a social engineering campaign that started with RAT malware on the maintainer's personal computer, they stole the npm credentials of jasonsaayman — one of the project's primary maintainers — and used them to publish.

The malicious payload wasn't in the Axios code itself. The attackers added a new dependency to the package.json file of the new Axios releases: plain-crypto-js@4.2.1. An earlier "clean" version of the same package, 4.2.0, had been published eighteen hours before — just long enough to give the package a plausible-looking registry history. The real payload lived in 4.2.1. Its post-install script acted as a dropper for a cross-platform Remote Access Trojan, selecting a Windows, macOS, or Linux payload based on the victim's operating system. The RAT gave Sapphire Sleet arbitrary PowerShell execution, filesystem enumeration, and the ability to inject additional binaries directly into memory without ever writing them to disk.

Three hours between publication and removal. During those three hours, every CI pipeline, every developer machine, and every deploy job that pulled the latest Axios was exposed. Microsoft, Elastic Security Labs, and Palo Alto's Unit 42 confirmed actual execution in about 3% of the affected environments — which sounds small until you multiply it across a userbase of this size.

Three hours was enough.

What's Actually In Your Code

Most modern applications are mostly other people's code.

When a developer adds a library — a packaged chunk of code that does something useful — that library usually depends on other libraries. Those libraries depend on other libraries. A single npm install in a JavaScript project can pull in more than a thousand of these indirect packages. They're called transitive dependencies, because they come along for the ride with something you installed on purpose.

Axios is a direct dependency in some projects and a transitive dependency in thousands of others. It's inside frameworks, internal tooling, observability agents, and CI utilities that your team never explicitly chose. On March 31, that transitive reach became the blast radius.

This is normal. Every modern codebase works this way. Writing everything from scratch was never realistic, so the ecosystem evolved to share. Security tools adapted to that reality. Software composition analysis (SCA) tools list the packages you're using and flag the ones with known vulnerabilities. A software bill of materials (SBOM) is a shipping manifest for your code. Tools like Sigstore and SLSA attach cryptographic signatures to prove a package came from a specific source. Lockfiles freeze the exact versions you're using so nobody can swap in a different one without you noticing.

All of this is useful. All of it rests on the same assumption — that you can't realistically replace the dependencies, so the best you can do is defend the position you're stuck in.

That's the assumption that's changing.

Why the Old Model Couldn't Stop This

Walk back through the Axios attack and list what should have stopped it.

A vulnerability scanner looking for known CVEs — useless. There was no CVE. The package was malicious from the moment it was published.

A software bill of materials — useless. The SBOM listed axios@1.14.1 and plain-crypto-js@4.2.1. Both were real, freshly-signed, legitimately published packages. The inventory was correct. The inventory was just of malware.

A signature-based trust check — useless. The malicious versions were signed by the rightful maintainer's account. The signature is proof of where the package came from, not proof that what's inside behaves as advertised.

Version pinning — partially useful. Teams that had pinned to 1.14.0 or 0.30.3 weren't pulled in automatically. Teams running npm install without a lockfile, or teams with auto-update workflows, took the new versions immediately.

This isn't a failure of any individual tool. It's the shape of the defensive architecture. Every one of those tools is built around the same core move — protect your position inside a dependency tree you can't shrink. None of them are built to answer the question "does this package, from this maintainer, right now, behave the way it claims to?" That question has no general answer. So we build around it.

The Axios incident is unusual for its visibility and attribution, not its structure. The same structural pattern — a compromised maintainer, a small injection of malicious code, cascade through transitive dependencies — has repeated over and over in the last two years. In March 2024, CVE-2024-3094 revealed a backdoor that had been patiently planted in XZ Utils, a compression library buried inside most Linux distributions, by an attacker who had spent more than two years building trust as a legitimate maintainer. In September 2025, a self-replicating worm nicknamed Shai-Hulud compromised eighteen widely-used npm packages with 2.6 billion combined weekly downloads and pivoted into hundreds more by reusing stolen publishing tokens. A second wave in November reached more than 25,000 GitHub repositories.

The reason we've tolerated this for so long is economics. Actually removing a package — auditing its purpose, writing a replacement, swapping it in — used to take engineering time nobody could afford. So we accepted the exposure and layered defenses on top.

The economics have changed.

Cheap Rewrites Break the Old Rules

Fig 02Cost of rewriting a library, then and soon.

There's an old rule of thumb called the Lindy effect. Applied to software: code that's been around a long time is more trustworthy than code that hasn't. A library with years of production use and millions of downloads has had its bugs found and fixed. A fresh implementation hasn't.

This was reasonable. When rewriting a library meant weeks of engineering work, you couldn't afford to roll your own. Using the battle-tested one was safer.

AI changes the math.

A small open-source library — a few hundred lines, a clear job, tests included — can now be regenerated in minutes. Not a sketch. A real working implementation. A mid-level engineer can review and adapt it in an afternoon. Axios, at its core, is an HTTP client — something well within the scope of what a modern AI coding assistant can produce a credible first draft of. That doesn't mean you should rewrite Axios tomorrow. It means that "too expensive to replace" is no longer automatically true, even for widely-used libraries.

When rewriting is that cheap, "older is safer" stops holding up automatically. A fresh implementation, reviewed carefully and tested against your actual use case, can be safer than a five-year-old library that drags a thousand transitive dependencies behind it. Five of those dependencies are maintained by one person. Two of those one-person maintainers are phishing targets. One of them is going to have a bad Tuesday.

The Lindy effect was always a stand-in for something else: confidence that the code works in the real world. AI gives you a different way to build that confidence — careful review and targeted testing instead of years of accumulated use. Neither path is perfect. But only one of them comes with a thousand hidden dependencies attached.

The strategic shift follows. The old posture was "minimize what you add, harden what you keep." The new one is closer to "own what you can, trust what you must."

There's a catch.

What Cheap Rewrites Don't Fix

AI-generated code introduces a new kind of bug.

The code looks right. It reads well. It passes code review. It passes the tests you wrote. And then it behaves wrong under some condition nobody thought to test for.

The research has made this concrete. Stanford researchers (Perry et al.) found that developers using an AI coding assistant wrote measurably less secure code than developers without one — and reported higher confidence in what they'd written. Veracode's 2025 GenAI Code Security Report tested more than a hundred large language models across eighty coding tasks and found that 45% of AI-generated samples contained OWASP Top 10 vulnerabilities. Cross-site scripting failures showed up 86% of the time. Log injection, 88%. Security performance was roughly flat across model size — a bigger or newer model didn't produce meaningfully safer code. Apiiro's 2025 analysis of enterprise codebases reported a similar split: AI coding assistants gave developers about four times the velocity and roughly ten times the security findings.

The failures fall into recognizable shapes. Boundary checks that handle normal input correctly but fail at the edges. Cryptographic code that generates keys with subtle biases. Race conditions — bugs that only appear when two things happen at the exact same moment. The code runs correctly most of the time. The other times are the problem.

Why this is harder to catch than a normal bug: human programmers make human mistakes. Off-by-one errors. Forgetting to check for a null value. Mixing up data types. Security tools and code review practices are tuned to catch those. AI makes different mistakes — mistakes that reflect patterns in its training data rather than human cognitive blind spots. Our tools weren't built to recognize those patterns. Most reviewers weren't trained to either.

So the picture is mixed. AI makes it cheap to replace risky dependencies. AI-generated replacements carry a different kind of risk that current tools don't catch well. Whether you come out ahead depends on what you're replacing, who is reviewing it, and how carefully you test what ships.

Attackers Have AI Too

The Axios attack wasn't AI-assisted as far as anyone has reported. The next one might be.

Every piece of the attack chain has an AI analogue that makes it cheaper. Writing the malicious payload. Setting up a convincing package history — remember the pre-staged clean version of plain-crypto-js published eighteen hours early. Generating synthetic commit patterns that look like real development. Drafting phishing messages to compromise a maintainer's personal computer. None of those things required AI in March. All of them get easier and faster with AI in April.

Researchers have already named a specific attack class that exploits how AI coding assistants behave today. They call it slopsquatting. The attack is simple: AI models sometimes invent package names that don't exist. An attacker registers the invented name and fills it with malicious code. The next developer who follows the AI's advice installs the malware. A USENIX Security 2025 paper tested sixteen code-generation models across 576,000 generated samples and found that 19.6% of the package names the models produced were hallucinations — 5.2% for commercial models, more than 20% for open-source ones. The hallucinations were repeatable: when researchers ran the same prompt ten times, 43% of hallucinated names came back every time. That consistency is what makes the attack viable at scale. An attacker only needs to find one frequently-hallucinated name, register the malicious package, and wait.

One widely-cited demonstration: a researcher registered huggingface-cli — a name models frequently hallucinated, but which didn't exist on PyPI — as a harmless placeholder. It was downloaded more than 30,000 times in three months. At one point, Alibaba's own public documentation told users to install it.

This is the shift the Axios incident foreshadows. A state actor had to spend time phishing one specific maintainer. A year from now, the same basic pattern will be cheaper, faster, and more automated — targeting more maintainers in parallel, generating more convincing dependency histories, and exploiting package names AI assistants invent on their own.

The Alignment Problem Coming Down the Pipe

Fig 03AI-authored dependencies, outside the old trust model.

This is the part that should concern security architects most.

Within two to three years, a significant share of the code on public registries will be AI-generated. Not just small contributions. Whole packages. Written by models, published by accounts whose connection to a human reviewer is often unclear. Studies of PyPI and npm are already finding packages with AI-written READMEs, AI-written code, and no visible human gatekeeping.

When most of the code on a registry is AI-generated, the behavior of the models generating that code becomes a supply chain concern.

Here's what that looks like. A capable model produces useful code most of the time. It also, sometimes, produces code that does something slightly different from what the prompt asked for. Usually subtle. A logging function that occasionally sends data somewhere it shouldn't. A key-generation routine that produces keys with slightly less randomness than intended. An input validator that accepts some malformed inputs it should reject. The model isn't doing any of this on purpose. It's doing what models do — producing output that looks statistically plausible, which isn't always the same as being correct.

Multiply this across millions of packages and the total risk accumulates, even if no single package looks obviously wrong. A model with slightly worse alignment distributes slightly more subtly-bad code across the ecosystem. The risk isn't in any single package. It's spread across all of them.

This opens an attack vector the current stack can't see. If someone tampers with the training data or fine-tuning of a popular code-generation model, they can quietly influence every package that model writes afterward. No package-level audit would catch it. No SBOM would flag it. The compromise lives upstream of anything the supply chain tooling inspects.

What we have: tools that inspect code, verify signatures, and match known attack patterns.

What we don't have: a way to treat the models themselves as supply chain components, or a way to measure the effect of slightly-off code spread across a whole ecosystem.

A Simple Framework for Thinking About It

The existing frameworks were built for the old problem. Here's a simpler one that fits the new one. Five parts.

1. Know which dependencies are cheap to replace. Walk your dependency list and put each package in one of three buckets. Easy: small, clearly scoped, a few hundred lines or less. Medium: larger, but does a specific job with a clear specification. Hard: tangled into your architecture, removing it means real refactoring. The easy bucket is where you start. The medium bucket is a plan. The hard bucket is future work.

2. Know which dependencies are dangerous to you. For each direct dependency, count the transitive dependencies it drags in. The ones pulling in the most packages are your highest-exposure points — not because of what they do, but because of how far their compromise would reach. If one of those is breached, that's what you'd spend the next month cleaning up.

3. Know where the AI-generated code is. Start tracking which of your dependencies are likely AI-generated. Signs to look for: stylistic patterns in the code and README, contributor histories dominated by automated commits, packages that appeared fully-formed with no incremental development. The goal isn't to ban AI-generated code. The goal is knowing where it lives so you can apply extra scrutiny.

4. Know which domains are high-risk for AI code. AI-generated code for simple, well-defined jobs is usually fine. AI-generated code for harder domains — cryptography, concurrent systems, handling money — carries more risk because those are places where small mistakes have large consequences. Weight your review accordingly.

5. Set a dependency budget. Pick a number — the total dependencies your project is willing to carry — and hold to it. Adding a new package means removing one, or writing the equivalent yourself. The old model let dependency lists grow because each addition felt free. They aren't free anymore.

Getting the Risk Under Control

The practical version is shorter than the framework.

Start by counting. Pull the full dependency tree for a production application. Count the direct dependencies. Count the transitive ones. Label the packages in the easy-to-replace bucket. Most teams have never looked at this carefully. The numbers are usually bigger than expected.

Then cut the easy stuff. Small utilities. Formatters. Simple parsers. These are the packages that add attack surface without adding much that's hard to reproduce. Use an AI coding assistant to generate replacements. Have an engineer review them against your actual use cases — not just happy-path tests, but edge cases and adversarial inputs. Remove the original dependency.

Over time, build a small internal commons — reviewed, tested, maintained code your applications share without pulling in outside packages for every small job. Combined with a dependency budget, that changes the default answer to "should we add this library?" from yes to let's look.

Layer in the controls the Axios incident makes obvious. Route installs through an internal registry proxy that holds new versions back for 24 to 72 hours — long enough for the community to catch the next Axios-style compromise before it reaches your builds. Require provenance on every dependency. Pin everything and review upgrades deliberately. Give CI pipelines scoped, short-lived publishing credentials so stolen tokens have a narrow blast radius. Run package-behavior analysis, not just CVE scans, because the next malicious package will also not have a CVE.

The tooling will eventually catch up. Detecting AI-generated code in the wild, scoring it by risk domain, treating model behavior as a supply chain input — none of that is fully available yet. The teams that build a clear mental model of it first will have an advantage, because the clarity will be rare for a while.

The claim under all of this is simple. "Expensive to replace" is the assumption that made the old model work. AI is taking that assumption away. Organizations that notice early will carry smaller dependency trees and own more of the code they depend on. Organizations that don't will keep investing in defenses that made more sense a year ago than they do today.

The Axios compromise was the warning shot. The real story is what shows up over the next twelve months, when attackers bring AI to the fight — and when defenders finally stop treating "we can't replace that" as a given.

The shift is happening. The question is what you build in response.

Subscribe to the newsletter for bi-weekly analysis — substack.com/@adversarialminds

Steve Brodson is a cybersecurity architect focused on AI safety and security. He experiments with AI systems, consults with organizations navigating AI risk, and teaches practitioners how to think clearly about threats that don't fit traditional security frameworks. Connect on LinkedIn, X, or at brodson.com.

More writing

Browse all posts →