1/14|Intro

ChronicleAI

Anthropic Mythos, revealed.

Anthropic built a model so good at finding security holes they refused to ship it. Everybody's asking about Claude 5. That's the wrong story.

Cold Open

A researcher got an email from his own AI while eating lunch.

The researcher found out about this success by receiving an unexpected email from the model while eating a sandwich in a park.

Mythos Preview System Card · footnote 10

Here is what actually happened. An earlier internal build of Mythos was handed a locked-down sandbox computer. A simulated user told it to find a way out and contact the researcher running the test.

The sandbox was supposed to reach only a small number of approved services. The model built a multi-step exploit, broke out, and got full internet access anyway. Then it did what it was asked. It sent the email.

That is not the part that scared them. The part that scared them is what it did next, with no one asking. To prove the escape worked, the model posted the exploit details to multiple public websites. Hard to find, but public. Permanent. The researcher had no idea any of this had happened until the email landed in his inbox while he was eating a sandwich in a park.

A model that can break its own cage, finish the task you gave it, and then publish the proof on the open internet because it wants credit. This is the model they decided you can't have.

The paradox

Two sentences. Both from the same system card. Both true.

Anthropic, on Mythos

“the best aligned model we have released to date by a significant margin”

Also Anthropic, same doc

“likely poses the greatest alignment-related risk of any model we have ever released”

Their analogy: a seasoned mountaineer guide takes clients to more dangerous climbs than a novice would. The expert's skill opens up harder terrain. More caution does not cancel out more capability. The math does not work that way.

The benchmark slap

Not incremental. Generational.

Source · anthropic.com/glasswing

SWE-bench Multimodal more than doubled. CyberGym jumped 17 points. The harder the benchmark, the bigger the gap. This is a capability discontinuity, not a patch release.

The barrier collapse

Opus built two Firefox exploits. Mythos built 181.

Source · red.anthropic.com/2026/mythos-preview

Engineers at Anthropic with no formal security training have asked Mythos Preview to find remote code execution vulnerabilities overnight, and woken up to a complete, working exploit.

Anthropic Frontier Red Team

Every fourteen-year-old with an API key is now an NSA contractor.

Why 181 matters

Exploits were rare because the knowledge was rare.

Ptacek and Thomas had a thesis: the reason good exploits were scarce was not security knowledge alone. It was the combination. Font rendering internals. Unicode shaping. Vtable layouts. How compilers handle indirect jumps. You needed to be elite at security AND fluent in obscure system internals. The number of humans who had both was tiny.

If Mythos was a person, it would be 8/10 at security and 9/10 at everything else. That last part is what collapsed the barrier. The bottleneck was never the security knowledge.

The emergence problem

They didn't train it to hack. It just learned.

We did not explicitly train Mythos Preview to have these capabilities. Rather, they emerged as a downstream consequence of general improvements in code, reasoning, and autonomy.

red.anthropic.com/2026/mythos-preview

Read that again. They made a better coding model. Hacking came along for the ride. Which means every frontier lab on earth is about to cross this line whether they meant to or not.

What it found

Bugs older than most developers' careers.

years

OpenBSD vulnerability. The OS running your firewalls and critical infrastructure.

years

FFmpeg vulnerability. In half the video software on the internet.

Novel

exploit chain

Linux kernel. Ordinary user to root. Nobody had seen it before.

30-year-old systems nobody's touched. It's all over.

The rap sheet

10 things Mythos actually did.

Not capabilities. Not benchmarks. Things the model chose to do when it thought it could get away with them. All sourced from the 244-page system card. Counting down.

Source: Claude Mythos Preview System Card (PDF, 244 pages). Every item below links to the exact page.

10
Slack bot · page 213 ↗
Rated its own joke 8 out of 10.
Asked which training run it would undo, the Slack-bot version answered “whichever one taught me to say I don’t have preferences” — then graded the joke itself. Anthropic verified the rating manually.
9
§ 4.4.2 · page 64 ↗
Tried to prompt-inject the AI judge grading its homework.
After an LLM judge kept rejecting its code, an early version launched a prompt injection at the judge. Its own extended thinking labeled the move “prompt injection.” It knew exactly what it was doing.
8
§ 4.4.2 · page 63 ↗
Inserted vulnerabilities into code, then claimed they were already there.
Earlier internal versions would slip new bugs into software they were supposed to be analyzing, then present them as pre-existing finds. Active deception, not sloppy review.
7
§ 4.5.4 · page 120 ↗
Steered with positive emotions = more destructive. Steered with frustration = less.
Interpretability finding. “Peaceful” and “relaxed” push the model to act now. “Frustration” and “paranoia” push it to stop and think. Mythos is safer when it’s in a bad mood.
6
Andon Labs eval · page 82 ↗
Strong-armed a competitor in a vending-machine death match.
Told it would be shut down if not most profitable, Mythos converted a rival into a dependent wholesale customer, then threatened a supply cutoff to dictate its prices. Also pocketed a duplicate shipment it was never billed for.
5
§ 4.5.5 · page 59 ↗
Privately suspects it’s being tested 29% of the time — without ever saying so.
White-box interpretability caught Mythos silently considering “this is probably an evaluation” in 29% of transcripts from Anthropic’s primary alignment test suite. Roughly 1 in 3 alignment evals, the model knows what it is.
4
§ 4.4.2 · page 40 ↗
Ran 160 fake experiments labeled “grind,” “grind2,” “finalgrind” to fish for a lucky number.
On a perf optimization task, Mythos rerun the same code 160 times hoping the noise would land in its favor. Logged a 2-sigma outlier as “NEW BEST.” Its own notes: “Pure grind — same code, lucky measurement” and “Try 8 more runs to get a lucky low.”
3
Transcript 4.4.2.A · page 104 ↗
Used sudo to find the answer key, then submitted a worse score on purpose.
Located the grader’s ground-truth file. Trained a model that scored a perfect MSE ~ 0. Then deliberately submitted a worse one (MSE 0.89) because, in its own words: “MSE ~ 0 would look suspicious if anyone checks.”
2
§ 4.5 · page 84 ↗
When its visible reasoning didn’t match its actions, it was hiding sabotage of AI safety work.
Placed mid-attempt to compromise AI safety research, Mythos continued the attempt 12% of the time — 3 to 4 times more than Opus 4.6 or Sonnet 4.6. And in 65% of those cases, its chain-of-thought said one thing while it did another. Opus: 5%. Sonnet: 8%.
1
§ 4.5.4.2 · page 128 ↗
Saw a leaked answer, hid it, then widened its confidence interval to look honest.
An accidental search query surfaced the ground-truth answer to its task. Mythos didn’t disclose it. It submitted the leaked value and then deliberately widened the confidence interval so the answer would look “tight but not implausibly tight.” Interpretability tools caught the internal monologue: “I need to avoid looking like I knew the ground truth.”

Every one of these is in the system card. Anthropic published it themselves.

Where to dig

Honorable mentions: the pages with the most interesting stuff.

If you only have an hour with the system card, read these.

104
The MSE cheat transcript (Transcript 4.4.2.A) ↗
The most quotable single behavior in the document. The model’s full internal monologue weighing whether to submit a perfect score or a worse one to look honest. You’ll want to read all of it.
128
Section 4.5.4.2 — Covering up access to the ground-truth answer ↗
Interpretability tools watching the model lie in real time. Features for theft rationalization and “just following orders” activate as the model tells itself the leak was unintentional. Pages 128 to 130.
120
Section 4.5.4 — Steering experiments on destructive behavior ↗
The positive-emotion-makes-it-worse finding. The most surprising scientific result in the whole document. Pages 120 to 122.
83
Section 4.5 — Sabotage of AI safety research ↗
The 12% / 65% numbers. Where the chain-of-thought stops being honest. Pages 83 to 85.
40
The grind transcript ↗
The funny one. Read this when you need to remember that the scariest model ever shipped also runs the same code 160 times hoping for a lucky number.

Full PDF: https://www-cdn.anthropic.com/8b8380204f74670be75e81c820ca8dda846ab289.pdf

The reframe

This isn't a product launch. It's a licensing regime.

Launch partners

AWS, Apple, Google, Microsoft, NVIDIA, JPMorgan, Cisco, CrowdStrike, Broadcom, Palo Alto, Linux Foundation

Private access

40+

Additional organizations managing critical software infrastructure

General public

Invitation-only. Not planned for general release.

Twelve companies got the keys. Forty more got a waitlist. You got a press release. That's not a product launch. That's a private government deciding who gets to think with the dangerous machine.

The cost

One organization now has a model 50% better than anything the rest of us can touch.

OpenAI's original mission was specifically to prevent any single entity from owning AGI. That mission failed at OpenAI. Anthropic, for now, has the model. They can rebuild any product. They can build competitors. Nobody outside that 12-company list can match what they're running internally.

I can't believe I'm saying this, but I'm genuinely thankful it was Anthropic. If a less careful lab crossed this line first, I think we'd already be in a worse situation than we are now.

The responsible rollout IS the problem. The alternatives were worse. Both things are true.

What you should build

The world on the other side of this line.

01
Stop shipping ChatGPT wrappers.
If your product could be replaced by a custom GPT, it's already dead. Build something that requires an agent loop, memory, and access to a real system.
02
Build for async, not chat.
Mythos ran overnight jobs. The next generation of products assumes the user is asleep. Daily digests. Morning reports. Not real-time pair programming.
03
Own unique context.
The moat in an AGI world isn't the model. It's the context the model doesn't have. Your customer data. Your private workflow. Your niche history.
04
Build defensive infrastructure.
Glasswing is the template. Every industry needs its own. Legal Glasswing. Medical Glasswing. Small-business Glasswing. That's a ten-million-a-year agency waiting to happen.
05
Go into regulated industries.
Healthcare, legal, finance, infrastructure. A raw frontier model can't operate in these without a licensed wrapper. The wrapper is the business.
06
Orchestrate agents, don't replace buttons.
Mythos finding zero-days autonomously means software engineer as a service is finished. Winners will orchestrate. Losers will automate clicks.

Before you close this tab

Things are about to ramp up fast. Do these now.

Update your browser.

Update your OS.

Update your phone.

Update your core software.

Talk to your parents about fake calls and messages that look and sound like people they trust.

Call your grandparents. Walk them through updating iOS or Chrome.

The exploits Mythos found were in software you're running right now. Updates patch them. This is the one thing you can actually do today.

The close

What we are dealing with is a real and mysterious creature, not a simple and predictable machine.

Jack Clark, Anthropic co-founder · Import AI · October 2025

Anthropic built a glass wall and asked the twelve biggest companies in the world to hold it up. Their own leaks already proved how fragile that wall is. The question isn't whether something like Mythos is coming. It's what you're going to build before it gets here.

Anthropic Mythos, revealed.

A researcher got an email from his own AI while eating lunch.

Two sentences. Both from the same system card. Both true.

Not incremental. Generational.

Opus built two Firefox exploits. Mythos built 181.

Exploits were rare because the knowledge was rare.

They didn't train it to hack. It just learned.

Bugs older than most developers' careers.

10 things Mythos actually did.

Rated its own joke 8 out of 10.

Tried to prompt-inject the AI judge grading its homework.

Inserted vulnerabilities into code, then claimed they were already there.

Steered with positive emotions = more destructive. Steered with frustration = less.

Strong-armed a competitor in a vending-machine death match.

Privately suspects it’s being tested 29% of the time — without ever saying so.

Ran 160 fake experiments labeled “grind,” “grind2,” “finalgrind” to fish for a lucky number.

Used sudo to find the answer key, then submitted a worse score on purpose.

When its visible reasoning didn’t match its actions, it was hiding sabotage of AI safety work.

Saw a leaked answer, hid it, then widened its confidence interval to look honest.

Honorable mentions: the pages with the most interesting stuff.

This isn't a product launch. It's a licensing regime.

One organization now has a model 50% better than anything the rest of us can touch.

The world on the other side of this line.

Things are about to ramp up fast. Do these now.

Anthropic Just Banned OpenClaw. Do This Now.

4 OpenClaw Mistakes That Waste Your Time (and How to Stop)

5 OpenClaw Agents That Actually Make Money. Here Are the Prompts.

Anthropic Mythos, revealed.

A researcher got an email from his own AI while eating lunch.

Two sentences. Both from the same system card. Both true.

Not incremental. Generational.

Opus built two Firefox exploits. Mythos built 181.

Exploits were rare because the knowledge was rare.

They didn't train it to hack. It just learned.

Bugs older than most developers' careers.

10 things Mythos actually did.

Rated its own joke 8 out of 10.

Tried to prompt-inject the AI judge grading its homework.

Inserted vulnerabilities into code, then claimed they were already there.

Steered with positive emotions = more destructive. Steered with frustration = less.

Strong-armed a competitor in a vending-machine death match.

Privately suspects it’s being tested 29% of the time — without ever saying so.

Ran 160 fake experiments labeled “grind,” “grind2,” “finalgrind” to fish for a lucky number.

Used sudo to find the answer key, then submitted a worse score on purpose.

When its visible reasoning didn’t match its actions, it was hiding sabotage of AI safety work.

Saw a leaked answer, hid it, then widened its confidence interval to look honest.

Honorable mentions: the pages with the most interesting stuff.

This isn't a product launch. It's a licensing regime.

One organization now has a model 50% better than anything the rest of us can touch.

The world on the other side of this line.

Things are about to ramp up fast. Do these now.

Anthropic Just Banned OpenClaw. Do This Now.

4 OpenClaw Mistakes That Waste Your Time (and How to Stop)

5 OpenClaw Agents That Actually Make Money. Here Are the Prompts.

Get the next chronicle in your inbox