😸 OpenAI solved an 80-year math problem by... disproving it

😸 OpenAI solved an 80-year math problem by... disproving it

Written By
Grant Harvey
Grant Harvey
May 22, 2026
8 minute read

Some people just inherently understand their priorities in life, and now that they can code, are unleashing true beauty into the world:

Happy Memorial Day Weekend to everyone who celebrates! Gonna keep it light today.

Here’s what happened in AI today:

  • 🐱 OpenAI’s model solved an 80-year math problem.

  • 📰 OpenAI and Anthropic’s revenue race got weird fast.

  • 📰 Trump delayed an AI order as California prepared workers.

  • 🍪 Qwen 3.7 Max ran an agent for 35 hours.

  • 🎓 Use Codex /goal mode for long tasks.

Hey: Want to reach 700,000+ AI-hungry readers? Advertise with us! 

P.S: Love robots? We’re starting a new robotics newsletter! Sign up early here.

Math has one perk for AI watchers: eventually, somebody checks the work.

That makes OpenAI’s new claim worth paying attention to. The company says an internal reasoning model apparently disproved the Erdős unit distance conjecture, a discrete geometry problem from 1946. If that all made you go “Huh?”, keep scrolling.

Here’s the basic explanation of the problem: if you place n points on a flat plane, how many pairs can sit exactly one unit apart? For decades, many mathematicians believed square-grid style patterns were basically the best possible answer.

And yet, OpenAI’s unreleased reasoning model apparently found a counterexample: a new infinite family of point arrangements that creates more unit-distance pairs than the old grid-based belief allowed. That means the model “solved” the problem by proving the conjecture was false.

Here’s what happened:

  • OpenAI said the original proof came from a general-purpose reasoning model, rather than a system specially trained, scaffolded, or targeted for this problem.

  • The proof shows infinitely many point sets with at least n1+δ unit-distance pairs.

  • That beats Erdős’s old n1+o(1) conjecture, which roughly meant “only a tiny bit better than linear.”

  • External mathematicians published companion remarks verifying and explaining the result.

  • Princeton mathematician Will Sawin sharpened it, showing more than n1.014 unit-distance pairs for arbitrarily large point sets.

Why this matters: This is a cleaner test of AI reasoning than a benchmark (a standardized model test). Benchmarks can reward lucky guesses. A proof has to survive expert review, line by line.

The proof used algebraic number theory (math about number systems), including class field towers and Golod-Shafarevich theory, to crack a geometry problem that sounds simple.

TechCrunch noted an earlier OpenAI Erdős claim fell apart after the model surfaced existing results. This time, outside mathematicians signed the companion remarks, including some critics of that previous episode. OpenAI turned their haters into benchmarks, basically.

Elliot Glazer added an interesting POV on this too: AI may surface answers humans could have found, but didn’t have time (or the will) to go after because it didn’t seem worth finding. Only so many experts can spend years attacking a problem the field doubts exists in the first place.

Our take: Think about the loop here: the model found the weird route, humans checked the work, Codex helped clean up the write-up, and Princeton’s Will Sawin showed the construction’s edge compounds at huge scale, which is why the result matters beyond “AI found a math trick.”

Math is unusually AI friendly because proofs can be checked. Biology, medicine, and business strategy have messier feedback loops. Greg Kamradt of ARC Prize recently shared a nice breakdown of the 7 levels of verifiability that tracks how hard things are to verify on a spectrum due to the length of time it takes to get “feedback” on if your actions led to the outcome you want. Read our deep dive on the topic here.

The Vanta Agent is the sharpest GRC engineer you’ve never had to hire, working tirelessly across the platform to draft policies, complete questionnaires, and flag issues before they escalate.

Fast-moving companies like Ramp and Cursor use Vanta to get and stay compliant, simplify their audit process, and unblock deals—so teams can get back to building.

Long agent tasks fail when the AI forgets what “done” means. Codex’s /goal mode fixes that by giving it a persistent objective it can keep checking as it works.

Use this for tasks with many steps: migrations, refactors, audits, bug sweeps, or report generation. The trick is to write the goal like a mini contract: outcome, constraints, and tests. If /goal does not appear, OpenAI says you can enable features.goals in config.toml or run codex features enable goals.

Try this:

/goal
Audit this project for newsletter draft readiness.

Definition of done:
1. Every section has the required header.
2. Every hyperlink is attached to a short, natural anchor.
3. No Treats or Around the Horn bullets use bold text.
4. Every technical term has a plain-English parenthetical on first use.
5. Return a short report with pass/fail status and exact fixes made.

Before editing, make a checklist. After editing, run the checklist again and show me what changed.

Total AI beginner? Start here (goes with this video).

Have a specific skill you want to learn? Request it here. 

Click the image above to watch on YouTube!

ICYMI: Ben Cherry of LiveKit joined us on The Neuron’s weekly livestream to show us how to build real-time voice agents that can listen, interrupt, call tools, and run in production. And guess what? It’s so easy, even an agent can do it! You can literally grab the transcript from Google (click “… more” under the vid description, scroll down to click “Show Transcript”, then copy the transcript and give it to your Codex / Claude to set up for you).

It’s a super fun episode; Ben shows how to launch an agent via LiveKite (and what code repos to use if that kinda thing doesn’t intimidate you), edited his agent live with Claude Code, and even cloned his own voice for us live. Click here to watch.

📰 Around the Horn

  • OpenAI reportedly generated about $5.7B in Q1, nearly $1B ahead of Anthropic, while Anthropic is projected to more than double to $10.9B in Q2.

  • California signed a first-in-the-nation order to prepare workers and small businesses for AI disruption.

  • xAI’s Grok reportedly flopped with U.S. government buyers, with Reuters finding only three identified federal use cases.

  • Intuit planned to lay off 3,000+ workers, about 17% of its workforce, to simplify the company and refocus on AI products. Remember Monday’s story?

  • Samsung will reportedly distribute about $26.6B in bonuses to chip workers, averaging roughly $340K per employee. That’s how important chips are!

  • NVIDIA CEO Jensen Huang said CPUs built for AI agents could become a new $200B market for the company.

  • Taiwan sought to detain three people accused of forging documents to smuggle NVIDIA AI chips to China, Hong Kong, and Macau.

Your next great hire lives in Slack.

Viktor is an AI coworker that connects to your tools and ships real work. Ask Viktor to pull a report, build a client dashboard, or source 200 leads matching your ICP. Most teams hand over half their ops within a week.

💡 Intelligent Insights

  • Data filtering: this research paper argues larger models can benefit from messy data that smaller models cannot use well (basically scaling laws for data).

  • AI oversight: the U.K. AI Security Institute warned that today’s oversight methods may degrade as models get more capable.

  • Arena’s frontier: Arena.ai found GPT-4-level model quality is now roughly 500x cheaper than it was in 2023 (and 4 other insights you might like!).

  • Accessible for AI: a sharp TechPolicy Press critique of llms.txt, MCP servers, and other machine-readable web infrastructure that may leave disabled users behind.

  • If you read anything today, read this: After Automation from Dan Shipper argued AI creates more work for humans by flooding the world with generic output and raising the value of taste, context, and judgment. I’ll end the newsletter with this insight:

Source: Dan Shipper @ Every; image linked to the article

A Cat’s Commentary

Don’t say we never highlight negative feedback!

That’s all for now.

What'd you think of today's email?

Grant Harvey

Grant Harvey is the Lead Writer of The Neuron, where he continues to lead the publication's daily coverage of AI news, tools, and trends.

The Neuron Logo

Don't fall behind on AI. Get the AI trends & tools you need to know. Join 700,000+ professionals from top companies like Microsoft, Apple, Salesforce and more.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.