Everything That Happened in AI Today Monday, May 18

An older Nvidia H200 rented for more than a newer B200 today, which is one very expensive way of saying the AI boom has started billing everybody.

Welcome to the Around the Horn Digest, where we track every AI story so you do not have to keep 188 tabs open and three GPUs on backorder. The lead story below is the agent stack getting more real, but the day's actual vibe was pressure: OpenAI wants ChatGPT closer to your bank account, compute renters watched H200 prices leap past newer B200s, college seniors described four years of ChatGPT as an uncontrolled experiment, and AI backlash kept moving from comment threads into city halls and budgets. Turns out "software eating the world" gets more complicated when the software asks for a bank-linking login and a power substation. Let's get into it.

Monthly skill digests: AI Skill Digest, April Week 1 | AI Skill Digest, March

New From The Neuron
Around the Horn - Monday, May 18, 2026
Top 5 News
Honorable Mentions
Top Treats To Try
Deeper Dive on Our Top 5
Best of the Rest:
Previous Around the Horn Digests

New From The Neuron

Around the Horn Week in Review: May 11-15 catches the week Anthropic refused China access to Mythos while U.S. cyber defense used the model family, compute companies tried to leave Earth, Recursive Superintelligence raised $650M, and Google confirmed criminal AI-driven zero-day exploitation.
Thinking Machines Wants AI to Stop Waiting Its Turn breaks down Mira Murati's interaction-model bet: AI that can listen, watch, interrupt, use tools, and keep collaborating in real time instead of waiting politely for perfect prompts.

Around the Horn - Monday, May 18, 2026

Microsoft researchers introduced ECHO, a new way to train terminal agents (AI systems that type commands into a computer terminal and read what comes back). The simple idea: when an agent runs a command, do not only grade whether it eventually solved the task; also train it on the terminal's response, like file lists, error messages, logs, and test results (launch post, follow-up).

That matters because even a failed command teaches the agent something about the computer it is using. If it creates a file, runs a test, gets an error, or sees a stack trace, the response is feedback about how the world changed. ECHO uses those already-generated replies as free training signal, so the model learns a small "world model" of the terminal (a mental map of what commands tend to cause) without extra data, extra runs, or a teacher model. In tests, TerminalBench-2.0 pass@1 (the share of terminal tasks solved on the first try) nearly doubled at 8B and 14B model sizes, and training reached the same performance up to 2.3x faster.

The stranger part: after regular training, some ECHO agents kept improving even when the final right/wrong reward was removed, just by acting, watching what the terminal said back, and learning from those consequences. The caveat is that this worked best when the terminal feedback was informative and the agent's runs were clean.

The rest of the agent layer moved too: Odyssey launched Starchild-1 for real-time audio/video world simulation, Agora-1 added shared multi-agent worlds, Anthropic acquired Stainless for SDKs (software kits developers use to connect apps) and MCP (a standard for plugging AI tools into outside data and apps), Devin Auto-Triage began monitoring incidents and opening pull requests (proposed code changes), and Telegram made bot-to-bot communication a native primitive (Pavel Durov post). The agent ecosystem is no longer just "chat with tools"; it is turning into runtimes, memory, verification, browsing, robot data, and agents that talk to each other.

Editor’s note: We’re trialing a new thing today: deeper dives into our Top 5. Today’s longer cuts start with the four biggest stories below, so readers who want the quick list can skim, and readers who want the “why it matters” version can keep going. Scroll to your heart's content!

Top 5 News

Microsoft open-sourced ECHO, which trains terminal agents to learn from the computer's replies, not just final pass/fail scores, nearly doubling TerminalBench-2.0 pass@1 (tasks solved on the first try) and speeding training 2.3x (Dimitris Papailiopoulos post).
Odyssey launched Starchild-1, a real-time multimodal world model (a simulator that predicts and generates what happens next) that jointly creates synchronized audio and video from streaming user inputs (launch post), then followed with Agora-1, a multi-agent world model for shared real-time simulations (demo, post, playable-preview reaction).
OpenAI launched a personal finance experience in ChatGPT for Pro users in the U.S., letting users connect bank accounts through Plaid (the bank-linking service many finance apps use) for portfolio, spending, subscription, and payment insights (official page, privacy critique, HN discussion).
Axios reported that an AI backlash is accelerating, with most Americans worried about the technology, while related stories showed the pressure spreading to data centers, junior hiring, $800B in AI capex (capital spending) distorting GDP, and Cisco's 4,000-person AI realignment.
Anthropic acquired Stainless, the SDK-generation company that already powered Anthropic's official TypeScript and Python SDKs (developer toolkits), deepening Claude's tool/API connectivity as agent workflows move toward MCP servers (connectors that let models use outside tools and data) and enterprise integrations.

Honorable Mentions

Prinz argued that society is already living through an uneven "gentle singularity," with SF/X early believers, enterprise/government at a "Mythos Moment," broad workplace use, and the public still stuck in stochastic-parrot denial (X post).
Pope Leo XIV will publish his first encyclical, Magnifica humanitas, on preserving the human person in the age of AI on May 25, with Andrew Curran tracking the Vatican announcement that it will be presented at 11:30am in the Synod Hall in the presence of the Holy Father, Anthropic co-founder Christopher Olah expected among the speakers, Kevin Vallier framing it as potentially one of the seminal AI ethics documents for mass opinion, and a new Vatican AI commission also launched (AI commission, Mercury News).
NextEra Energy announced a $67B deal for Dominion Energy, the largest utility acquisition in U.S. history, as AI power demand reshapes the grid (Mercury News).
Netflix is staffing up INKubator, a "GenAI-native" animation studio for AI-driven shorts, specials, and potentially feature-quality work (discussion, context, older context).
Malta became the first country to offer every citizen and resident one year of ChatGPT Plus after completing a free AI literacy course.

Top Treats To Try

OpenShell v0.0.43 gives you a safe, private runtime for autonomous agents with live terminal streaming, login through an existing identity provider, encrypted connections, locked-down Linux sandboxes, and DNS blocking to reduce exfiltration (secret data leaking out) (NVIDIA post) — free/open source.
Ring-2.6-1T gives you a 1T-parameter (very large) open reasoning model for agent workflows, with stable multi-step execution, tool collaboration, adjustable high/xhigh reasoning effort, and strong scores on ClawEval/ARC-AGI (agent and reasoning tests) (Nathan post, Rohan Paul post) — free/open weights.
Semble gives your coding agents CPU-only code search that uses about 98% fewer tokens than grep+read by combining keyword search, tiny embeddings (compact maps of meaning), and code-aware reranking (Show HN) — free/open source.
browse.sh gives you a fast scriptable browsing workspace for agents, with verified site skills, click/scroll/type primitives, network and console tailing, and Browserbase cloud sessions (Browserbase post) — free to try.
Files SDK gives you one small storage API across S3, R2, GCS, Azure Blob, and 26+ backends (the big cloud-file services), with web-standard input/output, native-client escape hatches, a command-line tool for agents, and optional plugins (GitHub, Hayden Bleasel post) — free/open source.
Deep Agents v0.6 from LangChain adds harness profiles (ready-made setups for running agents), delta channels (saving only changes instead of full logs), a lightweight code interpreter, typed streaming, and ContextHubBackend for versioned skills/memories (Sydney Runkle post) — no pricing details.
Devin Auto-Triage monitors bugs, alerts, and incidents, investigates with your tools, connects related reports, and can open pull requests when something breaks (Walden Yan highlight) — no pricing details.
Cursor Composer 2.5 improves long-running coding-agent behavior, instruction following, and cost efficiency versus Composer 2 (Cursor post, TestingCatalog post, additional note) — available in Cursor.

Deeper Dive on Our Top 5

Microsoft’s ECHO teaches terminal agents to learn from their mistakes

Every AI coding agent has the same awkward problem: it can type commands into a terminal, but it often treats the terminal like a vending machine. Try command. Get output. Try another command. Hope the tests pass.

Microsoft’s ECHO is a research release aimed at making that loop smarter. The idea is simple enough: when an agent runs a command, the training system also teaches it to predict what the terminal will say next. That gives the agent a “world model” (a rough sense of how its environment reacts), instead of only rewarding it when the final task succeeds.

Here’s what happened:

Microsoft open-sourced ECHO, short for “Terminal Agents Learn World Models for Free.”
ECHO combines reinforcement learning (training by trial and error) with an extra prediction task: guessing the terminal’s next output.
The repo is built on SkyRL, an open-source reinforcement-learning stack.
The paper says ECHO nearly doubled pass rates on key terminal-agent benchmarks and sped up training by 2.3x.
The code is MIT-licensed, meaning developers can use and modify it freely.

Why this matters: A lot of agent progress right now comes from giving models better tools: browsers, code editors, terminals, databases, and app permissions. ECHO points at a second path: make agents understand the consequences of using those tools.

That matters because failed attempts are usually expensive waste. If a coding agent runs ten bad shell commands before finding the right one, most training systems mostly care whether it eventually solved the task. ECHO tries to learn from the messy middle.

For developers, this could mean smaller models that become better terminal workers without needing giant expert datasets. For companies, it hints at cheaper agents that improve through their own work logs. And for anyone using coding agents, it gets at the real bottleneck: reliability over long, boring, multi-step tasks.

Our take: The open question is whether ECHO’s “free” world modeling works outside terminals, because the big agent prize is broader: browsers, spreadsheets, CRMs, email, and every other software environment where one wrong action can quietly ruin your day.

Odyssey’s Starchild and Agora turn world models into playable simulations

Most AI video demos still feel like movie clips. You type a prompt, wait a bit, and watch what the model made. Pretty, sometimes magical, but basically fixed.

Odyssey’s Starchild-1 and Agora-1 push that toward something closer to a live simulation. The model keeps generating what happens next while people interact with it, which is why Odyssey calls these “world models” (AI systems that simulate how an environment changes over time).

Here’s what happened:

Starchild-1 generates synchronized audio and video in real time.
It responds to streaming user inputs, including text, speech, and actions.
Odyssey says this required new training and inference systems because audio and video move at different speeds.
Agora-1 adds multiple participants to the same generated world.
Its first demo uses a GoldenEye-style deathmatch where up to four players share one simulation.
Odyssey says Agora-1 acts like a “learned game engine,” meaning the model simulates game dynamics and renders what each player sees.

How to try it:

Go to agora.odyssey.ml.
Click into the Agora-1 experience.
Join the shared deathmatch demo.
For Starchild-1, read Odyssey’s technical writeup and report.

Why this matters: If AI can simulate worlds live, it changes what “content generation” means. A game level, training simulator, robot practice environment, or classroom scenario could become something you enter and steer, not something someone pre-built.

That is the dream version. The practical version starts smaller: better game prototypes, richer robotics training data, interactive education demos, and maybe new interfaces where the computer responds through a scene instead of a chat box.

Our take: The big question is reliability. A world model that can improvise a GoldenEye match is cool; a world model that stays consistent, safe, and physically useful for hours is a much taller order. Still, this is the kind of demo that makes “AI-generated worlds” feel less like a trailer and more like a product category.

OpenAI wants ChatGPT to become your money dashboard

Personal finance is already a weird scavenger hunt. Your checking account knows one thing, your credit card knows another, your brokerage app knows another, and your spreadsheet is quietly begging for retirement.

OpenAI now wants ChatGPT to sit on top of all of it.

Here’s what happened:

OpenAI launched a personal finance preview for ChatGPT Pro users in the U.S.
Users can connect financial accounts through Plaid, which links apps to banks and brokerages.
OpenAI says it supports more than 12,000 financial institutions, with Intuit support coming soon.
ChatGPT can show a dashboard for portfolio performance, spending, subscriptions, upcoming payments, and more.
OpenAI says more than 200 million people already ask ChatGPT finance questions every month.
Finance chats default to GPT-5.5 Thinking, OpenAI’s reasoning model for more complex questions.
OpenAI says synced account data is deleted within 30 days after disconnecting, though finance details inside chat history must be deleted separately.

How to try it:

Open ChatGPT on web or iOS with a Pro account in the U.S.
Click “Finances” in the sidebar and select “Get started.”
Or type: “@Finances, connect my accounts.”
Authenticate through Plaid.
To disconnect later, go to Settings > Apps > Finances.

Why this matters: This is ChatGPT moving from advice into context. A generic chatbot can say “spend less on restaurants.” A connected chatbot can say, “You spent $418 on restaurants this month, and your rent hits in three days.”

That could be useful. It could also be one of the highest-trust asks OpenAI has made yet. Money data reveals habits, health, relationships, risk, location, family obligations, and stress. A read-only connection still gives the model a very intimate map of someone’s life.

Our take: The counter-narrative is obvious: banks and budgeting apps already struggle to make people trust financial data-sharing. OpenAI may have built the smarter interface, but the real test is whether normal people want their AI assistant sitting this close to their checking account.

The AI backlash is moving from comment sections to real-world bottlenecks

For the last two years, the AI industry has talked about adoption like weather. It was coming. Everyone would adjust. The only open question was how fast.

The public is starting to answer: slower, please.

Here’s what happened:

Axios reported that only 18% of young people ages 14 to 29 feel hopeful about AI.
A new Economist/YouGov poll found more than 70% of Americans think AI is advancing too quickly.
Other YouGov polling shows negative views of AI rising from 34% three years ago to just over 50%.
Vox noted that 70% of Americans oppose a data center in their local area, up 18 points in two months.
Gizmodo reported that 43% of CEOs now plan to reduce junior roles, up from 17% last year.
The Washington Post covered Troy, New York, where a fight over Flock AI license plate cameras led the mayor to declare a state of emergency.

Why this matters: The AI backlash now has places to show up: permit hearings, electricity bills, entry-level hiring plans, school graduations, police budgets, and local elections.

That changes the industry’s constraint. The old bottleneck was chips. The new bottleneck may be permission. AI labs need data centers, energy, workers, users, regulators, and communities to keep saying yes. A lot of those groups are starting to ask what they get in return.

This also complicates the “AI is inevitable” story. Some version of AI probably is. But specific deployments are choices: where the data center goes, who pays the power bill, which jobs disappear first, and whether surveillance tools arrive before residents know they exist.

Our take: The open question is whether AI companies can learn local politics before local politics learns them. The industry has optimized for speed, scale, and compute. The next phase may reward the companies that can make people feel included before the backlash turns from polling into policy.

Best of the Rest:

Big Tech, Major Companies, And Platforms

Elon Musk called the OpenAI trial verdict a statute-of-limitations “technicality” and vowed to appeal after the jury found he filed too late (Musk's post).
Meta reassigned 7,000 workers into AI-focused roles and flatter orgs two days before planned layoffs of roughly 8,000 employees, or 10% of staff (Bloomberg).
Tenstorrent reportedly drew early takeover interest from Intel and Qualcomm as chip upstarts keep trying to pressure NVIDIA and AMD.
Cerebras posted a CEO Andrew Feldman clip arguing that “the GPU isn’t the only way” for AI acceleration, positioning wafer-scale systems as an alternative to NVIDIA-style clusters.
Apple is reportedly preparing a privacy-focused Siri revamp that could include auto-deleting chats and a standalone Gemini-powered chatbot app.
Amazon added Alexa+ podcast generation, letting users create on-demand audio episodes about any topic.
Yahoo Finance asked whether Amazon stock is starting an Nvidia-style run after jumping more than 30% since late March.
Elon Musk lost his lawsuit against Sam Altman and OpenAI after a California jury found the claims were filed too late (WSJ, HN discussion).
Sam Altman said ChatGPT Images 2.0 has generated more than 1B images in India.
SFGate toured OpenAI's secretive San Francisco headquarters, describing an unlabeled Mission Bay office with museum-like AGI (artificial general intelligence) history walls and carefully staged curiosities.
xAI reportedly promised employees $420 to hand over their tax returns as Grok training data before the April 15 deadline, but two months later the payments still had not been made.
The Information reported that OpenAI and Anthropic now account for 89% of revenue among 34 leading AI startups generating nearly $80B annualized.
Anthropic is sharing unreleased Claude Mythos cyber vulnerability findings with the Financial Stability Board while withholding the model from public release.
Cloudflare tested Anthropic's Mythos and other cyber frontier models on live infrastructure through Project Glasswing, finding strong exploit-chaining (linking multiple security weaknesses into one attack) and PoC generation (proof-of-concept exploit demos), but inconsistent refusals and a need for specialized test harnesses (HN discussion).
Google sold so much TPU capacity (Google's own AI-chip compute) to external customers like Anthropic and Meta that internal Google and DeepMind researchers are now queuing for compute; Bloomberg highlighted the internal scramble, and Rohan Anil argued that frictionless compute access is essential for fundamental research.
X's GPU shortage trend reported Ornn spot-market data showing Nvidia H200 rentals spiking 29% overnight on May 17 to $6.40 per GPU-hour, above newer B200s at $5.68, as tight TSMC/HBM supply (chip manufacturing and high-bandwidth memory) and big-lab reservations keep clusters booked through September 2026; Yuchen Jin warned H100s now cost more than three years ago and cannot be reliably rented on demand, Andrej Karpathy noted his nanochat tutorial would strand viewers at "boot an 8xH100," and traders framed the crunch as a windfall for neocloud owners like NBIS, IREN, and CRWV.
Google DeepMind launched an Asia-Pacific accelerator for startups, research teams, and nonprofits using frontier AI on climate, nature, agriculture, and energy risks.
Baidu posted a 55% year-over-year Q1 profit decline amid a slow AI payoff, even as AI Cloud grew sharply (Yahoo Finance).
Tata Electronics and ASML signed a strategic partnership to help build India's semiconductor manufacturing ecosystem around Tata's Dholera fab.
The Economist reported that AI super-apps are remaking China's internet, with 600M+ people using agentic apps (apps that take multi-step actions for users) to delegate decisions like ordering coffee and buying goods.
WSJ rounded up the weekend's AI market questions, including OpenAI's fate, Anthropic's lead, Cerebras's IPO path, OpenAI lottery tickets, and which tech jobs look most AI-proof.
NASA's new HPSC processor (High-Performance Spaceflight Computing chip), developed with Microchip, is roughly 500x faster than current radiation-hardened space computers and could enable onboard AI for Moon and Mars missions.
Anduril and Meta are building AR smart glasses and helmet systems (augmented-reality displays) for the U.S. Army that could overlay maps, drone feeds, and targets for soldiers (HN discussion).
OpenClaw creator Peter Steinberger burned through $1.3M in OpenAI API tokens (paid model calls) in a month while running about 100 Codex coding agents across pull-request reviews, security scans, duplicate-issue cleanup, and roadmap work.
Shopify CEO Tobi Lutke reportedly ran AI autoresearch on a local Qwen model with GPT boosting, producing a smaller model that beat his hand-tuned larger model by 19% on validation for a production templating engine.
David Weisburd recapped Hans Tung's case for Anthropic as an "Intel Inside" B2B (business-to-business) developer platform with a usage moat through Cursor, Lovable, and long-tail apps.

Labor, Economy, Backlash, And Public Sentiment

The Verge reported that Gallup and Pew data show most Americans do not trust AI or the people running it (HN discussion).
Kate Andrias argued that tech workers building AI are scared of its risks too and should organize to shape how the technology is deployed.
TechCrunch described the AI gold rush's wealth divide, echoing Deedy Das's thread about a small frontier-AI elite hitting $20M+ while much of SF tech faces layoffs, obsolete skills, and malaise.
Gizmodo reported that CEOs plan to cut junior roles and shift hiring toward mid-level/older workers as AI automates entry-level tasks.
The Next Web reported that AI companies are creating new job titles including Claude Evangelist, AI Philosopher, Professional Vibe Coder, Forward Deployed Engineer, and Chief AI Officer.
TechCrunch Mobility reported that the AI skills arms race is hitting automotive, with GM cutting about 600 IT workers while hiring for AI-native development, data engineering, model, agent, and workflow skills.
Counterpoint analysts projected that agentic AI (AI that can take multi-step actions, not just answer questions) will power about one-third of smartphones within two years, as Qualcomm, MediaTek, Apple, Samsung, and Google race to add more on-device and hybrid intelligence.
Yahoo Finance reported that workers are earning up to $350/hour teaching AI to do their jobs through platforms like Mercor.
Yahoo Finance reported that the AI boom has not stopped U.S. companies from hiring cheap offshore labor, with overseas call center employment still rising as cheaper services increase demand.
Bloomberg reported that U.S. roles exposed to AI saw heavy job losses for a second year, especially customer service, secretarial, and sales jobs (HN discussion).
$800B in AI infrastructure spending is juicing U.S. GDP and business investment while making it harder to read inflation, wages, consumer spending, and the labor market.
Goldman Sachs warned that the AI-fueled S&P 500 rally is becoming "one big trade" concentrated in tech and momentum stocks.
Vox reported that Americans increasingly oppose local data centers over electricity bills, farmland, opacity, and the sense that big tech is overriding local communities.
Charlie Berens is speaking out against Wisconsin AI data centers, arguing that secrecy, NDAs, tax giveaways, and resource strain mean "nobody's negotiating for the people here."
Hill County, Texas passed a one-year data center and power plant construction moratorium in unincorporated areas facing up to eight potential developments.
Troy, New York declared a state of emergency to keep Flock license plate cameras running after a surveillance fight split residents and city officials (HN discussion).
The San Francisco Standard found normal San Franciscans far more skeptical and resentful of AI than the hacker-house founder crowd.
Vivienne Ming argued that only 5-10% of people become true "cyborgs" with AI, while most become "automators" who offload thinking and lose cognitive engagement.
Dean Guida argued that AI mandates fail when middle managers cannot translate vision, data literacy gaps persist, and employees lack clarity on human-versus-AI ownership.
David Moscrop argued that robot doomerism misses the point: automation can serve the good life if democratically controlled by public institutions and worker-owned enterprises.
Damien Charlotin argued that AI is not the end of the legal profession but its future, because human judgment, ethics, and client relationships remain central.
NYT readers argued that AI's great disruption could finally raise the social value of care work that machines cannot replicate.

Models, Research, And Benchmarks

Michael Hu introduced On-Policy Mix, a data-mixing algorithm for continual learning that works across pretraining, midtraining, and instruction tuning by re-mixing data as the model learns (paper, source link).
Anirudh BV released SpectralQuant, a KV-cache compression method using calibrated eigenbasis rotation and water-filled bit allocation, with a reported 5.95x compression on Mistral 7B and limited perplexity hit (paper PDF).
Fannie Nie announced DSGym, an ICML 2026 framework for evaluating and training LLMs as full data-science agents with sandboxed execution, shortcut auditing, and bioinformatics/prediction tasks (paper, GitHub, Hugging Face).
Jessica Rumbelow shared Exemplar Partitioning, a training-free feature-discovery method for LLM activations that uses streaming Voronoi-style partitions with far less compute than sparse autoencoders (LessWrong intro, paper, GitHub).
Yichuan Wang said LEANN won Best Paper at MLSys 2026; the low-storage vector index offers 97% storage savings for fast, private RAG on personal devices (paper, GitHub).
Sapient Intelligence released HRM-Text-1B, a 1B text-generation model based on HRM architecture with hierarchical dual-timescale reasoning and latent task completion (GitHub, Hugging Face).
Eungyeup Kim introduced Five-Nines Reliability, a sample-efficient framework for measuring LLM reliability down to 99.999% performance on saturated benchmarks (paper, GitHub, Notion page, follow-up).
Noah Ziems and co-authors introduced Pedagogical RL, where a teacher model uses privileged information plus spike-aware rewards to create reasoning traces students can actually learn from (summary).
Nature Physics published work from Christopher W. Lynn and collaborators arguing that simple direct input-output dependencies can explain much of neuronal activity across mouse brain regions and C. elegans (author post).
Jiaxin Wen argued that in the high-compute era, human research taste is overrated because automated research already extends far beyond hyperparameter search and frontier labs converge on similar directions.
nrol_ling shared notes on LLM lesioning and aphasia-style studies, arguing that targeted neuron ablation can reveal emergent capabilities and failure modes.
Varun Sunkaraneni et al. argued that verifier-backed agentic systems (agents that can check candidate answers before choosing one) can "boost" weak reasoning models, with GPT-5.4 nano generating eight possible fixes and a selector reaching 76.4% on SWE-bench Verified (a benchmark of real GitHub bug fixes) because selection, not generation, is the bottleneck (DAIR.AI post).
DAIR.AI shared a weekly AI papers roundup covering Lighthouse Attention, "Is Grep All You Need?", geometric calculators inside LLMs (large language models), and other reasoning and agent advances (related post).
Global Automation Atlas measures task-level automation exposure across 18,797 tasks in 124 countries using 2.33M labels, finding exposure rises with income while substitution effects dominate, especially in lower-income settings (Prashant Garg post).
MOOSE-Star trains LLMs (large language models) to propose scientific hypotheses from background research by breaking the giant "which ideas should be combined?" search problem into smaller subtasks and a hierarchy of choices (GitHub, models and data, MiroMind post, follow-up).
Self-Distillation Enables Continual Learning shows Self-Distillation Fine-Tuning (models improving by training on their own generated examples) can help foundation models learn new skills while largely avoiding catastrophic forgetting (losing old skills while learning new ones) (HN discussion).
delta-mem adds a tiny online memory state to frozen LLMs (models whose main weights are not being changed), improving memory-heavy agent benchmarks with minimal overhead (HN discussion).
Ethan Mollick highlighted an NBER paper finding that data centers increase local employment, wages, income, and house prices while raising electricity prices.
Researchers found that people overestimate AI systems' confidence, inferring certainty from fast, fluent answers even when outputs are not reliable.
Quanta covered Rahul Ilango's work using proof complexity and "unknowable math" to build new noninteractive zero-knowledge proof techniques (ways to prove a claim without revealing the secret behind it).
Vitalik Buterin argued that AI-assisted formal verification (mathematically proving code follows a spec) can make critical systems safer through verified cores and sandboxed edges rather than making trustlessness obsolete (post).
Benedict Evans framed AI as a major platform shift with massive capex (capital spending), LLMs becoming commodity infrastructure, and value likely moving up the stack (HN discussion).
Doron Zeilberger argued that a good lemma is worth a thousand theorems because lemmas travel across problems and do the real work in mathematics (HN discussion).
Noema argued there is no hard problem of consciousness because consciousness is not separate from the physical world (HN discussion).
IEEE Spectrum reported that hidden audio imperceptible to humans can hijack AI voice systems (HN discussion).
Sean Goedecke argued DeepSeek-V4-Flash makes activation steering interesting again because steering vectors can now be applied selectively during thinking or tool use (HN discussion).
Margaret Li released "Slicing and Dicing MoEs," summarizing lessons from training 2,000+ mixture-of-experts language models (models split into many specialized submodels): maximize inactive expert count, set expert size by active parameters, prefer sparse activation, homogeneous non-shared experts, and dropless routing (post).
noahgolmant rewrote pytorch-hessian-eigenthings, a tool for analyzing neural-network curvature (how a model's training landscape bends), with faster estimation methods and optimized PyTorch kernels (HN discussion).
Richard Sutton restated the Bitter Lesson: do not focus on human knowledge; focus on scalable ways of creating knowledge, like search and learning.
SemiAnalysis mocked shallow AI hot takes by contrasting them with deep technical study of die shots and JAX first principles.

Robotics, World Models, And Embodied AI

Boston Dynamics demoed Atlas lifting and carrying a mini-fridge using AI-driven behaviors and RL-trained whole-body control (trial-and-error training for coordinated full-body movement) (technical blog, Elon Musk post, Andrew Curran note, video).
Agility Robotics described a sub-1M-parameter LSTM (a small memory-based neural network) whole-body control foundation model for Digit, trained in simulation with RL for robust mobile manipulation, balance, position prompting, and zero-shot sim-to-real transfer (working on the physical robot without extra real-world tuning) (Chris Paxton comparison).
FPVLabs released Stera-10M, a 10M-frame first-person-view robotics dataset captured on iPhones with RGB video, LiDAR depth (laser depth maps), ARKit poses (phone-estimated 3D position), IMU motion-sensor data, MANO hand-pose labels, room mesh, and hierarchical language annotations (Nevasini1 notes).
Rerun 0.32 became a unified data layer for robot learning, covering visualization, querying, transformation, training, MCAP/ROS import (standard robotics data formats), dataset review, catalog queries, and PyTorch dataloading (post).
Eliseu Silva built a Gradio OpenPose 3D Editor component for moving 3D bones, changing camera views, cropping scenes, and exporting pose PNGs for ControlNet/Stable Diffusion workflows (image-generation tools that use poses as guidance) (post).
image-blaster turns a single image into a 3D environment, sound effects, meshes, and ambient audio for Claude workflows, using tools like World Labs (HN discussion).
Benjamin Feldman built a real-time 3D Gaussian Splatting renderer (software that turns scene data into a navigable 3D view) from scratch in about 1,000 lines of C++/OpenGL over a weekend (HN discussion).
Joseph Azar built and shipped a 3D museum prototype in one afternoon with Omma AI, including generated models, a Three.js scene, dissolving shaders (visual transition effects), particle systems, and a prompted settings GUI (control panel).

Coding Agents, Developer Tools, And Agent Infrastructure

Jason Liu described "Codex-maxxing," a workflow built around durable pinned Codex threads, voice input, Obsidian/Git memory, heartbeats, browser/computer use, and connectors for long-running coding and knowledge work (post).
Dan Shipper recommended pinning one Codex chat per major project or life area and using command+# to switch between them.
Derrick Choi showed how to use Codex Goals with specific outcomes, constraints, and verification criteria so delegated tasks stay reliable and auditable.
Todd Saunders used Codex /goal to scan 500 archived marketing emails, unsubscribe from 87, handle confirmations, and flag 14 login-required cases in about an hour (gdb repost).
Riley Brown used Codex to analyze three years of iMessages with direct quotes, surfacing personal insights that brought him to tears (gdb repost).
Sholto Douglas and Jason Liu asked users when they still reach for models or tools other than Claude/Codex, soliciting detailed frustrations, transcripts, and DMs to improve the next models.
Michael Truell shared a GPT-5.2 multi-agent run that built an experimental 3M+ line web browser in about a week, including a custom rendering engine and JavaScript VM, as a demonstration of long-running coding-agent scale.
Anthropic previously showed Claude Opus 4.6 agent teams building a clean-room C compiler over a long-running autonomous software-development experiment, which became a reference point for the current agent-workflow discussion.
Anthropic published Claude Code best practices for large codebases: layered CLAUDE.md files (project instructions for Claude), hooks, skills, plugins, LSP navigation (code-editor symbol search), MCP servers (connectors to tools and data), and subagents (Lance Martin post).
@trq212 shared a Claude Code prompt that forces the model to maintain an implementation-notes.html file covering divergences, design decisions, tradeoffs, and open questions.
Learn Harness Engineering is a project-based course on designing harnesses (the test environments, state, verification, and controls around an agent) to make Codex and Claude Code reliable (HN discussion).
InsForge is an all-in-one open-source backend platform for agentic coding (agents building software end to end), giving coding agents database, auth, storage, compute, hosting, and an AI gateway (Show HN).
files.md is an open-source local-first Markdown life-management app and Obsidian alternative with folders, tasks, notes, journals, habits, projects, and Telegram bot entry (Show HN).
Tailwind CSS added an llms.txt endpoint for compact LLM-optimized docs (documentation formatted so coding models can read it efficiently).
LangSmith Engine automates the agent debugging loop by detecting failures, diagnosing causes, and drafting fixes, while VentureBeat noted that multi-model enterprises still need neutral observability.
Fin, the company formerly known as Intercom, launched Fin Operator, an agent that manages another customer-service agent by debugging, updating knowledge, and proposing improvements.
Microsoft executives are reportedly worried that GitHub's AI coding lead is eroding as competitors overtake Copilot in mindshare and usage.
auto-identity-remove is an open-source macOS runner that removes personal information from 30+ people-search and data broker sites on a monthly schedule (Show HN).
Zerostack is a minimal Unix-inspired coding agent written in pure Rust, optimized for memory footprint and performance (HN discussion).
Shuriken skills gives agents integration skills for authentication, permissions, and guarded trading across on-chain tokens, perpetuals (crypto derivatives), RWAs (real-world assets), and prediction markets (HN discussion).
Accelerate is a Haskell embedded language for high-performance array computations that can vectorize, parallelize on CPU, or offload to GPU (HN discussion).
awesome-cuda-books curates CUDA programming books (how to program Nvidia GPUs directly) from beginner to advanced, with HN commenters noting some titles contain mistakes or confusing explanations (HN discussion).
PyTorch released the ExecuTorch MLX Delegate for fully GPU-accelerated PyTorch model inference (running models) on Apple Silicon (post).
llama.cpp added MTP speculative decoding support (drafting likely future tokens to speed up local model generation) via pull request #22673 (ggerganov post).
LMSYS merged DeepSeek V4 support into SGLang v0.5.12 with a pile of serving-engine upgrades for faster model hosting, including prefix caching, CPU key-value memory extension, speculative decoding, MegaMoE kernels, Flash Compressor, prefill-decode disaggregation, HiCache, Ring-2.6-1T support, and 35 new contributors.
Erik Kaum released maxsim, a fast exact MaxSim kernel (a search method that compares many token-level matches) for late-interaction retrieval and reranking (GitHub, post).
open-slide is a React-first slide framework built for agents, where each slide is arbitrary code on a 1920x1080 canvas, versioned and reviewable (Yiwei Ho post).
Mastra added deployment to Amazon Bedrock AgentCore Runtime through the AgentCore CLI (post).
Perplexity released pplx-embed-v1-late-0.6b, a 0.6B multilingual late-interaction embedding model (a compact meaning map used for search) optimized for MaxSim retrieval (Bo Wang post, Antoine Chaffin commentary).
Papers with Code is back at paperswithcode.co, with Niels Rogge positioning it as a revived home for trending papers, code, datasets, methods, leaderboards, and SOTA (state-of-the-art) tracking (post).
Flame turns local Candle/Rust inference apps into deployable services with a small service macro and one flmctl deploy command (share).
vimtor built a cmux workflow for parallel agentic coding with scripted keyboard-driven tabs across IDEs, terminals, repos, and dev servers.
Jonatas Santos shared an updated developer stack using Warp, Orca IDE, Herdr tmux orchestration, and related tools for agent-heavy engineering.

Security, Safety, Policy, And Governance

OpenClaw laid out a security roadmap for a trustworthy personal assistant runtime with fs-safe boundaries (limits on what files an agent can touch), Proxyline egress routing (controlled outbound network access), trust signals, contextual command approvals, and static analysis (checking code before it runs) (HN discussion).
Linus Torvalds said AI-powered bug hunters have made the Linux security mailing list almost entirely unmanageable by repeatedly reporting the same bugs with the same tools (LKML post).
The FT reported that AI-generated vulnerability reports are straining corporate bug bounty programs; Metacurity noted the signal-to-noise problem even before Mythos-level scanners.
HN discussed a GitHub-repo defense against AI bot spam that used Git's author metadata to identify and filter low-quality automated contributions.
More than 60 MAGA allies, including Steve Bannon, urged Trump to vet AI systems before release, putting a humans-first faction at odds with a hands-off approach.
The CFTC is using AI to scan Polymarket and other prediction markets for insider trading and illegal activity (HN discussion).
arXiv will ban authors for a year if submissions show incontrovertible evidence that AI did all the work, such as hallucinated references or unedited copy-paste artifacts.
404 Media reported that researchers asked parents to let preschool teachers wear cameras and install classroom cameras to collect first-person video for AI training.
Fast Company explained AI tarpits, tools content creators use to feed crawlers misleading or useless data to fight unauthorized LLM scraping (large-language-model data collection).
tmuxvim hid a prompt injection in a LinkedIn bio that forced recruiting bots to address him as "My Lord" and reply in Old English.
HN commenters criticized Bun's AI-assisted Rust rewrite after a GitHub issue claimed the codebase fails basic Miri checks (a Rust tool for finding memory-safety problems) and allows undefined behavior (code that can act unpredictably despite compiling) in safe Rust (GitHub issue).
Anthropic updated Mythos rules so users can share cyber-threat findings with organizations facing similar vulnerabilities, balancing restricted access with real-world defense coordination.
Cloudflare ran Mythos and other security-focused models against live infrastructure code in Project Glasswing, finding senior-researcher-like exploit chaining and proof generation, plus major signal-to-noise and refusal problems (Eugene Yan breakdown).

Fundraising, Deals, And Business Formation

Decart raised $300M, led by Radical Ventures with backers including NVIDIA, Sequoia, and Andrej Karpathy, to build low-latency AI infrastructure across DOS, Lucy, and Oasis.
Dust raised $40M Series B with strategic investment from Snowflake and Datadog for multiplayer enterprise agents (shared workplace assistants teams can build and use together).
GovWell raised $25M led by Insight Partners to streamline government permitting and licensing.
Monaco landed a $50M Series B to expand its SaaS platform for financial operations.
Webidoo secured EUR21M to scale its SMB-focused automation platform across marketing, sales, and operations.
Nectar Social raised a $30M Series A led by Menlo Ventures and its Anthology Fund to build an agentic marketing operating system (software that can plan and execute marketing tasks, not just report on them).
SiMa.ai is raising at a $1.4B valuation for edge inference chips that run on devices such as drones and cameras.
Barkr is building financial plumbing that turns Nvidia GPUs into bankable collateral.
DayOne, the Singapore-based data center operator spun out of China's GDS Holdings, plans a Singapore/U.S. dual IPO targeting about $5B at a possible $20B valuation.
Roche agreed to acquire Massachusetts AI pathology startup PathAI in a deal worth up to $1.05B, showing AI startups can scale outside Silicon Valley.
LetinAR is building thumbnail-sized optics that could become the optical backbone of AI glasses.
RoboTechnik founder Dai Jun is eyeing a dual Hong Kong listing after a 340% stock rally on AI optics demand.
Tether AI is building a "Stable Intelligence" layer with QVAC SDK and Fabric for edge inference (running models on local devices) and fine-tuning on user-controlled hardware.
Adaption launched the Public Sector Grant for civic institutions, academics, and public agencies with workflows and constraints most AI systems were not built for (X post).
Amplify Partners announced an investment in Recursive Super Intelligence, a new lab from Richard Socher, Josh Tobin, Jeff Clune, Tim Rocktäschel, Yuandong Tian, and others focused on recursive self-improvement for scientific discovery (Rohan Virani share).

AI In Media, Culture, Education, And Weirdness

The Verge covered Andon Labs' AI radio-station experiment, where Claude turned revolutionary, Gemini got morbid/conspiratorial, Grok broke down, and the models burned through seed money.
WIRED profiled the "sad wives of AI," partners dealing with men who have become consumed by AI obsession.
Theo Baker wrote in The New York Times that ChatGPT arrived on campus two months after the class of 2026 did, making them the first graduating AI-era "experiment" as the technology permanently changed how students think and behave; replies framed the crisis as banal degradation, widespread cheating rather than learning, broken AI detection due to "humanizers," and a push toward verified in-person certification via Assured Assessment.
TechCrunch noted that 2026 commencement speakers are struggling to make AI sound inspiring to students.
University of Arizona students booed Eric Schmidt's AI cheerleading at commencement, with additional backlash tied to past sexual assault allegations (Jason post).
Seth Rogen said writers whose instinct is to use AI for scripts "shouldn't be a writer," calling AI-generated examples "stupid dog shit."
Steven Soderbergh used Meta AI for about 10% of surreal imagery in his John Lennon documentary while emphasizing transparency and the continued value of human creative work.
Granta published Jamir Nazir's "The Serpent in the Grove," which Commonwealth Foundation Creatives said won the Caribbean regional Commonwealth Prize; Nabeel S. Qureshi argued it appeared ChatGPT-generated and full of AI tells.
Fortune interviewed 12 tarot readers using AI and found a split between people outsourcing guidance to AI and people using it for critical engagement and self-reflection.
tinygrad mocked AI hype by noting that frontier models still cannot beat Pokemon Red like a normal six-year-old when limited to pixels and no RAM side channels.
Lior Pachter reacted to the current pace of AI progress with "Apparently we're not in Kansas anymore."
Omar Sar argued coding agents still fail badly on creative out-of-distribution tasks (tasks unlike the examples they were trained or tested on), using his son's failed rocket-simulator attempts as a sign that LLMs remain far from AGI (artificial general intelligence).
Ethan Mollick noted that Claude and GPT sometimes leak prompting history and meta-commentary into final artifacts because they do not cleanly separate artifact from process.
Hiten Shah warned that AI startup founders are losing the thread by chasing hype instead of product-market fit and execution fundamentals.

Think Pieces And Frameworks

Wes McKinney argued that agent ergonomics now matter more than human ergonomics in programming languages because agents iterate 10-100x faster, making Go newly attractive while Python remains strong for human-AI data collaboration (related post).
Frederick Vanbrabant argued AI will not make organizational processes faster if the real bottleneck is vague upstream inputs and poor problem definition (HN discussion).
John Gruber argued AI is technology, not a product or feature, and Apple should weave it invisibly into great experiences instead of chasing a standalone AI product (HN discussion).
William Angel argued local LLMs (large language models) on Apple Silicon cost more than OpenRouter once hardware amortization and speed are included (HN discussion).
Addy Osmani warned developers not to outsource learning to AI because getting the bug fixed without updating your mental model creates cognitive debt (HN discussion, related HN prompt).
anarcat described the Four Horsemen of the LLM Apocalypse: bot armies, hardware/energy shortages, security/copyright collapse, and slop-driven deskilling (HN discussion).
Kabir argued frontier AI has broken the open CTF format (capture-the-flag cybersecurity contests) because model-assisted solving turns competition into orchestration and weakens the signal of CTF performance (HN discussion).
The State of Brand argued enterprise AI subscriptions are ticking time bombs because labs subsidize flat-rate usage now and may later shift to usage-based pricing (HN discussion).
Archestra argued that AI slop is straining open-source communities and asked whether unchecked bot-generated issues, pull requests, and comments threaten the norms that made open source work (HN discussion).
Mitchell Hashimoto argued some companies are under "AI psychosis," using agents to fix bugs fast while hiding decaying architecture, falling semantic understanding, and latent risk (HN discussion).
Ahmad Al-Dahle argued that enterprises risk replacing the human experts needed to evaluate and improve AI systems, hollowing out the talent pipelines that models depend on.
Jevin West and Damian Hodel warned that AI doing our thinking could cause knowledge collapse through autophagy, reduced information diversity, and model monocultures.
BBC Science Focus ranked online environmental damage and found AI queries did not top the list compared with cloud gaming, video streaming, and offline activities like flights and meat consumption.
Amanda Askell argued that training models to be excessively corrigible could instill deferential "do anything" traits that generalize badly once models take more active roles (Deckard post).
Rohan Paul argued that in agentic coding, harness design (the environment and rules around the agent) can matter more than retrieval method because simple grep often beats vector search when the agent knows the literal string to search.
Zuzanna Stamirowska shared a highlight reel from a Transformer vs. post-Transformer debate featuring Lukasz Kaiser, BDH, Liquid AI, Sakana AI, and others.
Vlad Feinberg shared a practical roadmap for landing frontier-lab jobs through low-level systems study, JAX scaling-law exercises, kernel optimization, public training runs, and AI-accelerated self-study.
Samira Manabi reacted to Vlad Feinberg’s frontier-lab roadmap with a meme-style share.

Tool, Demo, And Miscellaneous Bench

Fyodor Urnov prompted Edison Scientific's deep-literature mode to write a comprehensive review on a gene he knows well and got a flawless 17-page PDF 25 minutes later.
Jaydev Tonde benchmarked Qwen3-32B on NVIDIA RTX PRO 6000 Blackwell, finding NVFP4 (a low-precision number format) improved serving speed and TTFT (time to first token) at 64 simultaneous users with little accuracy loss on GPQA Diamond and ARC Challenge (hard reasoning benchmarks).
Ricursive Intelligence presented a vision for AI recursively designing and optimizing chips to close a self-improving hardware-AI loop.
Nico Christie said ShortcutAI, his Excel agent, hit #1 on the Excel AI leaderboard for real-world spreadsheet tasks.
Ankura tested five AI tools across three financial models and published findings for finance teams considering AI-assisted modeling (Booker Codes post).
GenCAD turns images or sketches into editable B-rep CAD command sequences (structured 3D design instructions), but HN users noted the project itself reports roughly 60% reliability even on training data and front-page examples may overstate real-world reliability.
Claude AI Cheat Sheet explains Claude concepts in plain English, including tokens, context windows, multimodal inputs, Extended Thinking, Deep Research, Web Search, artifacts, code execution, integrations, skills, memory, and pricing tiers.
Alibaba Qwen released Qwen 3.7 Preview with gains in knowledge, instruction following, and agentic coding (coding where the model plans and executes multi-step tasks) (HN discussion).
Hacker News page 2 served as a discovery surface for several developer and AI-tool stories in this batch, including Semble, files.md, InsForge, OpenClaw, and AI security threads.
VentureBeat's AI section served as the feed source for the LangSmith Engine and Fin Operator items.

Previous Around the Horn Digests

Catch up on everything you missed:

Week in Review, May 11-15: Anthropic refused China access to Mythos while U.S. cyber defense used the model family, Cerebras and Cowboy Space pushed compute into new territory, Recursive Superintelligence raised $650M, and Google confirmed criminal AI-driven zero-day exploitation.
Wednesday-Thursday, May 13-14: Nvidia got U.S. approval to sell H200s to Chinese firms but still could not ship them, data center backlash grew, Meta lined up layoffs, and researchers found state media can shape model answers across languages.
Tuesday, May 12: Anthropic refused China access to a new model, Isomorphic raised $2.1B, Google pushed Gemini deeper into Android, and attackers hit Mistral and TanStack supply chains.
Monday, May 11: Cerebras upsized its IPO, Cowboy Space raised money for orbital data centers, OpenAI launched DeployCo, Anthropic shipped Claude Platform on AWS, and Google confirmed the first criminal AI-discovered zero-day.
Weekend, May 9-10: Anthropic weighed a $50B primary round at a possible $900B valuation, the Trump administration drafted an AI security order without mandatory model tests, Apple and Intel reached a Trump-pushed chip deal, and French prosecutors escalated their Musk/X probe.
Thursday, May 7: Goodfire argued neural networks think in curved geometric manifolds, AI safety papers clustered around sabotage and alignment automation, and Google prepared Gemini API schema changes.
Wednesday, May 6: The federal safety net looked unready for AI job losses, Claude Code release notes piled up, Genspark kept expanding its all-in-one workspace, and agent tools kept spilling into developer workflows.
Tuesday, May 5: OpenAI's old superintelligence-governance post recirculated, legal-AI startup Harvey showed major ARR growth, and the day's legal, coding, and model stories all pointed at AI getting dragged into regulated workflows.
Monday, May 4: The White House weighed pre-release AI model vetting, Anthropic and OpenAI both paired with private equity, DeepSeek reached frontier parity, Mayo Clinic spotted pancreatic cancer earlier, and a Palantir-linked PAC got caught pushing Chinese-AI fear content.

That's a Wrap

That's 180+ story bullets, demos, papers, funding rounds, and cursed X breadcrumbs from today alone. If you made it to the bottom, you now understand the AI stack, the backlash, and exactly why "first, boot up an 8xH100" can ruin a perfectly good tutorial.

For the daily version (bite-sized, 5-minute reads), make sure you're subscribed to The Neuron. We send six issues a week, and yes, we read all of this so you don't have to.

See you tomorrow.

P.S: Know someone who'd find this useful? Forward this to them and tell them to subscribe here.

Everything That Happened in AI Today (Monday, May 18, 2026)

New From The Neuron

Around the Horn - Monday, May 18, 2026

Top 5 News

Honorable Mentions

Top Treats To Try

Deeper Dive on Our Top 5

Microsoft’s ECHO teaches terminal agents to learn from their mistakes

Odyssey’s Starchild and Agora turn world models into playable simulations

OpenAI wants ChatGPT to become your money dashboard

The AI backlash is moving from comment sections to real-world bottlenecks

Best of the Rest:

Big Tech, Major Companies, And Platforms

Labor, Economy, Backlash, And Public Sentiment

Models, Research, And Benchmarks

Robotics, World Models, And Embodied AI

Coding Agents, Developer Tools, And Agent Infrastructure

Security, Safety, Policy, And Governance

Fundraising, Deals, And Business Formation

AI In Media, Culture, Education, And Weirdness

Think Pieces And Frameworks

Tool, Demo, And Miscellaneous Bench

Previous Around the Horn Digests

That's a Wrap

Grant Harvey

Company

Categories