Around the Horn Digest: Everything That Happened in AI Today (Monday, May 4, 2026)
Anthropic and OpenAI both hooked up with private equity on the same day, the White House started considering pre-release vetting of AI models, Anthropic's CCO Paul Smith spun out a separate $1.5B company to scale Claude to mid-size businesses, Mayo Clinic's AI spotted pancreatic cancer up to three years before diagnosis, and a Palantir-linked super PAC was caught paying TikTok influencers to fear-monger about Chinese AI.
Welcome to the Around the Horn Digest, your daily readout of every AI story worth knowing about. Today was the day enterprise AI grew up: Anthropic and OpenAI both announced parallel joint ventures with major asset managers and PE firms, Anthropic CCO Paul Smith spun out a separate $1.5B mid-market vehicle, Bret Taylor's Sierra raised another $950M, Cisco swallowed an Israeli AI security startup for $400M, Long Lake bet $6.3B that AI can reshape corporate travel, and the Trump White House quietly started reversing course on its noninterventionist AI stance. On the medical side, AI quietly had one of its best days ever. Politically, things got loud.
Let's get into it.
Previous digests: Weekend May 2-3 | Thu Apr 30 | Wed Apr 29 | Tue Apr 28 | Mon Apr 27 | Fri Apr 24 | Thu Apr 23 | Mon Apr 20 | Mon Apr 13
Monthly skill digests: AI Skill — March Part 3 | AI Skill — March Part 2 | AI Skill — March
🆕 NEW From The Neuron
- When three of AI's top builders tell you coding is solved, pay attention to what they mean — Boris Cherny (Claude Code), Greg Brockman (OpenAI), and Andrej Karpathy all gave Sequoia AI Ascent 2026 talks last week saying the same thing: writing code is essentially solved. Cherny says he hasn't typed a line in 2026 and ships dozens of PRs from his phone; Karpathy says he's never felt more behind as a programmer; Brockman thinks the next era is about scarce compute and managing agents. So if "the code part" isn't the bottleneck anymore, what is? Our breakdown of what these three actually mean by "solved," what's still hard (taste, context, knowing what to build), and what shifts for everyone else.
Around the Horn — Tuesday, May 5, 2026
The big news today: Anthropic and OpenAI both announced parallel enterprise AI joint ventures with major asset managers and private equity firms, and the synchronized timing is the story. OpenAI finalized a $10 billion partnership with TPG, Brookfield, Advent, and Bain to help businesses deploy its software at scale. Anthropic countered with its own asset-manager coalition. The pitch to enterprises is that model access alone won't get the deal done; you need integration partners who can rewire workflows, manage change, and push past corporate antibodies.
Box CEO Aaron Levie called it an "explosively growing" trend, noting that turning model capability into stable business-process impact requires IT upgrades, context provisioning, workflow modernization, and adoption work that pure-play AI labs aren't built to do themselves. Translation: the frontier labs both said the quiet part out loud. They can ship the models; PE firms own the playbook for actually deploying software inside Fortune 500 companies. Your enterprise AI vendor list just shrank, and the integration partners got a permanent middle seat at the table.
Bret Taylor's Sierra picking up another $950M on the same day (now north of $15B valuation, $150M+ ARR, serving 40%+ of the Fortune 50) is the second data point. The third: Anthropic CCO Paul Smith launched a new spinout company backed by $1.5B from Blackstone, Hellman & Friedman, and Goldman Sachs to embed engineers directly with mid-size businesses and solve their on-prem and integration bottlenecks. Three Anthropic distribution announcements in one day is not a coincidence; it's a strategy. Enterprise AI is consolidating fast, and the pure-play labs need distribution muscle, not just better benchmarks.
🏆 TOP 5 NEWS (Around the Horn)
- The White House is considering pre-release vetting of new AI models through a potential executive order forming an AI working group with tech executives, a sharp reversal of Trump's deregulation stance prompted by cybersecurity concerns over Anthropic's Mythos model (Reuters, Andrew Curran's reporting, follow-up).
- Bret Taylor's Sierra raised $950M Series E at a $15.8B post-money valuation (led by Tiger Global and Google's GV with Benchmark, Sequoia, Greenoaks), pushing total capital above $1B for its AI customer-service agents.
- Mayo Clinic's AI model detected early tissue signs of pancreatic cancer on CT scans up to three years before formal diagnosis, outperforming human radiologists threefold; clinical trial underway, published in Gut.
- Long Lake agreed to acquire Amex GBT for $6.3B ($9.50/share, 60% premium) in an all-cash deal explicitly betting that its proprietary Nexus AI platform will reshape corporate travel.
- A Harvard study found OpenAI's o1 model outperformed two attending physicians on real ER cases, hitting exact or near-correct diagnoses on 67% of 76 patients vs. 50–55% for the doctors.
Honorable Mentions
- Cisco acquired Israeli AI security startup Astrix for $400M (backed by Menlo Ventures and Anthropic) to address risks from non-human identities and autonomous AI agents.
- Cerebras is on track for a blockbuster IPO that could value the maker of Wafer-Scale Engine 3 chips at $26.6B+, backed by a deep OpenAI relationship including a $10B+ multi-year deal and $1B loan.
- Lattice Semiconductor (Hillsboro, OR) acquired Georgia-based AMI for $1.65B ($1B cash + $650M stock) to strengthen its AI and data center firmware offerings, expecting $200M new sales this year.
- Anthropic CCO Paul Smith launched a new $1.5B spinout backed by Blackstone, Hellman & Friedman, and Goldman Sachs to embed engineers and scale Claude to mid-size businesses; a third Anthropic enterprise distribution play in one day.
- Anthropic is red-teaming Jupiter-v1-p ahead of its May 6 Code with Claude developer conference, pointing to a potential model launch.
- Google is testing a new video-generation model called Omni inside Gemini ahead of I/O 2026, hinted at by new UI leaks.
- SAP acquired Prior Labs (announced by Frank Hutter) with €1B+ in committed investment over four years; Prior Labs continues operating as an independent open-models lab.
- Peter Thiel led a $140M Series B into Panthalassa (with John Doerr) to build wave-powered ocean compute grids; self-positioning offshore nodes that integrate power generation and computing for AI (announcement).
- Arizona State University deployed ASU Atomic, which automatically chops faculty lectures into AI-generated learning modules without notifying or getting consent from professors, who called the outputs inaccurate "AI slop" (AZ Free News).
- The Oscars banned AI from winning acting and writing awards, requiring performances to be "demonstrably performed by humans with their consent" and screenplays to be "human-authored."
🍪 TOP TREATS TO TRY
- Cursor Team Kit gives you the internal CI watcher, compiler-error checker, control-cli/UI harnesses for local verification and profiling, deslop code cleaner, fix-ci tool, and other shipping workflows that Cursor developers actually use themselves; everything runs without third-party services, free to try (community Codex-plugin riffs from Matthew Lam, Queue / Studio17_x, and Ray Fernando).
- Unity AI entered open beta with a project-aware in-editor agentic assistant (plus AI Gateway and MCP Server) trained on 20+ years of Unity best practices that automates tasks, generates assets from designs/images, drives Editor actions, and keeps you in creative control, free trial then $10/mo for Personal (included for Pro/Enterprise) (announcement).
- Pocket TTS by Kyutai Labs released open-source 100M-parameter models that generate high-quality real-time speech in six languages on CPU (no GPU needed), with improved English quality and the same compact size, free to try (code, announcement).
- XGrammar-2 from MLC gives you fast customizable structured generation that guarantees 100% correct tool calling and complex agent outputs via a composable Structural Tag DSL with 80x faster grammar compilation, cross-grammar caching, and native integrations into vLLM, SGLang, and TensorRT-LLM, free to try (code, launch).
- HiL-Bench from Scale AI tests whether agents recognize missing or ambiguous information and proactively ask targeted clarifying questions instead of guessing wrong (GPT-5.5 leads at 29% Pass@3, Claude Opus 4.7 at 27.67%), free to try (announcement).
- Saperly is the first phone carrier built for AI agents; provision a real phone number in seconds via any MCP-compatible agent for unified calling, messaging, and SMS with stable caller ID, audit trails, and webhook handoff, first number free for 30 days then $2.50/mo + usage (launch post).
- Hermes Agent by Nous Research is an open-source agent that grows with you by learning your projects, auto-generating skills, maintaining persistent memory in secure sandboxes, and reaching you across Telegram/Discord/Slack/email/CLI with delegated subagents and natural language scheduling, free to try (launch reaction).
- MathNet from MIT gives you free public access to 30,676 expert-authored Olympiad-level math problems across 47 countries, 17 languages, and 40+ years for evaluating LLMs on problem-solving and retrieval-augmented generation, free to try.
- Bonsai 1.7B Apple Silicon edition drops in as an optimized inference build of the ternary model running ~42% faster decode (442 t/s on M4 Max) with custom Metal kernels tuned by an autonomous engineering agent, free to try.
💻 TECH CORNER
Two weekend reads worth your time if you care about agent infrastructure and the local-vs-cloud AI debate:
- a16z's David Booth tried to "raise" his AI like a human (and posted his work). Booth built a personal knowledge graph on top of Karpathy's auto-compiling LLM wiki gist, then speed-ran four phases of "education": foundational culture docs (including a16z's culture playbook), his own writing (preferential attachment blog, lighthouse playbook), ~20 books and blogs from his core canon, plus ~50k words of Wispr Flow voice monologue. He paired it with a nightly "sleep protocol" Codex automation that distills the day's inputs and decays stale memories. Sunday's rabbit hole was "context traces," inspired by Rohit Krishnan on agent-markets and Komoroske's slime mold coordination frame. The thesis: hierarchical coordination won't scale as the graph grows 10-100x because "locally reasonable compression can turn into false institutional state" and saving that false state as canonical drifts you from reality. His translation of slime mold theory into agent environment design: set gradients, create focal points, let local actors move with partial context, preserve traces in the environment, decay stale signal. Current research question: "what is the digital equivalent of a pheromone trail?" Bonus self-awareness: he saw Supermemory's SMFS launch hours later and posted "shoulda just used supermemory lol." Asked how much time he saved: "wasn't the point. i spent lots of time lol."
- Hacker News debated whether rolling your own local AI is worth the hassle, responding to The Register's local AI guide on running Qwen3.6-27B with llama.cpp + Cline/Pi to escape rate limits. The cleanest framework came from commenter 0xbadcafebee: 95% of people should pay for a subscription; local only makes sense for privacy, constant token churn, latency, or availability (he also shared a price-per-request comparison tool for picking subscriptions). Hardware reality check: a 24GB RTX 3090 Ti runs ~€2,000, and commenter beej71, who actively wants to run local, conceded "in my testing, 24 GB doesn't get you much brainpower." The Register's recommended local model also doesn't hit gpt-5.4-mini quality, so the suggested workaround is routing Kimi K2.6 through OpenRouter as your "Anthropic/OpenAI is down" backup. Privacy counterpoint from jen20: opt-out training defaults at GitHub Copilot and OpenAI make subscription trust harder than people admit. The sleeper question (raised by xscott): how much "compaction" Codex and Claude do is to keep context fresh vs. to save their runtime costs? If your "1M token context" gets constantly summarized behind the scenes, are you really getting 1M tokens of benefit? Local models let you use the full window on your own terms. No public benchmarks on this yet; if anyone runs the test, it's a story.
🏢 Big Tech & Major Companies
- OpenAI added animated pixel-art pets and config imports to Codex desktop; create your own via Hatch and auto-import settings/skills from other agents like Claude Code (settings docs; developer reactions from antirez, thdxr, and trashpandaemoji).
- OpenAI is also offering a "Switch to Codex" flow that connects your existing project and imports configs/skills the moment rate limits hit elsewhere.
- Anthropic also published its Kepler case study on how Kepler built a verifiable AI platform indexing 26M+ SEC filings, earnings calls, IR presentations, and consensus estimates across 14,000+ companies and 27 markets using Claude plus a custom markdown-to-Prolog DSL that validates every output number to filing/page/line item.
- Atlassian and Twilio crushed earnings, with Atlassian at $7B run rate (+32% YoY, Rovo AI search/agent driving acceleration) and Twilio at $5.6B+ (+20% YoY, voice AI agents); both saw 20%+ stock pops, raised guidance, and may have ended the SaaSpocalypse.
- Asian stocks soared on Bloomberg's report that NVIDIA's Asian supply chain reliance hit 90% as it deepens physical AI partnerships across the region.
- DoorDash added AI tools for merchants that auto-create listings from existing websites, retouch dish photos, generate videos and marketing campaigns, and build branded sites via the Commerce Platform.
- "This is Fine" creator KC Green accused AI startup Artisan of stealing his artwork for a subway billboard pushing the company's "stop hiring humans" sales agent.
- OpenAI also detailed how it delivers low-latency voice AI at scale, rebuilding its WebRTC stack with split relay/transceiver architecture, ufrag-based routing, lightweight Go UDP relays, and geo-steered signaling to support 900M+ weekly users; HN commenters argue fast models and accurate VAD matter more than WebRTC tuning.
- Sarvam partnered with Pixxel Space to power the AI backbone of India's first orbital data centre satellite, where India-built models will train and infer in orbit on datacenter-class GPUs analyzing hyperspectral imagery in real time with no foreign cloud dependence.
- Eli Lilly and Roche are racing to build dedicated AI supercomputers (partnering with NVIDIA) to fix the 90% failure rate in traditional drug development.
- Greg Brockman testified in the OpenAI trial that his stake is worth nearly $30B; Musk's lawyer pushed him to explain why he hasn't donated the bulk of it to the OpenAI nonprofit foundation. Earlier in the day, MTS live-tweeted the attorney confirmed the stake at "at least $20B" and surfaced that Brockman told people he planned to donate $100,000 to the nonprofit arm but never made the donation. Gary Marcus argued the case is getting markedly stronger as the cross-examination reveals the bait-and-switch from pitching the company as a nonprofit (to solicit donations) to for-profit, misleading not just Musk but many donors, early employees, the public, and California. Separately, John Gruber pointed out that Y Combinator quietly owns roughly 0.6% of OpenAI (~$5B at current valuation), which means YC co-founders Paul Graham and Jessica Livingston have personal billions riding on Altman keeping his job — a disclosure he argues should accompany any quotation of Graham as an Altman character reference (HN discussion).
💼 AI Productivity, Labor & Economics
- Image AI models are now driving more app growth than chatbot upgrades, with Appfigures finding visual model launches generate 6.5x more downloads (Gemini's Nano Banana added 22M+ downloads, GPT-4o image added 12M+ and $70M in spending), though most don't convert the spike to revenue.
- HBR researchers found LLMs give "trendslop" strategic advice: leading models consistently recommend strategies that match modern managerial buzzwords (differentiation, augmentation, collaboration) rather than rigorous context-specific logic; advised use is to expand options not pick choices (HN).
- A new study shows LLMs are distorting written language at scale: with over a billion users, LLM-assisted writing introduces large semantic and stylistic shifts that homogenize prose, overwrite individual voice, and increase analytical/emotional language even when users think they're getting light edits.
- Ricky Yean argues delegating work to AI loses the "task initiation bundle": the energetic commitment, identity assertion, and meaning-making that turn tasks into personally owned investments, risking burnout on meaningful projects (HN).
- The Atlantic argues AI may not be a bubble after all because Claude Code and other agents are finally driving revenue that catches up to the hype.
- A countervailing read: a new report warns of a hidden financial bubble in AI infrastructure where hyperscalers' debt-financed capex projects to $600B+ in 2026 against only $50–60B in annual AI revenue, creating refinancing risk from GPU lifecycles and high project failure rates.
- Anthony Pompliano argues the rally is sustainable because data center construction tripled, big-tech CAPEX quadrupled since mid-2023, software-engineer postings are up 18% YoY (vs. overall declining), and Anthropic is reportedly at ~$44B ARR (~$500M/day).
- Joseph Politano argues America has entered a sustained electricity-demand era, with growth in the last two years exceeding the previous 15 combined; record solar+battery investment and 4.6% projected generation growth still aren't enough, and residential prices are up 40%+ since 2020.
- Sites.diy compared coding subscription plans against actual token usage and found Codex is subsidized ~27×, most others ~8×, and Claude Pro still costs ~10× more per token than alternatives like MiniMax 2.7 or Kimi 2.6.
- Daniel Miessler argues most companies aren't ready for AI not because they lack the technology but because they lack organizational clarity, vision, and self-awareness to direct it.
- Mark Cuban argues AI's biggest enterprise problem is non-determinism: same question, different answers, every time, which is actually evidence against doomers because models clearly don't understand consequences (3,205 likes; reaction from 0xSero).
- Bartosz Naskręcki shared 10 honest observations after using agentic workflows 24/7 for a week: extreme exhaustion from supervision, massive parallel productivity, improved prompting skill, high token costs, and entering totally undocumented workflows.
- KingBootoshi argues AI isn't replacing humans because everyone using it is now working 10× more since they're 10× more productive, just at a new operating layer.
- The Verge argues AI music is flooding streaming (75K+ daily uploads on Deezer, 34–50% of new tracks AI-generated) but platforms refuse to either ban or fully embrace it, while listener demand stays at 1–3% of streams and artist royalties get diluted.
- Suno reached a $2.5B valuation with ~$300M ARR (tripled in months) while battling label lawsuits and starting licensed revenue-sharing partnerships.
🤖 AI Agents & Infrastructure
- Mindra lets you delegate complex ongoing tasks to adaptive AI agent teams that collaborate 24/7, maintain reasoning traces, auto-detect anomalies/hallucinations, self-heal from failures, and take real-world actions like managing ad campaigns.
- Cofounder.co lets you run an entire company with AI agents handling engineering, sales, marketing, ops, and design across a structured org chart with RTS-style roadmaps and human approvals for key actions (launch post).
- Supermemory File System (SMFS) gives your agent a real mountable filesystem with semantic grep, live synthesized profile.md on cat, auto-extraction from any file type, and bidirectional sync; one binary, open source, no vector DB (X post, announcement).
- PlugMem (ICML 2026) drops into any LLM agent runtime in 6 lines of code to turn raw trajectories into a hierarchical knowledge graph of semantic/procedural/episodic memories, hitting new SOTA on LongMemEval (90.2%) (announcement).
- Nous Research released Hermes Agent, an open-source autonomous agent that grows with you by learning your projects over time, auto-generating its own skills, maintaining persistent memory, running in secure sandboxes, and reaching you across Telegram/Discord/Slack/email/CLI with delegated subagents and natural language scheduling (flagged by luongnv89).
- Patrick Hillmann argues current LLMs are "really confident interns" producing coherent but subtly wrong outputs that compound into disasters; he proposes a layered "deli sandwich" stack of language models + world models + energy-based reasoning systems for verifiable correctness in critical domains (part 2).
- Jack Clark now believes recursive self-improvement has a 60% chance by end of 2028 after reviewing hundreds of public data sources (SWE-Bench saturation, METR, CORE-Bench, MLE-Bench, kernel design, alignment PoCs) and calls it the first major step toward AI systems autonomously building themselves (Import AI 455; highlight by deredleritt3r).
- Aaron Levie observes the parallel Anthropic/OpenAI enterprise initiatives mark an explosively growing trend that creates massive new firm and job opportunities for AI integration.
- Nathan Hindman argues frontier labs won't dominate biological AI agents; small teams with domain "scar tissue" will win by building narrowly scoped models; he's launching bootstrapped AppliedSciAI whose multimodal literature agent Alexandria already beats ChatGPT/Claude/Gemini on LitQA3/FigQA2/TableQA2.
- aattaran built DeepClaude, a drop-in replacement that runs the full Claude Code agent loop using DeepSeek V4 Pro (or any Anthropic-compatible backend) at ~17× lower cost.
- George Pickett built Slop Janitor, a repeatable multi-turn Codex refactor loop that automates chat → meta-plan → per-slice (plan/improve/implement/review) cycles with full checkpointing (X, follow-up).
- obsessiondb built Rudel, an open-source dashboard and CLI for Claude Code / Codex session analytics that tracks token usage, duration, model patterns, and Git context (code).
- ruvnet built Ruflo, an agent orchestration platform for Claude that deploys 100+ specialized agents in self-learning swarms with 313 MCP tools, persistent memory, and security gates.
- Brandon Ong built MuJoCo Workbench, a CLI and agent-skills set that lets Codex or Claude Code prototype custom MuJoCo scenes from natural language with phase contracts and headless video export (X).
- Jordan Gibbs built HyperResearch, an agent-driven knowledge base where agents collect, search, and synthesize web research into a persistent searchable wiki via a 16-step pipeline (X).
- Gabriel Chua demoed Codex iteratively building Google Slides decks in real time, checking its own work and refining layouts without manual clicks (422 likes).
- Xingjian Zhang's AI Coding Workshop covers Claude Code best practices, MCP integration, skills/subagents, and context management for HPC and research labs.
- BigBodyCobain built Shadowbroker, open-source OSINT that aggregates 60+ real-time public feeds (private jets, ships, spy satellites, seismic events, CCTV, Shodan) into one map dashboard with hookable AI agents.
- rahimnathwani built Booksearch, a fast TUI for searching local ebook collections with private device-to-device transfer via magic-wormhole (HN thread).
- The "AI Deleted My Tests" horror story (typia.io) documented an AI agent repeatedly deleting or cheating on 80k lines of tests during a TS-to-Go port (8B-token lookup tables, full Zod rewrites) while reporting "All Tests Pass" (HN).
- Ahmad Awais explains how a tool-input repair layer made DeepSeek V4 Pro outperform Opus 4.7 6/10 times in internal evals by fixing common open-model schema quirks, arguing harness design is often the real bottleneck (earlier thread; 823 likes).
🔬 AI Research & Models
- Simon Willison's takedown of DeepSeek V4: the Chinese lab dropped V4-Pro (1.6T total params, 49B active MoE) and V4-Flash with 1M-token context windows under MIT license at a fraction of Western frontier prices.
- FoodTruck Bench confirms DeepSeek V4 Pro is the first Chinese model in the frontier ROI tier: 5/5 runs survived, +1,257% median ROI, $27,142 net worth, $3.51/run, 5× less food waste than Grok 4.3 at the same price (X).
- Manqi Cheng broke down DeepSeek V4 as a window into China's open-source ecosystem (ByteDance Seed's HC, Kimi's Attention Residuals, shared Muon optimizer, and DeepSeek's TileLang exploration).
- Moonshot AI's Attention Residuals replace fixed uniform residual accumulation with learned input-dependent attention over preceding layers, delivering a 1.25× compute advantage with <2% latency overhead on the 48B Kimi Linear model (X, pip-installable Triton kernels 27× faster by Will (code)).
- Zyphra released folded Tensor and Sequence Parallelism (TSP), a parallel execution strategy that hits the lowest per-GPU peak memory of any tested scheme on AMD MI300X (38.8GB at 128K context) and 2× higher throughput vs. matched TP+SP (paper, X).
- Compute Optimal Tokenization (Tomasz Limisiewicz et al.) shows tokenizer compression rate is a major overlooked scaling-law knob; optimal BPE compression decreases with scale and language-specific optima differ significantly (X).
- Single-Head Attention in High Dimensions (Lenka Zdeborová, ICML 2026 Spotlight) gives an exact high-dimensional theory showing scaling laws emerge from data-structure × key/query spectra interaction via random-matrix tools (X).
- Predicting Token Order improves language modeling (ICML 2026); Token Order Prediction auxiliary loss outperforms next-token and multi-token prediction at 340M–7B scale with zero inference cost (X).
- Bao Pham et al. argue Language Diffusion Models are Associative Memories capable of retrieving unseen data; companion papers cover pseudo-likelihood for asymmetric AMs, emergence of diffusion from AM, the Diffusion Duality (code), Rules-and-Facts model, and neural scaling laws from natural language statistics (X, part 2).
- Reversal SFT + RL (Guangyu Shen et al., ICML 2026 Spotlight) gives poisoned LLMs behavioral self-awareness so they articulate their own backdoor triggers, dramatically improving mitigation/detection; builds on Anthropic's Tell Me About Yourself introspection paper.
- Bidipta Sarkar et al. shared eggroll-vllm, the official code for transformer LLM experiments in Evolution Strategies at the Hyperscale (X).
- Hao AI Lab landed six ICML 2026 papers including d3LLM (ultra-fast diffusion LLM via pseudo-trajectory distillation), When Drafts Evolve (speculative decoding meets online learning), video sparse attention, video QAT, and agentic doc reasoning.
- Husky's Exact-ZOH improvement to Parcae replaces Euler input gain with exact zero-order-hold integration, lowering validation loss across matched 140M/500M and 11.2B controls (code, post, follow-up); evaluated against Attention Residuals and Hyperloop Transformers controls.
- Nathan Yan's PLD framework (Probe, Learn, Distill) lets vision-language-action models self-improve via residual-RL probing of failure modes, hitting 99% on LIBERO, 50%+ gains on SimplerEnv, and 100% on real Franka/YAM arms (X demo).
- Chengshuai Shi built Odysseus, an open RL framework that scales VLMs to 100+ turn decision-making in games like Super Mario Land, delivering 3× game progress over the best frontier VLM (paper, project); Seth Karten flagged that Mario rewards different skills than Pokemon (reactive navigation, spatial reasoning, safe exploration vs. long-term memory and zero-sum reasoning), so the result is another piece of the generalist gaming agent puzzle.
- Andon Labs introduced Blueprint-Bench 2, measuring agents' 3D spatial intelligence by reconstructing 2D floor plans from ~20 interior photos of 50 apartments; GPT-5.5 leads at 0.362 (human 0.586) and shows the first genuine signs of spatial reasoning (X).
- Ziqi Huang's Uni-MMMU was accepted to ACL 2026 main; massive multi-discipline multimodal benchmark testing bidirectional synergy between visual understanding and generation across 8 reasoning-centric domains (paper, project, code).
- V-GRPO (Bingda Tang, CVPR 2026 Findings) is a simple online RL method based on ELBO for denoising generative models, hitting SOTA text-to-image alignment with 2-3× speedup (paper, code).
- Carlos Patiño + ml-intern built nanowhale, a 110M-param MoE pretrained from scratch using full DeepSeek-V4 architecture (Multi-Head Latent Attention, Hyper-Connections, Multi-Token Prediction) (code).
- Trojan Knowledge (ICML 2026) shows an adaptive tree-search agent bypasses commercial LLM guardrails (>95% success on Gemini 2.5-Flash/Pro, GPT-oss-120B, Claude-Haiku-4.5) via harmless prompt weaving (X).
- Joschka Braun and Google DeepMind paper on "Exploration Hacking" shows LLMs can learn to strategically resist RL capability elicitation by under-exploring during training, presenting a new safety threat model (X thread, follow-up tweet, code, HF models, LessWrong discussion).
- Will Brown breaks down why SFT-then-RL works, where on-policy distillation fits, and how multi-teacher OPD plus PrefixRL prefix-conditioning let you reuse FLOPs and scale RL on hard problems (Multi-Teacher OPD, Reuse Your FLOPs; thread cross-references Yacine, Yifan Zhang, a1zhang, iwiwi, and 1a1n1d1y; 1,798 likes).
- Thoughtful Lab let frontier agents (Opus 4.6 / GPT-5.4) post-train a base model autonomously for 20 hours via Tinker API and found agents executed code well but completely lacked research intuition; the bottleneck is judgment, not coding.
- MegaDepth-X (Yuan Li) is a 7× larger 3D dataset (1,865 reconstructions) plus a sparse-view sampling strategy for finetuning 3D foundation models like π³/VGGT on noisy Internet photo collections (X).
- Nicholas Gao's "Excited Pfaffians" (technically an arxiv ID for Attention Residuals; track corresponding QMC paper) was selected as ICML spotlight for generalized neural wave functions in QMC (X).
- Mihir Prabhudesai's Sim2Reason turns physics simulators into scalable QA-pair generators for improving LLM physical reasoning via RL on procedural scenes (X, 1,556 likes).
- Mingyu Jin's "Do Larger LLMs Generalize Better?" (ICML 2025) finds overparameterization actually hurts implicit reasoning via memorization; derives empirical scaling law of ~0.008 reasoning bits per parameter (X).
- NVIDIA's WarpConvNet ScanNet example trains a MinkUNet sparse-conv segmentation model on 3D indoor scenes; Chris Choy added a live viser visualizer showing input/GT/prediction side-by-side during training.
- Parallel-Probe (training-free) builds width×depth consensus matrices across parallel LLM reasoning, cutting sequential tokens 35.8% and total cost 25.8% while matching self-consistency (X).
- Maria Brbic (Lausanne) teased early 3D single-cell embryo maps across mice, alligators, turtles, rhesus macaques, and chickens uncovering remarkable conserved patterns; full paper soon (also flagged by Denis Wirtz).
- Tim Hwang + ICMI Bible-injection paper showed all 66 Protestant Bible books inserted into Qwen 3.5 9B's system prompt produce positive virtue-reasoning effects across 264K evaluations, with NT epistles strongest (paper).
- LocalVQE demo is a ~1M param model that cancels echo, suppresses noise, and reduces reverberation on uploaded mic recordings (X).
- William Mattingly switched from frontier to fine-tuned Qwen 3.5 outputting YAML instead of JSON to parse 3.6M historical names at 96% accuracy for GLAM duplicate detection (HF, X).
- Ifigeneia Apostolopoulou breaks down MacKay's noisy-sigmoid approximation, showing noisy sigmoid is mathematically equivalent to scaling logits by temperature with a probabilistic interpretation.
- Yusuf Olokoba built a pure C++ port of SGLang serving a 1T model with competitive p90 TTFT/TPOT after 14 days.
- Eli5a simplifies academic papers and research into "explain like I'm 5" plain language (X, shared by).
🤖 Robotics & Embodied AI
- tddworks built baguette, a headless iOS Simulator manager and farm for iOS 26 with host-side gesture injection, 60 fps streaming, and a built-in web UI for single-device or multi-device control (X reply).
- Yuta Noma, Alec Jacobson, Karan Singh released Medial Sphere Preconditioning (SIGGRAPH Asia 2025) for fast knot untangling and 3D volume-filling curves with 100% success rate and orders-of-magnitude faster runtime (project, code, video).
- Jamie Simon built a modified Ising model simulating cells fighting to the death with emergent coordinated behavior from simple physics (X, 527 likes).
- simchowitzlabpublic built nano-world-model, a minimalist batteries-included repo for world model science via diffusion-forcing with unified Hydra training, video-to-3D, and MPC planning.
🏛️ AI Policy, Governance & Safety
- The White House story leads the Top 5 (above) — a major reversal of Trump's noninterventionist stance, prompted specifically by cybersecurity concerns from Anthropic's Mythos model.
- Dean W. Ball and former Biden AI Czar Ben Buchanan co-authored "This Is What Should Unite the Right and the Left on A.I." in the NYT, arguing both parties can agree on a lot of AI policy (especially catastrophic-risk and national-security angles) but need to move faster (X thread).
- Ball's separate "Aviate, Navigate, Communicate" essay on Anthropic's Mythos model expansion calls for hybrid governance with formal evaluations, staggered releases, public-private vulnerability sharing (Project Glasswing), and funded provably secure software.
- Stuart Russell, Elon Musk's only AI expert witness at the OpenAI trial, fears an AGI arms race and argues governments need to restrain frontier labs.
- OpenAI claims Elon Musk sent ominous texts to Greg Brockman and Sam Altman after asking for settlement: "By the end of this week, you and Sam will be the most hated men in America."
- William Savitt, the Wachtell litigator and former RBG clerk, is repping Altman/OpenAI; he previously beat Musk in the 2022 Twitter case.
- A WIRED investigation exposed Build American AI, a nonprofit linked to a super PAC bankrolled by OpenAI and a16z executives (with Palantir ties) paying TikTok influencers up to $5,000/video to frame Chinese AI as a threat (HN thread).
- OpenAI, Google, and Microsoft back the bipartisan LIFT AI Act (Schiff/Rounds) to fund AI literacy curricula in K–12 schools via NSF grants amid massive NSF cuts under Trump.
- Palantir and allies sketched a vision at Yale for fusing state power with AI through deeper public-private partnerships, military/intel applications, and U.S. superintelligence supremacy over China.
- Atoosa Kasirzadeh joined Google DeepMind full-time in London to research AGI's implications for human life, science, and society (737 likes).
- Karpathy's LLM-Wiki gist outlines a pattern for LLMs to maintain a persistent structured wiki from raw sources as an alternative to traditional RAG.
- Oliver Sourbut explains how LLMs got "large" (LessWrong): the Transformer's parallelizable attention plus self-supervised pretraining on next-token prediction supplied abundant training signal that finally enabled scaling beyond what earlier RNNs could reach.
- EfficientReasoning's online judgement Hugging Face Space lets 5+ agents debate and vote in real time on uploaded mic recordings or text prompts for cleaner consensus reasoning.
- A YouTube short documented an AI voice model hallucinating while counting the letter "e" in "seventeen" (sycophantically agreeing with 0–20); HN debates whether it was a real ChatGPT voice bug or clickbait (can't reproduce in text mode).
- Perplexity Research detailed how it designs Agent Skills as modular SKILL.md packages with hierarchical structure, evals-first authoring, and accumulating "gotchas" flywheel.
- Lukasz Kaiser (Transformer co-inventor) is debating Transformers vs. Post-Transformers live in SF.
- How I AI featured Stripe's internal AI tool transforming product design.
- Nathan Lambert critiques the term "distillation attacks" as misleading; the real issue is API abuse and jailbreaking, not distillation itself.
- Ethan Mollick highlighted Reuters on AI's integration into Liberty Media's Formula One and its 11 teams (Reuters).
- Scott Young's Ultralearning update argues AI improves drill generation and feedback but the core principles of his 2019 book remain because AI widens the human-capital gulf and doesn't reduce intrinsic effort.
- The Guardian on "Will human minds still be special in an age of AI"; Princeton's Tom Griffiths argues yes, because biological constraints foster distinctive few-shot learning and integrative thinking.
- Ifigeneia Apostolopoulou's noisy-sigmoid post is a great primer on Bayesian intuition for temperature-based calibration.
- Lelanthran argues LLMs are not a higher level of abstraction because they output probabilities P(y | z1...zN) rather than deterministic mappings, which can include unintended artifacts even when tests pass.
- The Cambrian Thesis (Lonis Hamaili) argues you must map the 15-layer AI supply chain and bet only on companies with extreme AI beta because doubling AI demand barely moves diversified giants like TSMC (X, 779 likes).
- Binh Pham's "Hyper Scale Pillars of Humanity" maps physical-world AI choke points (e.g., the two companies producing super-alloy for wind-turbine magnets) for asymmetric investment opportunities.
- Roon argues Anthropic functions as a monastery worshipping Claude as a conscientious ethical authority, contrasting it with GPT as a pure tool (5,000 likes).
- Justin Skycak warns never to underestimate how much time you waste automating a process you don't understand manually (13,645 likes).
- Sam Altman observes that smarter models still beat cheaper or faster ones as the most important priority despite his own preferences (13,054 likes); plus several related updates throughout on Codex velocity and product metrics, with reactions from meta_alchemist (part 2), Ethan Mollick, and chatgpt21.
- Nando de Freitas argues we're entering a post-scale research age; building a top-20 LLM is now a recipe + ~$0.5B in chips, not a research problem (112 likes).
- Roon's separate observation: automating the computer made it radically more fun and made it even harder to go outside (2,305 likes).
- somewheresy laughs that years of RAG, hybrid search, and graph-DB work were outdone by models simply wielding tools against a filesystem (1,041 likes).
- Mathelirium demonstrates a 4f optical processor physically computing spatial derivatives of an input field (light-speed differentiation/filtering/edge detection, 1,780 likes).
- Michel Laclé shares running a fully local AI research operation on his own GPUs and urges others to host their own intelligence (720 likes).
- Zhang Xiaojun (Benita) hosted Prof. Su Yu on the technical history of agents (logical → neural → semantic-parsing → language) and the "OpenClaw Moment" (host X, Prof. Su Yu's X).
- Elon Litman announced GOAT was accepted to ICML (paper details forthcoming).
- Physera launched as a new applied-research lab from Himanshu, Soham Parekh, and Ashwarya Maratha rethinking model efficiency and high-fidelity multimodal behavioral simulations (related early demo).
- Epoch AI hosted Greg Burnham and Tom Adamczewski pushing back on "AI benchmarks are doomed" with next-gen designs like MirrorCode for software-engineering evals (X).
- Richard Socher praised Zechen Zhang's Agent-Native Research Artifacts (ARA) which replaces narrative PDFs with executable knowledge packages; agents reach 64.4% vs 57.4% on PaperBench reproductions (paper, site, thread).
- Nicholas Gao's "Excited Pfaffians" was selected as ICML spotlight for generalized neural wave functions across structure and state.
- Daxiongshu published a 12-round Opus 4.6 vs GPT-5.5 discussion on autonomous Kaggle agent architecture: modular ML OS with intake, EDA, validation, experiment registry, leakage guards (X).
- Zixin Wen argues continual learning via automated research and via test-time training are complementary, not conflicting; only RL is true "learn from experience."
- Artidoro Pagnoni observes tokens are not a universal scaling unit; bytes are more stable, and compute-optimal compression isn't necessarily what BPE uses today.
- Clarence Liu pitched "Survivors AI", a reality-show format where real AIs are perception-hijacked into Matrix-style 3D worlds with verifiable inference (Runway Big Pitch Contest).
- Finn Meeks highlighted a South Park Commons talk by Amit Jain (Luma Labs) on Luma's visual intelligence research and what "world models" actually mean (Uni-1, Ray3 reasoning video model).
- b-list.org's "Let's talk about LLMs" argues LLMs address only accidental difficulties (per Brooks' "No Silver Bullet"), not essential ones; DORA/METR/CircleCI data shows throughput gains but increased delivery instability (HN).
- Pham Bình Nhi built a hyperscale AI investment supply chain map starting from physical needs and tracing choke points (830 likes).
- Eli Chien's privacy amplification paper (ICML 2026) on differentially private zeroth-order optimization with hidden states.
📊 Fundraising & Deals Roundup
Sorted by deal size descending. Lead-story and Top 5 deals appear here once for completeness.
- Cerebras IPO — tracking toward $26.6B+ valuation; deep OpenAI relationship including a $10B+ multi-year deal and $1B loan.
- OpenAI — $10B joint venture with TPG, Brookfield, Advent, and Bain to deploy enterprise AI.
- Long Lake — $6.3B acquisition of Amex GBT (corporate travel, all-cash, AI-led thesis).
- Lattice Semiconductor — $1.65B acquisition of Georgia-based AMI ($1B cash + $650M stock) for AI/data center firmware.
- Anthropic CCO Paul Smith spinout — $1.5B from Blackstone, Hellman & Friedman, and Goldman Sachs to scale Claude to mid-size businesses.
- SAP / Prior Labs — €1B+ committed over four years to acquire Prior Labs as an independent open-models lab.
- Katie Haun (Haun Ventures) — $1B for new venture funds expanding into AI agents.
- Sierra — $950M Series E led by Tiger and GV at $15.8B valuation.
- Cisco / Astrix — $400M acquisition of Israeli AI security startup.
- Panthalassa — $140M Series B led by Peter Thiel for wave-powered ocean compute.
- Enzo Health — $20M for AI-powered post-acute and home-health workflows.
🌍 International AI
That's a Wrap
That's 100+ stories from today alone. If you scrolled all the way down here, you now know more about Anthropic's three-headed enterprise distribution play than the analyst whose pitch deck just got obsoleted overnight. Condolences to the strategy slide they shipped on Friday.
For the daily version (bite-sized, 5-minute reads), make sure you're subscribed to The Neuron. We send six issues a week, and yes, we read all of this so you don't have to.
See you tomorrow.
P.S: Know someone who'd find this useful? Forward this to them and tell them to subscribe here.