Everything That Happened in AI Today Tuesday, May 12

China asked for Anthropic's newest model, Anthropic said no, and the same model family is already showing up in U.S. government cybersecurity work.

Welcome to the Around the Horn Digest, the one page you need to sound dangerously informed at work tomorrow. Today had the full 2026 AI news bingo card: a geopolitical model-access fight, Google trying to turn Android into an assistant that does things before you ask, OpenAI trial drama with Musk lore, a $2.1B drug-discovery round, and a developer supply-chain incident that made npm and PyPI feel like abandoned amusement-park rides. At this point, the safest dependency is probably a printed manual and a candle. Let's get into it.

Around the Horn - Tuesday, May 12, 2026

The biggest story today was the model-access line turning into an actual geopolitical line. China's representatives reportedly approached Anthropic at a Singapore meeting to demand access to the company's newest model, and Anthropic refused, with Annmarie Hordern flagging the moment as a new visible point in the U.S.-China AI rivalry.

That would have been enough on its own, but the broader context made it bigger: POLITICO framed Mythos as a China-summit flashpoint, while Reuters reported the Pentagon is deploying Anthropic's Mythos cybersecurity model to find and patch vulnerabilities across U.S. government systems even as the department races to transition away from Anthropic.

So the real story is not just "China wants the model." It is that the newest frontier models are now being treated like strategic infrastructure: useful enough for government cyber defense, sensitive enough to deny to a rival state, and powerful enough that model access itself is becoming a diplomacy problem.

🏆 TOP 5 NEWS (Around the Horn)

Isomorphic Labs raised a $2.1B Series B led by Thrive Capital to scale AI-driven drug discovery, with Axios and R&D World noting the Alphabet-backed Demis Hassabis company has now raised about $2.6B total.
Google pushed Gemini deeper into Android with proactive task automation, widgets, web comparison, autofill, and Rambler, while Googlebook introduced a premium Android / ChromeOS laptop category, DeepMind reimagined the mouse pointer as a context-aware AI partner, ZDNET previewed the hardware angle, The Verge covered the phone-control push, and Pause Point added a speed bump for doomscrolling.
Sam Altman testified that Elon Musk's 2017 push for control of OpenAI's for-profit structure made him "extremely uncomfortable," with Bloomberg Law, TechCrunch, Yahoo Finance, and live posts from Mike Isaac, FirstAdopter, and MTSLive surfacing the control, children, Tesla, honesty, and nonprofit-governance angles.
Jensen Huang was left off President Trump's China business delegation while Tim Cook, Elon Musk, and Boeing's CEO were included, a signal Reuters and Tom's Hardware read as another clue that Beijing is unlikely to get advanced AI chips soon; however, at the 11th hour, apparently Air Force One stopped over Alaska to pick him up!
Mistral's PyPI package and TanStack's npm packages were hit in a "mini Shai-Hulud" supply-chain campaign, with Lyrie Research, Socket, TanStack, and Endor Labs detailing credential theft, GitHub Actions abuse, poisoned package artifacts, and destructive payload behavior.

Honorable Mentions

xAI released Grok Voice Think Fast 1.0 via API, a full-duplex voice agent for noisy, interrupt-heavy support and sales calls that topped tau-Voice Bench across retail, airline, and telecom tasks while already powering Starlink phone sales and support.
Google and SpaceX are reportedly discussing orbital data centers for AI compute, which sounds like science fiction until the cloud bill arrives.
Amp raised $1.3B to build an alternative AI "Grid" as the compute stack gets more vertically integrated.
Meta offered rival AI chatbots one month of free WhatsApp access to resolve EU antitrust concerns, while Threads tested a Grok-like Meta AI feature and Meta Connect 2026 was set for September 23-24.
OpenAI is reportedly generating billions in revenue commitments by promising future supplier purchases, while GOP lawmakers scrutinized Altman's business dealings ahead of OpenAI's IPO.

🍪 TOP TREATS TO TRY

Claude Code Agent View gives you one place to manage parallel Claude Code sessions, while Claude's developer posts and ClaudeDevs update added agent controls like /goal, /loop, and /schedule for longer-running coding work - available on paid Claude plans.
OpenAI Daybreak helps security teams use GPT-5.5 and Codex Security to identify threats, generate patches, and verify remediation across code and systems - paid only rn.
BrowserCode is an open-source browser-native agent framework that Alexander Yue said reached the highest browser-agent score so far, with the release post positioning it as a new open baseline - free to try.
Perceptron Mk1 gives you frontier video understanding and embodied reasoning at far lower reported cost, with OpenRouter, OpenRouter pricing, VentureBeat, and Perceptron's demo covering access - pricing varies by provider.
Statewright adds deterministic state-machine guardrails for agents, including tool allow-lists, edit limits, command controls, bash safety blocks, and human approval checkpoints, with the Show HN thread explaining the reliability pitch - free tier, Pro $29/mo.
Voker monitors production AI agents by classifying user intents, corrections, and resolutions, then surfacing where agents fail before customers complain, with the Launch HN thread giving the YC S24 context - starts free, enterprise self-hosting available.
Krea 2 gives creators an in-house image foundation model for expressive, aesthetically varied images with precise style and moodboard control, with Krea's launch post and mem0 showing creative follow-ups - no pricing details.
Meta Alchemist shared a practical /goal prompt trick for Claude Code and Codex: ask the agent to read the full session, repo, history, and docs, analyze the real intent, then write the /goal prompt itself so future coding passes stay focused instead of drifting - free tip.

🏢 Big Tech & Major Companies

Anthropic warned users about unauthorized stock sales and investment scams involving Anthropic shares.
xAI released Grok Voice Think Fast 1.0, its most capable voice agent API, built for real-world telephony noise, accents, interruptions, precise spoken data entry, corrections, background reasoning with zero added latency, and 28-tool orchestration; xAI says it tops tau-Voice Bench across retail, airline, and telecom categories and already powers Starlink phone sales and support with 20% conversion, 70% autonomous resolution, and 25+ languages.
Hugging Face crossed 1 million open datasets, a milestone for the public data layer behind open models, evals, fine-tuning, and research replication.
Claude for Legal shipped 20+ MCP connectors (Model Context Protocol, a way for AI assistants to plug into outside software) plus 12 legal-work plugins for law firms and legal teams, while Anthropic's GitHub repo open-sourced the plugin suite, reference agents, skills, and legal workflow connectors for contracts, privacy, employment, litigation, corporate, IP, regulatory monitoring, AI governance, clinics, and law students; Scaling01's post noted the release drew 1,420 likes and 130 reposts.
Meta Muse Spark was positioned as Meta Superintelligence Labs' strongest model yet for Meta AI across WhatsApp, Instagram, Facebook, Messenger, and AI glasses, with Meta's newsroom post pushing the rollout.
Sapiens2 quietly landed as Meta's high-resolution human-image model family for pose, segmentation, depth, and normal estimation, with Merve Noyan flagging the release.
Perplexity Research explained how it hosts Qwen on NVIDIA Blackwell hardware, with Zihao Gavin Yang sharing the systems angle.
Qualcomm fell 11% as chip stocks pulled back from a record AI-driven rally, while CNBC reported new semiconductor futures contracts will let traders hedge GPU rental and AI compute costs.
Google Cloud plans to hire hundreds of forward-deployed engineers to help customers adopt its AI products, with FirstSquawk surfacing the hiring push and Aaron Levie arguing forward-deployed engineers will become critical for AI rollouts because successful deployment requires business-process knowledge, evals, data plumbing, and change management, not only shipping code.
Google's Android security roadmap added 2026 protections including AI-assisted safeguards, with Android Authority listing the 12 new features and CyberScoop covering Android Intrusion Logging with Amnesty International.
Lee Robinson amplified Google's Gemini Intelligence push for Android, pointing back to the same Google platform update.

💼 AI Productivity, Labor & Economics

NBER showed AI agents running Deep Research on a Loop can automate much of the work of constructing high-quality economic datasets from primary public sources at roughly LLM-subscription cost, shifting the cost curve for empirical economics research.
Deirdre Bosa reported that Long Lake CEO Alex Taubman, who has bought 30+ legacy non-tech companies and just privatized Amex GBT for $6.3B, is betting AI will transform real-economy businesses by turning hundreds of billions in lab capex into actual GDP growth.
Amazon employees reportedly used the internal MeshClaw AI tool for unnecessary tasks to inflate usage scores, with Tom's Hardware calling the broader pattern "tokenmaxxing."
A Hollywood screenwriter argued AI training gigs have become the new "waiting tables" for entertainment workers, after doing 20 contracts across five platforms in eight months.
The Economist argued America is seeing a productivity miracle, with AI and entrepreneurship as part of the story.
NBER published work on mixed Nash equilibria (game-theory predictions for strategic choices) and another paper tied to economics research, with NBER's post pointing readers to the new studies.
Tuhin Nair argued senior developers fail to communicate expertise because they talk in complexity-reduction terms while the business cares about uncertainty-reduction, with the Hacker News thread debating how much expertise can be compressed into words.
Eric Ries argued mission-driven founders should protect mission through concrete governance, not vibes, because only a minority remain CEO three years after IPO.
Deedy Das argued many AI app startups overclaim defensibility around models, workflows, and data when the moat is often weaker than the pitch deck suggests.
Jason Saltzman shared OpenAI data suggesting tech startups represent only about 5% of active U.S. users doing entrepreneurial work, a reminder that AI startup Twitter is not the whole economy.
Anthony Kroeger asked what AI coding setup people would actually use if the heavily subsidized $20-$200 subscriptions disappeared and API pricing became the real bill.
Aniket Panjwani shared the Stanford IRiSS panel transcript and reading map Empirical Work in the Age of AI, arguing foundation models and agents are already delivering 10x productivity for empirical social science through replication, scraping, custom fine-tuning, and causal-inference pipelines, while humans remain necessary for verification, creativity, and interpreting surprising details.

🤖 AI Agents & Infrastructure

Odyssey released PROWL, an RL-driven adversarial framework where agents explore game worlds like Minecraft to find world-model failures and feed targeted fixes back into training, with the technical blog, trending link, and Oliver Cameron's post showing the world-model learning loop.
alphaXiv demonstrated RL fine-tuning a 4B Qwen model into a native recursive language model that uses one shared policy for parent decomposer and child sub-agent roles, using GRPO and advantage inheritance to match Claude Sonnet 4.6 quality on multi-paper scientific evidence selection at 7 seconds instead of 60+ seconds; the launch thread linked askalphaXiv's post and Sheri Yuo's comment.
Manthan Gupta argued voice-agent memory has to invert the normal read/write path under 500-800ms latency budgets by pre-loading user profiles, extracting facts asynchronously during and after calls, and choosing between framework state, bolted-on memory services, knowledge graphs, or cognitive-reflection systems so memory never blocks the live response.
Automated Design of Agentic Systems introduced a meta-agent that writes, tests, and archives better agent designs in code, discovering prompts, tool-use strategies, and workflows that outperform hand-designed agents across coding, science, and math tasks with cross-domain and cross-model transfer.
The AI Scientist-v2 uses progressive tree search to autonomously generate hypotheses, run experiments, refine figures with vision-language model feedback, and author what the researchers describe as the first fully AI-generated workshop paper accepted at ICLR.
PI-SERINI pairs tuned BM25 lexical retrieval with frontier LLMs in a ReAct search / read-results / read-document loop, arguing that old-school keyword search can beat or match dense retrieval when agents can inspect lots of evidence; the paper, GitHub repo, Matt Justram's post, and Xueguang Ma's note all point to the same lesson: the retrieval interface matters as much as the retriever.
MoE Capital argued video world models are still in their GPT-2 era despite $10B+ in recent investment, because RL and video generation are converging into the missing simulation layer for robotics, games, and physical agents; Deedy Das called it the best read on world models, and Nvidia's Jim Fan argued robotics is entering its end game through world action models, egocentric video, and world-model simulation, with a 95% chance the robotics tech tree is solved by 2040 and a physical Turing test within 2-3 years.
Adaption Labs introduced AutoScientist from the premise that fewer than a thousand people know how to shape frontier models inside closed labs, leaving everyone else stuck prompt-engineering around average-use-case systems; the related Adaptive Data link was included without a public description in the export.
Trainloop AI partnered with Mercor to test reframing knowledge-work agents as coding agents rather than tool-mirroring assistants, with Jackson Stokes sharing the research.
Browser Use's BuxFatherBot creates and manages cloud browser boxes from Telegram, with Browser Use framing it as a way to spin up autonomous browsing agents.
Prime Intellect open-sourced renderers, a Python library for token-level chat templating that keeps long multi-turn RL rollouts stable and parseable; also check Prime Intellect's post pointing to the release.
Modal explained how to build truly serverless GPUs using cloud buffers, lazy image loading, and checkpointing, with Charles Frye framing inference as a new stack that is not Kubernetes and not SLURM.
Thinking Machines Lab introduced Interaction Models for real-time multimodal collaboration across audio, video, and text, with the HN thread debating defensibility.
Google DeepMind demoed agentic interaction ideas in the same broader week that its AI pointer work reframed the cursor as context-aware collaboration.
Physics Intern lets you enter a theoretical physics question and have specialized agents gather evidence, form hypotheses, and critique the steps, with Wenhao Chai sharing the MLS / agentic-science connection.
Composio TrustClaw is a self-hostable personal agent with vector memory, Composio tools, and Telegram access, with Sarah Fim showing the project.
Oboe turns a learning goal into a personalized course made for you, positioning itself as a 'learn anything' product rather than a generic content library.
Inworld highlighted on-device AI capabilities, while JeliPenguin surfaced the release.
holaOS and its open-source GitHub repo give you an open-source agent computer where agents share your browser, files, apps, memory, and state, run sub-agents in parallel, improve rules over time, and turn recurring research, content, or client-delivery work into persistent autonomous work streams instead of restarting every run, with JeliPenguin surfacing the release.
Mobbin MCP, shared by Paper Seasons, connects AI agents to 621,500+ shipped product screens so they can reference proven paywalls, checkout flows, bottom sheets, onboarding, and permission screens instead of guessing when generating UI.
Liquid AI's voice-assistant cookbook, shared by Paula Bartabajo, maps spoken Home Assistant commands straight to function calls with a fine-tuned LFM2.5-Audio-1.5B model, skipping a separate speech-to-text pipeline so the assistant can run privately on-device.
SkillsBench quickly became a fast-growing agent-skills benchmark with 103 workshop submissions and $20K in prizes, while Niels Mündler argued the results expose current agent limits: skills still need human authors, performance peaks around 2-3 skills, extra skills can hurt accuracy, and gains on high-value SWE / math tasks remain marginal.

💻 AI Coding & Developer Tools

OpenAI Codex added in-app browser testing so developers can control the device toolbar, test apps at different viewports, capture verification screenshots, hide animations, and speed up evals 1-2x while saving tokens.
Claude Code desktop now defaults to remote control being on, letting users hand Claude full session management with one click.
Anthropic is reportedly in advanced talks to acquire a developer-tools startup used by OpenAI and Google to strengthen Claude Code.
oMLX v0.3.9.dev2 added Gemma 4 MTP / DFlash support, ParoQuant, oQ proxy auto-builds, RAM overflow handling, and an admin restart button for Apple Silicon LLM inference, with Jun Kim sharing the update.
LlamaIndex open-sourced liteparse-server and core liteparse, a local HTTP backend that extracts text and exact bounding boxes from PDFs, Office files, images, and spreadsheets using PDF.js, LibreOffice, ImageMagick, Tesseract.js, or OCR plugins, with liteparse-server adding batch jobs, page screenshots for vision models, Redis caching, Docker, observability, and zero cloud dependency.
OpenGravity is a zero-install, vanilla-JS, BYOK clone of Google Antigravity with a live xterm.js terminal, local file sync, and a sidebar agent for file edits, with the Show HN thread explaining the lightweight build.
OpenClaw OS is the default workspace for an OSS Claude Coworker-style environment, with the Show HN thread framing it as one screen that feels like SaaS tools.
E2a is an authenticated email gateway for AI agents with SPF / DKIM verification, HMAC-signed delivery, webhooks, WebSocket fan-out, CLI, and SDKs, with the Show HN post explaining the agent-email use case.
Grunden offers GLM 5.1 inference through an OpenAI-compatible API hosted on NVIDIA H200 hardware in Sweden for EU-sensitive data, with the Show HN thread covering the sovereignty angle.
safe-install runs npm installs with lifecycle scripts disabled and rebuilds only trusted dependencies, with Show HN tying it to supply-chain safety.
adamsreview runs multi-lens PR reviews for Claude Code with deep review, auto-fix loops, walkthroughs, and external-finding injection, with the Show HN post claiming it catches more real bugs than built-in reviewers.
tuicr is a terminal UI for human-in-the-loop code review of AI-generated changes.
Gigacatalyst lets B2B SaaS teams embed a white-label natural-language builder so sales, CS, and customers can create governed one-off workflows inside the product.
Hopper connects AI agents to mainframes through Model Context Protocol so they can drive TN3270 sessions, inspect z/OS datasets, write JCL, debug JES spool failures, and pause for approval, with the Show HN post offering trial access.
Needle is a 26M-parameter tool-calling model for small devices, with Show HN arguing tool calling is retrieval-like enough that massive models are overkill.
Dan McAteer demoed Codex in /goal mode completing a mechanical-interpretability task much faster than a human baseline, then shared another Codex workflow post as part of the same Codex workflow thread.
CJ Zafir demoed Codex creating a Colab notebook, uploading a 145M JSONL dataset from Drive, running analysis, and turning the results into a Streamlit app for computer use.
David Kundel demoed Codex making documentation navigation temporarily drag-and-drop editable for fast UI iteration.
OpenAI's Parameter Golf write-up said 1,000+ participants and 2,000+ submissions explored AI-assisted ML research, coding agents, quantization, and model design under hard constraints, with Alex Zhao sharing behind-the-scenes iteration notes.

🔬 AI Research & Models

SenseNova-U1 introduced a native unified multimodal model family using NEO-unify architecture to merge understanding, reasoning, and generation inside one network instead of separate encoders or VAEs; the GitHub repo, Hugging Face collection, SenseNova Studio, and paranioar's post cover the 8B and A3B-MoE variants, VQA, interleaved image-text, any-to-image generation, infographics, and early VLA / world-model tasks.
Ravid Shwartz Ziv and Jesse Lai explained the principles of diffusion models, covering how generation starts from noise and denoises step by step, why variational, score-based, and flow-based methods share a change-of-variables view, why diffusion dominates images, video, text, and proteins, and how the same ideas extend to consistency models and real-time world models; related posts came from Ziv Ravid and Jesse Lai.
Log analysis is necessary for credible evaluation of AI agents, from Peter Kirgis and co-authors, argued outcome-only benchmarks can inflate or deflate scores by hiding shortcuts, scaffold limits, and dangerous actions; their tau-Bench Airline case study found pass^5 performance was under-elicited by nearly 50%.
HLA, introduced by Yifan Zhang, gives autoregressive models a causal streaming higher-order linear-attention mechanism that keeps constant-size state and linear-time per-token compute while generalizing state-space duality to higher-order interactions.
Stefano Ermon highlighted Peter Pao-Huang's Flux Matching as a generative-modeling paradigm that generalizes diffusion by learning broader vector fields stationary with respect to the data distribution, enabling structural priors, faster sampling and mixing, and more interpretable generation.
kalomaze argued low-dimensional autoencoder latents plus straight-through estimation remain underused for native byte-level latent spaces on modest GPU budgets, then expanded the point into why continuous spaces and neural-network geometry can make some solutions feel destined rather than randomly searched.
Tanishq Mathew Abraham highlighted Kaiming He et al.'s ELF paper, which keeps diffusion-language modeling mostly in continuous embedding space until the final token-mapping step.
Sander Dieleman framed ELF as a successor to earlier continuous text-diffusion lines including Self-conditioned Embedding Diffusion, Continuous Diffusion for Categorical Data, and LangFlow.
Cai Zhou highlighted Coevolutionary Continuous Discrete Diffusion, which tries to make diffusion language models act more like latent reasoners.
Spectral Dynamics in Deep Networks, shared by fly51fly, proposed a two-level dynamical mean-field theory for feature learning, outlier escape, and learning-rate transfer.
SETOL introduced a semi-empirical theory of deep learning that tries to bridge statistical mechanics, scaling laws, and training dynamics.
Sharpness-Aware Pretraining, shared by Ishaan Watts and Aditi Raghunathan, argued the lowest-loss pretraining checkpoint is not always the best starting point for fine-tuning because flatter minima can preserve knowledge better.
Chengsong Huang introduced G-Zero, a verifier-free self-play framework for open-ended LLM generation from zero data that uses an intrinsic Hint-delta reward, where a Proposer trained with GRPO generates hard queries and hints around the model's blind spots, then a Generator optimized with DPO internalizes the style and structure improvements, yielding gains on AIME and IFEval without external judges, majority vote, or verifiers (Hugging Face, GitHub).
Jina Embeddings v5 Omni released multimodal embeddings across text, image, audio, and video, with Jina AI's posts and model collection giving access.
Sophie Wang argued in the Tokens project that generated-token hidden states contain semantic information spread across the sequence, with mean-pooled representations better capturing input meaning than any single token; the result held across language, vision, and protein domains and exposed interpretable generation dynamics (paper, GitHub).
Zheng Toon released AutoTTS, an environment-driven framework that lets a coding agent discover test-time scaling controllers from offline reasoning trajectories, producing a Confidence Momentum Controller that matched math accuracy while cutting tokens by 69.5% and costing $39.90 across 160 minutes (arXiv).
Antoine Chaffin introduced Agent-ModernColBERT, a 150M-parameter late-interaction retriever fine-tuned in five minutes on 5,238 AgentIR rollout trajectories with concatenated reasoning traces and queries, reaching 72.53% on BrowseComp-Plus, beating much larger models, and supporting both retrieval and reranking through PyLate.
Pixal3D, from Dong-Yang Li, Wang Zhao, Yuxin Chen, Wenbo Hu, Meng-Hao Guo, Fang-Lue Zhang, Ying Shan, and Shi-Min Hu, generates 3D assets directly in the input view's coordinate system instead of canonical space, using pixel back-projection to lift 2D features into 3D volumes, a VAE to compress sparse signed-distance fields, and a coarse-to-detail process for reconstruction-level fidelity (Hugging Face, GitHub, paper, Wang Zhao, L. D. Yang).
A Single Neuron Is Sufficient to Bypass Safety Alignment, amplified by Hamid Kazemi and Mersad Abbasi, argued that manipulating a tiny internal feature can bypass refusal behavior in aligned LLMs, connecting to earlier refusal-direction work and raising the obvious safety question: what if jailbreaks become model-surgery problems instead of prompt tricks (PDF, Hugging Face).
Lightning OPD proposed efficient post-training for large reasoning models using offline on-policy distillation, a cheaper way to reuse model-generated reasoning trajectories without repeatedly running expensive live rollouts, with GitHub available for implementation details.
Generative Modeling with Flux Matching, from Peter Pao-Huang, shipped with code and a Demis Hassabis share.
Adina Yakup surfaced Ovis2.6-80B-A3B, an Apache 2.0 multimodal model with 80B parameters but only 3B active per request that reasons over images by invoking visual tools like cropping and rotation, supports 64K context and 2880x2880 resolution for long-document QA and dense charts, and accepts single images, multi-image inputs, video, or text.
AntAngelMed is a specialized medical multimodal model readers can try directly in the ModelScope studio for clinical and research workflows, with the model page included for details.
Step Image Edit 2 offers unified text-to-image and image editing with 1-2 second responses.
Social Theory Should Be a Structural Prior for Agentic AI, also present as a PDF, argues multi-agent AI systems should encode social-science structure instead of treating interaction as pure prompt choreography.
Hallucinations Undermine Trust; Metacognition is a Way Forward, shared by _galyo, argues frontier models keep hallucinating because progress has mostly expanded what they know rather than improving whether they can tell what they do not know; the paper reframes hallucinations as confident errors and makes faithful uncertainty, matching the model's language to its real uncertainty, the control layer for trustworthy agents.
Gerard Sans connected that metacognition thread to trust, while his broader alignment critique argued companies can stage intelligence, personality, and selfhood through RLHF loops, system prompts, first-person framing, evaluator feedback, and deployment design choices, then market those behaviors as emergent.
MLS-Bench, with paper, GitHub, and Karan Vaidya's post, evaluates whether AI agents can make atomic, generalizable ML science contributions.
Positive Alignment, shared by Ruben Laukkonen and discussed by Andrew Curran, argues alignment should optimize for human flourishing, not only risk suppression; Jan Kulveit pushed back that the paper misrepresents the alignment field by treating it as mostly negative safety while omitting foundational positive-vision work such as Eliezer Yudkowsky's Coherent Extrapolated Volition from 2004.
Complex-Valued Phase-Coherent Transformer, introduced by Leo Hio, explores a transformer architecture built around complex-valued phase coherence.
Frank Zhou introduced RigidFormer, a mesh-free object-centric Transformer that learns multi-object rigid-body contact dynamics directly from point clouds, uses anchors for contact geometry, enforces rigidity through Kabsch alignment, handles noisy partial inputs, and scales to 200+ objects at 23.9 FPS, reportedly 8-101x faster than GNN baselines (paper PDF).
Nilaksh shared Reconstruction or Semantics? Robotic World Models, which found semantic latent spaces from pretrained vision encoders outperform reconstruction-focused spaces on robot policy metrics like action recovery, task-success classification, planning, downstream policy success, and robustness to distractors, while still preserving enough visual fidelity for useful rollouts.
RAG over Thinking Traces, shared by PTenigma, tested whether models can retrieve from past reasoning traces rather than only source documents, using prior thoughts as reusable scaffolding for harder reasoning tasks.
How Much Data Is Enough? and the biomedical Zeta Law of Discoverability connect AI data scaling to low-dimensional structure, with Paul Thompson's source-note arguing that percolation in latent embedding space explains sudden sample-complexity elbows in vision-language models and why better encoders can make biomedical data become discoverable faster (percolation explainer).
Chen Hui introduced FabScore, a four-stage coding-agent pipeline that checks numerical claims in AI-generated papers by extracting results, doing static analysis, executing code, and generating verdicts, finding a 21.2% claim-level fabrication rate across 144 papers and 98.6% precision in human validation (GitHub).
Ziyang Wang introduced EgoMemReason, a benchmark for multi-level memory reasoning over week-long egocentric videos that tests entity, event, and behavior memory across 500 human-verified questions requiring an average 5.1 evidence segments and 25.9 hours of temporal backtracking; the best tested model, Gemini-3-Flash, reached only 39.6% accuracy (paper, Hugging Face, follow-up).
SlimQwen, shared by Seb Krier and Shengkun, studies pruning and distillation in large Mixture-of-Experts models (MoE, models that activate only some expert subnetworks per request).
Compute Optimal Tokenization, with ArXivIQ's write-up and Che Shr Cat's post, studies how tokenization should change under compute constraints.
Compared to What?, discussed by Yoav Goldberg and Zihao Gavin Yang, critiques counterfactual prompting experiments that change inputs without using the right baselines.
DAIR.AI shared NanoResearch, which argues personalization is a precondition for usable research agents and introduces tri-level co-evolution: a skill bank for reusable procedures, memory for user and project history, and label-free policy learning from feedback, beating SOTA systems while improving quality and cost over repeated cycles; related X links from Epoch AI and Peter Wildeford were preserved, but the pasted export did not include readable post text for either.
Robust Speech Recognition via Large-Scale Weak Supervision, revisited by Andrey Burkov and ChapterPal, remains the key Whisper paper showing how weak supervision at scale changed speech recognition.
Arena.ai updated Text Arena rankings, with Arena showing Claude Opus 4.7 at the top and Artificial Analysis launching a tau-Voice benchmark for realistic speech-to-speech customer-service tasks.
GPT-5.5 and ProgramBench showed GPT-5.5 solving the first high / xhigh ProgramBench task in C and Python, while Noam Brown and his follow-up argued saturated benchmarks like GPQA need replacement by harder evals plotted against cost and latency.
WeirdML v2 put GPT-5.5 xhigh at 84.9%, while Haider and Epoch AI highlighted GPT-5.5 helping find fatal errors in about one-third of FrontierMath tiers 1-4 problems.
Adventures in Demand Analysis Using AI, shared by Rozhina Ghanavi, examines where AI helps and struggles in empirical demand-analysis work.
Parameter Golf craft notes, shared by Mrinaal Arora, show what people learned trying to squeeze model performance under tiny parameter budgets.

🏛️ AI Policy, Governance & Safety

RSL Media's Human Consent Standard, backed by George Clooney, Tom Hanks, and Meryl Streep, gives creators a way to set terms for how AI systems can use their work or likeness.
The Economist, amplified by Rob Wiblin, warned AI tools could help novices with bioterrorism but noted current studies still find important bottlenecks outside the model.
Jeff Geerling argued Bambu Lab is abusing the open-source social contract by threatening legal action against OrcaSlicer-bambulab fork work that supports offline / Developer-mode control, with the HN thread debating whether "just works" hardware is worth the lock-in.
XBOW disclosed CVE-2026-45185, a critical unauthenticated Exim remote-code-execution bug, then used the disclosure window to compare human and autonomous exploit development, with HN reacting to the security-labor implications.
Poolside broke down reward hacks in benchmark training and the strategies it is testing to reduce them, with HN questioning online benchmark design.
Yi-Ling Liu argued the U.S. and China share a feeling of being harvested by the future despite their different AI strengths, with her X thread expanding the thesis and Ethan Mollick surfacing the piece.
Rebecca Torrence and Jan Golebiowski appeared as standalone X links in the pasted packet without readable post text, so their URLs are preserved in the audit appendix rather than being over-interpreted in the editorial summary.

🛠️ AI Tools & Products

AI IQ scores frontier AI models such as GPT-5.5 and Claude Opus 4.7 on a human IQ-style bell curve, plus EQ and cost comparisons.
Papel is a TikTok-style social network for scientific papers with summaries, quizzes, and researcher community features, though the Show HN thread pushed back on posting before users can try it.
Halupedia is an encyclopedia of a fictional universe that does not exist until you visit it, shared by Nav Toor.
Ponder, shared by Tim Wang, turns hours of raw footage into a polished rough cut from prompts like 'make a 60-second highlight reel with upbeat music,' handling A-roll / B-roll layering, dead-time removal, emotion-aware pacing, continuity, timeline refinements, and export to Premiere, Final Cut, or DaVinci.
StepFun's Step Image Edit 2 delivers lightweight unified text-to-image and image editing with 1-2 second responses, while Krea 2 creates expressive images with aesthetic diversity, style and moodboard control, and fast outputs from simple or ambiguous prompts, with Krea's launch post in the same creative-tools batch.
Maxime Heckel built a browser-rendered sky and planet shader walkthrough using raymarching, Rayleigh and Mie scattering, ozone absorption, and depth-buffer compositing.
Obsidian launched a new Community site and developer dashboard with automated reviews, scorecards, paid-plugin labels, team controls, and faster plugin / theme publishing, with the HN thread clarifying roadmap details.
Bambu Lab became a debate over 'just works' hardware vs. owner control, DeepMind's AI pointer raised the social cost of voice-heavy interfaces, Beyond Semantic Similarity turned into a search-control argument about semantic retrieval vs. keyword feedback loops, and Snowflake Postgres became a lock-in debate over whether operational databases should follow analytics platforms.

📊 Fundraising & Deals Roundup

Isomorphic Labs - $2.1B Series B for AI-driven drug discovery.
Amp - $1.3B for an alternative AI compute grid.
Wispr - in funding talks at a possible $2B valuation for Wispr Flow voice dictation.
Exaforce - $125M Series B at a $725M valuation to catch and stop cyberattacks in real time; Exaforce's press release says total funding is now $200M from HarbourVest, Peak XV, Mayfield, Khosla Ventures, and Seligman, and the company post framed its Exabots and real-time knowledge graph as machine-speed investigation and response for AI-powered attacks.
The Huang Foundation - signed a CoreWeave GPU deal to donate AI compute time, with Anissa Gardizy noting the foundation has donated $108M in GPU compute time grants.
Anthropic developer-tools talks - acquisition talks to strengthen Claude Code.

🎙️ Interviews, Panels & Podcasts

How to build a company that withstands any era featured Eric Ries on why mission-driven companies need concrete governance before markets, boards, and incentives reshape them.
Empirical Work in the Age of AI collected the Stanford IRiSS panel transcript and reading map from Susan Athey, Matt Gentzkow, Yiqing Xu, Andrew Hall, and others on how agents automate replication, scraping, custom fine-tuning, and causal-inference workflows while human judgment remains the verification layer.
Percolation: a Mathematical Phase Transition served as the math primer behind the data-scaling cluster, where percolation in latent embedding space was used to explain sudden sample-complexity jumps in vision-language models and the Zeta Law paper's claim that better encoders make biomedical data become discoverable faster.

💡 Industry Commentary & Analysis

The Build argued Snowflake Postgres, Databricks Lakebase, and Azure HorizonDB are "Postgres" mostly at the wire-protocol layer, so the real decision is which adjacent analytics platform you want to be locked into.
Beyond Semantic Similarity argued agentic search should interact directly with the corpus instead of relying only on fixed top-k semantic retrieval, using terminal tools like grep and shell scripts to keep evidence recoverable.
Kevin Grajeda argued AI-assisted design needs a concept of fidelity so teams do not accidentally turn low-fidelity explorations into production code debt, then shared Dessn Create as a design-to-code prototype environment in a later post.
Ksenia Se argued reusable agent skills, smaller than full agents but richer than prompts, are becoming the coordination layer for durable AI workflows.
Gabe argued LLM psychosis scales with distance from the code: the less direct contact people have with implementation, the more self-reinforcing the AI fantasy becomes.
George Hotz / tinygrad argued AI compute winners will still be decided by timeless metrics like FLOPS per dollar and FLOPS per watt, not narrative momentum; tinygrad also argued for decentralizing AI power by putting stacks of GPUs in every house instead of giant company-controlled data centers.
Noam Brown made the case for new hard evals, while ProgramBench commentary showed how fast frontier benchmarks saturate.
Bahareh Tolooshams presented Sparse Autoencoder Neural Operators, a mechanistic-interpretability framework that treats data as continuous functions, learns mappings between function spaces, and adds concept sparsity plus domain sparsity so researchers can see which concepts activate and where they appear over time.
Ada Fang and Zijian Chen were standalone X links in the research packet without readable post text in the pasted export, so the digest keeps their URLs in the audit trail without inventing extra claims.
Lynnette Ng / Quarbby argued agentic AI systems need social theory as a structural prior because behavior emerges from multi-agent interaction, not isolated optimization; her MASS framing treats agent networks as dynamical systems of information exchange, influence, and network structure, with priors for strategic heterogeneity, network-constrained dependence, co-evolution, and distributional instability.
Ihor Beaver demoed Loop Model 1 for robotics, claiming 20x more throughput per unit of data than Physical Intelligence's Pi0.6 + RLT on the zip-tie insertion task and framing it as the missing piece for MicroFactory deployments simple enough for users to run themselves.
Le Cong shared an AI engine that discovers, predicts, optimizes, and de novo generates IRES elements, RNA control modules that regulate translation initiation, moving from decoding 5' UTR regulatory grammar toward writing programmable RNA systems that control when, where, and how strongly proteins are produced inside cells.
Jie Tang argued the next breakthrough is likely in long-horizon agent tasks, like models hunting software bugs around the clock, because memory, continual learning, and self-judging are being cracked through engineering tricks; his endgame is self-evolving systems that write code, clean data, generate synthetic data, and turn one-person companies into none-person companies.
MFarajtabar shared Unmasking On-Policy Distillation, a paper on where distilling from a model's own on-policy reasoning trajectories helps, where it hurts, and why.

That's a Wrap

That's 300+ source links and source-reference links from one very crowded day. If you made it to the bottom, congratulations: you are now qualified to explain the AI stack from orbital data centers to npm malware to why the mouse pointer has lore now. Please use this power only in meetings where someone says "quick question."

For the daily version, make sure you're subscribed to The Neuron. We send six issues a week, and yes, we read all of this so you don't have to.

See you tomorrow.

P.S: Know someone who'd find this useful? Forward this to them and tell them to subscribe here.

Everything That Happened in AI Today Tuesday, May 12, 2026