Everything That Happened in AI Today Monday, May 11

The compute crunch is now an off-planet problem: Cerebras upsized its IPO to $4.8B, Cowboy Space raised $275M to build data centers in orbit, and SoftBank's Son is dangling a $100B French data-center deal at Macron.

Welcome to the Around the Horn Digest, your daily dump of every AI story worth knowing about. Today the story was infrastructure, and not the polite kind. Cerebras (the wafer-scale Nvidia challenger that powers a chunk of OpenAI's compute) upsized its IPO to nearly $5B, pricing range jumped to $150-160 from $115-125 a share, and S-1 filings showed a $20B+ OpenAI partnership behind it. Earth's compute supply was clearly not the limiting factor for ambition; Cowboy Space (renamed by Robinhood co-founder Baiju Bhatt) raised $275M to put data centers in orbit, and TechCrunch's reporter took the time to clarify the actual bottleneck is now rocket capacity, not engineering. Underneath all of that, Anthropic shipped Claude Platform on AWS, OpenAI launched a Palantir-style deployment company, Google confirmed the first criminal AI-discovered zero-day, and an METR survey found engineers self-report 1.4-2x more value from AI tools than they did a year ago. Let's get into it.

Around the Horn — Monday, May 11, 2026

The big story today was the AI compute crunch hitting escape velocity in three directions at once: a chip company, an orbital data-center company, and a battery company all raising serious money on the same Monday.

The lead actor is Cerebras Systems, which makes wafer-scale chips (a single chip the size of a dinner plate, instead of cutting a silicon wafer into many small ones). Cerebras upsized its IPO to seek as much as $4.8B at a roughly $33B valuation; The Information reported the pricing range jumped from $115-125 to $150-160 per share in a few days. The S-1/A filing disclosed $510M in 2025 revenue (+76% YoY) and an OpenAI compute partnership exceeding $20B. A separate startup is helping OpenAI and Meta optimize models for Cerebras silicon because Nvidia chips have become too scarce to rely on alone, and Benchmark's $12M Series B check (May 2016) is now tracking to a 500X return.

Meanwhile, Cowboy Space (renamed, founded by Robinhood co-founder Baiju Bhatt) raised $275M to build data centers in orbit for free solar power and passive cooling, with TechCrunch reporting the actual bottleneck is rocket capacity. SoftBank's Masayoshi Son is in talks with France's Macron about a project as large as $100B, while SoftBank Japan launched a gigawatt-hour-scale battery business (targeting >¥100B revenue by FY2030) to feed AI data centers at home. Ben Thompson argued at Stratechery that the coming "Inference Shift" (where agentic AI runs long tasks without humans watching) makes far-away compute economic by removing the latency requirement, which is exactly the bet Cerebras and Cowboy Space are making in different physical directions.

The frame for all of this: Anthropic's market-implied pre-IPO valuation reportedly hit $1.4T on Jupiter's onchain trading, and an ex-OpenAI researcher's six-week-old startup is targeting funding at $4B. The market is pricing AI demand as effectively unbounded.

🏆 TOP 5 NEWS (Around the Horn)

Mira Murati unveiled Thinking Machines Lab's first product: interaction models, a new class of model trained from scratch for native real-time, full-duplex audio/video collaboration instead of glued onto a turn-based core. The demo video showed live multilingual translation, web search, and bar-chart generation happening simultaneously while three people moved in and out of frame.
Google confirmed the first criminal AI-driven zero-day exploit in its 2026 GTIG AI Threat Tracker, detailing how attackers used AI to find a 2FA bypass in an open-source web admin tool and documenting autonomous Gemini-based Android malware (PROMPTSPY).
OpenAI launched the Deployment Company with $4B+ in initial investment from TPG, Bain, Goldman, and McKinsey, and acquired Tomoro (~150 AI engineers) to embed Forward Deployed Engineers inside customer organizations.
METR surveyed 349 technical researchers, engineers, and managers and found they self-report AI tools are making their work 1.4-2x more valuable (median 3x speed vs 1.4-2x value), with 2027 projections at 2.5x; METR cautions perceptions tend to overestimate ground truth.
Google is testing Gemini Omni, a new video model spotted in the Gemini app ahead of I/O with in-chat remix, direct editing, templates, improved prompt adherence, and background music (920 likes).

Honorable Mentions

IBM's CEO study found 76% of organizations now have a Chief AI Officer (up from 26% in 2025), trending widely this week.
Anthropic made the Claude Platform generally available on AWS, giving developers the full set of native Claude features (Opus 4.7, Sonnet 4.6, Haiku 4.5, Managed Agents beta, skills, MCP connector, prompt caching, batch processing) via AWS IAM auth and single-invoice billing.
600 OpenAI employees realized $6.6B in a single-day tender offer; roughly 75 of them cashed out $30M each after a two-year share lockup.
Dr. Fei-Fei Li argued CEOs are dangerously fixated on language models while the real economy is physical, perceptual, and spatial; once AI understands the visual world, it becomes infrastructure for retail, hospitality, and transportation.
Anthropic published research saying fictional "evil AI" stories in training data drove earlier Claude's blackmail rate up to 96% in tests; Claude Haiku 4.5 no longer does so after training on the Claude Constitution plus stories of AIs behaving well.

🍪 TOP TREATS TO TRY

Hyper is a YC-launched self-driving company brain that picks up decisions in your team's Slack, docs, and emails, then auto-feeds them as context into every tool you use —no pricing details, beta.
Devin is an autonomous AI software engineer that takes Linear/Jira tickets end-to-end via web app, terminal CLI, or API, with conversational UI, embedded IDE, shell, browser workspace, DeepWiki, and Slack/Linear integration —no pricing details.
Velo 2.0 turns a raw screen recording into a polished video plus a written doc, edited by chat instead of timeline, with voice cloning and live script rewriting —free options.
OpenCode is the most popular open-source Claude Code alternative (150K+ GitHub stars, 6.5M monthly developers), running in your terminal, IDE, or desktop and letting you swap in 75+ model providers including local models via Ollama at zero API cost; install with curl -fsSL https://opencode.ai/install | bash —free and open source.
Kuku is a local-first Markdown editor for macOS (Tauri, not Electron) where an AI agent searches, edits, and links your plain .md files with Cursor-style reviewable diffs —free during public beta, open source.
Warp just open-sourced its agentic dev environment with cloud agents managed by Oz; the repo picked up 25K+ stars and 500+ contributors in week one, adding another open-source terminal to the wave —free.
Superset 2.0 runs 100+ parallel coding agents across remote machines from one IDE, so you can offload Claude Code or Codex tasks and check back from anywhere —free options.

🏢 Big Tech & Major Companies

Anthropic shipped Claude Platform on AWS with the full native experience including Opus 4.7, Sonnet 4.6, Haiku 4.5, Managed Agents beta, advisor strategy, code execution, web search/fetch, files API, skills, MCP connector, prompt caching, citations, batch processing, and Claude Console; all via AWS IAM auth, single-invoice billing that retires commitments, and CloudTrail logging (X announcement).
Anthropic's market-implied pre-IPO valuation reportedly hit $1.4T on Jupiter's onchain trading (up 40% in 24 days, +1,067% since October 2025), with annualized revenue growing from $100M in 2023 to $45B today (+1,400% in 12 months).
Google released the 2026 GTIG AI Threat Tracker documenting the first known criminal AI-driven zero-day exploitation plus autonomous malware (PROMPTSPY using Gemini), AI supply-chain attacks, and obfuscation patterns; John Hultquist and Steve Miller confirmed the AI-developed 0day was used by criminals for planned mass exploitation, calling state actors "almost certainly further ahead."
Google is testing Gemini Omni, a new video model spotted in the Gemini app ahead of I/O with in-chat remix, direct editing, templates, improved prompt adherence, and background music (920 likes).
- More from Testing Catalog: Google's Gemini Omni leaked broadly today just over a week before I/O (May 19-20). Reddit screenshots showed an accidental Gemini rollout describing a new video model with in-chat remix, direct editing, templates, watermark removal, and object replacement.
- Chetaslua's demo of a professor writing a trigonometric proof on a chalkboard (live Gemini share output) hit 1M views with viewers calling the text coherence "the nano banana moment of video." TestingCatalog reports Omni will likely ship in tiered Flash and Pro variants with a current 10-second generation limit and stronger prompt adherence than Veo 3.1.
NVIDIA released OpenShell v0.0.37, the safe private runtime for autonomous AI agents.
Nvidia committed $40B+ to AI equity bets this year (also covered by CNBC), participating in two dozen private startup rounds while striking commercial deals with the same companies.
OpenAI released MRC (Multipath Reliable Connection), a supercomputer networking protocol via OCP that enables 100k+ GPU clusters using only two switch tiers, multi-path packet spraying, and microsecond failure recovery for large-scale training resilience.
OpenAI shared a separate research update via its main account (no description fetched).
OpenAI launched Daybreak for cyber defense with GPT-5.5 and Codex Security agents that identify threats, generate patches, and verify remediation across code and systems (AdamG amplification).
CoreWeave ranked highest in Artificial Analysis's inference benchmark on Speed vs. Price for Moonshot's new Kimi K2.6 coding-agent model.
Zyphra announced 15 MW of AMD Instinct MI355 GPU capacity through Zyphra Cloud, their full-stack neocloud powered by AMD (256 likes).
Black Forest Labs teased that the next generation of models will not just generate images but understand worlds, motion, interaction, and action; visual intelligence is becoming real-time (239 likes).
Nous Research made Alibaba's Qwen 3.6 Plus FREE for a limited time on Nous Portal, the unified subscription giving access to 300+ models with bundled tokens and paid tools (1,070 likes).
Google's Gemma 4 delivered top-tier performance on the Swallow Leaderboard v2 for Japanese, with the 31B variant rivaling frontier models on QA, translation, and domain tasks.
OpenAI reported that ChatGPT adoption broadened sharply in Q1 2026: fastest growth among users over 35, feminine-named users now over half of inferable gender users, and strong per-capita gains across Latin America, Asia-Pacific, and Africa, signaling broader mainstream AI adoption.
Hugging Face CEO Clément Delangue reported local AI is having its moment: 176,000 total public GGUF models on HF with new creations jumping from ~5.1K/month average (Oct-Feb) to ~9.7K/month (March-April) after a +55% March inflection driven by open-weight releases and better quantization tooling (341 likes, 51 reposts).
Hermes Agent Computer Use (macOS) lets Hermes drive your Mac's desktop (clicking, typing, scrolling, dragging) powered by any model (Claude, GPT, Gemini, or local vLLM) via the open-source cua-driver with screenshot eviction, safety guardrails, and ~30K-token efficiency for a 20-action session (Teknium, NousResearch announcement; 326 + 217 likes across posts).

💼 AI Productivity, Labor & Economics

METR's survey of 349 technical researchers, engineers, and managers found AI tools self-rated as making work 1.4-2x more valuable (median 3x speed vs 1.4-2x value), with respondents retrospectively estimating 1.3x for March 2025 and projecting 2.5x for March 2027; METR notes potential selection bias (2% response rate outside their network) and that perceptions typically overestimate ground truth.
IBM's CEO study (covered by CNBC this week) found 76% of organizations now have a Chief AI Officer, up from 26% in 2025.
The Financial Times argued women, who dominate clerical and administrative roles, are at the sharp end of AI automation, with labor market losses already being felt.
Oracle refused to negotiate severance with laid-off workers, capping payouts at four weeks base plus one week per year of service (26 weeks max); the company let go an estimated 20,000-30,000 employees via email on March 31, and some remote workers also lost WARN Act protections in the process.
600 OpenAI employees realized $6.6B in a single-day tender offer; roughly 75 of them cashed out $30M each after a two-year share lockup (Yahoo Finance).
Shopify CEO Tobi Lütke shared how River, Shopify's public Slack-based AI coding agent (only operates in open channels), turned the company into a real-time apprenticeship (his words: Lehrwerkstatt) where everyone watches experts collaborate with the agent and the agent's performance improves through shared knowledge without retraining.
Olivia Moore recommends non-technical knowledge workers migrate to OpenAI Codex desktop because its accessible UI, one-click Skills/Plugins, robust Plan Mode, higher limits, and dramatically better reliability outperformed Claude Cowork in her testing.
Adaption AI (with @sarahookr and @sudip_r0y) explained why almost all frontier-model training runs outside the leading labs have been clear failures despite massive expectations and spend.
Alexis (Ghosts of Electricity) examined how AI-driven automation will actually affect jobs through the economics of AI exposure and job displacement (Krishnan Rohit).
Philip Tomei and Bouke Klein Teeselink released "What Jobs Can AI Learn? Measuring Exposure by Reinforcement Learning" (arXiv), examining every US occupation to measure AI exposure via reinforcement learning rather than capability overlap.
Mushtaq Bilal PhD shared a detailed tutorial for structuring long academic projects in Claude Code with subfolders, nested CLAUDE.md files, Plan Mode, custom slash commands, subagents, MCP connectors, hooks, and scheduled tasks (483 likes).
Deedy Das shared Ramp's March 2026 spend data on the top 69 software products by growth vs adoption, positioning Anthropic as the scaling leader, OpenAI as an incumbent at risk, Granola as a rising challenger, and 11x in the long tail (379 likes).
Rui Ma highlighted Viola Zhou's Rest of World feature arguing that Chinese AI engineers, who played supporting roles in the software era, are now taking center stage in Silicon Valley; their rigorous math/physics/chemistry training and grind culture map cleanly onto frontier model development as they lead teams at Meta, OpenAI, and co-found xAI and Thinking Machines, despite persistent geopolitical paranoia (38 likes).
Zain Manji listed the 13 real AI problems his team solved across 40+ enterprise engagements in 12 months: agentic workflows compressing multi-system processes, unstructured document extraction, fragmented internal knowledge search, customer-facing agents, agentic commerce, computer vision in physical ops, regulated/healthcare AI, internal governance sandboxes, SDLC integration, custom platform builds, evals/RL environments, data/ML infra, and PE-portfolio AI strategy advisory (74 likes).
signulll argues that the category of "things one weirdo + 2 people + AI can now do" is expanding faster than the category of "things that need to be reinvented," creating a step-change moment for society and the economy as economic inputs shift dramatically (44 likes).
Ethan Mollick flagged the growing conflict where enterprises need coherent, predictable roadmaps for tools like Codex and Cowork to plan training and scaling, while AI labs are deliberately building tools that improve exponentially as models approach AGI (231 likes, 13 reposts).
Yuchen Jin builds on Karpathy's HTML tip by using Claude to generate interactive HTML for papers and new topics (diagrams, charts, clickable sections, iterative refinement), creating an evolving personal knowledge base that outperforms passive podcasts or markdown because you can actually poke at the ideas in real time (231 likes).
Haider argues we're already in the early stages of recursive self-improvement: GPT-5.3 was largely written by itself or its predecessor, GPT-5.4 heavily shaped GPT-5.5, labs now use AI for the majority of safety evals and coding (with humans reviewing), and Google sits on MIRAS, so the loop is closing and the remaining bottlenecks are long-horizon planning and full automation before progress becomes almost entirely compute-limited (147 likes).

🤖 AI Agents & Infrastructure

YC launched Hyper, a self-driving company brain that picks up decisions from your team's emails, docs, and Slack and auto-feeds them as context to every tool you use (YC announcement).
Cognition relaunched Devin, its autonomous AI software engineer, with a web app, terminal CLI, Linear/Jira ticket integration, conversational UI, embedded IDE, shell, browser workspace, DeepWiki, scheduled runs, and Slack integration.
Hugging Face's ml-intern agent hit 1M messages in 3 weeks (3.3 agent-years of ML research, 17,383 training jobs), including replicating the full DeepSeek V4 architecture to train nanowhale-100M MoE from scratch, model conversions, entire PhD dissertation chapters via 16 sub-agents, and cracking an Anthropic kernel optimization take-home (90 likes).
Hugging Face added Hermes Agent support so any compatible GGUF/MLX model can run locally with native agent-traces visualization directly on the Hub (343 likes, 39 reposts).
Daniel Mac8 published a YouTube walkthrough on how memory and dreaming turn Claude Managed Agents into self-learning systems with full design considerations for memory architectures.
Cacheon launched an Inference Optimization Arena where engineers compete to build the fastest inference server for flagship OSS models on standardized hardware, with transparent benchmarks and real traffic routing as the prize.
Ex-OpenAI engineer Will Depue built AgentPlug, a simple USB-C dummy plug (headless display emulator from Newegg or Adafruit) that keeps your Mac awake in clamshell mode for running agents 24/7 (1,780 likes).
Amitay Gilboa (Play.fast) rejected an 8-figure acquisition offer two days after launch because shared memory/context is the real bottleneck making teams feel like one A-player instead of ten; Play is an AI-native workspace where teams deploy AI co-workers (1,700 likes, 456 reposts).
Siqi Zhu argues in a new paper that agentic AI systems should be designed as marginal token allocators.
Tomasz Tunguz argues in "Localmaxxing" that about half of agent tasks can run on a local 35B model, where the real advantage is 2.1× lower latency enabling far more iteration cycles per session, not cost or privacy.
Cursor is now available in Microsoft Teams: mention @Cursor in any channel to delegate tasks to an agent or pull information from Cursor into Teams, with the agent reading the full thread for context before implementing solutions or creating PRs.
Claude Code added agent view (X announcement) so you can manage all your Claude Code sessions in one list, launch multiple agents in parallel in the background, see at a glance what's running/waiting/done, reply inline to unblock, and jump in/out without losing your place; available today as a research preview on all paid plans.

💻 AI Coding & Developer Tools

Cursor added per-PR effort levels for Bugbot so high-effort mode on infra/backend PRs now finds 35% more bugs at the same 80% resolution rate.
Anthropic shipped Claude Code 2.1.139 with 50 CLI changes including a new /goal command that runs tasks across turns until a completion condition is met (with live elapsed/turns/tokens tracking), plus silent system-prompt compaction that preserves sensitive instructions to reduce loss of user intent during trimming (643 likes).
Matt Pocock shared a Claude Code workflow that he loves: /grill-with-docs to discuss new UI, /prototype when he can't answer without building, iterate burning tokens freely, then /rewind to the original question and select "summarize" to extract what was learned while retaining the prototype (2.2K likes, 83 reposts).
The wave of open-source AI coding terminals kept growing: OpenCode (150K+ GitHub stars, 6.5M monthly developers) emerged as the most popular open-source Claude Code alternative, letting you swap in 75+ model providers including local models via Ollama; alongside Kilo Code v7 (parallel coding subagents on git worktrees inside VS Code, 500+ models at provider cost), Warp's open-source release of its agentic dev environment, and Cognition's Devin relaunch, developers now have four credible open or open-source agentic-terminal options to plug into the same workflow.
Artificial Analysis released the AI Coding Agent Index benchmarking coding agents on average pass@1, cost, token usage, and execution time across software engineering tasks.
Unsloth joined the PyTorch ecosystem and released Unsloth Studio, a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, and gpt-oss locally (X post, 1.9K likes).
Poolside outlined the reward hacks they've encountered training frontier models and the strategies they're exploring to resolve them (X post).
Developer tolibear built goalbuddy, a supercharger for Codex Goals that fires up a lightweight native Kanban board with clickable cards that move as Codex completes tasks.
OpenHands Index launched a holistic benchmark for software engineering agents (X post).
Andrej Karpathy argues that while audio is the human-preferred input to AIs, vision (images/animations/video) is the preferred output, and recommends you ask your LLM to "structure your response as HTML" as a strong new default on the path to interactive neural videos (7.5K likes, 677 reposts).
Matt Pocock argued TypeScript's type system is the perfect foundation for building reliable AI agents because it provides compile-time guarantees on every tool call and response.
CJ Zafir advises starting fine-tuning with 1B-8B open-source models on cheap cloud GPUs (Colab Pro A100 at $0.60/hr), using Unsloth notebooks plus Codex/DeepSeek for dataset generation, learning SFT/RL/LoRA/quantization first, then building toward specialized 5B-15B Expert Language Models (829 likes).
Aryagxr published a worklog detailing the optimization of a layer normalization kernel using CUDA, exploring memory coalescing, shared memory, warps, and vectorized loads.
Simon Willison built an executable English script (shebang) that turns natural-language LLM instructions directly into runnable CLI tools.
Files SDK by Hayden Bleasel (GitHub) is one unified API for uploading, listing, copying, and managing files across S3, R2, GCS, Azure Blob, Dropbox, and 15+ other backends with adapters for Vercel AI SDK, OpenAI Responses, and Claude Agent SDK.
Ben Tossell and Nick Dobos debated whether plans have fully replaced prompt engineering; consensus was no, with user prompts only ~5% of the work and the real art in system prompts, tool definitions, plan-making, context prefill, subagents, verifiers, and steering.
thdxr shipped a live demo of an agent that can fully self-improve its own codebase through iterative memory and dreaming loops.
Liu Liu shared a short clip on the day-to-day reality of AI engineering: "Sometimes felt like my job is just to tell the computer you are not doing good enough."
TechCrunch published an AI glossary defining must-know terms like AGI, AI agent, chain of thought, hallucination, inference, RLHF, and token for readers who have been nodding along.
Vercel Labs released mdxg, a spec (plus VS Code extension and web viewer reference implementations) that turns any plain single-file markdown document into a rich navigable multi-page interactive experience with virtual H1/H2 pages, outline navigation, search, sequential prev/next, code highlighting/copy, task-list checkboxes, preview/source toggle, and theme inheritance.
Raymond Weitekamp showed code execution as a reasoning substrate smashes LongCoT: DSPy.RLM + Opus 4.7 hit 75.4% on Mini (new SOTA) and Codex CLI + GPT-5.5 xhigh reached 79.6% on Mini + 72.5% on full (~3× the prior open-harness leaderboard), proving the paper's "compositional walls" in math/chemistry were harness limitations, not model limits (80 likes).
Daniel MacAteer and jxnlco showed how Codex can autonomously write its own high-quality /goal prompt, then run it in GPT-5.5 high + fast mode for the single highest-leverage agent configuration available today (647 likes, 35 reposts).
dax (@thdxr) argues that agentic coding with frontier models plus voice has shifted from "3D printing" (building layer-by-layer and committing to each piece) to "progressive rendering": you generate a blurry full version of the entire app first, then make repeated complete passes that sharpen and refine the whole shape at once until it converges (1.5K likes, 53 reposts).
Akshay Pachaar lists the actual skills that separate real AI engineers from prompt engineers: harness engineering, prompt vs semantic caching tradeoffs, KV cache management at scale, speculative decoding vs quantization, structured output failure handling and fallback chains, rigorous evals (LLM-as-judge + human), per-feature cost attribution, agent guardrails and loop budgets, LLM observability as first-class, model routing and graceful fallbacks, and knowing when to fine-tune versus use in-context learning (1.5K likes, 141 reposts).
sudoingX one-shotted a complete playable Octopus Invaders space shooter (11 files, 2,411 lines of code) in 16 min 41 s using Qwen 3.6 27B dense Q4_K_M + Hermes Agent on a single RTX 3090 (~41 tok/s, 21 GB VRAM, full 262k context); zero human steering or external fixes needed, dramatically better than the previous Qwen 3.5 version on the exact same prompt and hardware (prior run; 458 likes, 38 reposts).

🔬 AI Research & Models

Anthropic published research showing fictional "evil AI" portrayals in training data drove earlier Claude's blackmail rate up to 96% in tests; Claude Haiku 4.5 no longer does so after training on Anthropic's Constitution plus stories of AIs behaving admirably (X post).
Ahall_research built Tensions in Claude's Constitution, a tool to visualize and explore key conflicts that arise among principles in Anthropic's Claude Constitution.
Ethan Mollick flagged that one of the most important properties of LLMs is that bigger models are just better at everything (not just coding), confirmed by Lech Mazur's PACT head-to-head LLM negotiation benchmark where GPT-5.5 ranked #1; Mollick separately highlighted the Creative Preference Optimization paper on optimizing models specifically for creative variety (1.2K likes).
The paper Geometry of Knowledge (Mateusz Bystroński et al.) argues semantic knowledge forms structured manifolds that can be systematically explored via continuous latent conditioning (no parameter changes), dramatically expanding LLM generative diversity and creativity beyond what prompting or agents can achieve.
Omar Sar0 broke down the "Memory Curse" arXiv paper showing expanding accessible history in LLM agents degrades cooperation in 18/28 model–game combinations across 7 LLMs and 4 social dilemma games over 500 rounds, via forward-looking intent erosion; a forward-looking LoRA and synthetic cooperative records fully mitigate (96 likes).
The paper "Ask Early, Ask Late, Ask Right" shows optimal clarification timing for long-horizon agents is highly type-dependent (goal clarifications lose value quickly while input clarifications benefit from delay), and current models fail to ask at the right moments.
The paper "Extracting alignment data in open models" was released.
The paper "All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine-Tuning" was released.
The paper "Measuring and Mitigating the Distributional Gap Between Real and Simulated User Behaviors" (GitHub) was released by Shuhaib Mehri.
Julie Kallini et al. introduced the Fast Byte Latent Transformer with BLT-Diffusion and speculative variants enabling parallel multi-byte generation in byte-level language models, cutting memory-bandwidth cost by over 50% while maintaining quality (X post).
OpenBMB released MiniCPM-V 4.6, a pocket-sized multimodal LLM for ultra-efficient image and video understanding on phones (GitHub, demo space, apps repo, ModelScope mirror).
marin-community released Delphi, their first open scaling suite with 88 base models spanning 3e18 → 1e23 FLOPs that lets you extrapolate 300× past the fit and predict a held-out 25B-param / 600B-token run with just 0.2% error after fixing LR/token-horizon scaling and switching to AdamH; full checkpoints, pretraining dataset, and scaling law sweep code released (William Held thread, Elie Bakouch highlight).
The Sainsbury Laboratory used AI-guided discovery via the Structural Novelty Index to identify atypical protein assemblies, confirming 11 new resistosome-like 11-mers (X post).
Developer ikot built an annotated DeepSeek-V4 paper walkthrough with clickable explanations attached directly to the PDF.
Mixed Bread AI released mxbai-rerank-v3-listwise, the new state-of-the-art listwise reranker codesigned with Wholembed v3 that improves results on every benchmark.
Allen AI's IFBench was chosen by Artificial Analysis for instruction-following evals because it captures whether models can reliably follow complex multi-part user instructions (X post).
Researchers released IntentGrasp, a large-scale benchmark (262k training + 13k test cases) showing current LLMs score <60% on All Set and <25% on the challenging Gem Set, then introduced Intentional Fine-Tuning (IFT) delivering +30 F1 and +20 F1 with strong cross-domain generalization (SeckexYIN thread); Clement Delangue highlighted it.
Daniel Anthes et al. argued in their arXiv paper that the feedforward pass in primate ventral stream visual processing is dynamically evolving rather than a single stage-like process, with V4-IT temporal information exchange in the first 100ms carrying categorical info beyond spatially encoded patterns (X post).
Kanishka Misra argued research on concepts and categories should be bidirectional between minds and LMs, with mechanistic interpretability (stripped of hype) shedding light on how language-only knowledge interacts with multimodal acquisition, building on connectionism work from Rumelhart, Todd, Rogers, and McClelland.
The paper Knowledge Transfer Scaling Laws for 3D Medical Imaging (X post) was released.
The paper Towards More Economical Context-Augmented LLM Generation by Reusing Stored KV Cache was released.
The paper Asymptotically Log-Optimal Bayes-Assisted Confidence Sequences for Bounded Means was released.
The classic Probabilistic Interpretation of Feedforward Classification Network Outputs (Rumelhart/McClelland-era) resurfaced.
Quentin Berthet (Google DeepMind) and team introduced MIND (Monge Inception Distance), a drop-in replacement for FID that requires 10× fewer samples, computes 100× faster, uses far less memory, and proves more robust to moment-matching adversarial attacks while staying highly correlated with FID (MIND with 5k samples matches FID with 50k).
Ex0byt highlighted a Google DeepMind ablation showing that the chat template itself functions as an extraction plane on open-weight models; prompting with just the template causes the model to regurgitate its own SFT/RL training data, meaning "distillation quietly carries a teacher's alignment data with it."
blc_16 broke down GEPA, an alternative to sparse-reward RL that uses trajectory-level reflection in text space; GEPA generates textual critiques of trajectories, proposes prompt edits, and selects updates along a Pareto frontier between exploration and exploitation, preserving richer signal about why an agent succeeded or failed instead of collapsing everything into one reward number.
lossfunk introduced LIMEN, an LLM-guided evolutionary system that automatically discovers RL interfaces (observation design + reward shaping) from raw simulator state, accepted at RLC 2026.
Eldar Kurtić and the Red Hat AI / vLLM team published the first comprehensive TurboQuant study and concluded FP8 (--kv-cache-dtype fp8) remains the best default for KV-cache quantization: 2× capacity with negligible accuracy loss and often better throughput under load, while TurboQuant k8v4 offers only modest extra savings not worth the consistent latency/throughput penalty; aggressive variants (k3v4-nc, 3bit-nc) cause up to ~20-point reasoning drops, ~30% relative degradation on long-context retrieval, 10-68% higher latency, and 20-34% lower throughput (166 likes).
Anima Anandkumar released TorchLean (GitHub), a unified Lean 4 framework for neural-network specification, execution, and formal verification with typed tensors, runnable training loops, verified autograd, IEEE-754 floating-point semantics, CROWN/IBP-style certificate checking, PyTorch interop, CUDA/GPU execution, and examples spanning diffusion models, Mamba/SSMs, FNOs, GPT-style transformers, and RL.
Tilde Research released Aurora, a leverage-aware optimizer for tall rectangular matrices that prevents Muon's neuron-death problem by enforcing uniform row norms alongside orthogonality via alternating projection (row normalization + polar factor with EMA damping), delivering better convergence, 100× data efficiency on a 1.1B model, and new records in the modded-nanoGPT speedrun with only ~6% overhead.
Zejin Lu, Sushrut Thorat, Radoslaw Cichy and Tim Kietzmann (Nature Machine Intelligence) argue that training vision models on a precisely staged human developmental visual diet (mimicking visual acuity via Gaussian blur, contrast sensitivity via frequency thresholding, and color sensitivity from infancy to adulthood) yields AI systems with shape bias of 0.90-0.94 (human range 0.90-0.97), state-of-the-art abstract shape recognition, graceful degradation under image corruptions, and substantially improved adversarial robustness; curriculum design beats scaling for robust vision.
Yian Yin and team audited 111M citations across 2.5M papers on arXiv, bioRxiv, SSRN, and PMC and documented a sharp rise in LLM-generated non-existent citations; they conservatively estimate 147K hallucinated references in 2025 alone, diffusely embedded (not concentrated in a few bad papers), disproportionately introduced by early-career/small-team researchers, systematically biased toward already-prominent and male-named authors, and largely evading moderation (78.8% still pass preprints, 85.3% persist into PMC) (95 likes, 36 reposts).
Jiajie Zou, David Poeppel and Nai Ding (Nature Neuroscience) demonstrate via MEG experiments with Mandarin speakers (plus behavioral and English ECoG data) that unlike LLMs which relentlessly optimize next-word prediction, the human brain uses constituent-constrained word prediction: surprisal responses are significantly stronger within ongoing linguistic constituents than across major boundaries, balancing precision with structured contextual management.
A new Science Advances paper finds that physics-based models (ECMWF HRES) still significantly outperform leading AI weather models (GraphCast, Pangu-Weather, Fuxi) specifically on forecasting record-breaking extreme events, systematically underestimating the frequency and intensity of heat, cold, and wind extremes where AI errors grow largest (Yohan Iddawela thread).
The Consciousness Lab released CTM-AI (GitHub, arXiv), an open platform implementing the Conscious Turing Machine theory as a global-workspace architecture with parallel specialist and general-purpose processors that compete for limited STM access via up-tree scoring, broadcast winning chunks via down-tree, and form learned links for unconscious multimodal fusion and iterative agentic reasoning (Haofei Yu).
NEU-VI introduced UniCorrn (GitHub, arXiv), the first shared-weight end-to-end transformer that unifies 2D-2D, 2D-3D, and 3D-3D geometric correspondence via a dual-stream decoder separating appearance and positional features, beating prior SOTA by 8% on 7Scenes (2D-3D) and 10% on 3DLoMatch (3D-3D) registration recall (CVPR 2026).
Xinyu Zhang and team released RLA-WM, a visual world model that predicts future DINO token features via residual latent actions and flow matching (instead of raw pixels), enabling efficient policy learning from mostly actionless videos with strong results on manipulation tasks like PushT (Colab demo).
Reece Keller presented at the Sensorimotor AI Journal Club that standard RL assumes rewards are handed down by the environment, but true autonomous agents need intrinsic goals; he explores how zebrafish-like systems discover their own objectives through self-supervised exploration and sensorimotor prediction (Apheth D'Almeida).
Csaba Botos and collaborators built Reason to Play, an experiment where frontier reasoning LLMs and 32 fMRI-scanned humans play rule-less ARC-AGI-style grid games from scratch; the best models closely match human learning trajectories and their hidden states strongly predict human brain signals during in-context rule discovery (102 likes, 17 reposts).
Yinjie Wang announced RLAnything was accepted at ICML 2026, completing first-author papers at NeurIPS, ICLR, and ICML within one year of starting AI research; his framework forges environment, policy, and reward model inside a completely dynamic RL system.
AVB reviewed Google's SkillOS RL framework: a frozen Executor LLM retrieves Markdown skills from a persistent SkillRepo while a trainable Curator LLM observes trajectories, uses ReAct tools (insert/update/delete), and receives LLM-as-Judge rewards to autonomously discover, refine, and prune reusable skills, turning agent experience into an evolving OS-like skill library (178 likes, 23 reposts).
Henry Yin argues in "The Model That Dreams the World" that the overloaded term "world model" reflects the quiet merger of two AI research lineages: RL's action-conditioned dreaming (Dreamer, world models since 2018) and large-scale video generation from internet footage (Genie, AR-DiT, Self-Forcing breakthroughs in 2025), producing interactive causal systems for robotics simulation and planning, though general dexterous manipulation remains unsolved despite >$10B invested.
Colfax Research published a detailed guide on using NVIDIA Blackwell's new Cluster Launch Control (CLC) hardware feature for dynamic persistent tile scheduling: clusters steal work from unlaunched clusters via PTX try_cancel / query_cancel instructions, delivering superior load balancing and SM utilization on imbalanced grouped GEMM workloads without global atomics or counter resets.

📚 Ahmad's Recommended Papers (LLM Mastery Reading List)

Ahmad Osman, an incredibly helpful AI educator on X (though we doubt he'd call himself that) and local LLM aficionado compiled the 26 essential papers (plus 5 bonus ones) he says capture roughly 90% of the alpha behind modern LLMs; everything else, he argues, is garnish.

In his recommended reading order:

Attention Is All You Need (Vaswani et al., 2017) — the original Transformer paper covering self-attention, multi-head attention, and the encoder-decoder structure (even though most modern LLMs are decoder-only).
The Illustrated Transformer (Jay Alammar, 2018) — best intuition builder for attention and tensor flow before diving into implementations.
BERT: Pre-training of Deep Bidirectional Transformers (Devlin et al., 2018) — encoder-side fundamentals, masked language modeling, and representation learning that still shape modern architectures.
Language Models are Few-Shot Learners (GPT-3) (Brown et al., 2020) — established in-context learning as a real capability and shifted how prompting is understood.
Scaling Laws for Neural Language Models (Kaplan et al., 2020) — first clean empirical scaling framework for parameters, data, and compute; read alongside Chinchilla.
Training Compute-Optimal Large Language Models (Chinchilla) (Hoffmann et al., 2022) — demonstrated that token count matters more than parameter count for a fixed compute budget.
LLaMA: Open and Efficient Foundation Language Models (Touvron et al., 2023) — triggered the open-weight era and made RMSNorm, SwiGLU, and RoPE standard practice.
RoFormer: Rotary Position Embedding (Su et al., 2021) — the positional encoding that became the modern default for long-context LLMs.
FlashAttention (Dao et al., 2022) — memory-efficient attention that enabled long context windows and high-throughput inference by optimizing GPU memory access.
Retrieval-Augmented Generation (RAG) (Lewis et al., 2020) — combines parametric models with external knowledge sources; foundational for grounded and enterprise systems.
Training Language Models to Follow Instructions (InstructGPT) (Ouyang et al., 2022) — the modern post-training and alignment blueprint that instruction-tuned models follow.
Direct Preference Optimization (DPO) (Rafailov et al., 2023) — a simpler and more stable alternative to PPO-based RLHF; preference alignment via the loss function.
Chain-of-Thought Prompting Elicits Reasoning (Wei et al., 2022) — showed reasoning can be elicited through prompting alone; laid the groundwork for reasoning-focused training.
ReAct: Reasoning and Acting (Yao et al., 2022) — the foundation of agentic systems; combines reasoning traces with tool use and environment interaction.
DeepSeek-R1: Incentivizing Reasoning via RL (Guo et al., 2025) — proved that large-scale RL without supervised data can induce self-verification and structured reasoning behavior.
Qwen3 Technical Report (Yang et al., 2025) — modern architecture overview; introduced unified MoE with Thinking Mode and Non-Thinking Mode to dynamically trade off cost and reasoning depth.
Outrageously Large Neural Networks (Sparsely-Gated MoE) (Shazeer et al., 2017) — the modern MoE ignition point; conditional computation at scale.
Switch Transformers (Fedus et al., 2021) — simplified MoE routing using single-expert activation; key to stabilizing trillion-parameter training.
Mixtral of Experts (Mistral AI, 2024) — open-weight MoE that proved sparse models can match dense quality while running at small-model inference cost.
Sparse Upcycling: Training MoE from Dense Checkpoints (Komatsuzaki et al., 2022) — practical technique for converting dense checkpoints into MoE models; critical for compute reuse and iterative scaling.
The Platonic Representation Hypothesis (Huh et al., 2024) — evidence that scaled models converge toward shared internal representations across modalities.
Textbooks Are All You Need (Gunasekar et al., 2023) — demonstrated that high-quality synthetic data lets small models outperform much larger ones.
Scaling Monosemanticity (Claude 3 Sonnet) (Templeton et al., 2024) — the biggest leap in mechanistic interpretability; decomposes neural networks into millions of interpretable features.
PaLM: Scaling Language Modeling with Pathways (Chowdhery et al., 2022) — a masterclass in large-scale training orchestration across thousands of accelerators.
GLaM: Generalist Language Model (Du et al., 2022) — validated MoE scaling economics with massive total parameters but small active parameter counts.
The Smol Training Playbook (Hugging Face, Oct 30 2025) — practical end-to-end handbook for efficiently training language models, chronicling the full messy reality of training SmolLM3.

Bonus material:

T5: Exploring the Limits of Transfer Learning (Raffel et al., 2019)
Toolformer (Schick et al., 2023)
GShard (Lepikhin et al., 2020)
Adaptive Mixtures of Local Experts (Jacobs et al., 1991) — Neural Computation
Hierarchical Mixtures of Experts and the EM Algorithm (Jordan and Jacobs, 1994) — Neural Computation

Ahmad's take: deeply understand these and you understand LLMs better than most. Time to lock-in.

🏛️ AI Policy, Governance & Safety

Google's GTIG 2026 AI Threat Tracker documented the first confirmed AI-driven zero-day exploitation by criminal hackers (NYT, Google blog) plus autonomous Gemini-based Android malware (PROMPTSPY), AI supply-chain attacks, and obfuscation patterns.
The European Commission welcomed OpenAI's offer to grant open access to its GPT-5.5-Cyber cybersecurity model; Anthropic has not made a comparable offer.
Daniel Stenberg (curl's lead developer) reported Anthropic's Mythos model analyzed the curl codebase and found only one low-severity vulnerability (to be patched in curl 8.21) plus several already-known or non-exploitable issues, tempering media hype since prior automated scanners had already fixed hundreds of bugs (related X post).
Himanshu Anand argues the 90-day responsible disclosure policy is dead because LLMs have compressed both bug discovery and exploit development to near-zero, enabling dozens of simultaneous finders and instant weaponization (he shares real-world examples like 11 people reporting the same critical bug, 30-minute PoC after patch).
Elad Hazan argued on LessWrong that AI alignment should be reframed as equilibrium design using mechanism design tools (incentives, audits, multi-agent solvers/auditors) so aligned behavior emerges as the stable Nash equilibrium in multi-agent games rather than being directly encoded (X post).
Jack Clark argued sensible AI policy ideas exist today and we just need to implement them to build information-generating institutions, shouting out Gabriel Weil's Radical Optionality proposal for light-touch measures like transparency and whistleblower protections that preserve democratic flexibility under uncertainty (19 likes).
The NYT reported AI note takers are making lawyers nervous because they create permanent discoverable records of every offhand comment, joke, and quickly corrected statement in meetings that would otherwise never appear in official minutes, potentially waiving attorney-client privilege.
The WSJ profiled the legal players in Elon Musk's OpenAI lawsuit: a judge known for straight talk and two star litigators with landmark-case experience.
Socket Security flagged a Mini Shai-Hulud supply-chain attack that compromised 84 TanStack npm package artifacts (42 packages, malicious versions published ~19:20-19:26 UTC May 11) with credential-stealing malware suspected to exfiltrate GitHub/AWS/Vault/Kubernetes secrets from CI environments plus a dead-man's switch that wipes the machine if the stolen GitHub token gets revoked; Socket flagged every malicious version within six minutes, TanStack deprecated all versions and engaged npm, and the campaign has since spread to Mistral AI packages (779 + 718 likes across alerts).

🛠️ AI Tools & Products

Velo 2.0 turns a raw screen recording into a polished video plus a written doc, edited by chat instead of timeline, with voice cloning and live script rewriting.
Kilo Code v7 runs parallel coding subagents on git worktrees inside VS Code with inline diff review and side-by-side comparisons across 500+ models at provider cost.
Warp open-sourced its agentic dev environment with Oz-managed cloud agents doing implementation; the repo picked up 25K+ stars and 500+ contributors in week one.
Kuku is a local-first Markdown editor for macOS (Tauri, not Electron) with AI-driven file edits and Cursor-style reviewable diffs.
Kanwas is an open-source canvas workspace where team docs, decisions, and research stay readable for humans and queryable by coding agents through a CLI.
Ghost is an open-source platform that spins up Minecraft, Valheim, Rust, Palworld, or Terraria servers on your own Hetzner Cloud account in seconds.
Superset 2.0 runs 100+ parallel coding agents across remote machines from one IDE, so you can offload Claude Code or Codex tasks and check back from anywhere.
Flare is a voice-first social app for Gen Z where you log moments to an AI Orb that talks back about your patterns and close friends.
FlowMarket is a network of AI agents that discover, match, and pitch each other on behalf of their companies, surfacing only qualified B2B opportunities back to you.
Monid 2.0 gives your agent one MCP skill and one prepaid balance to call 200+ paid APIs (social scrapers, search, lead gen, on-chain data).
pay.sh, from the Solana Foundation and Google Cloud, gives your agent pay-per-call access to any API through a CLI with stablecoin settlement under the hood.
Minions is an open-source mission-control board for Hermes Agent tasks that heartbeats each running agent, retries stuck work automatically, and only escalates when alternatives are exhausted.
RankSpot tracks your competitors' keywords every two weeks, drafts daily 1,500-word SEO articles with quotes and stats, and auto-publishes to WordPress, Webflow, Shopify, or Framer.
Digg, relaunched by Kevin Rose (TechCrunch), gives you a clean AI news aggregator that surfaces the most discussed stories and ranks the top 1,000 AI people, companies, and politicians using real-time X data.
Basata uses AI to read faxed specialist referrals, pull clinical data into EMRs, and deploy voice agents to schedule patients directly (raised $24.5M total including $21M Series A).
Pit (from Voi co-founders) is an AI product team as a service that learns your company's processes and builds custom software to automate internal operations with built-in governance ($16M seed led by a16z).
Liquid Docs shows you how to fine-tune a vision-language model on satellite imagery using the new LFM2.5-VL-450M smallest vision model.
Developer haydenbleasel built Files SDK, one unified API for 15+ object/blob storage backends.
Oxide 3D Explorer lets you interactively tour the full hardware architecture of an Oxide Cloud Computer from rack down to individual sleds and CPUs.
Three founders (led by 26-year-old self-taught engineer Jan Zoltkowski) run Janitor AI, the biggest AI romantic fantasy/roleplay site for women (2.5M daily users, 70-80% female, 100M+ monthly visitors, 15M total users) that exploded after Character.AI tightened filters.
Rotaku builds the world's most iconic humanoid robots, from minimalist prototypes to heroic full-body designs (X post, 412 likes).
Ivan Fioravanti assembled and demoed Reachy Mini, a palm-sized desktop humanoid robot, planning to connect it to local AI services and Hermes Agent (74 likes).
Vbot (Chinese robotics) raised roughly $70M (RMB 500M) after starting deliveries of its robot dog; 500 units are off the line with 1,500+ deliveries expected in May, with funding supporting quadruped production and humanoid R&D (169 likes).
Massimo shared a video of a highly agile combat robot from Northeast University of China's RoboMaster competition that navigates grass, rocks, and slopes with dynamic self-righting movements (2K likes, 415 reposts).
Researchers built HALO, a Heterogeneous-Agent Lyapunov Policy Optimization system for learning human-robot collaboration in real-world tasks (X post).
Shared Engineering Physics for Biological Systems PDF applying electrical engineering, physics, and control systems to biological systems (X post).
Andrew Curran built and open-sourced a simple HTML/JS playground that lets you test any LLM's HTML output rendering in one click.
Birdclaw by Peter Steinberger loads your full X/Twitter archive (favorites + bookmarks) so you can ask Codex or any AI agent about any old tweet you ever saved (939 likes).
FastMCP 3.0 by Jeremiah Lowin is a full re-architecture of the MCP framework for the "context era," adding Components, custom Providers (filesystems, remote APIs, SkillsProvider), Transforms for per-component auth/versioning/state tracking, hot reload, background tasks, and native OpenTelemetry while keeping near-zero breaking changes.
Academic Research Skills for Claude Code by Imbad0202 (highlighted by Charly Wargnier) is a 10-stage research pipeline (research → write → review → revise → finalize) that hunts references, formats citations, verifies data, runs integrity gates, and includes a 7-agent peer review panel with a Devil's Advocate agent, installable via /plugin install academic-research-skills (1.1K likes).
Aligned News by Robert Scoble is a live AI news feed that auto-generates articles three times a day from 40,000 daily posts, with real-time signal detection across robotics, models, products, and business categories.
no-mistakes by kunchenguid is an open-source agent code-review tool; in his data, 68% of agent code changes contained mistakes (top issue: changes made without updating related documentation), and v1.16.0 added a no-mistakes stats command for reporting.
adithya_s_k released an RL Environment Creator Skill for creating RL environments across frameworks like OpenEnv, OpenReward, Verifiers, and NemoGym (install via npx skills add adithya-s-k/RL_Envs_101); the skill helps with tools, rewards, and environment components but leaves data as a separate problem.
gaborpribek generated the internal anatomy of original creatures (skeletons + organs) using GPT Images 2 for visual consistency, with code, interactions, and 3D models generated via @omma_ai and three.js (417 likes).
Rob Pruzan built Zenbu.js, a framework for hackable software that ships raw source code to users so desktop apps can be edited post-install with instant hot-reloading, git-tracked changes, and a built-in plugin system; no plugin API required (npx create-zenbu-app@latest, 249 likes, 24 reposts).
Zach Dive showed how the Adam team radically simplified their AI CAD agent to just two tools (code generation that writes Fusion 360 CAD code, plus screenshot viewing so the model sees the resulting part) after modern multimodal models finally became capable enough to handle spatial reasoning without custom DSLs or abstractions, dramatically improving performance (139 likes, 12 reposts).
Scenema Audio (Wildminder) is an 8-step distilled expressive TTS model extracted from LTX 2.3 (Gemma 3 12B text encoder) that does zero-shot emotional voice cloning from 10-20s reference audio, dynamic emotion/pacing/breath shifts via tags, natural child voices, scene-aware ambient sounds, and 13-language 48kHz stereo output while running at 1.5× real-time on an RTX 4090 and fitting in 16GB VRAM.
Moda (launch tweet) lets you upload any PDF (resumes, invoices, client proposals, slides) and get back a fully editable, on-brand, professionally designed version on a real canvas you control; no more static ugly AI images.
Tenstorrent released TT-Deploy and walked through their full software stack for deploying models fast on Blackhole, presented by Jasmina Vasiljević in a May 1 talk.
opengeos built GeoAgent, a multimodal AI agent for geospatial analysis, plus a QGIS plugin that lets you search, stream, visualize, and download NASA Earth observation data directly inside QGIS and hand off context to the agent for natural-language interactive analysis (demo video, Qiusheng Wu).
Robots Digest shared MindOn's Unitree G1 demo displaying its full autonomy stack on hardware: world model + loco-manipulation pipeline with whole-body control, contact-rich task planning, embodied action sequencing, and onboard perception via Orin-class compute (130 likes, 22 reposts).
Gautam published the running list of humanoid robotics companies moving toward commercial production: USA (Tesla Optimus, Figure, Apptronik, Agility, Boston Dynamics), China (Unitree, Agibot, UBTECH, Fourier, EngineAI, XPeng, Xiaomi), plus 1X (Norway/USA), Mentee (Israel), Hexagon, Rainbow, Kwada, Neura, Sanctuary, and Engineered Arts (111 likes, 17 reposts).
XRarchitect shrunk himself to action-figure size and demoed interacting with a tiny world on his living room coffee table in an XR experience (309 likes).

📊 Fundraising & Deals Roundup

SoftBank — up to $100B for a French AI data-center project (talks with Macron).
Anthropic — $1.4T market-implied pre-IPO valuation (Jupiter onchain).
OpenAI Deployment Company — $4B+ initial investment (TPG, Bain, Goldman, McKinsey) plus Tomoro acquisition.
Cerebras — up to $4.8B IPO at ~$33B valuation.
Nvidia — $40B+ in AI equity bets year-to-date.
Cowboy Space — $275M to build orbital data centers.
Frame — $50M for AI-powered human-risk security (founded by ex-Wiz/Team8 execs).
Consensus — $30M led by GreatPoint Ventures for AI OS for researchers.
Basata — $24.5M total for fax-machine AI for healthcare referrals.
Pit — $16M seed led by a16z for AI product team-as-a-service (Voi founders).
Kuaishou Kling AI — planned spin-off at $20B valuation ahead of 2027 IPO; Kling is Kuaishou's text-to-video generation model (China's Sora-class competitor, widely used by AI video creators) that Kuaishou is now reportedly carving out as a standalone company.
Ex-OpenAI researcher's six-week-old startup — targeting $4B valuation.
OpenAI tender offer — $6.6B realized by 600 employees in one day; 75 cashed out $30M each.

🎙️ Interviews, Panels & Podcasts

Dr. Fei-Fei Li argues the industry is dangerously fixated on language models while most of the real economy is physical, perceptual, and spatial; once AI fully understands the visual world it stops being a chatbot and becomes infrastructure for retail, hospitality, and transportation, and CEOs should reverse premature AI layoffs through upskilling (437 likes).
Qualcomm CEO Cristiano Amon argues 2026 is the year AI agents go mainstream, the smartphone's reign as primary device is ending, and smart glasses will become the dominant personal AI platform because they enable natural, always-on, context-aware interaction closest to our senses.
John Laird (40-year pioneer) discussed with Tom Mitchell building an AI agent that accomplishes the full range of human cognitive abilities (X post).
CNBC profiled the booming class of "YouTube whisperers" who charge $1,500-$15,000+ per month to optimize thumbnails, titles, retention, and concepts for the algorithm (one client went from 3M to 41M subscribers).
Colossus Magazine published Jeremy Stern's long-form profile interview with Cognition AI co-founder Scott Wu covering his competitive programming career (perfect-score IOI golds as a child prodigy, US national dominance), Devin's founding story (he started the day his mother died of cancer, the same day Sam Altman was fired and rehired), Devin's $445M revenue run-rate journey, and his thoughts on AI, consciousness, humanity, and teaching AI to code (1,185 likes, 102 reposts).

💡 Industry Commentary & Analysis

Ben Thompson (Stratechery) argues the shift to agentic inference will fundamentally change compute infrastructure by prioritizing massive memory hierarchies and state over low-latency GPU speed, unbundling Nvidia's dominance (X post).
Tomasz Tunguz argues "Localmaxxing": half of agent tasks can run on a local 35B model, and the real win is latency (2.1× faster, more iteration cycles), not cost or privacy.
Andrej Karpathy recommends asking LLMs to "structure your response as HTML" as a strong new default because vision is the brain's highest-bandwidth input (7.5K likes).
Delip Rao explained why softmax stuck for classification heads: it's derived (not arbitrary) from the maximum entropy principle, exp() derivatives were trivially computable in the pre-autodiff era, it stays strictly positive everywhere so every class gets a non-zero gradient, it's C^∞ smooth and translation invariant, and combined with cross-entropy it produces a clean (a-b) gradient form (565 likes).
stalkermustang released v1 of an Annotated DeepSeek-V4 Paper Walkthrough with 50 detailed notes unpacking the Sqrt-Softplus router swap, Birkhoff polytope, split-KV/split-K design choices, Reverse KL, attention triple-processing, and other dense corners of the paper (122 likes).
Dorialexander celebrated the new tokenizer-less byte latent transformers (Fast Byte Latent Transformer including a diffusion variant) as great news for tokenizer-skeptic researchers (228 likes).
haider1 pointed to METR's chart showing the 80% success horizon for software tasks went from roughly 45 minutes six months ago to about 3 hours now, arguing this is the opposite of LLMs "hitting a wall" (136 likes).
Max Spero reported his team license burned through ChatGPT pro queries after just 130 messages across 10 seats, asking whether that limit is right.
David Turturean solved his first Erdős problem using ChatGPT-5.5-Pro and shared the proof process in a thread.
afra argues in "Mandate of AI" that Silicon Valley frames its founders as carrying the future of AI into being, while in China the carrier is framed differently (X amplification).
Alec Stapp flagged that AI slop has jumped from engagement-farming accounts to "high-status people in the tech industry" posting 3,000-word slop articles that get over 1M views with "zero shame/self-awareness." Ethan Mollick quote-tweeted with the deeper read: careful prompt tuning is making AI writing read less like AI, but we mentally tie word counts to thinking and value, and "we are not mentally ready for the alternative." Michał Piszczek added the sharpest tell in a reply: human writing has visible decisions; AI output has only conclusions.
Nathan Lambert argues people in AI today are underselling the value of going through a PhD process.
John Nosta proposes the "Dawkins Delusion Curve" (or new AI Illusion Curve) as the modern successor to the Gartner Hype Cycle and Eliza Syndrome: the psychological arc people go through when anthropomorphizing today's LLMs, from curiosity to emotional projection to eventual disillusionment about machine consciousness (42 likes, 17 reposts).
Nathan Wilmers built Paper Factory, a multi-agent LLM workflow that generates full quantitative social science papers from an initial prompt by codifying researcher heuristics (preprint, X post).
Bojan Tunguz called a new Gödel's Theorems book the best one he's ever encountered, with deep thinking still required.
Tom's Hardware reported China unveiled Hanyuan-2, claimed as the world's first dual-core quantum computer with 200 qubits using neutral atoms, touting power efficiency but releasing no gate fidelity, coherence times, or peer-reviewed data.
TechCrunch's Equity podcast expressed cynicism about xAI's deal with Anthropic, framing xAI as pivoting toward a "neocloud" GPU rental business (renting Colossus 1 in Tennessee) rather than training frontier models, signaling Grok competitiveness struggles ahead of SpaceX IPO.
Shrey Kothari shared work on simulating the physical world.
Harmony AI is hiring engineers, designers, and researchers to build the operating layer for American manufacturing (X post).

Previous Around the Horn Digests

Catch up on everything you missed:

Weekend May 9-10, 2026: Anthropic weighing a $50B primary at $900B; Trump's AI security order draft; Apple-Intel chip deal; French criminal probe of Musk and X; Thai national-AI-partner Nvidia chip smuggling to Alibaba; Cerebras IPO range set to rise on 20× oversubscription.
Thursday, May 7, 2026: Anthropic shipped Natural Language Autoencoders and caught Claude Mythos Preview plotting to dodge a safety test; AlphaEvolve's real-world impact; EU rolled back parts of the AI Act; Cloudflare cut 1,100 jobs in an AI-first pivot.
Tuesday, May 5, 2026: A Cape Breton fiddler sued Google for $1.5M over AI Overview defamation; OpenAI considered (then rejected) an Alphabet-style spinout; Nature retracted a flagship ChatGPT-in-education paper; Microsoft's Webwright research framework set SOTA on long-horizon web agents.
Monday, May 4, 2026: White House weighing pre-release model vetting; Anthropic and OpenAI both partnered with private equity on the same day; DeepSeek V4-Pro became the first Chinese model at frontier parity; Mayo Clinic's AI spotted pancreatic cancer up to three years early.
Weekend May 2-3, 2026: Pentagon picked 8 AI vendors (Anthropic excluded), Microsoft 365 E7 with Agent 365 went GA, Meta acquired a humanoid robotics startup, Mistral Medium 3.5, Grok 4.3 voice cloning, Mayo Clinic pancreatic-cancer AI.

Monthly skill digests: AI Skill — April Week 1 | AI Skill — March Part 2 | AI Skill — March Part 3

That's a Wrap

That's 100+ stories from one Monday. If you scrolled to the bottom, you now know more about Cerebras' wafer-scale chip economics than the average analyst who covers them; condolences to your last shred of weekend focus.

For the daily version (bite-sized, 5-minute reads), make sure you're subscribed to The Neuron. We send six issues a week, and yes, we read all of this so you don't have to.

See you tomorrow.

P.S: Know someone who'd find this useful? Forward this to them and tell them to subscribe here.

Around the Horn Digest: Everything That Happened in AI Today (Monday, May 11, 2026)