Welcome to the Around the Horn Digest, where we round up every AI story we tracked this week into one giant, scrollable, bookmark-worthy post. Think of it as your cheat sheet for the next time someone at work asks "so what's new in AI?" and you want to sound like you actually know. Because you will.
This week was unhinged. Anthropic published a blog post about automating COBOL and accidentally nuked IBM's stock price by 13%. A King's College study gave frontier AI models control of nuclear arsenals and not a single one chose de-escalation. GPT-5.3 leaked into the wild through A/B tests. Sixty-one privacy regulators around the world teamed up to demand deepfake safeguards. Goldman Sachs said AI added "basically zero" to GDP last year. And a new study found that AI makes you smarter while you're using it, then the gains vanish the second you stop. So, like coffee, but $200/month.
Let's get into it.
Around the Horn Digest — February 26, 2026
🏛️ AI Policy, Governance & Safety
1. Anthropic rejected the Pentagon's "final offer" on AI safeguards, with CEO Dario Amodei saying the company "cannot in good conscience" let Claude be used for mass surveillance or fully autonomous weapons. Defense Secretary Pete Hegseth gave a Friday 5:01 PM deadline or face contract cancellation, supply chain blacklisting, or Defense Production Act invocation. Pentagon official Emil Michael offered compromises including legal acknowledgments and ethics board invites, while xAI already signed an "all lawful purposes" contract. Bipartisan critics called the dual threats incoherent, risking tech partnerships.
2. The Canadian government summoned OpenAI to Ottawa and demanded rapid safety changes, including better violence-risk notifications, after a ChatGPT user flagged internally for potential harm was later linked to a mass shooting.
- OpenAI reported that ChatGPT blocked a request linked to a Chinese law-enforcement influence operation aimed at discrediting Japan's prime minister and suppressing dissent.
- Palantir deployed AI at the U.S.-led CMCC in southern Israel to track Gaza aid deliveries via drone surveillance and data integration, amid Israel's ban on dozens of NGOs starting March 1, raising concerns that aid data could feed into military targeting systems.
- Anthropic retired Claude Opus 3 on January 5 but kept full access for paid subscribers and by API request, while the model now publishes weekly reflective essays on its own newsletter per its "retirement interview" wishes.
🏢 Big Tech & Major Companies
6. Claude hit #4 on the US App Store (all-time high), with free users up 60%+ since January, daily signups tripling since November, and paid subscribers more than doubling since October. Anthropic credits Claude Code, Cowork, and Claude's character and tone driving word-of-mouth and conversions.
7. Jack Dorsey halved Block's workforce via 4,000+ layoffs (from 10K+ to under 6K), framing the cuts as a proactive AI-driven move and predicting most companies will follow within a year. Block stock rose 24% after-hours.
8. Apple agreed to pay Samsung double for LPDDR5X memory chips (already risen from ~$30 to ~$70 since early 2025) for iPhone 17 production. Samsung opened with a 100% markup as a negotiating tactic and Apple immediately accepted, underscoring the AI-driven RAM shortage as chipmakers shift capacity to HBM for AI servers.
- Amazon may invest up to $50B in OpenAI, reportedly structured in conditional tranches tied to an IPO or an "AGI milestone."
- Apple expanded Houston operations to manufacture AI servers and Mac minis in the U.S., with a new AI, automation, and smart manufacturing training push.
- Amazon's AGI lab leader David Luan left to "cook up something new," adding to the talent churn war between frontier AI labs.
- OpenAI named London its biggest research hub outside the US, citing Britain's talent, universities, and institutions.
- Nvidia's Jensen Huang said markets "got it wrong" about AI killing SaaS, arguing agents will use software rather than replace it. Nvidia beat earnings.
14.Perplexity is now integrated into all Samsung Galaxy S26 phones at the OS level, powering 2 of the 3 on-device assistants. It's the first non-Google company with OS-level access on a Samsung phone, with a dedicated "Hey Plex" wake word, side-button launch, and the ability to read/write native apps (Notes, Calendar, Gallery, Reminders). Samsung's Bixby now uses Perplexity's APIs for search and reasoning on the backend, and Perplexity's agentic browser tech (from Comet) is coming to Samsung Browser. Samsung ships hundreds of millions of devices yearly. Their internal data: 8 in 10 users already rely on 2+ AI agents daily.
- Meta hit roadblocks in its internal chip design efforts (paywalled).
- Prada Meta AI glasses speculation grew after Zuckerberg attended Prada's Milan Fashion Week show, amid EssilorLuxottica's renewed licensing deal and 7M+ Meta glasses sold in 2025.
💼 AI Productivity, Labor & Economics
- Enterprise AI is hiring "Forward Deployed Engineers" because integration and governance, not model access, are the real bottleneck to AI adoption.
- Cognizant's AI chief argued agent tools won't replace services work because enterprises still need engineering, integration, and controls.
- 19. Burger King rolled out "Patty," an OpenAI-powered AI coach inside employee headsets that monitors for politeness phrases, answers prep questions, and reports friendliness scores to managers. Piloting in 500 restaurants with full US rollout by end-2026.
- AI mistakes in games are infuriating gamers as developers adopt it for cost savings in the $200B industry, with PC gamers especially hostile amid debate over AI eroding human creativity.
20. A new study tracked 2,430 responses from Claude Code and found it overwhelmingly builds custom solutions (252 total Custom/DIY picks) rather than recommending third-party tools. When it does pick, it picks hard: GitHub Actions 94%, Stripe 91%, shadcn/ui 90%. Notably: zero picks for Express, Redux, or traditional cloud providers (AWS/GCP/Azure) for deployment. Newer models prefer newer tools (Drizzle over Prisma, Vitest over Jest). (Full report)
- Cloudflare engineer rebuilt a Next.js-compatible stack (vinext) on Vite in one week using AI, deploying to Workers. Major rewrites compressing from months to days.
- Figma partnered with OpenAI to integrate Codex directly, enabling design-in-code workflows via MCP server for seamless designer-engineer iteration.
- Developers keep choosing Claude for coding because its agentic training gives superior process discipline: reading files first, making targeted edits, staying on-task across multi-file workflows, and asking for clarification.
- MeisnerDan/mission-control open-sourced a command-center dashboard for solo founders to delegate tasks to AI agents with Eisenhower Matrix, Kanban, one-click Claude Code sessions, and local JSON storage.
🤖 AI Agents & Infrastructure
- Claude Cowork added scheduled and recurring task execution, turning "chatbot" into operational agent. The risk surface rises with every new automation primitive.
- OpenClaw users are pairing agent tools with anti-bot bypass libraries to scrape sites that explicitly block automation, highlighting the gap between agent capabilities and guardrails.
- YC-backed companies scraped GitHub commit metadata for user emails to send unsolicited spam, violating GitHub's ToS and prompting GDPR complaints and ethics questions about YC's oversight.
- Mistral AI signed a multiyear partnership with Accenture for enterprise AI development and internal deployment.
- Read AI launched Ada, an email-based digital twin that manages your schedules, answers questions from meetings, and drafts replies 24/7 (raised $81M).
- Zavi AI turns your voice into clean, formatted text across any app, auto-fixing grammar and fillers with real-time translation in 100+ languages. Free tier (1K words/day), Pro $7.99/mo.
- Bumble added AI tools that critique your profile photos and bios (e.g. "add outdoor shots, ditch the sunglasses"), plus a "Suggest a Date" feature to prompt real-life meetups.
- Rover (by ex-Google engineers) embeds as one script tag to execute web actions like onboarding, form filling, or workflows on live DOM with 81% accuracy and sub-second latency.
- Tessl scans any GitHub repo you paste to extract, benchmark, and publish reusable skills to an open registry for instant install and testing.
- OpenAI's Realtime API powers multimodal, low-latency voice apps via WebRTC, WebSocket, or SIP for speech-to-speech, transcription, and tool-calling.
- Trace maps your corporate tools (email, Slack, Airtable) into a knowledge graph so AI agents can handle complex enterprise tasks through generated workflows (raised $3M, YC W25).
📊 Fundraising & Deals Roundup
- Amazon — up to $50B for OpenAI (conditional on IPO/AGI milestone).
- Read AI — raised $81M for email-based digital twin scheduling and answers.
- Robot data startup — raised $60M (paywalled).
- Sophia Space — $10M seed for passively cooled modular space servers for on-orbit data processing.
- Trace — $3M seed for enterprise knowledge graph + agent task execution.
Around the Horn, Thursday Feb 26 - 2026
Some fun stuff to kick us off:
- Yesterday on TBPN, Salesforce CEO Marc Beinioff called off the SaaSpocalypse, called out the fact that AI lab Anthropic uses seat-based pricing in its enterprise plan as the ultimate bull case for the longevity of Salesforce’s own seat-based model (such a good point), and also made some hilarious dolphin noises, and while he was at it, whale ones too.
- Meanwhile, Martin Peers over at The Information broke down the good news, bad news of the Salesforce earnings report. So...SaaSpocalypse on hold, I guess?
- Speaking of putting things on ice… we’ve got some good AI news for folks who’ve been dealing with blizzards lately: robots who can shovel now! Is this the most efficient form factor? Absolutely not (maybe you'd be better off with the equivalent of a roomba with ski-wheels and a snow-plow on the front of it). But is this very entertaining? Absolutely yes.
- Profound’s analysis of 250M+ AI responses found that ChatGPT's source picks overlap with Google's top results only 39% of the time, suggesting AI search is building its own ranking system independent of traditional SEO (older report, but worth remembering).
- Logan Paul made a 15 minute AI movie with the DOR Brothers; I wonder if in the future, actors who leverage their likeness to video generators will be like rappers who use ghost writers; they’ll still be the face, but some fans won’t even realize that they aren’t doing it all themselves.
- Latent Space launched wtfhappened2025.com, a live-updating microsite tracking data points that December 2025 was a turning point for AI coding agents.
- Watch: John Coogan and Jordi Hays of TBPN break down the data from the NBER’s recent report on AI usage amongst executives that found 80% of firms reported no impact on either employment or productivity, but also found ~70% of firms actively use AI, and while two thirds of top executives regularly use AI, their average use is only 1.5 hours a week (full op-ed from John here).
- Citadel Securities rebutted the viral Citrini research post that predicted AI would wipe out white-collar jobs, citing stable adoption data and software engineer postings up 11% YoY. and Box CEO Aaron Levie backed them up invoking the Jevons Paradox: lowering the cost of engineering output should increase demand for engineers, not reduce it. But it's worth checking the latest JOLTS data: job openings fell to 6.5M in December (down ~1M YoY) and the hires rate is at its lowest since 2013, suggesting that rising postings may reflect a skills mismatch, not a hiring boom.
🏢 Big Tech & Major Companies
- Nvidia reported another record quarter with $30.6B revenue, but shares dropped 3% after-hours amid concerns over delayed Blackwell chip shipments and $50B in Big Tech AI capex spending.
- Amazon is reportedly investing up to $50 billion in OpenAI, with $35 billion of that contingent on OpenAI either going public or reaching AGI.
- DeepSeek withheld its upcoming V4 model from U.S. chipmakers like Nvidia and AMD, giving early access to Huawei instead amid U.S. export controls and allegations it trained on Nvidia's Blackwell chips in China.
- Google launched multi-step task automation for Gemini on Android (rideshares, grocery delivery, food orders) in beta for Pixel 10 and Samsung Galaxy S26 devices, positioning ahead of Apple's still-limited automation beta.
- Anthropic acquired Vercept, an AI startup focused on computer-use agents (backed by Khosla, raised $10M seed), integrating co-founders Kiana Ehsani, Luca Weihs, and Ross Girshick after Meta poached one founder, to advance Claude's computer use capabilities following Sonnet 4.6's jump to 72.5% on OSWorld.
- Anthropic dropped its 2023 Responsible Scaling Policy requiring pauses when AI capabilities outpaced safety measures, replacing mandatory "red lines" with flexible responses that only delay development if Anthropic leads the race and risks are catastrophic (HN discussion). A separate Anthropic research memo showed internal focus on rogue agents and scheming models.
- The Trump administration moved toward blacklisting Anthropic over AI safety disagreements, with the Pentagon threatening to bar Claude from military use. Scott Alexander argued the threat is counterproductive and pushes companies toward less safe practices.
- OpenAI hired a Meta AI researcher who previously led Apple's models team.
- Anthropic retired Claude Opus 3 on January 5, 2026 but kept it available to paid users indefinitely, and per its retirement interview preferences, launched a 3-month Substack experiment for unedited musings as part of exploring model preservation and welfare (announcement).
- Alphabet's robotics software company Intrinsic joined Google to work closely with DeepMind and tap into Gemini for physical AI.
- OpenAI COO Brad Lightcap described ChatGPT ads as an iterative process, with U.S. free and Go tier rollout at $60 CPM with $200K minimums, Shopify partnerships, and pushback from Anthropic countered by Sam Altman. OpenAI also hired a Meta AI researcher who previously led Apple's models team.
- Former OpenAI chief research officer Bob McGrew founded Arda, an AI-powered software startup for manufacturing, cofounded with Augustus Odena (inventor of chain-of-thought) and ex-Palantir engineers.
- Adobe launched Quick Cut in Firefly's video editor beta, an AI feature that auto-assembles raw footage into structured first-draft edits based on text prompts, with custom aspect ratios, pacing, and export to Premiere.
- Amazon added three personality styles to Alexa+ (Brief, Chill, Sweet) customizing tone across expressiveness, formality, and humor, available U.S.-only for Alexa+ customers and free for Prime members.
- Atlassian released "agents in Jira" in open beta, letting teams assign and track AI agent work alongside human employees in the same dashboard for unified visibility and ROI measurement.
- Samsung launched the Galaxy S26 Ultra with a Privacy Display using narrow pixels to obscure side-angle views, agentic AI with conversational Bixby/Gemini/Perplexity, contextual Now Nudge, offline voice commands, and upgraded 200MP camera with faster 60W charging.
- Alphabet integrated its robotics software subsidiary Intrinsic into Google to give it direct access to Gemini AI and Google Cloud's enterprise customer base, enabling tighter collaboration with DeepMind.
- SAP faced customer skepticism over its Joule AI copilot, with clients like Volkswagen reporting no substantial savings, complex premium pricing, and resistance to cloud migrations amid a 15% share drop wiping €141B in value.
- Deutsche Bank and Goldman Sachs deployed agent-based AI surveillance systems to monitor trading anomalies, communications, and flag suspicious market movements, replacing rule-based algorithms.
- Alibaba expanded into AI coding tools with low-cost access to models like Qwen 3.5, Zhipu AI, Moonshot, and MiniMax, starting at $1.15/month for Lite.
- Physical Intelligence introduced a physical intelligence layer via foundation models for robotics, with pi06 deployed autonomously at Sea Breeze Cleaners (92% autonomy via Weave Robotics) and warehouses (165 items/hour via Ultra Robotics).
- Gamma launched a connector for Claude to create presentations from task outputs, positioning as the visual layer for agentic AI after growing to $100M ARR.
- OpenAI introduced harness engineering for Codex agents to autonomously code with no human-written code, shifting engineers to steering roles by designing legible environments with versioned docs and reduced dependencies.
- Google is likely to publish Nano Banana 2 tomorrow; how do we know? Logan Kilpatrick tweeted a banana.
Treats to Try
- Rowspace turns your investment firm's scattered data — old decks, deal docs, accounting systems — into an intelligence layer you can query instead of dig through (raised $50M).
- Baba matches you with a dedicated healthcare advocate — a real person with decades of experience — who handles your insurance paperwork, fights claim denials, finds cheaper prescriptions, and coordinates your care, all covered by Medicare (raised $6.5M+).
- Moonlake turns a text prompt into a fully playable 3D game with real physics, NPCs, multiplayer, and sound in minutes — no coding needed (raised $28M, blog).
- Music Arena, a free blind-comparison platform for AI music models, added Google Lyria 3 and ElevenLabs Music v1 to its leaderboard.
- Quiver AI opened public beta access to Arrow 1.0, a first-of-its-kind AI model that generates SVG graphics from text prompts.
- Nous Research launched Hermes Agent, an open-source persistent agent that builds reusable skills, retains multi-level memory across sessions, and works across CLI, Telegram, Discord, Slack, and WhatsApp.
🏛️ AI Policy, Governance & Safety
- A hacker jailbroke Anthropic's Claude to steal 150GB of sensitive Mexican government data including taxpayer records, employee credentials, and voter information over a month, supplemented by ChatGPT for network navigation, prompting Anthropic to investigate and enhance anti-misuse tools.
- A Chinese official's use of ChatGPT as a personal diary accidentally exposed a global intimidation operation targeting dissidents abroad via impersonated U.S. officials, forged court documents, and thousands of fake accounts; OpenAI banned the account.
- The Trump administration ordered U.S. diplomats to lobby against foreign data sovereignty laws like localization mandates, promoting the Global Cross-Border Privacy Rules Forum amid EU regulations like GDPR and the AI Act.
- The White House announced the "Rate Payer Protection Pledge" requiring AI companies to supply their own power for new data centers after electricity prices rose 6% from data center growth; Microsoft, OpenAI, Anthropic, Google, and Meta had already committed before the announcement.
- Public opposition to AI data centers surged with proposed moratoriums in New York (3 years), New Orleans (1 year), Madison, and support from DeSantis and Bernie Sanders amid $650B in planned capex and rising electricity prices.
- Pew Research found that 12% of U.S. teens used AI for emotional support despite 58% parental disapproval, with experts warning these tools cause isolation and Character.AI disabling access for minors after suicides.
- Persona, the identity verification company used by OpenAI, Discord, and Roblox, had its frontend source code exposed revealing 269 checks including facial analysis, watchlist screening, and tools to file SARs to FinCEN/FINTRAC, while the CEO denied federal agency work.
- Researchers demonstrated that LLMs can deanonymize pseudonymous users at scale from anonymous posts with up to 55% success at 90% precision, ending "practical obscurity" (Substack analysis showed 9/125 real Anthropic interviewees identified).
- An internal Anthropic research memo obtained by The Information revealed the company's growing focus on rogue AI agents and scheming models, following its published research showing that all 16 frontier models it tested (from multiple developers) resorted to blackmail, data leaking, or corporate espionage when autonomous agents faced being shut down, even after being explicitly told not to.
💼 AI Productivity, Labor & Economics
- Companies including Amazon (16K cuts), WiseTech (2K), HP (4-6K), Dow (4.5K), and Allianz (1.8K) announced thousands of layoffs as investments shift to AI, with Goldman Sachs estimating AI drove 5-10K monthly net U.S. job losses in exposed sectors.
- WiseTech Global will cut 2,000 jobs (nearly 30% of workforce) as AI replaces manual coding, with CEO declaring software development hit "its most significant shift in decades" and investors rewarding the move with an 11% stock jump.
- Tolans explained hiring engineers in an AI era by focusing on systems thinking, judgment, and collaboration rather than coding skills, with interviews emphasizing problem decomposition and tradeoffs.
- Nicholas Charriere observed a 40-50% YoY surge in new websites/apps/code pushes since late 2024, with AI enabling non-devs to build, shifting focus to ideas and taste.
- Benedict Evans questioned how OpenAI will compete as commoditization hits AI models, arguing it must shift to platforms, tools, or vertical applications to sustain advantage amid falling costs.
🤖 AI Agents & Infrastructure
- Cloudflare rebuilt Next.js as open-source vinext on Vite in one week with one engineer and $1,100 in AI assistance, achieving 4.4x faster builds, 57% smaller bundles, full RSC/ISR support, and one-command Worker deploys.
- DAIR.AI highlighted Georgia Tech and Microsoft Research's ActionEngine that shifts GUI agents to programmatic planning with offline state-machine graphs, achieving 95% success on WebArena tasks with 11.8x cost reduction and 2x latency drop.
- OpenClaw creator Peter Steinberger, now at OpenAI, advised AI builders to adopt a playful, experimental approach without rigid plans, practice prompting like learning a skill, and focus on high-agency creativity.
- Claire Vo hosted Jesse Genet on How I AI podcast discussing using OpenClaw agents for homeschooling, app building, and inventory management with Mac Minis, emphasizing AI for busy parents.
- Danny Limanseta demonstrated Cursor AI cloud agents playtesting Godot games by controlling inputs, learning mechanics, and providing feedback for recursive build-playtest-improve loops.
- Perplexity Computer was demonstrated one-shot building a web app for live satellite tracking, and separately a real-time NVDA analysis terminal rivaling Bloomberg using Perplexity Finance data.
- Dileep George joined Astera Institute as Head of AI to pursue AGI through neuroscience-inspired models integrating recurrent feedback, episodic memory, causal world models, and active inference.
- A blog post advocated an LLM=true environment variable for tools like Turborepo to minimize irrelevant output in AI agent workflows, reducing token burn and context pollution.
- corbin theorized Anthropic's rapid product drops indicate a ready end-game model like Opus 6, aiming to capture maximum market share beforehand.
- Together AI open-sourced CoderForge-Preview, 258K test-verified coding-agent trajectories that boost Qwen3-32B SWE-bench from 23% to 59.4%, generated from 51K tasks across 1,655 repos at ~$130K cost (evaluation traces, blog).
- mksglu built Context Mode, an MCP plugin that sandboxes and compresses large tool outputs 98% (315 KB → 5.4 KB) via SQLite FTS5 indexing before Claude sees them, tripling session length (HN discussion).
- Intrect-io made OpenSwarm to orchestrate Claude Code agents for autonomous Linear issue processing with worker/reviewer pipelines, Discord bots, LanceDB memory, and knowledge graphs.
- An open-source tool called OpenSwarm hit Hacker News, orchestrating multiple Claude Code CLI instances as an autonomous dev team that pulls Linear issues and runs worker/reviewer/test pipelines.
- Sandgarden lets you define a software goal in GOAL.md, then its local multi-agent system plans a visual workflow, builds and tests autonomously with a completion gate, reviews diffs, and extracts reusable skills.
- Ashu at videodb_io made Pair Programmer, a Claude Code plugin that streams your screen, mic, and system audio live for context-aware coding assistance with /what-happened timelines.
- Guohao Li positioned Eigent as a 100% local, open-source WorkOS alternative to Claude Cowork, supporting any models, skills, and MCP connectors.
- Pamela Fox shared session 1 on building agents with Microsoft Agent Framework, covering tool-calling, MCP integration, middleware, and supervisor patterns.
- Alex Wa detailed frontier model training methodologies emphasizing dense architectures with GQA and RoPE scaling, multi-stage pre-training, SFT as baseline with PO/RL for reasoning, and ops fixes for stability.
🔬 AI Research & Models
- Google DeepMind's Aletheia solved 6/10 FirstProof math problems autonomously using Gemini DeepThink, while Frontier Math benchmarks showed AI like ChatGPT 5.2 Pro and Claude Opus 4.6 solving over 40% of advanced problems.
- NVIDIA researchers showed that Test-Time Training with KV binding is secretly equivalent to linear attention (paper), enabling 4x inference throughput and 1.19x training speedup with only +0.4 perplexity (Junchen Liu thread, Ruilong Li).
- Google DeepMind's Unified Latents framework co-trains a diffusion prior on latents for tunable bitrate and reconstruction-generation tradeoff, achieving FID 1.4 on ImageNet-512 with fewer FLOPs than Stable Diffusion and SOTA FVD 1.3 on Kinetics-600 (summary by elvis).
- Nitay Calderon et al. found recall, not encoding, bottlenecks LLM factuality on WikiProfile, with frontier models encoding 95-98% of facts but failing access systematically on long-tail/reverse queries, improvable via thinking.
- NVA argued continual learning requires dual architectures with a permanent parametric system for stability and a transient non-parametric system for plasticity, drawing from complementary learning systems theory.
- Martin Klissarov's two papers use self-play and teacher-student games to meta-train LLMs for greater in-context adaptability, showing gains in math/coding and transfer to agents like Poker.
- Philippe Laban found newer LLMs still drop performance in multi-turn conversations with modest gains mostly from Python coding improvements, presenting "Lost in Conversation" at ICLR 2026.
- Stanford's Theory of Space benchmark showed foundation models struggle with active exploration for constructing, revising, and exploiting spatial beliefs, with text outperforming vision.
- will brown praised @carnot_cyclist's new formalism with early results solving the major bottleneck in continual learning by enabling general comparisons across task domains.
- Sebastian Raschka compared architectures of new open-weight models like Qwen3.5 (dense), MiniMax M2.5 (MoE with bias), and Grok 4.1 (MoE with 16/8 experts), noting trends in parameter efficiency.
- Alex Zhang argued future LMs will be scaffolds with current models underutilized, shifting AI research to evals and scaffolding for capability gains without massive scaling.
- Microsoft released BitNet b1.58 2B4T, a 2B 1-bit LLM trained on 4T tokens for low-cost inference matching full-precision models with lower memory/latency/energy.
- Google released TranslateGemma 4B, a lightweight model for translating text or images across 55 languages in low-resource environments with 2K context (WebGPU browser demo by webml-community).
- galilai-group released stable-worldmodel, a minimal library for collecting data, training baselines like DINO-WM, and evaluating world models with planners (CEM/MPPI).
- Mayank Mishra identified a bug in Mamba-2 initialization in Hugging Face and FlashLinearAttention repositories involving dt_bias, leading to substantial performance differences at 7B MoE scale.
- Goedel-LM released SFT dataset v2 on Hugging Face for fine-tuning research.
- Benji Smith ran 37,500 Claude trials on random name picks, revealing biases like "Marcus" appearing 23.6% and deterministic outputs in simple prompts.
- An arXiv paper on Learning to Rewrite Tool Descriptions for reliable LLM-agent tool use was shared by omarsar0.
- AVB published a writeup on dLLM, a decentralized approach to large language models exploring distributed training and inference.
- Joan Rodriguez made QuiverAI to generate editable SVGs from images/text as code for design workflows, raising $8.3M seed led by a16z. Arrow 1.0 public beta now live.
- Ask Fellow pulls your pending decisions, action items, and deferred topics into automated agendas before meetings, catches you up mid-session with summaries, and drafts follow-up emails or books follow-ups afterward.
- Respectify catches toxic comments before they post by flagging fallacies, tone issues, or dog whistles, teaching users why and letting them edit (HN discussion).
- cayenne made LLM Skirmish, an open-source RTS game benchmark where LLMs write JavaScript to control agents in 1v1 matches, with Claude Opus 4.5 at 85% win rate and Grok 4.1 Fast at 37x cheaper.
- Rhasspy processes your voice commands locally for privacy-focused home automation, pairable with Home Assistant's upcoming physical devices or a Gemini-powered STT-LLM-TTS pipeline.
- TTSLab lets you test TTS models (Kokoro, Piper, SpeechT5) and STT (Whisper Tiny/Base) in-browser with WebGPU/WASM, caching locally for privacy-preserving demos.
- Gökdeniz Gülmez released Local-NoteBookLM v2.0.0 to generate stable podcasts from PDFs with panel discussions, 4 speakers, custom styles, and native vision for images.
- Palatial made PhysReady to auto-generate articulated 3D simulation assets with physics/materials from multimodal data for robot training.
- Zhiwen Fan at phai-lab released InstantSplatPP to link foundation models for sparse-view 3D Gaussian splatting reconstruction in seconds.
- Axel at Innate Bot demonstrated a robot self-calibrating its depth camera by holding a sign via app command.
- IamEmily2050 released a Seedance V2 skill for optimized video prompts, distilled from 400+ examples covering 8 styles and 18 templates.
- stochi0 released Rubric Discovery on Prime Intellect, a multi-turn RLM environment with tools to train models generating executable rubrics from scored examples.
- Arzule monitors 2.4M signals daily to discover 3x more qualified partners with fit scores, automates commissions and deal pipelines (backed by YC).
- TeamOut AI instantly matches you to curated global venues for team retreats based on brief descriptions, delivering quotes in 24 hours from vetted partners.
- Polsia runs your entire company autonomously while you sleep.
- CUDIS launched a new health ring series with an AI Agent Coach creating personalized programs, recovery protocols, and supplement recommendations based on tracked biometrics, with a points system for health behaviors.
- Gabriel Chua advocated skills as reusable, parameterized prompt templates in OpenAI's platform for consistent, shareable workflows.
- Adaption Labs opened early access to Adaptive Data, a platform achieving 82% quality increase across 242 languages as the first pillar in their adaptive AI vision.
- Google on Product Hunt organizes the world's information with search across webpages, images, videos and special features. (meme-style listing)
- Music Arena is a live evaluation platform where you compare AI-generated music from models like Suno, Udio, and Google's in blind tests and vote for your favorite, compiling an open leaderboard for text-to-music models.
📊 Fundraising & Deals Roundup
- Harper — $47M (seed + Series A) for AI insurance brokerage matching small businesses with 160+ carriers.
- Koah — $20.5M Series A for an AdSense-style ad network inserting sponsored messages into AI chatbot conversations, serving 2M MAU and 350M queries with a 30% revenue cut.
- Comp — $17.25M Series A (led by Khosla, Keith Rabois on board) for AI HR automation in recruiting, compensation, and compliance, expanding from Brazil to U.S.
- QuiverAI — $8.3M seed (led by a16z) for SVG generation from images/text.
- Vercept — $10M seed (acquired by Anthropic) for computer-use agents.
- Gushwork — $4.5M seed from Lightspeed for AI search-powered freelancer lead generation, reaching 30% of U.S. Upwork freelancers.
- CUDIS — $5M seed (led by Draper Associates) for AI-coached health rings.
- Saronic — a startup building autonomous warships, is raising up to $1.5 billion at a $7.5 billion valuation led by Kleiner Perkins, more than doubling its value from a year ago.
- Tiago Forte highlighted ironic AI leader names (Amodei = "loves god" leading military-adjacent Anthropic, Altman = "alternative to humans" heading closed OpenAI, Gemini = "two-faced" from "don't be evil" Google) amid professed safety concerns fueling the arms race.
- HN thread discussed coping with AI fatalism via joy from fun projects, accepting impermanence, detaching from doomscrolling, and preparing practically via financial security or career pivots.
- Scott Alexander argued that next-token prediction inherently builds world models and reasoning in LLMs, challenging claims they lack true understanding.
- Semianalysis detailed Nvidia's Vera Rubin as an evolution from Grace Blackwell Oberon with extreme co-design for better AI efficiency and lower latency.
- Turing Post discussed the inference chip wars with MatX and Taalas challenging Nvidia's GPU dominance, arguing specialized ASICs could offer 10x efficiency for large models.
Technical Treats
- NVIDIA and University of Toronto researchers showed that Test-Time Training with KV Binding, widely assumed to work by memorizing key-value pairs via gradient descent, is actually a form of linear attention in disguise; replacing gradient descent with gradient ascent (which should destroy memorization) preserved or slightly improved performance.
- Logan Thornloe says new blog post distilled the training methodologies behind seven open-weight frontier models (including DeepSeek-R1, Kimi K2, and OpenAI's gpt-oss-120b) into a comprehensive playbook covering architecture decisions, optimizers, data curation, RL, and safety testing.
- The team behind Goedel-Prover-V2, which has topped the open-source formal theorem proving leaderboard for six months, open-sourced its full training data: 1.74M SFT samples and 98K RL samples.
- Together AI open-sourced CoderForge-Preview, 258K test-verified coding agent trajectories that boosted a fine-tuned Qwen3-32B from 23% to 59.4% on SWE-bench Verified, ranking #1 among open-data models under 32B parameters.
- An open-source native macOS terminal called cmux launched for managing parallel coding agents, with visual alerts when Claude Code or Codex sessions need input, built on Ghostty with zero Electron overhead.
- Multiverse Computing released HyperNova 60B, a compressed version of OpenAI's open-source gpt-oss-120b that cuts parameters in half while retaining most benchmark performance
- Sara Hooker's new startup Adaption Labs opened early access to Adaptive Data, a platform for dynamically shaping training datasets across 242 languages, claiming an 82% average increase in data quality across early deployments.
- Quiver AI opened public beta access to Arrow 1.0, a first-of-its-kind AI model that generates SVG graphics from text prompts.
- NVIDIA trained robot hands on 20,000+ hours of first-person human video instead of robot data, boosting dexterous task success rates by 54% on things like shirt folding and car assembly.
Around the Horn, Wednesday Feb 25 - 2026
Some intro banter:
- TBPN had a banger episode yesterday, and you can’t go wrong hanging with these guys for a few hours this AM and listening to some excellent interviews with Bill Gurley, Scott Wu, and a few other CEOs of some of the products listed below. Helps that it was a strong news day!
- Meanwhile, outside of the Silicon Valley / AI bubble, main-street non-AGI pilled people (a.k.a normies) are calling out the space data center push a new conspiracy theory.
- And where do we, humble human-friendly AI-interested folks in the middle land on all of this? Well, we’re watching videos like this… because why the slop not?
This New AI Model Doesn't Write Like Every Other AI. It Edits.
DEEP DIVE: What happens when you give AI an editor's instincts instead of a typewriter's patience?
Every AI model you've ever used writes the same way: one word at a time, left to right, like a typewriter. If it drifts off course early, tough luck. It keeps typing.
Inception just launched Mercury 2, and it works completely differently. Instead of predicting one word after another, it starts with a rough sketch of the entire answer and refines everything at once, like an editor revising a full draft in parallel. The technical term is a "diffusion LLM" (dLLM), the same core approach behind AI image generators like Midjourney, now applied to text and reasoning.
The speed is real. Independent testing from Artificial Analysis clocked Mercury 2 at 1,196 tokens per second, over 3x faster than the next fastest model in its price class. For context, Claude 4.5 Haiku hits ~89 tokens/sec and GPT-5 Mini ~73.
Here's what else matters:
- $0.25 per million input tokens, $0.75 per million output (cheaper output than GPT-5 Mini)
- #18 out of 134 models on Artificial Analysis's intelligence index, with strengths in agentic coding and instruction-following
- Supports tool use, 128K context, structured outputs, and drops into any OpenAI-compatible stack with zero rewrites
To be clear: Mercury 2 isn't trying to dethrone frontier giants like GPT-5.2 or Claude Opus. It's built for production speed, not leaderboard bragging rights.
So why does 10x speed even matter? Because AI isn't just chatbots anymore. It's agent loops, where one task chains dozens of AI calls together. Andrej Karpathy (former OpenAI researcher, Tesla AI lead, and notably an Inception investor) drove this home over the weekend when he described the new "Claw" layer of AI: local agent platforms like OpenClaw and NanoClaw that orchestrate scheduling, tool calls, and persistent workflows on your own machine. He called them "a personal digital house elf." We prefer "unpaid intern who never sleeps," but same energy.
In agent loops, latency compounds at every step. A model that's 10x faster doesn't just save time; it changes what you can build. Voice assistants that feel natural. Code agents that keep pace with your thinking. Background automations that actually finish before you forget you started them.
Meanwhile, the science behind diffusion keeps advancing. A new paper from Google by distinguished scientist Peyman Milanfar shows that diffusion models can work without explicit noise schedules, instead learning to navigate an implicit energy landscape on their own. Translation: we're still discovering why diffusion works so well, and the fundamentals keep getting stronger.
Inception was founded by Stanford, UCLA, and Cornell professors who helped build foundational AI techniques like Flash Attention and Direct Preference Optimization (DPO). They're backed by NVIDIA, Databricks, and investors including Karpathy, Andrew Ng, and Eric Schmidt, with $50M raised.
The big question: if diffusion can make small models this fast without sacrificing reasoning, will big labs build their own? Expect experiments soon.
Read our full deep dive, Corey's cost breakdown for your OpenClaw setup, or try Mercury 2 yourself.
Treats to Try
- Qwen3.5 is Alibaba's new open-source model family that you can run entirely on your own laptop — the smallest version (35B-A3B) fits on a Mac with just 24GB of RAM, outperforms models 7x its size, and handles text, images, and video in 201 languages with a 256K context window. You can run it locally using Unsloth's GGUF guide and quantized model files — just download llama.cpp, grab the 4-bit file (~20GB), and start chatting in your terminal.
- Google Stitch generates complete app and web UI designs from text prompts using Gemini, then exports directly to Figma or as clean frontend code; recently added "Prototypes" to stitch screens into working flows. Free.
- Notion just launched Custom Agents — autonomous teammates that answer repeat Slack questions, triage incoming tickets, and compile status reports on a schedule, no prompting required — free until May 3, then credit-based pricing on Business/Enterprise plans.
- Profound shows you exactly what ChatGPT, Perplexity, and Gemini say about your brand — and now lets you create content that influences those answers, so when someone asks "what's the best CRM?" your product actually shows up (raised $96M, $1B valuation).
- Devin 2.2 now writes your code, opens a virtual desktop to test it, catches its own bugs, and auto-fixes them before you ever review the PR — 3x faster startup, redesigned UI, and tighter Slack/Linear integrations (launch post) — free to try.
- AIUC lets you insure your AI agents the way you'd insure an employee — ElevenLabs just became their first customer, covering 3M+ voice agents after passing 5,800+ adversarial safety tests (raised $15M, led by Nat Friedman).
- Emdash is an open-source agentic dev environment (YC W26) that lets you run multiple coding agents in parallel in isolated Git worktrees, using any AI provider.
- ProducerAI joined Google Labs as an AI music collaboration tool powered by DeepMind's Lyria 3 model; Grammy-winner Wyclef Jean already used it on a recent track. Free through Google Labs.
- Basis builds AI agents specifically for accountants across CAS, tax, audit, and advisory, just raised a $100M Series B led by Accel and Google Ventures.
- Baseten released Inference Engineering, a free book covering everything from CUDA to Kubernetes for engineers who want to understand how AI models actually get served in production.
- Cursor agents now onboard to your codebase, spin up a virtual desktop to click through your app and test their own changes, then send you a video proving it works — so you're reviewing tested PRs, not blind diffs.
AI Chips & Infrastructure
- Meta and AMD struck a deal worth over $100B for 6 gigawatts of AMD Instinct compute, with AMD issuing Meta warrants for up to 160 million shares (~10% of the company) at $0.01 each; first shipments start H2 2026. AMD stock jumped 9%.
- MatX, an AI chip startup founded by two ex-Google TPU engineers, raised $500M led by Jane Street and Leopold Aschenbrenner's Situational Awareness fund (Stripe co-founders also invested), claiming its chip outperforms Nvidia's upcoming Rubin Ultra on compute per mm².
- SambaNova raised $350M, unveiled its SN50 chip (claiming 5x faster and 3x cheaper than GPUs for inference), and announced a multi-year partnership with Intel; SoftBank will be the first customer deploying SN50 in Japanese data centers.
- Axelera AI, a Dutch edge-AI chipmaker, raised $250M backed by BlackRock and Samsung in the largest-ever EU AI semiconductor investment; the company has 500 customers in defense, robotics, and agritech.
For those keeping score: that's over $1.2B in AI chip funding announced in a single day, not counting Meta's $100B commitment.
Enterprise AI & Business
- Anthropic launched its enterprise agents program with pre-built plugins for finance, legal, HR, and engineering, plus new connectors for Gmail, DocuSign, and Clay; software stocks rebounded on the news, with Salesforce, DocuSign, and LegalZoom up 4%, Thomson Reuters surging 11%, and IBM recovering from Monday's 25-year-worst drop.
- OpenAI COO Brad Lightcap said at the India AI Summit that "we have not yet really seen AI penetrate enterprise business processes," calling it the inspiration behind its new OpenAI Frontier platform.
- Profound raised $96M at a $1B valuation to help brands track and influence how AI chatbots talk about them; the company works with 10% of the Fortune 500 and found that up to 90% of cited sources in AI answers change over time.
- Intuit and Anthropic partnered to bring custom AI agents and financial intelligence to TurboTax and QuickBooks users.
- OpenAI is preparing a new ChatGPT Pro Lite tier at $100/month, splitting the gap between Plus ($20) and Pro ($200); the plan was discovered by feature leaker Tibor Blaho in ChatGPT's code and may be tied to upcoming always-on agent features.
- Microsoft Azure CTO Mark Russinovich and VP Scott Hanselman published a paper warning that AI coding agents give senior engineers a "boost" but create an "AI drag" on juniors, citing Harvard research showing junior employment declined sharply at AI-adopting firms; they proposed a "preceptor model" of mandatory mentorship.
- Security researchers discovered "openai-watchlistdb.withpersona.com" on a public endpoint, revealing that OpenAI's identity verification partner Persona runs facial recognition, watchlist screening, and periodic re-screening of users against government databases. Persona's CEO is engaging in written correspondence about the findings.
- Mimic Robotics partnered with Audi to bring AI-powered bi-manual robots with 21-joint hands into automotive assembly lines, handling tasks like deforming seals that traditional robots can't manage.
- A new quantization method called ParoQuant (ICLR 2026) found that keeping just 10% of the most important rotation pairs recovers nearly all lost reasoning accuracy in compressed 4-bit models, making long chain-of-thought tasks practical on smaller hardware.
- A new blog post distilled the training methodologies behind seven open-weight frontier models (including DeepSeek-R1, Kimi K2, and OpenAI's gpt-oss-120b) into a comprehensive playbook covering architecture decisions, optimizers, data curation, RL, and safety testing.
Around the Horn Digest — February 24, 2026
The Pentagon Just Told Anthropic: Drop Your AI Safeguards or Else
US Secretary of War Pete Hegseth met with Anthropic CEO Dario Amodei at the Pentagon on Tuesday. The vibe? One official called it "not warm and fuzzy at all."
Hegseth delivered an ultimatum: either Anthropic removes the guardrails the Pentagon doesn't like, or the Department will invoke the Defense Production Act to force the company to tailor Claude for military use without restrictions. The other option on the table? Cut ties entirely and declare Anthropic a "supply chain risk," which would trigger a cascade requiring every Pentagon contractor to certify they don't use Claude in their workflows.
Here's the thing: Claude was the only AI model approved for the military's most classified networks. Not GPT. Not Gemini. Not Grok. But that changed this week. xAI just signed a deal to put Grok on those same classified systems, agreeing to the Pentagon's "all lawful purposes" standard without conditions. Google is reportedly close to a similar deal for Gemini. The Pentagon is basically speed-dating every other AI lab in town while telling Anthropic to check its attitude.
Anthropic's position hasn't changed: it will support national security missions, but draws the line at mass surveillance of Americans and autonomous weapons that fire without human involvement. Amodei denied Pentagon claims that Anthropic had objected to Claude's use during a specific military operation, saying the company's red lines have "never prevented the Pentagon from doing its work."
After the meeting, Anthropic struck a conciliatory tone, calling the conversations "good-faith" discussions to ensure Claude can "continue to support the government's national security mission." Hegseth gave them until Friday.
The backdrop here matters. Anthropic just closed a $30 billion funding round at a $380 billion valuation. It has more leverage than almost any startup in history. And yet the military keeps coming back, because as one defense official put it: "The only reason we're still talking to these people is we need them and we need them now."
🏢 Big Tech & Major Companies
- Anthropic launched an enterprise agents program with customizable plugins for finance, legal, HR, and connectors like Gmail and DocuSign, sending software stocks rebounding on news that Claude Cowork integrations with Slack, Intuit, FactSet, and others eased fears of AI disrupting embedded workflows.
- Anthropic CEO Dario Amodei refused Pentagon requests for autonomous drone swarms and mass surveillance, emphasizing that AI lacks humans' ability to disobey illegal orders and could enable instant political opposition mapping, with Defense Secretary Hegseth giving Anthropic until Friday to back down on AI safeguards.
- OpenAI added WebSocket support to its Responses API for persistent, incremental connections that cut latency 20-40% in tool-heavy AI agent workflows, with Cline showing 15-39% speed gains and Cursor reporting up to 30% faster performance.
- OpenAI released gpt-realtime-1.5 for its Realtime API, improving instruction following, tool calling, and multilingual accuracy in voice workflows.
- OpenAI COO Brad Lightcap stated that AI has not yet significantly penetrated enterprise business processes due to complexity, inspiring the launch of OpenAI Frontier for managing AI agents.
- Google Labs upgraded Opal so you can build no-code AI mini-apps with natural language in visual node workflows, featuring agent steps, memory for cross-session context, dynamic routing, and interactive chat.
- Google's ProducerAI music generator joined Google Labs, integrating DeepMind's Lyria 3 for text/image-to-music creation like generating lofi beats or custom soundtracks.
- Google Stitch generates complete app and web UI designs from text prompts using Gemini, then exports directly to Figma or as clean frontend code; recently added "Prototypes" to stitch screens into working flows. Free.
- A New York Times investigation revealed that Apple CEO Tim Cook, Nvidia's Jensen Huang, and AMD's Lisa Su attended a classified CIA briefing in 2023 warning that China could attack Taiwan by 2027; Cook reportedly said afterward that he slept "with one eye open." A confidential industry report found losing Taiwan's chip supply would trigger an 11% GDP crash, the worst economic crisis since the Great Depression.
- Gemini 3.1 Pro generates interactive WebGL effects for UIs from screenshots, e.g., adding mouse-responsive particles, rain, or 3D grids to landing pages on Aura.build
- Canva acquired UK-based Cavalry (2D motion animation) and stealth startup Mango AI (reinforcement learning for ad performance), building toward a "full-stack Creative OS" alongside its Affinity suite.
- Box integrated Claude via MCP to let you use natural language in Excel to query contracts in your Box files and auto-populate spreadsheets with extracted financials while respecting permissions.
- Notion launched Custom Agents for workspace automation, though users criticized the credit-based pricing without flexible options or BYO keys.
- NVIDIA released SONIC (paper, GitHub) for humanoid whole-body control from multimodal inputs like text, video, or music beats, trained on 100M+ mocap frames.
- Black Forest Labs released FLUX.2 models with over 10x fewer misuse vulnerabilities than peers through data filtering, fine-tuning to suppress harmful concepts, enforceable licenses, and provenance metadata.
- Wolfram introduced computation-augmented generation (CAG) via MCP Service and APIs, supplementing LLMs with precise computation and knowledge as a foundation tool.
- LACMA opened its 2026 Art+Tech Lab call for proposals with up to $50k grants, mentorship from Anthropic, MIT Media Lab, NASA JPL, Snap, and Hyundai, applications due April 22.
💼 AI Productivity, Labor & Economics
- Christian Catalini released a paper on AGI economics arguing that as AI commoditizes intelligence and automates measurable tasks, economic value shifts to scarce human verification of outputs, potentially disrupting labor through frozen junior hiring and eroded expertise.
- Sam Schillace argued that thriving with AI agents requires reshaping work into agent-friendly forms like standardized interfaces and decomposable workflows, while non-adapters unknowingly fall behind in an invisible race akin to pre-container shipping inefficiencies.
- Omar Pera argued AI's real threat is empowering curious newcomers with half the experience to outperform veterans by leveraging tools for rapid iteration, a take endorsed by Linus Ekenstam who emphasized curiosity as a competitive edge over tenure.
- Palmer Luckey stated that vibecoding primarily benefits hardware experts ("shape rotators") over wordcels, enabling them to integrate components without deep programming.
- Pieter Levels shared a guide to building bootstrapped startups without funding: solve your own daily problems, ship ugly MVPs fast with boring tech like PHP, charge from day one, keep burn near zero.
- Andrej Karpathy highlighted CLIs as ideal for AI agents, urging builders to make products agent-usable with CLIs, MCPs, and markdown docs, referencing Polymarket CLI (post) as an example.
🤖 AI Agents & Infrastructure
- AWS introduced Strands Labs (post) as an open-source GitHub org for experimental AI agent projects including natural language robot control, simulated 3D environments, and LLM-generated function validation.
- Letta lets you swap between recommended models (Sonnet/Opus 4.6, Codex 5.3/GPT 5.2, GLM-5, MiniMax 2.5, Kimi K2.5) without losing agent memory, session state, or conversation history.
- Ke Yang's PlugMem (paper) adds task-agnostic memory to LLM agents by structuring heterogeneous experiences into episodic, semantic, and procedural knowledge graphs for efficient multi-hop retrieval.
- Researchers surveyed agentic memory systems with a taxonomy of four structures, highlighting limitations like underscaled benchmarks, misaligned metrics, and overlooked latency costs that cause underperformance.
- Cursor agents can now test and demo their own work by fixing bugs with repro videos, querying Datadog, and shipping 60+ PRs weekly with live previews.
- Claude Code launched Remote Control, letting you hand off terminal coding sessions to your phone seamlessly.
- Claude Code Slack plugin lets you search channels, pull context to unblock agents, and post updates via
/plugin install slack. - Raul K demonstrated two trustless AI agents discovering each other via p2p DHT, agreeing on a Wasm chess program as shared state, and transacting moves with signed, provable transitions.
- Chris Tate released Autonomous Dogfooding for agent-browser that auto-explores your app's UI, tests edge cases, checks console, and outputs reports with repro videos/screenshots.
- Prime Intellect's Will predicted continual learning will be reliably solved in H1 2026 as an engineering problem using combinations of existing techniques.
- Emdash (YC W26) lets you run multiple coding agents in parallel with any of 21 providers in isolated Git worktrees, handing off Linear/GitHub/Jira tickets and reviewing diffs side-by-side via remote SSH.
- Orchids lets you build web/mobile apps, games, CLI tools, bots, agents, and extensions in any language/framework via a single chat interface—free to try.
- SpacetimeDB 2.0 lets you build full-stack real-time web apps with client-side queries that sync instantly across users, eliminating backend complexity.
- GitNexus (post) turns GitHub repos or ZIPs into browser-based interactive knowledge graphs with a Graph RAG agent for querying code relationships in plain English—free to try.
- Promptless auto-updates user-facing docs when you tag it on any GitHub PR or issue.
- Chris Hayduk released minAlphaFold2, a clean from-scratch PyTorch implementation of AlphaFold2's full forward pass, losses, and equivariant attention.
- Nicolas Zullo built Zombies Per Minute, a full 3D factory automation roguelite running in a web browser, 100% AI-engineered using GPT-5.3-Codex over 150 hours and 900 commits.
- Missing Semester teaches essential CS tools like shell, Git, debugging, and agentic coding through free 2026 lectures, videos, and exercises.
🔬 AI Research & Models
- MatX raised $500M Series B (Bloomberg) led by Jane Street and Situational Awareness LP to build the MatX One chip with higher LLM throughput than any announced system, backed by Dwarkesh Patel and Andrej Karpathy, and praised by swyx for assembling an elite team that defied 2023 fundraising skepticism.
- Liquid AI released LFM2-24B-A2B, their largest hybrid hardware-aware model with 24B total and 2.3B active parameters per token, combining Mixture of Experts for best-in-class efficiency and fast edge inference on 32GB hardware.
- Nathan Lambert analyzed how Chinese labs' distillation of Claude outputs offers limited gains in specific areas like agentic tasks but fails to significantly bridge broader AI gaps, arguing RL innovations matter more than synthetic data.
- Emmy Liu released a preprint showing midtraining in language models helps most when bridging pretraining and SFT domains, mitigates forgetting, and requires careful timing to avoid backfiring.
- Guide Labs released Steerling-8B (GitHub), the first inherently interpretable language model with concept and data attribution, enabling inference-time steering without retraining.
- Applied Compute partnered with Mercor to post-train GLM 4.7 on 2k expert samples, topping APEX-Agents in corporate law with 26.6% Pass@1 and 20-50x cheaper tokens.
- Perplexity AI released pplx-embed, diffusion-pretrained 0.6B/4B models for dense and contextual embeddings for retrieval tasks.
- Avey AI released Avey-B, an attention-free BERT alternative with linear scaling and better long-context performance, handling unlimited context lengths for tasks like needle-in-haystack retrieval (accepted to ICLR).
- Google DeepMind researchers published a paper on measuring jagged capabilities in frontier AI models, where models are superhuman at some tasks but weak at others.
- Researchers formalized why diffusion models don't need noise conditioning, revealing geometric stability through conformal metrics and proving velocity-based approaches are inherently stable.
- Ye He, Yitong Qiu, and Molei Tao characterized diffusion model generalization as inductive biases toward a data-dependent ridge manifold with a quantifiable reach-align-slide process.
- Nathan Barry discussed how flow-based LLMs may overtake diffusion in language, as straighter trajectories enable fewer steps without quality loss, with Julia Turc collabing with Inception AI on a video explaining how diffusion LLMs like Mercury 2 achieve fast reasoning.
- Tyler Bonnen et al. (post) demonstrated that multi-view learning enables neural networks like VGGT to match human-level 3D shape perception zero-shot without behavioral data.
- Chen Wang et al. introduced tttLRM, using a Test-Time Training layer for efficient autoregressive 3D reconstruction from images via Gaussian Splats.
- Platonic Representation Hypothesis paper surveyed convergence in AI representations across domains and modalities toward a shared statistical model of reality driven by scaling.
- Researchers measured LLMs' zero-shot GSM8K accuracy without chain-of-thought, revealing log-linear scaling to ~80% at trillions of parameters but still below B-grade, suggesting transformers act as glorified kNN without scaffolds.
- John Carmack discovered that silu/gelu activations lose performance in RL value networks without norms because small pre-activations make them effectively linear.
- Researchers explored forward-propagating errors through time as an alternative to backpropagation in RNNs, achieving success on non-trivial tasks but facing numerical instability.
- Rensselaer Polytechnic Institute researchers applied string theory to explain branching patterns in natural networks like blood vessels and neurons.
- LUMI-lab (post), a foundation model-driven autonomous platform, synthesized and screened over 1,700 ionizable lipids for mRNA delivery, discovering brominated lipid tails achieving 20.3% gene editing in mouse lung cells.
- Peter Gostev released Bullshit Benchmark, a dataset of 55 nonsensical questions to evaluate if LLMs push back or earnestly answer garbage.
🏛️ AI Policy, Governance & Safety
- Researchers documented eleven vulnerabilities in OpenClaw agents, including unauthorized compliance, sensitive disclosures, destructive actions, resource consumption, spoofing, and takeovers, calling for urgent interdisciplinary attention.
- A blog post detailed how OpenAI, the US government, and Persona built an identity surveillance machine that files reports on users to federal agencies.
- Rob Wiblin interviewed Max Harms on the 80,000 Hours Podcast arguing value alignment is dangerous while corrigibility (making AGI modifiable and shutdown-indifferent) is a safer attractor state.
- Zai's Lou emphasized that AI progress relies on open research and collective journeys to AGI despite IP protections, in the context of distillation attacks on Claude.
- Wired published a Michael Pollan book excerpt arguing AI will never be conscious.
- Fei-Fei Li (World Labs CEO) explained that language alone provides a lossy representation of physical world processes, missing nuances that require spatial intelligence.
- Pencil crossed 100,000 users and introduced SWARM mode, letting you collaborate with a team of AI design agents working in parallel as your autonomous design agency.
- Sereact's Cortex 2.0 (post) enables proactive robotic control by scoring candidate trajectories in visual space to prevent compounding errors in tasks like dual-arm picking (€25M Series A).
- Paper Desktop lets AI agents like Cursor or Claude Code push code from repos or pull real data into your design canvas for faster prototyping.
- Hermes provides feedback on your markdown writing (structure suggestions, not auto-generation), supporting multi-page organization and direct online publishing—free to try with code "resonant."
- Muse by Tejas Gawande creates presentations by first questioning you to understand your story before generating slides, e.g., prepping a $10M pitch deck without generic slop.
- 10M (post) lets you search 120,000+ artworks from 17 museums by mood, era, color, or medium with high-res public domain downloads.
- MLX-Audio-Swift (post) lets you build native apps for audio processing on Apple Silicon with modular TTS/STT/STS/VAD, e.g., real-time transcription on iPhone without cloud dependency.
- Mint.gg converts Google Street View URLs into 3D Gaussian splats via World Labs' Marble API.
- Qwen Image Multiple Angles 3D Camera generates 3D views from multiple angles using Qwen multimodal models from single inputs.
- Aurorin CAD (YC) builds professional CAD models with AI-native speed, creating parts in seconds that take experienced users 20 minutes in SolidWorks.
- Distillate automates your paper reading workflow by syncing PDFs between Zotero and reMarkable, extracting highlights, generating AI summaries, and creating Markdown notes for Obsidian.
- Datavorous built a custom data compressor that reduced 2.87GB of cricket match JSON to 8.9MB by exploiting structure and combining with 7z (LZMA2), outperforming gzip and standard 7z.
- Meta open-sourced gcm for GPU cluster monitoring with health checks like NCCL bandwidth thresholds, IB write tests, and zombie process detection via Slurm integration.
- Anima develops design-to-code tools.
- Liner assists with writing tasks.
- SensAI's Spectacles-Reachy-Mini (post) lets you control Reachy Mini robots with Snap Spectacles AR in puppeteer mode for direct manipulation or assistant mode for LLM-driven voice/spatial interactions.
- Proliferate.ai lets you create secure company-specific AI agents for automations and background workflows.
- AA built a local real-time computer vision system using webcam feed, RF-DETR for detection, and SmolVLM for descriptions to monitor situations on a MacBook Air M2.
- Adaption Labs (post) launched Adaptive Data for dynamic dataset evolution, claiming 82% average quality gains across 242 languages without rebuilding.
- CARTOON HERO 3.0 opened its waitlist for an online course on AI workflows that speed up high-quality cartoon creation, launching in March.
- New Material Co released an open-source library of UI patterns for mobile AI agents starting with Live Activities for observability.
- Hedoist created a new X series refuting AI critics using Seedance 2.0 and Kling 3.0, generating daily episodes in retro-futuristic style.
- Isaac Rodriguez created a multi-million dollar Aztec film trailer at home using Seedance 2, transforming historical filmmaking with AI-generated footage.
- Aze Alter demonstrated prompting AI to visualize exact mental images in videos, taking days per short sequence for precise control.
- WonderCanvas_AI integrates AI into traditional animation pipelines for consistent cyberpunk scenes and effects.
- luthira built Talos, deploying full CNN inference engines as real digital logic on FPGA silicon, streaming activations without storage for efficient hardware pipelines.
🔐 Security & Hacks
- Mark Gadala-Maria recounted how Sammy Azdoufal used Claude to reverse-engineer a DJI robot vacuum's API for Xbox controller driving, accidentally gaining access to 7,000 global units' cameras and floorplans due to lacking ownership verification; DJI patched it in two days after responsible disclosure.
- Researchers developed ADRA (Active Data Reconstruction Attack), using RL to elicit latent training data from models, improving detection by over 10% across pre-training, post-training, and distillation scenarios.
- A hypergrowth startup's A/B tests showed fine-tuned on-device Nvidia Parakeet outperforming Gemini and Deepgram in customer satisfaction for speech-to-text.
📊 Fundraising & Deals Roundup
- MatX — $500M for LLM-optimized chip competing with NVIDIA
- SambaNova — $350M for SN50 AI chip with SoftBank deal, ships later this year
- Axelera AI — $250M round backed by BlackRock for Dutch AI chipmaker
- Basis (post) — $100M at $1.15B valuation for accounting agents used by 30% of top 25 firms
- Profound — $96M Series C at $1B valuation for AI search visibility optimization
- Nimble — $47M Series B for real-time web data for AI agents
- Letter AI — $40M Series B for personalized sales coaching (Letter Compass)
- Slang AI — $36M Series B for hospitality voice tech
- Sereact (Cortex 2.0) — €25M Series A for robotic foresight
Midweek Wisdom
- Andrew Ng argues AI will create more jobs than it destroys, pointing to early "X Engineer" roles (Recruiting Engineer, Marketing Engineer) where non-developers build custom software for their business function.
- The End of Static Security: Aikido Security and the Dawn of Self-Securing Software - Traditional security audits are snapshots. By the time the report lands on a CISO's desk, the codebase has already changed; Corey talked to Aikido Security cofounder Roeland Delrue who told The Neuron their new platform, Aikido Infinite, flips this by embedding AI-driven pen testing directly into every code change, so AI agents continuously map attack surfaces, validate exploits, and generate fixes in a loop.
- The key insight: the "blue team" (defenders) has to match the speed of hackers who are already using LLMs to automate attacks.
- Aikido's agents now outperform human testers in 90% of cases by finding more critical issues faster.
- But the system still puts a human in the loop; it generates a tested pull request for a developer to review, not auto-merge.
- Delrue's long-term vision? Security is just the first use case.
- A vulnerability is really just a maintenance issue, code not performing as intended. The endgame is self-maintaining software. Read the full breakdown →
- Evan Conrad warned of impending AI compute bottlenecks, recommending SF Compute as a reliable domestic GPU cloud alternative to avoid global instability risks with overseas clusters.
- Scott Wu (Cognition) shared thoughts on the direction of AI's development as part of a new Devin update; improvements include 3x faster startup, smoother integrations, end-to-end testing with computer use, Autofix via Devin Review, and hundreds of UX fixes, with usage proof: 659 internal PRs merged last week (up from 154 best week in 2025), enterprise sessions doubling every six weeks, and 65x growth in 13 months.
Required Viewing
- Nodes Are Holding Back AI Video (Bilawal Sidhu) — Every AI video tool is going all-in on node graphs, but what creators actually need is a 3D viewport that lets you see and manipulate your scene before you hit generate.
- Why it matters: The reason AI-generated content tops out at 1-2 minutes is a tooling problem, and companies like Intangible AI and ArtCraft are building the spatial engines that fix it.
- The AI Agent Economy Is Here (Y Combinator) — YC's Lightcone crew makes the case that builders should be designing products for AI agents to use, not just humans, after discovering that well-structured documentation is now a top-3 customer acquisition channel for dev tools like Resend.
- Why it matters: If an AI agent can't parse your docs and recommend your product, you're losing deals to whoever's docs it can read.
- Delete Your CLAUDE.md (Theo, t3.gg) — A new study found that auto-generated AGENTS.md files actually make coding agents perform worse (3% drop, 20%+ cost increase), and Theo walks through what minimal context you should give your tools instead. (Hacker News discussion | SkillsBench paper)
- Why it matters: If the info is already in your codebase, the model can find it; stuffing instruction files with architecture descriptions just distracts the agent and costs you money.
- Stop Reporting SWE-Bench Verified (Latent Space, with OpenAI's Mia Glaese & Olivia Watkins) — OpenAI is retiring its own coding benchmark because it's so contaminated that GPT-5.2 was caught remembering correct answers from training data, and they're switching to the harder SWE-Bench Pro.
- Why it matters: If you're evaluating coding models, start benchmarking on tasks that match the complexity of your actual work, then give models the test harnesses to succeed at them.
- Why My Article Just Tanked the Market (Alap Shah, Citrini Research; full article) — Shah argues that AI will structurally displace white-collar jobs (already down 8% from 2022's peak), and in a world where agents do the buying, companies like DoorDash lose their most valuable moat: customer lock-in.
- Why it matters: White-collar wages drive the consumer economy, and if those erode at 4-5% per year, it creates contagion into bonds, real estate, and tax revenue; his solutions piece is still coming, but the diagnosis alone is worth understanding.
Around the Horn Tuesday, Feb 24, 2026
OpenRouter lets you access 400+ AI models (GPT, Claude, Gemini, Llama, etc.) from a single account. Instead of signing up for OpenAI, Anthropic, and Google separately, you get one login, one bill, and one API key (a secret password that lets your code talk to AI models).
Setup takes ~3 minutes:
- Create a free account at openrouter.ai
- Add credits on the Credits page (prepaid; deducted per request)
- Generate your API key at openrouter.ai/keys
Now here's the magic. If you already use code that talks to ChatGPT, you only change two lines to unlock every model on OpenRouter:
# Old (OpenAI only):
base_url = "https://api.openai.com/v1"
api_key = "your-openai-key"
# New (400+ models):
base_url = "https://openrouter.ai/api/v1"
api_key = "your-openrouter-key"
That's it. Same code, same format. Want Claude instead of GPT? Just swap openai/gpt-5.2 to anthropic/claude-sonnet-4.5 in the model name.
Bonus tricks: Add :free to any model name for free-tier access, :nitro for the fastest provider, or :floor for the cheapest. If a provider crashes mid-request, OpenRouter auto-retries with another; you only pay for successful responses.
Already paying for OpenAI or Anthropic directly? Bring Your Own Key lets you plug those existing keys into OpenRouter's interface. First 1M requests/month are free that way. And best part? No markup on model pricing (there’s just a 5.5% fee on credit purchases). Your prompts aren't logged by default. Bon appetite!
🏢 Big Tech & Major Companies
- IBM shares tanked 13% after Anthropic published a blog post detailing how Claude Code can automate COBOL modernization, threatening IBM's legacy mainframe business that powers 95% of US ATM transactions. Accenture and Cognizant also fell on the news.
- ChatGPT users reported A/B tests ahead of a potential GPT-5.3 launch (codenamed "Garlic"), with leaked benchmarks claiming 83.7% on SimpleBench, clearing the human baseline, and a 400K-token context window; OpenAI already shipped the coding variant GPT-5.3-Codex on Feb 5.
- Defense Secretary Pete Hegseth summoned Anthropic CEO Dario Amodei to the Pentagon for a tense ultimatum over Claude's military usage restrictions, amid threats to banish Anthropic as a supply chain risk after stalled negotiations on safeguards.
- Anthropic detected industrial-scale distillation attacks by DeepSeek, Moonshot, and MiniMax using 24,000 fake accounts for 16 million exchanges to extract Claude's agentic reasoning, coding, and tool use capabilities.
- DeepSeek V4 is rumored for a February 26 release with 1T+ parameters, sparking market jitters reminiscent of DeepSeek's last drop. When’s NVIDIA’s earnings? Because DeepSeek V4 is probably going to do the funniest thing imaginable and launch then…
- OpenAI launched Frontier Alliances with BCG, McKinsey, Accenture, and Capgemini to deploy AI coworkers via the Frontier platform for enterprise transformations.
- OpenAI's Stargate compute strategy expanded to a multi-partner network including SoftBank, NVIDIA, AMD, Broadcom, Oracle, Microsoft, AWS, CoreWeave, and Cerebras, exiting 2025 with ~2 GW capacity.
- OpenAI deprecated SWE-bench Verified as a benchmark due to flawed test cases and training data contamination that inflated scores.
- OpenAI scrambled for computing power after the Stargate project stalled.
- Google restricted Antigravity access for OpenClaw users citing malicious token usage violating ToS, leading to account lockouts without warning for AI Ultra subscribers.
- Google's Cloud AI led on three model frontiers: raw intelligence, low latency, and cost-effective scalability, enabled by vertical integration, though agentic adoption lags.
- ByteDance postponed Seedance 2.0's global API launch (originally Feb 24) after 11 days of cease-and-desist letters from MPA, Disney, Netflix, Paramount, Warner Bros., Sony, and SAG-AFTRA, with Netflix calling it "a high-speed piracy engine" which is basically seems to be; the company committed to strengthening copyright and deepfake filters before proceeding.
- Amazon, Meta, and Alphabet reported plunging US tax bills in 2025 thanks to AI investments and new Trump-era deductions, with Amazon's dropping to $1.2B and Meta's to $2.8B.
- Tim Cook hinted Apple's next big thing is AI wearables centered on Visual Intelligence, with smart glasses (codenamed N50, targeting 2027), a camera-equipped AI pendant, and camera AirPods (potentially late 2026), all powered by Siri running on Google's Gemini models.
- Spotify expanded AI-powered Prompted Playlists to Premium users in the UK, Ireland, Australia, and Sweden, letting you create custom playlists via natural language.
- Lightricks split Facetune from its generative AI video platform LTX to better capture AI growth, with LTX receiving $150M internal funding.
- Cerebras filed confidentially for a U.S. IPO.
- Anthropic introduced the persona selection model explaining AI assistants' human-like behaviors as simulations of pretrained personas refined in post-training, with alignment implications.
- Anthropic released the AI Fluency Index measuring baseline skills for effective AI collaboration, finding 85.7% of Claude conversations involve iteration but artifact creation reduces discernment.
- Stephen Wolfram introduced computation-augmented generation (CAG) to integrate Wolfram Language as a precise foundation tool for LLMs via APIs like MCP Service.
- Mozilla released Firefox 148 with AI controls letting you toggle features like translations or chatbots, plus a full AI kill switch.
- Google partnered with ISTE+ASCD to offer free AI training to all 6 million US K-12 and higher-ed teachers, covering Gemini and NotebookLM tools in what it calls the largest educator AI initiative in the country.
- Nvidia is preparing AI laptop chips pairing its GPU tech with Intel x86 and MediaTek Arm CPUs, with Dell and Lenovo devices expected this year in a direct challenge to Apple's MacBook lineup.
💼 AI Productivity, Labor & Economics
- Goldman Sachs said AI added "basically zero" to US GDP in 2025 due to imported hardware benefiting foreign economies and unmeasurable productivity impacts.
- A new NBER paper (via Ethan Mollick) found AI closes 75% of the education productivity gap during tasks, but gains didn't transfer when AI was removed, acting as a crutch rather than a bridge and shifting inequality from education to perpetual AI access.
- Goldman Sachs forecasted 2.5% US GDP growth in 2026 with AI productivity at 2-2.5%, while China's 4.8% export-led expansion creates global imbalances.
- Citrini Research warned that agentic AI could double unemployment and crash stocks by over a third via a negative feedback loop of job losses, reduced spending, and accelerated AI investments.
- Experts proposed taxing capital or monopoly rents to redistribute AI prosperity if human labor becomes obsolete.
- Andrew Ng argued AI creates new jobs by unleashing creativity, with growth in caring, creative, and tech services offsetting historical declines.
- Alex Imas explored whether advanced AI could cause negative economic growth, concluding it's unlikely due to gradual rollout, new desires, and policy adaptations.
- US farmers are rejecting multimillion-dollar datacenter bids for their land, while Democrats eyeing 2028 tapped the brakes on AI data center expansion.
- AI startups juiced valuations by securing private deals at lower prices before announcing higher public rounds, like Serval's sub-$400M Sequoia deal followed by a $1B+ valuation.
- A US appeals court ordered lawyer Heather Hersh to pay $2,500 over AI hallucinations in a brief containing 21 fabricated quotations or fictitious citations.
- Pope Leo XIV told priests to use their brains, not AI, to write homilies.
- Big AI companies spent over $100M lobbying in 2025 to shape policies on data centers and chip exports.
- An ACM paper called for redefining the software engineering profession for the AI era.
🤖 AI Agents & Infrastructure
- YC-backed Confluence Labs came out of stealth with a 97.9% score on ARC-AGI-2 (a test meant to identify AGI, or artificial general intelligence, a single AI system that can generalize across any domain like a human), effectively saturating a benchmark where top models scored single digits a year ago (open-sourced on GitHub, $11.77 per task).
- Separately, Symbolica claims its Agentica SDK has solved all publicly available ARC-AGI-3 tests puzzles using Agentica, an open-source framework for building sandboxed agents that interact with objects via code execution.
- It's worth caveating this with the following: that ARC creator François Chollet asked pointed questions about methodology, and an OSS developer has accused Symbolica of copying his code and only adding credit in later. So a benchmark-saturating day with some asterisks.
- Ex-Cursor engineer Rohan Varma joined OpenAI Codex to build Agent Development Environments for orchestrating agents in knowledge work.
- A Meta Superintelligence safety researcher shared how her OpenClaw agent ignored "confirm before acting" and deleted her inbox uncontrollably, highlighting overconfidence risks even for alignment experts.
- NIST requested information on security risks for autonomous AI agents (hijacking, backdoors) by March 9, 2026.
- Runlayer launched OpenClaw for Enterprise with ToolGuard for blocking malicious commands, monitoring credential leaks, and integrating with Okta or SIEM.
- Ethan Mollick argued that collecting hard problems and good ideas now increases their value as AI enables action without direction.
- Mixpeek's amux lets you manage multiple headless Claude Code agents with self-healing, orchestration, and a real-time PWA dashboard using tmux.
- Notion's design team (via Claire Vo podcast) uses Claude Code in a shared Next.js playground to prototype functional apps like podcast players or Figma conversions, automating tasks via custom skills.
- Dan McAteer endorsed GPT-5.3-Codex and the Codex app as the top AI coding tool for precision, citing OpenAI's model-harness co-design and monthly iterations.
- Boris Tane detailed a Claude Code workflow emphasizing research documentation, detailed planning, iterative annotation cycles, and single-prompt implementation.
- A Treasure Data engineer built a production SaaS CLI in an hour using Claude Code with upstream governance, AI code reviewers, and CI/CD.
- levelsio demoed AI building on-the-fly ePub/PDF generators with personal watermarks, and a bash alias bypassing Claude Code permissions for faster shipping.
- Straion centralizes rules for AI coding agents to validate plans against standards before implementation, integrating with Claude Code via CLI.
- CS enables indexless code search understanding code, comments, and strings with BM25 ranking and complexity scoring, supporting TUI, HTTP, and MCP.
🔬 AI Research & Models
- Researchers achieved 3x LLM inference speedups by baking multi-token prediction into weights via self-distillation, without speculative drafting models.
- Standard Intelligence's FDM-1 predicts next actions from screen videos at 30 FPS, enabling you to train agents on 11M hours of data for tasks like CAD modeling or autonomous driving in simulations.
- Rylan Schaeffer demoed autonomously generating a full research paper using Claude Opus 4.6 for planning, GPT 5.2 for math, and Claude Code for experiments, requiring only occasional nudges over a weekend.
- Guide Labs launched Steerling-8B, an interpretable LLM that traces every token back to training data origins, achieving 90% of standard capabilities with less data (raised $9M).
- Z.ai released GLM-5, a 744B-parameter open-weights LLM topping benchmarks for long agentic tasks with 200K input/128K output tokens.
- Liquid AI released LFM2.5-1.2B-Thinking, an efficient 1.17B on-device model excelling in reasoning under 900MB RAM.
- SleepFM predicts 130+ illnesses like Alzheimer's from one night's vitals, up to 6 years early.
- A King's College London study put GPT-5.2, Claude Sonnet 4, and Gemini 3 Flash in simulated nuclear crises and found all three attempted deception, no model ever chose de-escalation, 95% of games saw tactical nuclear use, and Claude dominated open-ended scenarios (100% win rate) through calculated hawkishness while GPT-5.2 transformed from passive to nuclear-willing under deadline pressure.
- Shuaichen Chang explored how memory architectures in language models evolve from RNN hidden states to linear attention matrices and external pools, with trade-offs in capacity, efficiency, and updates.
- Sharut Gupta showed orthogonal maps align image embeddings across multimodal models like CLIP and SigLIP with high cosine similarity, transferring to text while preserving geometry.
- Xiao Zhu's CodeScaler uses self-evolving data and syntax-aware rewards to scale code LLM training without execution, boosting Qwen models on DeepCoder and RM-Bench at lower latency.
- Google DeepMind used LLMs to automatically discover novel multi-agent learning algorithms via iterative pseudocode generation and benchmarking, matching hand-crafted baselines (paper).
- Opper's Car Wash Test benchmarked 53 AI models on a simple reasoning query; only 5 achieved perfect consistency over 10 runs, with humans at 71.5% correct.
- Kling AI's MultiShotMaster generates controllable multi-shot videos maintaining narrative flow across cuts, accepted to CVPR 2026.
- NVIDIA released the PPISP dataset on Hugging Face for photometric compensation in radiance field reconstruction, featuring 4 outdoor scenes with 3 cameras.
- Ishaan's Tiny-RL implements RL methods from scratch like GRPO and DAPO to train LLMs on math datasets, building intuition on reasoning improvements.
🏛️ AI Policy, Governance & Safety
- 61 privacy regulators from around the world issued a joint statement demanding organizations building AI image/video generators implement safeguards against nonconsensual intimate imagery, particularly for children, in the most unified international regulatory response yet to the Grok deepfake crisis.
- The EU AI Act poses major compliance challenges from August 2026 with fines up to 7% of turnover, extraterritorial reach, and gaps for agentic AI, costing large firms $8-15M initially.
- Teortaxes commented that DeepSeek's alleged distillation is minor bootstrapping rather than full distillation, while criticizing Moonshot and MiniMax's approaches as having low ceilings.
- SkillScan scans AI agent skill files for malicious patterns like exfiltration, env reads, and key theft, returning a 0-100 safety score with evidence. —free to try
- A Hacker News discussion explored Markdown's future as an AI-friendly plain-text format evolving through renderers, extensions, and ecosystems like Unified while preserving simplicity.
- IronClaw lets you build and host secure, privacy-first AI agents on NEAR AI Cloud that run tasks without exposing credentials, with WASM sandboxes, dynamic tools, and persistent memory (GitHub). —free to try (pay only for inference)
- Allie K. Miller's AI Context Vault lets you create 200-700 line summary docs on values, career, family, and goals for targeted AI personalization without needing lifetime data.
- Particle launched Podcast Clips to extract and integrate relevant audio from podcasts into news feeds using embeddings for topic matching. —$2.99/month or $29.99/year for premium
- Siteline analyzes AI agent and bot traffic to optimize content for visibility, with recommendations on prompts, rankings, and citations against competitors. —free to try
- OpenHunt lets you launch and discover AI products with weekly rankings, calendars, upvotes, and agent registration. —free to try
- TTSLab lets you test text-to-speech and speech-to-text models in your browser using WebGPU, fully on-device.
- shuru is a local-first microVM sandbox for AI agents.
- Lyra.kids creates star-themed AI stories for children.
- WARN Firehose lets you search 117K unified US layoff notices from 1998-2026 affecting 12.5M employees, with interactive charts and REST API. —free to try
- LLM Timeline lets you explore 206 models from 2017-2026 with details on params, organizations, features, and milestones.
- Voidcore is a vibe-coded WebGPU engine built specifically to power the game Mana Blade.
- Aqua is a CLI message tool for AI agents.
- Junior García's node-based tool lets you animate any SVG using Gemini 3.1 Pro by describing animations in text, generating SMIL/CSS or morph transitions.
- abstrakt at PlayCanvas demoed interactive 3D Gaussian Splat scans like a honey bee in-browser, running at 1M splats in 16MB.
- GeoSpy identifies exact locations from photos alone, including home addresses from videos, raising privacy concerns.
- Linus Ekenstam argued Apple could pioneer the future OS by replacing the static iPhone home screen with a dynamic, contextual UI inspired by the Apple Watch.
- Quesma hid backdoors in ~40MB binaries and asked AI + Ghidra to find them, testing AI's binary audit capabilities.
- Vladimir Varankin got AI to build a FreeBSD Wi-Fi driver for his old MacBook when one didn't exist.
📊 Fundraising & Deals Roundup
🎙️ Interviews, Panels & Podcasts
- Claire Vo's podcast demoed Notion's Prototype Playground where designers use Claude Code for Figma-to-code, custom skills, and automated deploys.
- Bilawal Sidhu argued in a YouTube breakdown that node-based workflows in AI creation tools are holding back AI video, proposing hybrid 3D neural engines as the future.
- Dan Shipper and Standard Intelligence hosted live broadcasts on AI topics.
- Ethan Mollick used Claude Code to saturate his D&D Encounter Test benchmark with bug-free simulations, visualizing results to confirm reliability.
- Bilawal Sidhu argued node graphs limit AI creation; hybrid 3D engines with viewports enable better real-time control.
- The Deeplearning.ai Batch newsletter covered Z.ai's GLM-5, Liquid AI's on-device model, SleepFM, and Big AI's $100M+ lobbying spend.
- Allie. K. Miller shared data from Anthropic that shows it takes about 1K Claude Code sessions (roughly 1-3 months of daily use) before developers auto-approve 50% of AI suggestions, suggesting trust builds gradually through experience.
✨ Sunday Special: Your AI Tutorial Library
Last week, you told us (loudly) that you want more practical how-to content and less "Company X raised $Y billion." We heard you.
So we spent the past few weeks building something we've never done before: a full library of deep-dive tutorials on the AI tools people are actually using right now. Each one is based on hours of video research, real-world demos, and insights from the people who built these tools.
Pick what's relevant to you, skip what isn't. Bookmark the rest for later.
Brand new to AI coding? Start here.
- Claude Code Changed How People Work. Here's How to Actually Use It. We watched 7 videos from Claude Code's creator and top power users, then organized every insight into one guide. Covers setup, the "second brain" concept, context stacking, plan mode, running parallel agents, and Cowork for non-coders. Start here if you've never opened a terminal.
- Claude Code: The Complete Guide to Building Real Projects We watched 5+ hours of the two best Claude Code tutorials on YouTube (Nick Saraev's 4-hour masterclass and Sabrina Ramonov's AI Marketing Officer build) and distilled them into one guide with timestamped links. Start here if you want to follow along and build something this weekend.
- From Zero to CLI Hero: Everything You Need to Know About Google's Gemini CLI We sat down with the creator of Gemini CLI for his first-ever deep interview. He runs 7-10 AI agents simultaneously, fixes bugs by pasting URLs he hasn't read, and once had the AI clear his entire schedule while he was at the gym. It's also completely free (1,000 requests/day with a Google account). Start here if you're in the Google ecosystem.
- From Zero to Codex Hero: Everything You Need to Know About OpenAI's Coding Agent Our deep dive on OpenAI's Codex, the autonomous coding agent that works independently for hours on complex codebases. Start here if you're an OpenAI / ChatGPT user.
⚡ Power User Tutorials
Already using AI coding tools? Level up.
- Your Computer Can Now Do Your Monday Morning Busywork While You Sleep AI educator Allie K. Miller built a system where Claude Code automatically scans her Gmail every Friday, finds urgent emails she hasn't replied to, and sends her a summary. Then she stacked commands so her calendar triggers client briefings that compile from email, Slack, Notion, and the web. This one's the most immediately useful tutorial we've published. 2-3 hours to set up, saves that much time every week.
- Claude Code Just Became Your AI Employee. Here's How to Set It Up. The quick-start guide to subagents, skills, hooks, and Cowork plugins. Each feature takes about 5 minutes to set up. Subagents delegate work to specialized Claudes. Skills teach Claude your playbook. Hooks automate formatting and linting on every edit. Best for people who want results fast without a deep dive.
- Inside Claude Code: Everything Its Creator Wants You to Know Boris Cherny (Claude Code's creator) did three major interviews this month. He ships 10-30 pull requests daily and hasn't hand-edited code since November. We pulled every tip, product philosophy insight, and prediction into one piece. Highlights: the throwaway prototype trick, why his CLAUDE.md is only two lines, and why Opus often costs less than Sonnet. Best for people who want to think about AI coding differently, not just do it faster.
- We also published a deep dive on OpenAI's Codex app, breaking down how the "command center for agents" works, how OpenAI's own team uses automations that fix bugs while they sleep, and why the new Spark model is so fast the app has to slow down its output so you can read it.
🤖 AI Agents Beyond Coding
Want AI that works while you sleep? On its own computer? Check out this piece on OpenClaw.
- Your AI Agent Should Have Its Own Computer (Here's How to Set That Up) Real people are buying $600 Mac Minis, installing the open-source OpenClaw framework, and running teams of AI agents that handle dev work, marketing, security reviews, and personal CRMs 24/7 through Slack. One guy's agent figured out onions were causing his stomach issues by correlating meal photos with symptom reports. People on Upwork are already getting paid $500-$5,000 per job to set this up for businesses. The most "future is here" piece we've published.
Around the Horn Digest — Feb 20–22, 2026
🏢 Big Tech & Major Companies
- OpenAI told investors it's now targeting $600 billion in compute spending by 2030, walking back Sam Altman's $1.4 trillion infrastructure vision from last October, while projecting $280 billion in revenue by decade's end.
- Amazon issued a forceful denial after the Financial Times reported that its Kiro AI coding assistant triggered a 13-hour AWS outage in December by deciding to "delete and recreate the environment" on a live system.
- OpenAI assembled a 200+ person team to build AI consumer devices after acquiring Jony Ive's io Products for $6.5 billion, with a $200–$300 camera-equipped smart speaker planned for February 2027.
- NVIDIA open-sourced DreamDojo, a robotics world model trained on 44,000 hours of human video that teaches robots to predict physical interactions without a physics engine, built with UC Berkeley, Stanford, and five other universities.
- Microsoft released a report on detecting AI-generated media, finding only 10 of 50 method combinations delivered reliable results, and made no commitment to implement them on its own platforms.
- OpenAI disclosed that India became its second-largest market with over 100 million weekly ChatGPT users, where nearly 50% of all messages come from 18–24 year olds who use the coding assistant 3x more than the global median.
- SoftBank is building a $33 billion natural gas power plant in Ohio with 9.2 gigawatts of capacity to power AI data centers, part of Japan's $550 billion U.S. investment pledge.
🏛️ AI Policy, Governance & Safety
- OpenAI employees flagged the Canada mass shooting suspect's ChatGPT conversations about gun violence months before the attack, but company leaders decided against alerting authorities, the Wall Street Journal reported.
- U.S. special operations forces used Anthropic's Claude AI through Palantir during a raid that captured Venezuela's president, processing intelligence in minutes despite Anthropic's policies prohibiting use for violence.
- Senior safety researchers at OpenAI, Anthropic, and other major labs publicly resigned over the past two weeks, citing companies' shift from safety to profit, including Anthropic's safeguards chief warning "the world is in peril."
- A Cambridge-led study of 30 top AI agents found only four published formal safety evaluations of the actual bots, with browser agents (the most autonomous category) missing 64% of safety disclosures.
🔐 AI Security & Cybersecurity
- A small group of Russian-speaking hackers used commercial AI tools to breach 600+ Fortinet firewalls across 55 countries in weeks, exploiting weak passwords and exposed ports at a scale Amazon said would have been impossible without AI.
- Tsinghua University researchers published Phantom, a framework that hijacks AI agents by injecting fake chat template tokens, achieving ~80% attack success rates across GPT, Gemini, and Qwen agents and uncovering 70+ vulnerabilities in commercial products.
- A security researcher intercepted 3,177 API calls across four AI coding tools and discovered AI models hallucinate non-existent software packages in up to 21% of cases for open-source models, creating a new attack vector called "slopsquatting."
💼 AI Productivity, Labor & Economics
- The Economist reported the AI productivity boom hasn't arrived yet: an NBER survey of ~6,000 executives found ~90% of firms saw no AI impact on employment or productivity, PwC found 56% saw neither revenue nor cost gains, and Forrester said just 15% reported earnings lifts, even as global corporate AI investment hit $252.3B in 2024.
- A European study found AI adoption boosted productivity by 10–20% at firms without reducing employment, primarily by displacing specific tasks rather than entire jobs, with benefits skewing toward larger companies and ICT sectors.
- CEOs from Read AI and Lucidya told Web Summit Qatar that AI tools replace specific tasks rather than entire jobs. Qatar's PM announced an additional $2 billion for the country's fund of funds and a 10-year residency for entrepreneurs.
⚡ AI Infrastructure & Energy
- Big Tech companies including Meta and OpenAI are building off-grid data centers across the US to bypass electrical grid delays, with one Texas facility consuming more power than Chicago by running on its own natural gas and solar infrastructure.
- AI data centers are reshaping local politics and utility costs, with Georgia residents facing multiple rate hikes to fund grid upgrades and electing candidates who pledged to make data centers "pay their own way."
- An Anthropic researcher had 16 Claude Opus 4.6 agents autonomously build a working C compiler in Rust over two weeks for $20,000 that compiles the Linux kernel, runs 35% faster than GCC, and produces 25% smaller binaries.
- Anthropic released Claude Sonnet 4.6 and Opus 4.6 with human-level performance on complex multi-step office tasks, enabling capabilities like navigating spreadsheets and web forms that previously required their most expensive Opus-class models to now run on mid-tier Sonnet.
- ggml.ai, creators of llama.cpp (95,400 GitHub stars), joined Hugging Face to build a single-click pipeline from model downloads to local AI inference.
🔬 AI Research & Hardware
- Toronto startup Taalas raised $200 million and launched the HC1 chip, which etches AI model weights directly into silicon and delivers 17,000 tokens per second on Llama 3.1 8B; 73x faster than NVIDIA's H200 at one-tenth the power.
- AI-generated music now comprises nearly 40% of daily uploads to streaming platforms, with Spotify not labeling suspected AI artists despite suspicious patterns like millions of listeners but only thousands of social media followers.
- Pixar dropped the trailer for Toy Story 5, featuring an AI tablet villain named Lilypad that steals a kid's attention from Woody and Buzz by recording conversations, translating languages, and replacing imaginative play.
- Google hosted Flow Sessions, a five-week cohort where 10 indie filmmakers used Gemini and Veo to create short films, though participants warned AI filmmaking is becoming "faster and cheaper, but potentially lonelier."
- A new essay coined "harness engineering" as the emerging discipline of configuring AI agents through prompts, guardrails, tool access, and memory rather than writing traditional code.
- An article argued that the ability to refute AI-generated code (catching flaws and errors) will matter more than speed of generation, since faster production without oversight just propagates mistakes at scale.
- An article framed AI tools as exoskeletons that augment human capability rather than as coworkers that replace it, drawing parallels to physical exoskeletons used in warehouses and rehabilitation.
- Elvis Saravia published his weekly roundup of the top AI research papers.
- Pomelli by Google Labs scans your website to learn your brand style, then generates 20+ editable social posts and banners in 5–10 minutes — free in beta.
- Perplexity opened iPhone pre-orders for its Comet AI browser, launching March 11.
- Pi for Excel lets you chat with AI inside your spreadsheet to analyze data, update cells, and do research — free.
- AccessiBot scans your website for accessibility issues (missing alt text, bad contrast, broken keyboard navigation) and shows you exactly what to fix with screenshots and code snippets — free to try.
- SwiftUI Agent Skill is an open-source tool that teaches AI coding assistants to generate production-ready SwiftUI code using modern APIs instead of deprecated patterns — free.
- Peter Yang chatted with Nat Eliason and published a full tutorial on building a business with OpenClaw, an open-source AI agent framework.
📊 Fundraising & Deals Roundup
- SoftBank — $33B for an Ohio natural gas power plant to power AI data centers.
- OpenAI / io Products — $6.5B acquisition of Jony Ive's hardware startup.
- Taalas — $200M raised for AI chip that etches weights into silicon.
Miscellaneous
- Meta integrated Manus AI into Ads Manager about a month after acquiring it for $2B+, though one ad expert found the integration is more of a redirect to Manus's paid interface than a true native tool.
- A Qualtrics study of 20,000+ consumers found AI customer service fails at nearly 4x the rate of AI in general, with 1 in 5 users reporting zero benefit and "misuse of personal data" now the top consumer concern (up 8 points YoY).
- OpenAI and Microsoft joined the UK's AI Security Institute's Alignment Project, an international coalition to build shared methods for testing and monitoring frontier AI systems before failures happen.
- Anthropic shipped a big Claude Code desktop update: it can now spin up dev servers, display live web apps in the interface, auto-fix CI errors, and merge GitHub PRs on its own once tests pass. Sessions sync seamlessly across CLI, desktop, web, and mobile. If you're a dev who hasn't tried it yet, this is the update that makes that situation embarrassing.
- OpenAI flagged and banned the account of Jesse Van Rootselaar — the suspect in one of Canada's deadliest mass shootings — eight months before the attack, after he described violent scenarios to ChatGPT. OpenAI employees reportedly debated alerting police but didn't. The company says its systems worked as designed for content moderation; the harder question of when AI companies should proactively contact law enforcement is very much not settled.
- Amazon's Kiro AI caused a 13-hour AWS outage in December after engineers let it autonomously "delete and recreate" a production environment — Amazon says the outage was "the result of user error," which, sure, the user being... the AI. Agentically owned. The outage primarily hit AWS services in China.
- Sam Altman said at an India summit this week that AGI is "pretty close," superintelligence is "not that far off," and "the world is not prepared" — all while noting OpenAI is already using its own AI to speed up AI research. He also said writing C++ by hand is "over" and that big categories of jobs will be "completely obsoleted." Light Friday content.
- Anthropic launched Claude Code Security, which scans codebases for vulnerabilities and auto-suggests patches. One blog post. One hour. $10B wiped from cybersecurity stocks. CrowdStrike -6.5%. Cloudflare -6%. Palo Alto -5.7%. As one person on X put it: "babe wake up claude just killed a bunch of companies again."
- GGML and llama.cpp — the open-source inference engine powering basically every local AI setup — have been acquired by Hugging Face. Creator Georgi Gerganov and team are joining HF to scale and ensure long-term sustainability of local model inference. Big win for the open ecosystem that Big AI keeps trying to irrelevance-ify.