Microsoft used Build 2026 to show how agents move from chat windows into PCs, Windows, GitHub, Azure, enterprise data, and new hardware. In other words: Microsoft is giving agents a whole computer to live in.
Microsoft Build 2026 had enough product names to make a developer wonder whether the keynote came with its own package manager.
So let’s simplify the whole thing: Microsoft is trying to build the world where AI agents do real work.
That was Satya Nadella’s frame from the beginning. He opened the keynote by saying developer conferences are about understanding tech shifts and the new stack. The new stack he laid out had five layers: compute, models, context, tools, runtime, and the security / governance systems wrapped around all of it.
That means more than a better Copilot button. At Build 2026, Microsoft showed an agent stack that stretches from chips and data centers all the way up to Windows PCs, GitHub workflows, Azure tools, Microsoft 365 data, and new devices like office badges and desk companions.
The big idea is this: agents need a place to live.
A chat window works fine when you want to ask a question. A real enterprise agent, meaning software that can take steps on your behalf, needs a lot more. It needs compute to run on, permission to access files, an identity so the company knows who did what, memory of the task, tools it can safely use, and an audit trail so IT can inspect what happened later.
So Microsoft’s answer is to build the whole stack around that.
Let's dive in, shall we? Keep scrolling for our full recap.
Watch the full Build Keynote below
Microsoft Build 2026 was not really about one new Copilot feature, one new Surface device, or one new model.
It was Microsoft’s attempt to show the whole agent stack.
At 1:50, Satya Nadella laid out the frame: compute, models, context, tools, runtime, and security. In plain English, Microsoft thinks useful agents need a full work environment around them.
The big announcements fit into that map:
So the short version is this: Microsoft wants agents to move from chat windows into the operating system, the browser, the codebase, the cloud, the company database, and eventually new devices like badges and desk companions.
The strongest version of this future is compelling. An agent can see your work context, use the right model, run safely on your device or in a Cloud PC, check live web data, write code, test it, open a pull request, follow policy, and leave a trail behind.
The hard part is making that feel simple.
Microsoft has a coherent agent strategy. It also has a naming problem large enough to qualify as infrastructure. Then again, this is AI. In this industry, names were never the load-bearing part of the stack.
A useful agent needs more than a smart model. It needs the same boring-but-essential parts every workplace system needs:
That framing explains the firehose of announcements. Windows AI APIs, Aion Instruct, Aion Plan, Surface Laptop Ultra, Surface RTX Spark Dev Box, Windows agent containment, OpenClaw on Windows, Microsoft Foundry, Web IQ, Scout, GitHub Copilot App, Azure HorizonDB, Project Solara, Maia 200, Cobalt 200, Microsoft Discovery, Majorana 2, and the Mayo Clinic partnership all point in the same direction.
Microsoft is trying to make Windows, Azure, GitHub, and Microsoft 365 the operating environment for agents.
Or said another way: Microsoft wants agents to move from “thing you ask” to “thing your company can safely assign work to.”
Microsoft started at the edge because it wants developers to treat Windows PCs as serious AI machines again.
At 2:46, Satya started with Windows. He said the amount of local compute sitting inside PCs is “astounding,” then pointed to current examples already using onboard AI: Outlook summaries, PowerPoint alt text, Teams super-resolution, and Adobe tools like After Effects and Premiere.
The message was that Windows ML and Windows AI are expanding so developers can build local onboard AI that runs across the full install base of Windows GPUs, NPUs, and CPUs.
Then at 4:24, Microsoft announced Aion Instruct and Aion Plan.
Aion Instruct is the smaller on-device model for summarization, rewrites, intents, and accessibility. Aion Plan is the 14B-parameter reasoning and tool-calling model with 32K context that ships in-box as part of Windows. Together, Satya described them as enabling a full local agentic loop, where developers can give agents tool access and build agentic applications without a round trip to the cloud.
That phrase, “round trip to the cloud,” is the key. If every agent step has to leave your machine, the experience gets slower, more expensive, and more complicated for privacy. Microsoft wants more of those loops to happen locally.
So this is the first part of the Build story: Microsoft wants Windows to become a place where agents can reason, plan, use tools, and run locally.
Surface Laptop Ultra is Microsoft’s flagship version of that local-AI pitch.
At 6:14, Satya introduced it as a device that brings NVIDIA’s next-generation system-on-chip together with Surface design. A system-on-chip means the CPU, GPU, and AI capabilities are integrated together. Microsoft said the platform includes unified memory and integrated DRTM, a security feature used to help establish a trusted boot and runtime environment.
Quick translation: this is Microsoft trying to make a Windows laptop that can act like a serious AI workstation, while still looking and behaving like a normal premium laptop.
The Surface Laptop Ultra includes:
Unified memory means the CPU and GPU can draw from the same big pool of memory. AI models are memory-hungry, so more shared memory helps the machine run larger local models, larger datasets, or more complex agent workflows without constantly shuffling data around.
Corey is going to try to get hands-on time with the Surface Laptop Ultra while he’s at Build, which is exactly the right test. Specs are one thing. The real question is whether this feels like a practical local AI workstation or a keynote machine with nice lighting.
Then Microsoft pushed the same idea harder with the Surface RTX Spark Dev Box.
At 7:23, Satya framed it as the machine you build if you “max the compute” and “max the memory” for developers. At 8:35, the video pitch called it a “dream machine,” with:
A petaflop is a measure of compute power. You do not need the exact math here. The point is that Microsoft is saying this small box can handle serious local AI workloads.
Microsoft’s Surface page adds the practical developer details: the Dev Box ships with a developer-optimized Windows 11 Pro experience, Visual Studio Code, GitHub Copilot in Windows Terminal, WSL, PowerShell 7, Coreutils for Windows, BitLocker, Microsoft Defender, Entra ID, Intune, two USB-C ports, USB-A, HDMI, Ethernet, a headphone jack, and a 100W thermal envelope inside an aluminum chassis.
At 9:13, Satya also said Windows is coming to NVIDIA’s DGX Station, which he described as a “desktop data center” capable of running a 1T-parameter model locally. The line was almost absurd on purpose: the kind of hardware once associated with early frontier-model supercomputers is now moving toward desks, labs, and developer workstations.
So imagine an agent that reviews your private codebase, runs local tests, checks error logs, and drafts a fix. A cloud model can help. Local compute gives that workflow faster iteration, lower API costs, and a better privacy story.
Corey is also going to see if he can use the Dev Box while he’s there. That hands-on piece will be useful because this category lives or dies on feel: how fast local models run, how loud the box gets, how clean the setup is, and whether the developer workflow feels normal after the keynote ends.
Microsoft also showed a developer-optimized Windows experience, and this part felt almost like a peace offering to developers who usually prefer macOS or Linux.
At 9:44, Satya said Windows 365 now has a developer distribution optimized for cloud productivity, and Windows itself is getting a pile of developer-friendly updates:
The demo on the Surface RTX Spark Dev Box made that concrete. At 11:47, the default experience had no news feed, no widgets, no notifications, and dark mode enabled. The whole point was “calm Windows,” which is funny because the phrase only works if you’ve used regular Windows.
The demo also included:
Then the demo got to the real agent point: Surface RTX Spark serving large local models.
The presenter said she had done development with a 120B-parameter model most machines cannot load, used 3.4M local tokens on the device, and kicked off agents through Fleet. She used Copilot voice, running through a local model, to ask the agent to find console write-line calls and convert them to the standard logger used elsewhere in the codebase.
The main agent delegated subagent tasks to the local model using the GPU, which made the workflow cheaper. Then Aion Instruct analyzed log files locally, while the machine showed about 90GB of RAM being used by the GPU.
That is the actual local-agent story. The machine is not only answering prompts. It is running agents, local models, containers, command-line tools, and code workflows at the same time.
After the edge, Microsoft moved to the cloud. Satya described Microsoft’s core infrastructure equation as “tokens per dollar per watt”.
That is a useful phrase. AI infrastructure is not only about having the biggest chips. It is about turning electricity into useful tokens as efficiently as possible.
Then he pivoted to the political part: earning permission from communities.
AI data centers have become a community flashpoint because people worry about power costs, water use, land use, pollution, local jobs, and whether the benefits flow anywhere near the people living next to the buildings.
Microsoft’s stated principles were:
Then Satya gave the scale: Azure now spans more than 500 data centers, and Microsoft added more data center capacity in the last 18 months than it added in Azure’s first decade.
The company is building around three dominant workloads: training, inference, and agent runtime.
That last one is the new category. Training builds models. Inference runs models. Agent runtime runs long-running workflows where models use tools, access data, call APIs, and take action.
Microsoft’s Fairwater data center design was presented as an “AI super factory” spanning Georgia and Wisconsin, built with NVIDIA, designed for high GPU density, higher-performance networking, lower latency, and more bandwidth. Microsoft also said it redesigned power delivery and cooling so the facility can effectively operate with zero water consumption, with daily water usage over a year roughly equivalent to a single restaurant.
So Microsoft cannot build the agent future without building the physical future underneath it.
The pressure will come from the real world. Communities will judge Microsoft’s promises by utility bills, local water stress, jobs, and transparency, not keynote lines.
Microsoft also used the keynote to show how its silicon strategy is changing.
At 22:42, Satya said Maia 200, Microsoft’s AI accelerator, is live in Arizona and will deploy internationally. He said it delivers 30% more tokens per dollar versus the leading GPU today and will power Microsoft 365 Copilot.
Then he made a second point: agents are changing CPU demand.
In normal AI conversations, people focus on GPUs. Agents need GPUs too, and they also create lots of tool calls, data calls, orchestration steps, and low-latency coordination work. That makes the CPU more important.
So Microsoft announced Cobalt 200 VMs, its next-generation CPU platform for cloud-native and agent workloads. Using GitHub Copilot agent traces, Microsoft said Cobalt showed:
That is a useful clue for where agent infrastructure goes next. If agents call tools constantly, the whole system has to be tuned for lots of small, fast, coordinated actions. The GPU may do the model work, and the CPU has to keep the whole circus moving.
The NVIDIA portion sharpened the local hardware story.
Jensen Huang said Microsoft and NVIDIA started working on this class of PC about three years ago. The goal was to build a system that was powerful for designers, creators, and AI, with processing capability and a software stack integrated into Windows and creative tools.
Then he gave the better phrase: the PC evolves from personal computer to personal AI.
His example was simple. You could be traveling, text your PC, ask it to get coding done or make design changes, and it would fire up tools on the machine, make changes, and iterate with you while you were away.
That is a very different computer metaphor. The PC becomes something an agent can use, not only something you use directly.
Jensen also said RTX Spark has a petaflop of MVFP4 performance, a numerical format Microsoft and NVIDIA worked on together. He said that lets the system take advantage of 128GB of memory and fit “maybe a couple hundred billion parameter model.”
So Corey’s instinct is right: the interesting part here is not only the box. It is the idea that an AI assistant can operate your PC as a persistent local workstation.
Then Jensen connected the same pattern to the cloud.
He said Hopper was focused on pretraining. Grace Blackwell moved the focus to post-training, reinforcement learning, and reasoning models. Then Vera Rubin is designed for agents.
That is a useful hardware timeline:
Jensen said Microsoft has deployed the largest number of Grace Blackwell systems in the world. He also said Fairwater is liquid-cooled, closed-loop, and uses almost no water.
Then he gave the economic claim: Grace Blackwell can increase token generation rate and reduce token generation cost by roughly 30x over Hopper.
The broader pattern is that Microsoft and NVIDIA are co-designing the full stack for agents: edge devices, data centers, CPUs, GPUs, networking, encryption, and software.
Jensen also said agent systems need encrypted paths from storage, which acts like long-term memory, through working memory and data in transit. He described Vera as a CPU designed for agents because agents are more impatient than humans and need extremely low latency.
Then he said something that belongs in the piece: GitHub commits have gone parabolic. In the last several months, he said the number of commits increased by a factor of three.
That was his evidence that agentic systems are not only impressive; they are useful, productive, and profitable enough to drive compute demand.
The weirdest and most interesting announcement was Project Solara.
Project Solara is Microsoft’s chip-to-cloud platform for agent-first devices. That phrase sounds abstract, and the idea is easy: if agents become the interface, computers may stop looking like one main screen with a bunch of apps.
Microsoft’s “big uh-huh” was that the next device is not one form factor. It is all devices working together as one system, with agents showing up closer to where and when you need them.
Project Solara has three pillars:
Azure ties the whole system together across cloud and device.
Microsoft showed two early reference categories: stationary devices and portable devices.
The desk device is built on MediaTek silicon. It uses Windows Hello for Business so you can walk up, sign in securely, and access an agent. The example was Microsoft 365 Copilot grounded in Work IQ, with voice or tap controls that help you think, plan, and delegate tasks.
The desk device can act as a companion to an existing Windows Copilot+ PC or access Windows 365 when connected to a monitor.
The badge device is built using Qualcomm wearable silicon. It is a lightweight form factor designed for agents on the go, with fingerprint unlock, camera input, and secure access to agents.
The live demo used the badge to capture shots from the keynote, clean them up, and send them to a team for review. Then Microsoft gave the healthcare example: the badge could help with check-ins, patient records, critical insights, hands-free voice documentation, medication scanning, vitals scanning, and workflow verification.
Microsoft also said the same foundation could be adapted for retail, industrial work, hospitality, financial services, legal, and other verticals. AccuWeather, Best Buy, Target, and others are exploring the devices.
Then Qualcomm CEO Cristiano Amon gave the stronger strategic point: in the smartphone era, the phone was the center of your digital life. In the agent era, agents become the center, and devices become the best physical surfaces for different contexts.
The privacy question is obvious. A work badge with microphones, a camera, and “one tap away from recording an impromptu hallway conversation” will need very clear social rules. Anyone who has ever said “quick sync” and then trapped a coworker for 19 minutes has already created the test case.
So Solara shows where Microsoft thinks agents go next: closer to the task, outside the app frame, and across a set of enterprise-managed devices.
After devices, Satya moved up the stack to models, context, and tools.
He said Foundry now has more than 12,000 models, which he described as the largest model catalog. That includes OpenAI models, Anthropic models, MAI models, and newly added frontier models like OpenAI’s realtime voice models and Claude Opus 4.8.
So Microsoft’s model strategy is clearly multi-model. It wants to offer OpenAI, Anthropic, its own MAI models, open-weight models through partners like Fireworks, and whatever else developers need.
The pitch is not “one Microsoft model to rule them all.” The pitch is “choose the right model for the right task, budget, and eval mix.”
That is important for GitHub Copilot too. Corey’s read was right: GitHub Copilot is becoming Microsoft’s answer to Codex and Claude Code, and the caveat is that it still gives developers access to OpenAI and Claude models.
So Microsoft does not have to win every model fight if it owns the place where agents use those models.
Then Microsoft moved to the data tier.
Satya said agents create different call patterns because they are constantly storing, retrieving, reasoning, acting, and learning. That means databases need to be designed for agent workloads, not only traditional user-facing applications.
So Microsoft introduced Azure HorizonDB, a fully managed PostgreSQL service on Azure.
PostgreSQL is a popular open-source database. Developers like it because it is familiar, flexible, and widely supported.
Azure HorizonDB is designed for high availability and read-heavy workloads. In the keynote, Satya said it includes automated failover, up to 128TB per cluster, and scale-out for read-heavy workloads. Microsoft also said internal testing showed 3x throughput compared with PostgreSQL.
The Microsoft product post adds the agent-specific pieces: ultra-low latency, rapid read scale-out, up to 3x faster transactions and search performance compared with self-managed PostgreSQL, advanced vector indexing, semantic search, in-database access to AI models, and integrations with Microsoft Fabric and Foundry.
So HorizonDB is Microsoft saying: keep using PostgreSQL, and we will add the AI-native database pieces around it.
Microsoft also announced GPU acceleration for Fabric Data Warehouse.
A GPU is a chip designed to handle lots of parallel math at once. GPUs became central to AI because training and running models involves a huge amount of parallel computation. Microsoft is now applying that same hardware acceleration to eligible analytics queries in Fabric Data Warehouse.
Satya’s explanation was simple: data warehouses were already important for human users, and they become even more important when agents need analysis on the fly. Microsoft said it is seeing 7x performance gains.
Microsoft also pointed to CoddSpeed, the underlying research that won Best Industry Paper at SIGMOD 2026. The product idea is straightforward: eligible Fabric queries can run on NVIDIA accelerated computing without developers rewriting queries or managing infrastructure.
So this connects back to agents because agents often need to ask questions of business data. Faster analytics means the agent can make tool calls against the same warehouse without sending users into a waiting room.
Then came one of the most important announcements: Web IQ.
Corey’s read from the room was that Microsoft sees Web IQ as the newer, faster, more efficient way agents access the internet. That lines up with Satya’s framing.
Satya said web grounding needs “fresh, high-quality, and fast web data.” Web IQ is built on Microsoft’s global infrastructure that already serves more than 1B users, and it has been rearchitected for LLM and agent workflows.
Web IQ is:
MCP, or Model Context Protocol, is a standard that helps AI systems connect to tools and data sources. So “MCP native” means Web IQ is meant to plug into agent stacks without a bunch of custom glue.
The old version of web-connected AI was basically “search, scrape, summarize.” Web IQ sounds more like a purpose-built internet access layer for agents: find the right sources, package the relevant evidence, keep latency low, and spend fewer tokens doing it.
So this is the agent race in miniature. Models answer from what they know. Search and retrieval systems answer from what they can find. Enterprise agents need both, plus permissions, auditability, and a way to compare internal reality with the outside world.
Microsoft then introduced Microsoft IQ, a unified intelligence layer across Foundry, Fabric, and Microsoft 365.
Here is what that means without the branding soup:
The demo made it easier to understand.
In a power-utilities control center demo, an agent assessed a grid operations incident and produced a response brief. It was wired to a Foundry IQ knowledge base that packaged documents, operational data, and people into context the agent could reason over.
Then the agent used Web IQ to answer a current-events question about electricity prices in San Francisco. The presenter called Web IQ “search built for the AI era,” with industry-leading quality, velocity, and efficiency.
Then the agent used Fabric IQ to pull internal context about at-risk substations. Fabric IQ represented the grid as an ontology, which is a structured map of the business and how its parts relate to each other.
In this case, the ontology was coupled with live telemetry, so it reflected the real operational state of the grid minute by minute.
Then the agent used Work IQ to pull the company’s response procedure from SharePoint. The presenter emphasized that the agent was not working from a stale upload or copied snapshot. It was answering from the same source the team maintains day to day.
When the procedure changes, the answer changes with it.
That is the enterprise dream: an answer grounded in the outside world, your live operations, your internal documents, your procedures, and the people who need to respond.
Then the long-running agent showed its work: Web IQ for the outside world, Fabric IQ for operational state, and Work IQ for people and procedures. The presenter also said Foundry routines can run this on a schedule, turning a one-off response into continuous, proactive execution.
So the enterprise dream is not “ask AI a question.” It is “give the task to an agent, then get back an answer grounded in your company, the outside world, and every source it touched.”
Next, Microsoft moved from context to runtime.
Satya said agents are a new execution environment because they reason continuously, generate and run code dynamically, and take action across files, devices, and networks.
So Microsoft introduced Microsoft Execution Containers, or MXC, through its Windows agent runtime updates.
A container is a controlled environment where software can run without getting full access to the entire system. Developers already use containers all the time on servers. Microsoft wants a Windows-native version of that idea for AI agents.
Satya said MXC is a new policy layer that lets Windows apply isolation and containment using OS-native primitives. In normal-person language: Windows itself enforces what the agent is allowed to do.
MXC supports different levels of containment:
That gives teams a menu of agent safety options. Small task? Use a lighter container. Risky workflow? Put the agent in a separate Cloud PC.
So Microsoft wants agents to have a fenced-in workspace on your device or in the cloud.
Microsoft then announced OpenClaw on Windows, leveraging MXC.
The demo was memorable because it showed the thing everyone worries about: an agent trying to delete files.
The OpenClaw Windows app can help users set up their own Claw or connect to existing ones. It looks like a native Windows app and includes chat, Canvas, usage, sessions, permissions, and sandbox configuration.
The presenters showed granular security controls, including what files and folders OpenClaw can access, whether it can use the clipboard, and whether it can talk to the internet.
Then they asked OpenClaw to delete all the files on a desktop. The agent tried. MXC stopped it because the folder was read-only. The demo showed 94 JPEGs were still safe.
That is exactly the kind of boring demo enterprise AI needs. The agent tried to do a bad thing, and the operating system enforced the rule anyway.
Peter Steinberger, the creator of OpenClaw, then said the obvious part out loud: six months ago, that would have worked. He built OpenClaw to have access to files, machines, and chats. That power is what made it useful, and it is also what made companies nervous.
His news was that “you can run it inside your company now”. He said OpenClaw now has observability, auto mode for permissions, more granular access, a plugin harness, support for whatever tools companies already trust, persistent memory, heartbeats, and OpenClaw inside Slack or Teams.
So OpenClaw on Windows is Microsoft’s security point in miniature: powerful local agents need OS-level containment, not just “please behave” prompts.
Then Satya moved to Microsoft Foundry, which Microsoft is building into a full application platform for the agent era.
Foundry is Microsoft’s platform for building, testing, deploying, and managing AI apps and agents. Think of it as the place where companies take a cool demo and turn it into something IT might actually approve.
Satya said Foundry Hosted Agents can include IQ layers, tools, durability, memory, state, sandboxing, rubrics, evaluations, safety, and guardrails. He also emphasized Foundry’s self-improvement loop: build an agent, evaluate it, improve it, and let the next run feed the next evaluation.
Microsoft also announced a partnership with Fireworks AI, bringing Fireworks’ open-weight models to Foundry with an inference stack and enterprise rails.
The Foundry update also includes hosted agents, sub-100ms sandbox cold starts, zero idle cost, Foundry toolboxes, tracing, evals, Agent Optimizer, Adaptive Evaluations, and one-click publishing to Teams and Microsoft 365 Copilot.
That is the difference between “watch this agent do a thing on stage” and “our company might let this agent touch customer data.”
Microsoft’s edge is not that it can make the flashiest demo. Its edge is that it can wrap the demo in Entra, Intune, Defender, Purview, Fabric, Teams, GitHub, Azure, and all the other systems enterprises already bought.
Then Satya turned to GitHub.
He said GitHub is becoming the control plane for all the agents, and that nearly everything GitHub measures is growing because of agentic workflows: repo creation, PR activity, API usage, and actions.
He also made a good point about the CLI. The command line is approachable again because models and natural language make it easier to use. And if every agent lives in a terminal, the developer eventually ends up with 100 CLI sessions open.
So GitHub built the new GitHub Copilot app: a tool with the speed and flexibility of a CLI, the capability of an IDE, and the ability to scale to many agent sessions. The GitHub repo is live for releases, issues, and discussion.
Corey’s read is right: this is Microsoft’s answer to Codex and Claude Code.
The big caveat, and maybe the smarter platform move, is that Copilot is not only a Microsoft-model wrapper. In the demo, Cassidy said developers have access to popular models through a single GitHub Copilot subscription, including OpenAI, Anthropic, and Google.
So Microsoft’s pitch is not “use our coding agent instead of theirs.” It is closer to: use GitHub as the place where all these coding agents actually do the work.
The Copilot app demo started with a practical pain point: a release has lots of blockers, and a developer needs to decide what to fix.
Cassidy selected all of them, and the app kicked off a separate session for every issue. Each session runs in its own git worktree.
A worktree is a separate working copy of a code branch. In normal-person terms, each agent gets its own little workspace, so multiple agents can work in parallel without stepping on each other.
Then Copilot showed Agent Merge, which “babysits” a pull request through CI checks, code review, and merge conditions.
Continuous integration, or CI, is the automated testing system that checks whether code changes broke anything.
The app also includes:
The canvas demo even included approving PRs with a thumbs-up gesture through the camera. That part felt more experimental, and the larger point is more durable: working with AI in 2026 should be more than chat logs and walls of text.
So code agents create a new bottleneck. If ten agents can generate ten pull requests, a human still needs to review them without turning the afternoon into a spreadsheet of robot homework.
Copilot is moving from pair programmer to project coordinator.
Microsoft also previewed Rayfin, an agent-first SDK that connects agents to enterprise backends as a service.
Here is the plain-English version: coding agents can now generate app front ends very quickly. The slower part is turning the app into something a company can actually run. You need identity, storage, databases, schemas, access policies, governance, and deployment.
Rayfin is designed to collapse that work.
In the Copilot demo, Cassidy showed a 100% agent-built SignalBox app that was containerized and had a database backend. Then she typed “Rayfin up,” and the app deployed to Microsoft Fabric.
With Rayfin, Microsoft says agents get a complete enterprise backend, so developers can deploy with confidence. The Rayfin repo is live, and it connects to Replit, which means developers can build the app in Replit while the app and data deploy into an enterprise-managed Fabric tenant.
So Rayfin is Microsoft’s answer to a very practical coding-agent problem: making the thing run is harder than making the thing appear.
After the developer workflow, Microsoft turned to governance.
Satya described Agent 365 as the agent control plane. Agents need their own identities, access controls, security, data protection, compliance, and management.
Microsoft said it extended:
Satya also emphasized that agents can be hosted anywhere: Azure, AWS, GCP, local Windows, or elsewhere, and they can be built with any framework. Microsoft announced the general availability of the Agent 365 SDK and said it is expanding to local agents running on Windows and other systems.
The demo showed how Foundry and Agent 365 compose together.
First, Foundry Toolbox lets developers add tools once and expose them to any agent through a single MCP endpoint. The demo applied one guardrail that blocks personally identifiable information, or PII, from leaking, and then all agents using that toolbox inherit the protection.
Then the agent deployed to Foundry with one block of code and GitHub Actions.
Foundry spun up a dedicated microVM for the session. A microVM is a tiny virtual machine for isolating a workload. The session also got its own persistent file system, and Foundry showed server-side traces and evaluations.
Then Microsoft showed rubric evaluators. Foundry read the agent and generated personalized evaluation criteria from production traces, including governance, outcome correctness, and prescribed source usage.
Then Agent Optimizer tuned four things: the model, instructions, tool descriptions, and skills. It generated improved candidates, scored them against the rubric, showed exactly what changed, and made it easy to deploy the best candidate as a new agent version.
The key line was that every run feeds the next eval, and every eval tells the optimizer where to improve next. So your agents get better the more they are used.
Then the agent was published to Teams and Microsoft 365 Copilot. It was an autopilot agent, meaning it had its own identity and productivity license and could work across M365 on its own behalf. Admin approval is required, and admins can review, monitor, and block it.
So Agent 365 is Microsoft’s governance answer to the obvious enterprise question: what happens when hundreds of agents start acting like employees?
Microsoft also showed MDASH, its multi-model agentic security system.
Satya described it as an agent harness for security, bringing together more than 100 agents across frontier and custom models to find exploitable bugs better than any single model can. It debuted on the CyberGym benchmark.
The demo showed MDASH running as a standalone CLI inside the GitHub Copilot app on a local dev machine. It scanned a codebase, broke results down by vulnerability, domain, and severity, and identified both traditional issues and AI-specific vulnerabilities.
Then it generated logs and an HTML report, showed details through a Defender command, suggested a fix, and produced a diff so a human could review what changed. It could also create a PR and upload results to tools like GitHub Advanced Security.
The strongest example was a real vulnerability where no single file looked wrong on its own. The flaw was spread across three parts of the codebase. One team of agents spotted the suspicious gap, another argued it apart, and a third built a working example that triggered the crash.
That is exactly the kind of joined-up reasoning normal scanners and single models can miss.
Microsoft said MDASH is coming soon to the CLI and the Microsoft Defender portal.
Later in the keynote, Satya described Microsoft’s broader Copilot direction.
He said Copilot started with chat, then moved to Cowork for multistep tasks and artifact generation, and now coding is coming into the same product. This summer, Microsoft plans to bring chat, cowork, and code into one Copilot “super app.”
That is the consumer-facing version of the same stack story. Microsoft wants Copilot to become the front door for agents across work, code, and collaboration.
Then Microsoft introduced Copilot Autopilots, which Satya called “enterprise-grade claws.”
Autopilots are autonomous, long-running agents with full enterprise compliance that run in your tenant. They can have:
The first autopilot is Microsoft Scout. Scout works where you work: Teams group chats and Outlook threads. It is available starting now for Copilot Frontier users, and Microsoft said it plans to build out a full digital team of autopilots in Copilot over the coming months.
So the point is not that Scout can answer questions. The point is that Microsoft wants Copilot to shift from “chat when asked” to “work in the background with permission.”
Microsoft also introduced Frontier Tuning, which is aimed at companies that want models adapted to their own workflows, policies, data, and style.
Satya framed this around tacit knowledge: the know-how that compounds inside a company through operations, habits, processes, and judgment. His question was: what is the future of the firm when models can learn from data and trajectories?
Microsoft’s answer is that every organization will need to build its own “hill-climbing machine.”
That phrase is worth unpacking. Hill climbing means continuously improving toward a goal. In this context, it means a company builds private evaluations, private reinforcement learning environments, private traces, and private workflows, then uses them to tune agents toward its own objectives.
A foundation model is trained to be broadly useful. It can write, code, summarize, and reason across lots of topics. The tradeoff is that it usually does not know your internal APIs, house style, product rules, or compliance requirements unless you explain them again and again.
Frontier Tuning is Microsoft’s answer to that gap.
Then Mustafa Suleyman took the stage and reframed Microsoft AI’s model work around “humanist superintelligence”: state-of-the-art AI capabilities designed to serve people and organizations, rather than replace them.
He also said compute used to train frontier models has increased by one trillion-fold in 15 years, or 12 orders of magnitude. He predicted three more orders of magnitude of compute will be applied to train frontier models in the next few years and said the scaling laws are holding.
Then Microsoft announced seven new models across image, voice, transcription, reasoning, and coding.
That is a shift because Microsoft has spent the last few years being seen as the enterprise delivery system for OpenAI’s models. You bought Microsoft 365, and OpenAI-powered Copilot showed up inside Word, Excel, Outlook, Teams, and GitHub.
So Build 2026 showed a more independent Microsoft.
The first new model was MAI-Image-2.5, along with a Flash variant.
Mustafa said MAI-Image-2.5 and Flash deliver a step change in quality, ranking #2 and surpassing Nano Banana 2. He said the models offer precise editing with control and consistency. Flash is designed for efficient production workloads, while 2.5 is for maximum fidelity and professional-grade performance.
MAI-Image-2.5 is live in PowerPoint today, rolling out to OneDrive, and available in Foundry.
Then came MAI-Transcribe-1.5. Mustafa called it the best transcription model in the world, with state-of-the-art accuracy across 43 languages, beating Gemini and OpenAI’s flagship models. He said it produces transcripts 5x faster than rival models and is integrated into GitHub Copilot, Dynamics 365 Contact Center, and Foundry.
Then Microsoft announced MAI-Voice-2 and MAI-Voice-2-Flash. MAI-Voice-2 is the latest speech generation model, with natural delivery, fine-grained control, and 15 languages. Voice 2 Flash is built for speed and value in ultra-low-latency voice.
So these are not only lab models. Microsoft is wiring them straight into PowerPoint, OneDrive, GitHub Copilot, Dynamics, VS Code, and Foundry.
Then came the model Corey flagged: MAI-Thinking-1.
A reasoning model is built to spend more time working through multi-step problems instead of answering immediately. Microsoft described MAI-Thinking-1 as its first reasoning model, targeted at reasoning and coding tasks.
Mustafa said it is a 35B active-parameter model, meaning it competes in the “medium-sized weight class.” Quick translation: “35B active parameters” means the model uses about 35B internal knobs while answering a given request. It is smaller than the largest frontier models, and Microsoft is pitching it as efficient enough to be useful in real products.
So Corey’s note about 45B active parameters looks like a mishear. The keynote transcript says 35B active parameters.
The benchmark claims were bigger than expected:
SWE-Bench Pro is a hard coding benchmark. The key claim is that Microsoft’s medium-weight reasoning model is landing in the neighborhood of much larger, premium frontier coding systems, at least on this benchmark.
Mustafa also said the model was built from the bottom up, with zero distillation and no benchmark targeting. Distillation is when one model learns from another model’s outputs. Microsoft’s claim is that MAI-Thinking-1 was built with enterprise-grade clean and commercially licensed data lineage, so companies can put it into production with more confidence.
So Corey’s vibe read makes sense: Microsoft seems to have found a lane. It may not need to build the single biggest general model in the world if it can build efficient, clean, product-fit models tuned into Microsoft’s stack.
Then Microsoft announced MAI-Code-1-Flash.
Mustafa described it as an inference-efficient coding model tuned for VS Code and GitHub CLI. He said it hit 51% on the coding benchmark despite having 5B parameters, making it close to Haiku in size while delivering strong coding performance at lower cost.
It is rolling out inside VS Code.
That is the model Microsoft probably wants everywhere: cheap enough to use a lot, good enough to handle routine coding tasks, and integrated where developers already work.
Microsoft also said the new MAI models will be available on OpenRouter, Fireworks, and Baseten.
Mustafa said this means developers will be able to tune the weights directly in an ecosystem of their choice.
That is notable because it makes MAI less locked to Microsoft’s own interface. Microsoft still wants Foundry to be the enterprise platform, and it is also letting developers access and tune models through other popular model platforms.
Across the model family, Microsoft said safety and security are built in from the start:
Then Mustafa added a full-stack point: Microsoft has been co-designing models with its own silicon. He said MAI-Thinking-1 was optimized on Microsoft’s Maia 200 chip and benchmarked against GB300, with a further 1.4x performance-per-watt gain when running the MAI model end to end.
He also said faster and more efficient MAI models are coming to the N1X platform on Windows “in a few months.”
So the broader claim is this: owning the full stack lets Microsoft tune the model, silicon, product, and customer workflow together.
Then Mustafa connected models to Frontier Tuning.
He said the big new piece is reinforcement learning environments, or RLEs. He called them “unique training gyms” for AIs. They create company- and task-specific agents, adapted only to you, built on MAI models.
The Excel example was striking: Microsoft used RLEs and MAI models to climb toward the best agentic uses in Excel. Mustafa said the resulting model is on par with GPT-5.4 on public and private benchmarks while being 10x more efficient on cost.
Then he cited McKinsey: Microsoft tuned models on McKinsey’s tasks, delivered the highest win rate, outperformed GPT-5.5, and delivered 10x greater cost efficiency.
The enterprise pitch is that your RLEs, workflows, know-how, knowledge, institutional data, and tuned models stay yours. Mustafa said those models become your moat.
That is the cleanest version of Frontier Tuning: do not only rent a frontier model. Build a private training loop around your company’s actual work.
Then Microsoft announced a partnership with Mayo Clinic to create a frontier model for healthcare and deploy it in Mayo hospitals and beyond.
Mayo CEO Dr. Gianrico Farrugia said Mayo Clinic Platform has spent seven years moving healthcare from a pipeline to a platform. He said the platform is in four continents, reaches about 100M people, and has created what he described as the largest longitudinal healthcare dataset in the world, including multimodal and genomics data.
The goal is to combine Microsoft’s model work with Mayo’s clinical practice and expertise.
For patients, the model could provide clinical and logistical answers about healthcare. For providers, it could act as a real-time team member, give insight, predict what is likely to happen next, prevent harm, increase patient safety, and make clinical teams better at delivering care.
This was one of the most important announcements and also one of the hardest to evaluate from a keynote. Healthcare models need more than strong demos. They need safety, validation, workflow integration, liability clarity, and clinical trust.
Microsoft and Mayo framed the goal as safe, secure, trustworthy, and effective healthcare solutions.
So this belongs in the story because it shows where Frontier Tuning goes when the stakes are highest: not just company-specific writing style, but clinical expertise and patient-care workflows.
The Frontier Tuning demo made the concept more concrete.
MAI-Thinking-1 is available in private preview in the Foundry model catalog. Users can deploy it as-is or fine-tune it.
The demo then used Land O’Lakes as the example. The presenter showed how an environment can include skills, knowledge, and tools.
The task was butter report generation, which required many manual steps and a high degree of precision. The presenter said almost 80% accuracy is not good enough for those tasks, so Microsoft is extending the definition of skills to include rubrics on “what good looks like.”
A rubric is a scoring guide. It tells the model how to judge whether the output was good.
Then the demo showed how Microsoft can use M365 signals from Teams, Outlook, Word, Excel, and PowerPoint to suggest skills and rubrics based on how the organization works. Users can add organizational knowledge from OneDrive and SharePoint. Tools can also be added.
The important safety detail was simulation: because the tools tap into real workflows, Microsoft simulates execution so the model can learn without affecting the live state of the business.
The demo claimed Frontier Tuning helped hill-climb Land O’Lakes tasks to more than 90% accuracy and estimated the tuned model to be 10x more efficient than baseline models.
The cached response “felt undoubtedly Land O’Lakes,” which is the actual enterprise goal. A tuned agent should not sound like a generic consultant in every company. It should understand the way that specific company works.
Satya summarized the shift after the demo: companies are moving from consuming frontier models to participating at the frontier.
That means private evals, private outcomes, private RLEs, traces, enterprise knowledge, scaffolding, and models all working together so the company can hill-climb toward its own objectives.
That is probably the most Microsoft-y idea in the whole keynote. The future of enterprise AI is not only which model is best on public benchmarks. It is whether a company can build a private improvement loop around the work only it knows how to do.
Then Satya moved to science.
He asked what would happen if the scientific method could become more continuous and programmable. Microsoft Discovery is the answer.
Microsoft Discovery is now generally available. Satya said it brings together models, HPC compute, knowledge graphs, scientific knowledge, automated labs, and simulation into one agentic discovery loop.
HPC means high-performance computing. A knowledge graph is a structured map of concepts and relationships. So Discovery is basically Microsoft’s attempt to make scientific work run more like agentic software development: define the task, let specialized agents work, run simulations, generate candidates, create protocols, and connect to labs.
The demo used plastic recycling. Today, recycling a plastic bottle often means downcycling: shredding and melting it into lower-value material. The demo asked whether proteins could help recycle the material again and again.
Inside the Microsoft Discovery app, which looked like VS Code, the scientist asked for three things:
The Discovery Engine used specialized agents running in the background and following the scientific method. The presenter said runs can take hours or days.
The system generated a research paper, created a custom agent on the fly, used HPC to generate protein candidates, explored millions of protein variations, and selected 80 proteins to send to the lab for testing.
Then it generated DNA sequences and submitted lab instructions through a custom agent connected to an automated lab, with most steps automated and human supervision still in the loop.
That is a big claim: physical science, simulation, agents, and automated labs in one loop.
So Discovery may be one of the more important long-term announcements, even if it is less immediately practical for most readers than Copilot or Surface hardware.
The final technical announcement was Majorana 2, Microsoft’s next-generation quantum chip.
Quantum computing uses qubits instead of normal computer bits. A normal bit is either a 0 or a 1. A qubit can hold more complex quantum states, which could eventually help computers solve certain problems that are extremely hard for today’s machines.
The issue is that qubits are fragile. They lose their quantum state easily, which makes reliable quantum computing very difficult.
Satya said Microsoft’s Majorana approach is focused on the fundamental barriers to a scalable quantum machine: reliability, speed, and size.
Microsoft said Majorana 2 qubits can maintain their state for 20 seconds on average, and up to a minute. Satya said that is roughly 1,000x higher than Majorana 1.
He also said Majorana 2 keeps the same cubic form factor as Majorana 1, with one-hundredth-of-a-millimeter digital control, making it possible to fit 1M qubits on a chip smaller than a credit card.
So yes, this still sounds like science fiction. And Microsoft is trying to make the case that the sci-fi part is getting closer to useful engineering.
Build 2026 was Microsoft’s clearest agent strategy yet: make agents real by giving them computers, context, guardrails, and deployment rails.
The surface story is product launches. The deeper story is that Microsoft is trying to turn agents into managed enterprise infrastructure.
That is a very Microsoft bet.
It assumes the winning agent platform will be the one that can say yes to the compliance team. It assumes developers want local compute and cloud scale (uh, yes. They do). It assumes companies want agent work tied to identity, policy, tracing, and internal knowledge (also, this would be nice!). It assumes the future has many agents, many devices, and a control plane holding it all together.
The risk is complexity.
Microsoft has a strong stack, and it also has too many names. Copilot, Scout, Autopilots, Foundry IQ, Work IQ, Web IQ, Fabric IQ, OpenClaw, OpenShell, MXC, Windows 365 for Agents, Project Solara, Rayfin, Frontier Tuning, Agent Control Specification, Microsoft Agent Framework, Agent 365, MDASH, Microsoft Discovery, Maia, Cobalt, MRC, HorizonDB, MAI, Majorana, Aion Instruct, and Aion Plan, to name a few.
Ah, the curse of the hyperscale. Google and AWS have the same problem, TBH. There's just a lot of stuff to manage when you get to their scale.
A normal developer may look at that map and ask where the front door is. So the trick is going to be simplifying it all and making it as easy to navigate as possible.
That said, the best version of this future is compelling. An agent (ideally a local one as the tech catches up) sees your work context, uses the right model, runs safely on your device or in a Cloud PC, checks live web data, writes code, tests it, opens a pull request, follows policy, and leaves an audit trail.
The worst version is enterprise AI as a folder maze. Or as I like to call it, a spaghetti mess of software slop.
That means we are left with a single question: can Microsoft make the agent stack feel as obvious to use as it is powerful to describe?
If it can, Build 2026 may be remembered as the moment Microsoft stopped selling AI as a side panel and started rebuilding the computer around it. I for one am at least somewhat bullish on this proposition, at least when it comes to the impact locally accessible AI could have for regular folks ... as long as we can get the rest of the systems around it built up and working right.
Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved
Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.