SHARE

🙀 Google, Perceptron, and Thinking Machines Are Trying to Kill the Prompt Box

Google’s Magic Pointer, Perceptron Mk1, and Thinking Machines’ interaction models all point toward the same shift: AI interfaces are moving from “type a perfect prompt” to “show the system what you mean.”

Written By

Grant Harvey

May 13, 2026

7 minute read

So in AI, there’s this thing called the prompt box.

You probably know it well. It’s the empty rectangle where you take the messy thing in your head, translate it into computer-friendly instructions, paste in the context, add the file, clarify the goal, explain the edge cases, and then hope the model understood what “this” meant.

For the last two years, that box has been the default way most of us “use AI.”

The next wave of AI interfaces is trying to make the box optional.

First up, the TL;DR
The World As It Was: you had to translate your life into prompts
And Then One Day: the cursor learned what “this” means
Raising the Stakes: Google wants this everywhere, not inside one demo
The Moment of Change: AI is learning three new kinds of context
The World As It Is Now: the prompt box becomes the fallback
What to watch next

First up, the TL;DR

We have not been particularly impressed by anything interface-wise that Google has put out lately.

It seems much easier for Google to spin up cool Labs projects than to make serious improvements to the core tools most people actually use every day. But maybe that is because Google was building toward this.

Here’s what happened:

Google announced Gemini Intelligence for Android, which can automate app tasks, summarize and compare web pages, fill forms, turn messy dictation into polished messages, and build custom widgets.
Googlebook is Google’s new premium laptop category, built around Gemini, Android apps, Chrome, phone sync, Magic Pointer, and premium hardware.
Google DeepMind showed off Magic Pointer, a Gemini-powered cursor that understands what you are pointing at and can act on “this” or “that” without a full prompt.

Of those three, the mouse killer is the most interesting.

A normal cursor tells the computer where you clicked. Google wants the cursor to tell the computer what you mean. Point at a date in an email, and it can make a meeting. Point at a table, and it can turn the numbers into a chart. Point at a paused travel video, and it can find the restaurant on the map.

That is the real shift: the interface starts carrying part of the prompt for you.

And Google is not alone. Perceptron released Mk1, a model built to understand video as a stream of events, not a pile of screenshots. Meanwhile, Thinking Machines Lab previewed interaction models, which process audio, video, and text in tiny 200-millisecond chunks so the model can listen, watch, interrupt, and use tools in real time.

Why this matters: In the Genspark interview we’ll release later today, COO Wen Sang told us he made a public commitment to be in front of his computer 50% less by the end of this year.

That only works if AI stops making the computer the center of every interaction.

The end state of agentic intelligence should not be typing commands into Telegram forever. It should be the removal of every barrier between you and the technology, so you can interact with it however you best see fit.

Unless the servers go down, in which case RIP, welcome back to Notepad.

The World As It Was: you had to translate your life into prompts

The old computing bargain was simple: the human adapts to the machine.

Want to do something? Learn the interface. Click the right menu. Use the right keyboard shortcut. Type the right command. More recently, write the right prompt.

That’s why the prompt box became so powerful and so annoying at the same time. It gave normal people access to very capable systems, but it also made the user do a ton of invisible prep work.

You had to tell the AI:

What object you were talking about.
Which part of the screen mattered.
What app you were using.
What you wanted done next.
What counted as a good answer.

In other words, the prompt box made you act like a translator between your work and the model.

That’s the thing Google, Perceptron, and Thinking Machines are all trying to change.

And Then One Day: the cursor learned what “this” means

Google’s Magic Pointer is the cleanest consumer version of this idea.

We’ve had mouse pointers for roughly forever. You move the pointer, click something, and the computer records the coordinates. Very fancy. Very 1970s. Much wow.

Google’s question is: what if the pointer understood the thing under it?

That sounds small until you picture it in normal work:

Point at a paragraph in a PDF and say, “turn this into email bullets.”
Hover over a table and say, “make this a pie chart.”
Select two images and say, “merge these.”
Pause a travel video, point at a restaurant, and say, “find this place.”

The magic word is this.

Humans use “this” and “that” constantly because we share context. If I’m holding up a jacket and say, “do you like this?”, I don’t need to say, “do you like the navy-blue wool jacket in my right hand?”

Computers usually need the full version.

Magic Pointer is Google trying to give computers a shared context layer. The pointer becomes a way to say, “I mean the thing right here.” The model sees the pixels, understands the object, and connects the request to the right action.

The novelty is not the cursor. The novelty is that pointing becomes prompting.

Raising the Stakes: Google wants this everywhere, not inside one demo

This is why the Android and Googlebook announcements matter.

Gemini Intelligence is Google’s attempt to turn Android from an operating system into what it calls an intelligence system. That means your phone can do more of the annoying middle steps: automate app tasks, use screen context, fill forms, summarize pages, compare websites, and turn rambling voice notes into polished messages.

Googlebook is the same idea pushed onto the laptop. It mixes Android, ChromeOS, Gemini, phone sync, custom widgets, and Magic Pointer into a new premium device category.

And Disco, Google Labs’ web experiment, points in the same direction. Its GenTabs feature turns your open browser tabs into custom interactive apps. Its AI-enabled pointer can select page content, answer questions without leaving the page, jump to the best parts, highlight relevant information, open new tabs, and add useful bits back into your GenTab.

That’s not a random pile of features.

It’s Google trying to make the browser, phone, and laptop feel less like places where you manage files and windows, and more like places where AI already understands what you are trying to do.

The computer stops asking, “Please describe the full context.”

It starts saying, “I can see what you mean.”

The Moment of Change: AI is learning three new kinds of context

The reason this story is bigger than Google is that two other launches are attacking the same problem from different angles.

Perceptron Mk1 is about physical context.

Most vision models are good at snapshots. They can look at an image and say, “there is a ball,” “there is a forklift,” or “there is a handwritten note.”

The physical world does not happen in snapshots. It happens over time.

Perceptron’s pitch is that Mk1 can reason across video: what happened, when it happened, which object stayed the same after it disappeared behind something else, where a robot failed a grasp, or which moment in a sports clip deserves to be cut into a highlight.

The company says Mk1 can analyze video up to 2 frames per second across a 32K-token context window, return structured timecodes, and output spatial primitives like points, boxes, polygons, tracks, and clips.

Translated out of robot-speak: it can watch messy real-world footage and hand downstream systems something they can act on.

Then there’s Thinking Machines Lab, which is going after conversational context.

Normal chatbots are turn-based. You talk, then the model talks. If you interrupt, overlap, pause, gesture, or look at something important, the interface usually fakes its way through with extra software around the model.

Thinking Machines’ interaction models bake that interactivity into the model itself.

The model processes audio, video, and text in 200-millisecond “micro-turns.” It can listen while speaking, react to visual cues, interject when useful, keep a sense of elapsed time, and delegate harder reasoning to a background model without vanishing from the conversation.

Picture trying to solve a problem with a coworker in person versus sending a long email and waiting for the reply. That’s the difference they’re aiming at.

The World As It Is Now: the prompt box becomes the fallback

The useful way to understand all three launches is this:

Google is making screen context usable.
Perceptron is making physical-world context usable.
Thinking Machines is making interaction context usable.

Together, they point to a new interface pattern.

You should not have to stop what you are doing, open a separate AI window, and explain the world from scratch. The system should get more of the context from your pointer, your screen, your camera, your voice, your tabs, your timing, and your environment.

That also makes the Genspark point land harder.

When COO Wen Sang says he wants to be in front of his computer 50% less by the end of the year, that sounds unrealistic if the computer remains the place where all knowledge work must happen.

It sounds much more plausible if AI can follow work into other surfaces: AR glasses, phones, browsers, voice agents, cameras, cars, factories, and physical spaces.

The culture will take longer to move. Plenty of companies still define “work” as “human sitting in front of rectangle, performing seriousness.”

But the technical direction is clear.

The next great AI interface will not be the one with the fanciest prompt box. It will be the one that makes the prompt box feel like a last resort.

The best interface is the one that disappears.

What to watch next

The open question is whether these systems stay cute demos or become daily muscle memory.

Magic Pointer only matters if it works across enough apps without becoming another weird assistant bubble people ignore. Perceptron only matters if video understanding is reliable and cheap enough for real factories, cameras, robots, and media workflows. Thinking Machines only matters if real-time interaction can scale without latency, safety, and cost issues eating the experience alive.

Still, the direction feels right.

AI that works for humans should not require humans to contort themselves into perfect little prompt engineers. It should meet us where the work already is.

Point at it. Say what you want. Keep moving.

That’s the interface.