The Text Box is the Command Line of AI

2026-05-297 mins

I'm sitting in front of an LED screen, staring slightly lower than eye level. I can feel my neck starting to hurt. I've been in this position for hours, and I'll be in it for hours more. Because right now, I am working by interacting with the most powerful tool humanity has ever built — AI.

Sounds absurd, doesn't it?

Computational intelligence has grown exponentially, and the interfaces are still where they were in the 1980s.

Right now interacting with intelligent machines means writing essays into a text box. I understand it. I am a very frequent AI user. But we have to remember: this is where we landed. And just because we landed here with LLMs doesn't mean it's natural, or correct, or the future.

I think the opposite. This couldn't be further from the future.

There are two parts to where we go next. The near future, where AI escapes text. And the further future, where interaction itself turns physical.

A. Near future: AI escapes text

Think about what you do when you are interacting with AI today. You sit down, stare at a blank box, and try to explain something probably visual, physical, or conceptual in paragraphs.

We are constantly translating.

We spend enormous amounts of effort converting what we mean into what a machine can understand, instead of just showing it.

That's incredibly unintuitive. And the reason we ended up here is simple: LLMs became the first AI technology that developed rapidly and became AI as we know it today.

I like to think of this as the command line era of AI.

The command line was the first interface that worked. It was unforgiving. You had to know the language, you had to type the right thing, and the machine sat there waiting for you to do all the work.

I think we are exactly there with AI right now.

What I think is coming is something more like a presence than a destination.

The AI stops being a place you go to. It is already there, seeing what you see, hearing what you hear, understanding the context you're in without having to be told. Sometimes it answers in voice. Sometimes it draws something. Sometimes it generates a small interface, built on the fly, for the specific thing you're doing in this specific moment. The interaction model reshapes itself around the task, instead of forcing the task into a text box.

The clearest picture of this is still Jarvis. Yes, that Jarvis. The one that made a whole generation want to become engineers.

Tony Stark doesn't open an app. He doesn't type. Jarvis is just in the room, watching the work, answering when asked, sometimes acting before being asked. The interface is whatever the moment needs.

Even though by today, May 2026, this is not reality, the pieces for this are already here. Vision is good enough. Voice is good enough. Screen-sharing with an AI is real. Generative UI is starting to show up in real products. So the solution that will glue all of these is defining the interaction model of AI.

A useful way to think about this: we are between eras. The text box is the command line of AI. Powerful, expert-feeling, the right tool for some things, but not the door most people should be walking through. What we're waiting for is the GUI moment. The moment the same intelligence becomes reachable through something that meets you where your senses already are, in the context you're already in.

This is incredible progress. Even with all of this, the laptop is still the medium. So this progress gives way for something else…

B. Future: Interaction turns physical

Now in this 'Jarvisian' future, you can interact way easier than before and now it's way more intuitive for you to just show things. Only problem is: you are still sitting on a chair, staring at an LED screen. All the interaction here is still living on a flat plane.

The next move is leaving the plane.

Now, I'm not talking about strapping a screen to your face. VR and AR still put a layer between you and the world — you're looking through something. I mean computation that lives in the actual room with you, that you meet with your bare senses. Spatial.

And this is a direction. What I am going to talk about from now on is unsettled and might stay unsettled for a while. But it is an important direction.

When we are talking about good design, a design being intuitive, we always compare it against the physical world.

What does a person expect this to react in the real world, and how does this react in this abstraction of the real world? If the difference is smaller, the design is better.

This naturally takes us into the next step of computational interfaces — carrying interface into the physical world.

And here's the thing — we already have all the actuators we need. Our hands, our voice, our body moving through space. We've had them the whole time. We just keep building abstractions on top of them.

A mouse cursor is an abstraction of our pointed finger. A keyboard is an abstraction of our voice.

So what would happen if we remove the abstractions?

Carrying computation into the physical world.

First of all, it would be the healthiest for our bodies. It would be the most intuitive.

There are already some interesting research projects around this. Like Bret Victor's Dynamicland. The project is about bringing computation into the physical world by using projectors.

There is another tech that I believe will be very useful in the future of physical computation, HoloTiles. This is a new kind of floor Disney is developing. You can walk on the same place and in any direction and you just stand where you are all the time, because the floor applies exactly the same force in the opposite direction. Pretty mechanical, and when I first saw it I was like 'damnnn… now this is the first tech I've seen that made us a little bit closer to holodecks.'

A holodeck (from Star Trek) is a room you walk into where a computer program becomes a physical world around you.

But the Star Trek version leans on one piece I don't think we'll figure out anytime soon: conjuring solid matter out of nothing. Some of what you touch in a holodeck is real replicated stuff, generated on the fly. That part stays science fiction for a long while.

So why do holodecks matter if we can't even build them?

Because they're the cleanest version of the thing we've been walking toward this whole time. Strip away the matter replicators and what's left is still the north star: a room that is a computer. Not a computer you sit in front of. A computer you stand inside.

Think about what that actually collapses. Dynamicland gives you computation living on surfaces, on objects, on the things in the room with you. HoloTiles gives you the ability to move through that computation without ever leaving the room. Put those two together and you don't have a screen anymore. You have a space. Not solid matter conjured from nothing — but light thrown onto the real objects and surfaces already in the room, and your real body moving among them. You run a program and the program is around you. You walk through it. You reach into it. You move things with your hands, because your hands are the interface now.

That's the whole point of this direction. Every step we've taken has been about shrinking the gap between what you expect to happen and what happens. The mouse was closer than the command line. Touch was closer than the mouse. And the closest possible thing — the asymptote — is just the world reacting like the world. You don't learn a holodeck. You already know how to be in a room.

Will we get there? I have no idea. This might stay unsettled for decades. But the direction is real, and I think it's the right one.