The Machine is Learning to Build Itself, And That Should Excite You

There's a peculiar kind of vertigo that comes from watching a technology outgrow itself in real time. We built the internet to share documents, and it became the nervous system of civilisation. We built smartphones to make calls, and they became the primary lens through which a billion people experience reality. Now, we've built AI systems smart enough that they're beginning to suggest how to build better AI systems, and we're just starting to grasp what that loop might mean.

The past few months have seen a quiet but remarkable acceleration across the field. Not just faster models or bigger benchmarks, but genuinely different directions: new architectures, new ways of seeing space, new classes of AI that act rather than simply respond. If you've been meaning to catch up on where things stand ,and why it matters beyond the headlines, this is that piece.

The Transformer had a great run. What comes next?

Most of the AI you interact with today: ChatGPT, Claude, Gemini, runs on something called the Transformer architecture. It was a genuinely brilliant invention, and for the better part of a decade it's been the engine behind nearly every major AI breakthrough. But Sam Altman, CEO of OpenAI, recently said something quietly significant: the Transformer is probably not the final answer.

The problem is scale. Transformers process information by paying "attention" to every part of a document in relation to every other part. That's powerful, but as inputs get longer, the computational cost doesn't grow linearly, it explodes. Feed a Transformer a short prompt and it hums along beautifully. Feed it a 500-page document and it starts to buckle under its own weight.

The most interesting part of Altman's prediction isn't that Transformers will be replaced. It's that today's AI is now smart enough to help us find what replaces them.

One promising alternative is called Mamba , an architecture designed from scratch to handle long sequences of information far more efficiently. Where Transformers grow more expensive with every additional token, Mamba scales gracefully. It's still early, and architecture shifts are messy, multi-year affairs. But the direction of travel is becoming clear: the next generation of AI will likely look quite different under the hood than what we have today.

Teaching machines to see in three dimensions

For most of AI's history, vision has meant looking at flat images. A photo is a grid of pixels; a model learns to classify or describe that grid. It works remarkably well, until you need the machine to understand that the world has depth, that objects have backs and undersides, that light behaves consistently across surfaces as you move around a scene.

That's changing. Apple's new model, Leto, can take a single 2D photograph and reconstruct a fully consistent three-dimensional object from it, complete with accurate lighting, reflections, and surface behaviour from any angle. Previous attempts at this tended to fall apart when you moved the virtual camera; the illusion would break, revealing that the model was essentially making things up. Leto holds together because it compresses thousands of possible views of an object into a single internal representation, then uses that representation to reason about angles it's never explicitly seen.

Meanwhile, a model called Inspiworld FM is approaching the same challenge from a different angle, building spatially consistent 3D environments specifically designed for robotics. What makes it unusual is that it runs in real-time on a single consumer GPU, the kind of card you'd find in a gaming PC. The ability to reason about physical space is no longer restricted to research labs with industrial computing budgets.

This matters more than it might seem. The gap between AI that understands text and AI that understands the physical world has been one of the biggest obstacles to truly useful autonomous systems. That gap is narrowing quickly.

From assistants to operators

Ask most people what AI is for, and they'll describe a conversation: you type something, the AI responds, you refine your question, it adjusts. That model is already becoming outdated.

The next wave of AI isn't designed primarily for conversation, it's designed for action. An agent called Manus "My Computer" runs directly on a user's machine, reading local files, running commands, controlling applications. It doesn't call out to a server to process things; it uses your device's own hardware, giving it low-latency, private access to your entire computing environment.

Then there's Z.A.I. GLM5 Turbo, which has been optimised not for coherent conversation but for reliable tool-use across extended workflows. Its error rate when calling external tools is 0.67%, vanishingly small for a system expected to chain hundreds of actions together in a single automated task. That's the kind of reliability you need before you can trust an AI to run a long, complex process without supervision.

The shift here is subtle but important. A conversational AI is something you use; an agentic AI is something you deploy. The mental model changes from "assistant" to "operator", and the implications for how we structure work are significant.

Not just writing code -> proving it works

As AI generates more and more code, a new question emerges: how do you know it's correct? Not just "does it run?" but "does it do exactly what it's supposed to do in every possible situation?" For high-stakes systems like medical software, financial infrastructure, aerospace controls , that distinction matters enormously.

Mistral's Leanistral tackles this head-on. Built on top of a formal language called Lean 4, it doesn't just write code, it constructs mathematical proofs that the code behaves as intended. If there's a flaw, the proof breaks. The model can also debug existing proofs, identifying precisely where the logical chain has snapped.

This is a different kind of intelligence than what we usually celebrate in AI. It's not creative or generative; it's rigorous and logical. The ability to verify that software is correct, at scale, could be one of the most quietly transformative capabilities to emerge from this generation of models.

Two years to AGI, and an AI in the corner office

Altman has always been comfortable with bold claims, but his recent predictions deserve some genuine reflection rather than reflexive scepticism or credulity. He believes AGI, a system with broadly human-level intelligence across most domains, could arrive within two years. He expects an explosion in programming agents, AI that writes, tests, deploys, and maintains software with minimal human involvement. And he raised the possibility, gently but seriously, of an AI CEO.

One person, equipped with the right AI tools, building something that previously required a corporation. That's the world Altman is describing. The interesting question isn't whether it's possible ,it's whether we're ready.

It would be easy to dismiss this as the familiar hyperbole of the tech industry, which has a long and embarrassing history of announcing things that are always "two years away." But the technical developments described above, self-improving research loops, spatial reasoning, reliable agentic workflows, formal verification, aren't hypothetical. They're shipping now, in various states of polish, to real users and developers.

What's genuinely uncertain is the pace and the shape. AI progress has surprised almost everyone, including the people building it. What feels like a gradual accumulation of improvements has a way of crossing thresholds that suddenly change what's possible, and we may be approaching one of those thresholds now.

The most useful thing you can do with all of this isn't to predict the future, it's to stay curious, stay informed, and resist the temptation to either panic or shrug. These are consequential technologies, and they deserve thoughtful attention. We'll keep trying to give them that.

The Machine is Learning to Build Itself, And That Should Excite You

Summarize and analyze this article with:

The Transformer had a great run. What comes next?

Teaching machines to see in three dimensions

From assistants to operators

Not just writing code -> proving it works

Two years to AGI, and an AI in the corner office

Read next

The AI Copilot Cost Crisis: When Coding Assistants Become More Expensive Than Developers

The €5/Month n8n Setup That Holds Up in Production

Google's Anti-gravity 2.0 Puts AI Agents in the Driver's Seat

I Use AI Every Day. Here's What Nobody Tells You About It.

The Machine is Learning to Build Itself, And That Should Excite You

Summarize and analyze this article with:

The Transformer had a great run. What comes next?

Teaching machines to see in three dimensions

From assistants to operators

Not just writing code -> proving it works

Two years to AGI, and an AI in the corner office

Read next

The AI Copilot Cost Crisis: When Coding Assistants Become More Expensive Than Developers

The €5/Month n8n Setup That Holds Up in Production

Google's Anti-gravity 2.0 Puts AI Agents in the Driver's Seat

I Use AI Every Day. Here's What Nobody Tells You About It.

Explore More Useful Tools

Plan Your Dream Home

Generate XML Sitemaps

Discover Your Life Path

Create Review Links

Generate Test Data

Add WhatsApp Button to Website