Business

Physical AI and world models. We explain how they differ from regular AI


From the perspective of an outside observer of the world of new technologies, the turn of May and June did not differ much from other periods. More language models, new versions, outdoing each other in benchmarks.

Those who were a bit more attentive, however, noticed individual announcements occurring in a relatively short period of time… from at least several key entities in the artificial intelligence industry. All of them were about one thing – they concerned what we now call “physical AI” and robotics, which is expected to be the biggest beneficiary of progress in this area.

On the one hand, the stream organized by Figure AI has come to an end, during which a robo-worker sorted parcels for 200 hours. The company is not widely known, but for about four years it has been doing truly spectacular things, and their robots are among the most “human” ones. It is not without reason that one of their models recently accompanied Melania Trump to one of the events in the White House, and also for good reason they obtained financing for several hundred million dollars. from OpenAI, Intel, Microsoft, Nvidia and Jeff Bezos' private foundations. Moments after the broadcast, they signed another contract with a large partner (the first was BMW) – Catalyst Brands – to operate logistics centers.

Shortly thereafter, news spread around the world that a similar transaction had taken place in China, where China Post had employed a large fleet of robots to handle parcels.

The presence of machines in this type of places is not a huge surprise nowadays, but what should attract attention is the form the machines take. This one is… humanoid. While robotic arms, lifts, ride-ons and the like have already become part of the landscape, human-shaped robots are something of a novelty. Imitation of human stature, dexterity and capabilities it introduces a completely different dynamic in such places, because it allows machines to deal with challenges that require features that have so far been very “human”. But it also allows them to function in an environment that was previously adapted only to the presence of humans.

Nvidia has also recently made a contribution by announcing a number of new products in the field of robotics, including: cooperation with the Chinese Unitree and Sharp, providing the new Cosmos 3 model created with robots in mindas well as providing the first open reference design created for the development of robotics. She combined all this in one machine aptly named Nvidia Isaac GR00T Reference Humanoid Robot.

Around the same time as all these announcements were made from the stage at Taiwan's Computex, Sam Altman and other OpenAI employees launched a communications offensive encouraging people to work in the robotics section.

OpenAI has a new plan

Altman, in a post on the X platform, wrote several key words that clearly indicate that OpenAI wants to invest heavily in this branch in the near future and plans to create its own machines equipped with multimodal models, allowing them not only to communicate verbally, but also to see and hear.

— We are looking for people who will help us program and produce robots useful to society — Altman wrote directly.

Explaining his motivation, he also added the vision behind making robotics one of the strategic foundations of the entire company: – Artificial intelligence should be able to help people in the physical world. In the short term, we are focusing on robots that will support skilled workers in building the infrastructure of the future.

He concluded his plans for the future as follows: — In the long term, we imagine that everyone will have a personal robot that will do everything we expect it to do.

This is a vision shared by other big tech companies, and we discussed it in more detail in a separate material devoted to when AI can replace professions such as plumbers or electricians. In the face of recent announcements and premieres, the dates indicated there, although they may have seemed “unrealistic” to some, have just gained additional validation. Physical AI is no longer another distant vision, but a new, hot trend among companies dealing with artificial intelligence.

We are currently on a very dynamic growth course, which, despite numerous doubts about its direction, social readiness and responsibility, does not seem to be slowing down in any way. Language models and the so-called Generative artificial intelligence took about 3-4 years to turn everything upside down. Now, with many times more money and huge investments in this industry, physical AI and world models may need even less time to trigger the next earthquake. But… what are both of these concepts, still new in the technological reality?

A new paradigm of artificial intelligence

Physical AI is essentially a completely new category of specialized models that are designed from scratch to perform specific actions in the real world. For proper operation, it must process in a fraction of a second gigantic amounts of data from various sources: from advanced camera vision analysis, through constant adjustment of movement trajectories, to precise control of arm and gripper controllers in dynamically changing conditions. A machine equipped with this type of solutions no longer operates on a single plane, and the resultant of its actions are not only commands, but also reactions from the outside world.

A huge number of variables that appear in the human environmentu, requires a completely new approach to training such AIwhich is why these models are often trained on huge sets of video data recorded from a first-person perspective. Many companies that collect such materials are willing to pay a lot for them. Algorithms analyze the movements of human hands and reactions to stimuli to precisely observe how a person deals with physical challenges. And then it develops its own imitation of the action, with any minor corrections that are transferred to the movements of the mechanical shell.

World models are a slightly different category. Closely related, of course, to physical AI, somehow providing it with the knowledge needed to take actions, but at the same time able to function without a mechanical “body”. Although they partially share nomenclature with language models, they are completely different from them. The way they work actually corresponds to the way the human brain workswho, being in a certain situation, independently interprets it, analyzes it and then makes a decision based on the expected consequences. To better illustrate this, let's use an example:

In a situation where you ask the AI ​​running on LLM (pl. Large Language Model) what will happen if you push a vase off the table – the language model will answer that the vase will fall, break, and you will have to clean it up. He knows this because he relies on the knowledge that was uploaded to him during the training stage and predicts “what he should answer” based on probability calculation. If he has received the right input, he will be able to develop it. World models do not need these input data. The first time you ask them what will happen to the vase… they don't know the answer. However, to obtain it, they use something else. They are trained and educated in such a way that they understand the basic principles that govern the world. Knowing the most important rules of e.g. physics, they are able to conduct their own simulation of the object's behavior, which will allow them to assess the effects of such action. In an extreme case – if it has a physical shell and there are no software restrictions in this matter – he could even drop the vase for the purposes of the experiment and obtaining data.

To put it very simply, the language model obtains knowledge from the data it is fed with. World model – acquires knowledge through observation and drawing conclusions.

One step closer to AGI

An artificial intelligence that thinks independently and interprets its environment, capable not only of generating and reproducing descriptions of the world, but also understanding the laws governing it, sounds dangerously close to something we call AGI, i.e. strong artificial intelligence. These are not convergent concepts, but in fact world models constitute a key stage in the development of this technology, bringing us much closer to achieving “full” AI, which will free itself from current limitations and will be able to think abstractlyno different from what we have always attributed only to people.

Yann LeCun, one of the fathers of today's artificial intelligence, directly admits that we will never develop AGI based solely on language models. Just a dead end, which will of course seem like a highway for a few years, but which will reach its limits sooner rather than later. The fundamentals of the technology behind the LLM will never allow you to surpass them and if the world of big tech dreams of another breakthrough, it must work on much more “understanding” and “conscious” models.

By combining physical AI, capable of operating in the real world, with knowledge acquired on an ongoing basis thanks to integrated world models, new generation humanoid robots They will soon start to appear not only in warehouses and factories, but also in other, more public places… including, probably, our homes.

Are we ready for this? Of course not. But we weren't ready for the previously “simple” AI, and yet it dominated our reality, changing the everyday lives of not millions, but billions of people.

Ashley Davis

I’m Ashley Davis as an editor, I’m committed to upholding the highest standards of integrity and accuracy in every piece we publish. My work is driven by curiosity, a passion for truth, and a belief that journalism plays a crucial role in shaping public discourse. I strive to tell stories that not only inform but also inspire action and conversation.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button