Google’s new AI model translates vision, language for robot to perform actions
US-based global tech firm Google on Friday introduced a new artificial intelligence (AI) model that translates vision and language for robots to perform certain actions.
Robotics Transformer 2, or RT-2, is a vision-language-action model trained on text and images from the internet, which can learn general ideas and concepts and then transfer that knowledge to inform a robot’s behavior, Google said in a blog post.
RT-2 can enable a single model to perform complex reasoning, and provide output robot actions, in addition to transferring concepts to direct a robot’s actions, it added.
“Unlike chatbots, robots need ‘grounding’ in the real world and their abilities,” the company said, adding that RT-2 will provide knowledge for a robot that can complete tasks such as picking up apples or throwing out the trash.
“In other words, with RT-2, robots are able to learn more like we do — transferring learned concepts to new situations. Not only does RT-2 show how advances in AI are cascading rapidly into robotics, it shows enormous promise for more general-purpose robots,” it added.
In June, Google introduced a self-improving AI agent for robotics called RoboCat, saying it can learn to perform a variety of tasks across different arms and then self-generates new training data to improve its technique.
While RoboCat can pick up a new task with as few as 100 demonstrations by drawing from a large and diverse dataset, Google said this capability will help accelerate robotics research since it reduces the need for human-supervised training, dubbing it an important step towards creating a general-purpose robot.