
"Google DeepMind says its upgraded AI models enable robots to complete more complex tasks - and even tap into the web for help. During a press briefing, Google DeepMind's head of robotics, Carolina Parada, told reporters that the company's new AI models work in tandem to allow robots to "think multiple steps ahead" before taking action in the physical world."
"To do this, robots can use the upgraded Gemini Robotics-ER 1.5 model to form an understanding of their surroundings, and use digital tools like Google Search to find more information. Gemini Robotics-ER 1.5 then translates those findings into natural language instructions for Gemini Robotics 1.5, allowing the robot to use the model's vision and language understanding to carry out each step."
""The models up to now were able to do really well at doing one instruction at a time in a way that is very general," Parada said. "With this update, we're now moving from one instruction to actually genuine understanding and problem-solving for physical tasks.""
Gemini Robotics 1.5 and Gemini Robotics-ER 1.5 enable robots to plan and execute multi-step physical tasks with embodied reasoning and vision-language capabilities. Robots can form an understanding of surroundings, tap digital tools like Google Search for location-specific information, and translate findings into step-by-step natural language instructions for execution. Tasks expanded from single instructions to complex workflows such as sorting laundry by color, packing a suitcase based on current weather, and sorting trash, compost, and recyclables according to local rules. The system supports cross-robot transfer of learned behaviors, allowing tasks developed on one robot configuration to function on other platforms.
Read at The Verge
Unable to calculate read time
Collection
[
|
...
]