[Feb 2022] Invited Talks @ Google, FAIR, Sea AI Lab, ByteDance AI Lab on our language planner project.
[Dec 2021] Invited Talk @ Intel AI Lab on "Generalization across Objects and Morphologies in Robot Learning".
The goal of my research is to endow robots with broad generalization capabilities for open-world manipulation tasks, especially in household environments.
Towards this goal, I am interested in 1) developing abstractions that best leverage Internet-scale data or models trained on them, and 2) developing motor skills that exhibit broadly generalizable behaviors.
Large language models and visual-language models can be used to directly label affordances and constraints in the 3D perceptual space. Combined with motion planning, we can enable robots to perform diverse everyday manipulation tasks in a zero-shot manner.
PaLM-E: An Embodied Multimodal Language Model
Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, Pete Florence
International Conference on Machine Learning (ICML), 2023.
Google AI Blog
Language models can digest real-world sensor modalities (e.g., images) to be embodied in the physical world. The largest model with 562B parameters is a generalist agent across language, vision, and robot planning.
Large language models can be grounded in embodied environments by using continuous probabilities to guide their token decoding, where the guidance is provided by a set of grounded models, such as affordance, safety, and preference functions.
Using hierarchical code generation, large language models can write robot policy code that exhibits spatial-geometric reasoning when given abstract natural language instructions, without any additional training.
Using various sources to provide textual embodied feedback, frozen large language models can articulate a grounded "thought process" for robots, solving many challenging long-horizon robotics tasks, even under adversarial perturbation.