Wenlong Huang

I am a first-year Ph.D. student in Computer Science at Stanford University.

I recently received my B.A. in Computer Science from UC Berkeley, where I was fortunate to be advised by Deepak Pathak, Igor Mordatch, and Pieter Abbeel. After graduation, I interned in the Robotics team at Google Brain. Before my time at Berkeley, I was also fortunate to work with Zhuowen Tu at UC San Diego.

Email  /  Google Scholar  /  Twitter  /  GitHub

profile photo
  • [Nov 2022] Code as Policies is covered by Google AI Blog and TechCrunch.
  • [Oct 2022] Invited Talks @ CAIR, Google on language as generalization interface for robotics.
  • [Oct 2022] Inner Monologue is covered by Two Minute Papers.
  • [Sep 2022] Started at Stanford as a PhD student, generously supported by Stanford SoE Fellowship.
  • [Apr 2022] Started my internship at Google Brain as a student researcher, working with the amazing Robotics team.
  • [Feb 2022] Interviewed by Yannic Kilcher for our language planner project. Check it out here!
  • [Feb 2022] Invited Talks @ Google, FAIR, Sea AI Lab, ByteDance AI Lab on our language planner project.
  • [Dec 2021] Invited Talk @ Intel AI Lab on generalization across objects and morphologies in robot learning.

I'm broadly interested in robot learning. The goal of my research is to build agents that can make intelligent decisions in embodied environments and have generalizable motor skills in challenging scenarios. Recently, I am interested in leveraging large pre-trained models to improve generalization of robot capabilities.

PaLM-E: An Embodied Multimodal Language Model
Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, Pete Florence
arXiv, 2023
Project Page / Paper / Google AI Blog

Language models can digest real-world sensor modalities (e.g., images) to be embodied in the physical world. The largest model with 562B parameters is a generalist agent across language, vision, and robot planning.

Grounded Decoding:
Guiding Text Generation with Grounded Models for Robot Control

Wenlong Huang, Fei Xia, Dhruv Shah, Danny Driess, Andy Zeng, Yao Lu, Pete Florence, Igor Mordatch, Sergey Levine, Karol Hausman, Brian Ichter
arXiv, 2023
Project Page / Paper / Explainer Video

We formulate a token decoding procedure applying large language model to robotics settings. Tokens are selected based on likelihood under the language model and a set of grounded models, such as affordance, safety, and preference functions.

Code as Policies: Language Model Programs for Embodied Control
Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, Andy Zeng
International Conference on Robotics and Automation (ICRA), 2023
Project Page / Paper / Code / Explainer Video / Google AI Blog / TechCrunch

Using hierarchical code generation, large language models can write robot policy code that exhibits spatial-geometric reasoning when given abstract natural language instructions, without any additional training.

Inner Monologue:
Embodied Reasoning through Planning with Language Models

Wenlong Huang*, Fei Xia*, Ted Xiao*, Harris Chan, Jacky Liang, Pete Florence , Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, Brian Ichter (*Equal Contribution)
The Conference on Robot Learning (CoRL), 2022
Project Page / Paper / Explainer Video / Two Minute Papers

Using various sources to provide textual embodied feedback, frozen large language models can articulate a grounded "thought process" for robots, solving many challenging long-horizon robotics tasks, even under adversarial perturbation.

Language Models as Zero-Shot Planners:
Extracting Actionable Knowledge for Embodied Agents

Wenlong Huang, Pieter Abbeel, Deepak Pathak*, Igor Mordatch* (*Equal Advising)
International Conference on Machine Learning (ICML), 2022.
Project Page / Paper / Code / Explainer Video / Interview

Large language models (e.g. GPT-3, Codex) contain rich actionable knowledge that can be used to plan actions for embodied agents, even without additional training.

Generalization in Dexterous Manipulation
via Geometry-Aware Multi-Task Learning

Wenlong Huang, Igor Mordatch, Pieter Abbeel, Deepak Pathak
arXiv, 2021
Project Page / Paper / Code

With appropriate object representation, a multi-task reinforcement learning policy can control an anthropomorphic hand to manipulate 100+ diverse objects and achieve SOTA performance on unseen ones.

One Policy to Control Them All:
Shared Modular Policies for Agent-Agnostic Control

Wenlong Huang, Igor Mordatch, Deepak Pathak
International Conference on Machine Learning (ICML), 2020.
Project Page / Paper / Code / Explainer Video / Oral Talk

Expressing robots as collections of modular components that share a control policy can lead to zero-shot generalization across diverse unseen robot morphologies.

3D Volumetric Modeling with Introspective Neural Networks
Wenlong Huang*, Brian Lai*, Weijian Xu, Zhuowen Tu (*Equal Contributions)
Association for the Advancement of Artificial Intelligence (AAAI), 2019.

Built upon prior Generative via Discriminative Learning and Introspective Learning frameworks, a single neural network can simultaneously perform classification and generation of 3D volumetric shapes.

Academic Services

Reviewer for ICML, IROS, NeurIPS.

Template from Jon Barron's website