NYU and Anthropic
University of Maryland
USC and Amazon
University of California Berkeley
|09:00 am - 09:10 am|| Organizers
|09:10 am - 09:35 am|| Dorsa Sadigh
Interactive Learning in the Era of Large Models
In this talk, I will discuss the role of language in learning from interactions with humans. I will first talk about how language instructions along with latent actions can enable shared autonomy in robotic manipulation problems. I will then talk about creative ways of tapping into the rich context of large models to enable more aligned AI agents. Specifically, I will discuss a few vignettes about how we can leverage LLMs and VLMs to learn human preferences, allow for grounded social reasoning, or enable teaching humans using corrective feedback. I will finally conclude the talk by discussing how large models can be effective pattern machines that can identify patterns in a token invariant fashion and enable pattern transformation, extrapolation, and even show some evidence of pattern optimization for solving control problems.
|09:35 am - 10:00 am|| Jesse Thomason
Considering The Role of Language in Embodied Systems
Pretrained language models (PTLM) are "all the rage" right now. From the perspective of folks who have been working at the intersection of language, vision, and robotics since before it was cool, the noticeable impact is that researchers outside NLP feel like they should plug language into their work. However, these models are exclusively trained on text data, usually only for next word prediction, and potentially for next word prediction but under a fine-tuned words-as-actions policy with thousands of underpaid human annotators in the loop (e.g., RLHF). Even when a PTLM is "multimodal" that usually means "training also involved images and their captions, which describe the literal content of the image." What meaning can we hope to extract from those kinds of models in the context of embodied, interactive systems? In this talk, I'll cover some applications our lab has worked through in the space language and embodied systems with a broader lens towards open questions about the limits and (in)appropriate applications of current PTLMs with those systems.
|10:00 am - 10:30 am||Coffee break + Poster Session|
|10:30 am - 10:55 am|| Jonathan Grizou
Aiming for internal consistency, the 4th pillar of interactive learning.
I will propose a 2x2 matrix to position interactive learning systems and argue that the 4th corner of that space is yet to be fully explored by our research efforts. By positioning recent work on that matrix, I hope to highlight a possible research direction and expose barriers to be overcome. In that effort, I will attempt a live demonstration of IFTT-PIN, a self-calibrating interface we developed that permits a user to control an interface using signals whose meaning are initially unknown.
|10:55 am - 11:20 am|| Daniel Brown
Pitfalls and paths forward when learning rewards from human feedback
Human feedback is often incomplete, suboptimal, biased, and ambiguous, leading to misidentification of the human's true reward function and suboptimal agent behavior. I will discuss these pitfalls as well as some of our recent work that seeks to overcome these problems via techniques that calibrate to user biases, learn from multiple feedback types, use human feedback to align robot feature representations, and enable interpretable reward learning.
|11:20 am - 12:10 pm||
Panel Session 1: Sam Bowman, Jim Fan, Furong Huang, Jesse Thomason, Diyi Yang, Keerthana Gopalakrishnan
|12:10 pm - 01:10 pm||Lunch break|
|01:10 pm - 01:35 pm|| Bradley Knox
The EMPATHIC Framework for Task Learning from Implicit Human Feedback
Reactions such as gestures, facial expressions, and vocalizations are an abundant, naturally occurring channel of information that humans provide during interactions. A robot or other agent could leverage an understanding of such implicit human feedback to improve its task performance at no cost to the human. This approach contrasts with common agent teaching methods based on demonstrations, critiques, or other guidance that need to be attentively and intentionally provided. In this talk, we first define the general problem of learning from implicit human feedback and then propose to address this problem through a novel data-driven framework, EMPATHIC. This two-stage method consists of (1) mapping implicit human feedback to relevant task statistics such as rewards, optimality, and advantage; and (2) using such a mapping to learn a task. We instantiate the first stage and three second-stage evaluations of the learned mapping. To do so, we collect a dataset of human facial reactions while participants observe an agent execute a sub-optimal policy for a prescribed training task. We train a deep neural network on this data and demonstrate its ability to (1) infer relative reward ranking of events in the training task from prerecorded human facial reactions; (2) improve the policy of an agent in the training task using live human facial reactions; and (3) transfer to a novel domain in which it evaluates robot manipulation trajectories.
|01:35 pm - 02:00 pm|| David Abel
Three Dogmas of Reinforcement Learning
Modern reinforcement learning has been in large part shaped by three dogmas. The first is what I call the environment spotlight, which refers to our focus on environments rather than agents. The second is our implicit treatment of learning as finding a solution, rather than endless adaptation. The last is the reward hypothesis, which states that all goals and purposes can be well thought of as maximization of a reward signal. In this talk I discuss how these dogmas have shaped our views on learning. I argue that, when agents learn from human feedback, we ought to dispense entirely with the first two dogmas, while we must recognize and embrace the nuance implicit in the third.
|02:00 pm - 02:25 pm|| Paul Mineiro
Contextual Bandits without Rewards
Contextual bandits are highly practical, but the need to specify a scalar reward limits their adoption. This motivates study of contextual bandits where a latent reward must be inferred from post-decision observables, aka Interactive Grounded Learning. An information theoretic argument indicates the need for additional assumptions to succeed, and I review some sufficient conditions from the recent literature. I conclude with speculation about composing IGL with active learning.
|02:25 pm - 03:00 pm||Contributed Talks|
|03:00 pm - 03:30 pm||Coffee break + Poster Session|
|03:30 pm - 04:00 pm|| Taylor Kessler Faulkner
Robots Learning from Real People
Robots deployed in the wild can improve their performance by using input from human teachers. Furthermore, both robots and humans can benefit when robots adapt to and learn from the people around them. However, real people can act in imperfect ways, and can often be unable to provide input in large quantities. In this talk, I will address some of the past research I have conducted toward addressing these issues, which has focused on creating learning algorithms that can learn from imperfect teachers. I will also talk about my current work on the Robot-Assisted Feeding project in the Personal Robotics Lab at the University of Washington, which I am approaching through a similar lens of working with real teachers and possibly imperfect information.
|04:00 pm - 04:50 pm||
Panel Session 2: David Abel, Anca Dragan, Keerthana Gopalakrishnan, Taylor Kessler Faulkner, Bradley Knox, John Langford, Paul Mineiro
|04:50 pm - 05:00 pm|| Organizers