Workshop on Interactive Learning with Implicit Human Feedback

About
Systems that can learn interactively from their end-users are quickly becoming widespread in real-world applications. Typically humans provide tagged rewards or scalar feedback for such interactive learning systems. However, humans offer a wealth of implicit information (such as multimodal cues in the form of natural language, speech, eye movements, facial expressions, gestures etc.) which interactive learning algorithms can leverage during the process of human-machine interaction to create a grounding for human intent, and thereby better assist end-users. A closed-loop sequential decision-making domain offers unique challenges when learning from humans -– (1) the data distribution may be influenced by the choices of the algorithm itself, and thus interactive ML algorithms need to adaptively learn from human feedback, (2) the nature of the environment itself changes rapidly, (3) humans may express their intent in various forms of feedback amenable to naturalistic real-world settings, going beyond tagged rewards or demonstrations. By organizing this workshop, we attempt to bring together interdisciplinary experts in interactive machine learning, reinforcement learning, human-computer interaction, cognitive science, and robotics to explore and foster discussions on such challenges. We envision that this exchange of ideas within and across disciplines can build new bridges, address some of the most valuable challenges in interactive learning with implicit human feedback, and also provide guidance to young researchers interested in growing their careers in this space.

Some potential questions we hope to discuss at this workshop are listed below:

When is it possible to go beyond reinforcement learning (with hand-crafted rewards) and leverage interaction-grounded learning from arbitrary feedback signals where grounding for such feedback could be initially unknown, contextual, rich and high-dimensional?
How can we learn from natural/implicit human feedback signals such as natural language, speech, eye movements, facial expressions, gestures etc. during interaction? Is it possible to learn from human guidance signals whose meanings are initially unknown or ambiguous? Even when there is no explicit external reward?
How should learning algorithms account for a human’s preferences or internal reward that is non-stationary and changes over time? How can we account for non-stationarity of the environment itself?
How much of the learning should be pre-training (i.e. learning for the average user) versus how much should it be interactive or personalized (i.e. for finetuning to a specific user)?
How can we develop a better understanding of how humans interact with/ teach other humans or machines? And how could such an understanding lead to better designs for learning systems that leverage human signals during interaction?
How to design intrinsic reward systems that could push agents to (learn to) become socially integrated/coordinated/aligned with humans?
How can well-known design methods from HCI (such as ability-based design) be imported and massively used in AI/ML? What is missing from today’s technological solution paradigms that can allow for ability-based design to be deployed at scale? How can the machine learning community assist HCI and accessibility research communities to build adaptive learning interfaces targeting a wide range of marginalized and specially-abled sections of society?
What are the minimal set of assumptions under which learning from arbitrary/implicit feedback signals is possible for the interaction-grounded learning paradigm?

All our contributed papers are non-archival and can be submitted to other venues. To ask questions during the workshop, use this Sli.do link or the embedded page below:

Schedule

Time (GMT-10)
09:00 am - 09:10 am		Organizers Introductory Remarks
09:10 am - 09:35 am		Dorsa Sadigh Interactive Learning in the Era of Large Models Abstract In this talk, I will discuss the role of language in learning from interactions with humans. I will first talk about how language instructions along with latent actions can enable shared autonomy in robotic manipulation problems. I will then talk about creative ways of tapping into the rich context of large models to enable more aligned AI agents. Specifically, I will discuss a few vignettes about how we can leverage LLMs and VLMs to learn human preferences, allow for grounded social reasoning, or enable teaching humans using corrective feedback. I will finally conclude the talk by discussing how large models can be effective pattern machines that can identify patterns in a token invariant fashion and enable pattern transformation, extrapolation, and even show some evidence of pattern optimization for solving control problems.
09:35 am - 10:00 am		Jesse Thomason Considering The Role of Language in Embodied Systems Abstract Pretrained language models (PTLM) are "all the rage" right now. From the perspective of folks who have been working at the intersection of language, vision, and robotics since before it was cool, the noticeable impact is that researchers outside NLP feel like they should plug language into their work. However, these models are exclusively trained on text data, usually only for next word prediction, and potentially for next word prediction but under a fine-tuned words-as-actions policy with thousands of underpaid human annotators in the loop (e.g., RLHF). Even when a PTLM is "multimodal" that usually means "training also involved images and their captions, which describe the literal content of the image." What meaning can we hope to extract from those kinds of models in the context of embodied, interactive systems? In this talk, I'll cover some applications our lab has worked through in the space language and embodied systems with a broader lens towards open questions about the limits and (in)appropriate applications of current PTLMs with those systems.
10:00 am - 10:30 am		Coffee break + Poster Session
10:30 am - 10:55 am		Jonathan Grizou Aiming for internal consistency, the 4th pillar of interactive learning. Abstract I will propose a 2x2 matrix to position interactive learning systems and argue that the 4th corner of that space is yet to be fully explored by our research efforts. By positioning recent work on that matrix, I hope to highlight a possible research direction and expose barriers to be overcome. In that effort, I will attempt a live demonstration of IFTT-PIN, a self-calibrating interface we developed that permits a user to control an interface using signals whose meaning are initially unknown.
10:55 am - 11:20 am		Daniel Brown Pitfalls and paths forward when learning rewards from human feedback Abstract Human feedback is often incomplete, suboptimal, biased, and ambiguous, leading to misidentification of the human's true reward function and suboptimal agent behavior. I will discuss these pitfalls as well as some of our recent work that seeks to overcome these problems via techniques that calibrate to user biases, learn from multiple feedback types, use human feedback to align robot feature representations, and enable interpretable reward learning.
11:20 am - 12:10 pm		Panel Session 1: Sam Bowman, Jim Fan, Furong Huang, Jesse Thomason, Diyi Yang, Keerthana Gopalakrishnan
12:10 pm - 01:10 pm		Lunch break
01:10 pm - 01:35 pm		Bradley Knox The EMPATHIC Framework for Task Learning from Implicit Human Feedback Abstract Reactions such as gestures, facial expressions, and vocalizations are an abundant, naturally occurring channel of information that humans provide during interactions. A robot or other agent could leverage an understanding of such implicit human feedback to improve its task performance at no cost to the human. This approach contrasts with common agent teaching methods based on demonstrations, critiques, or other guidance that need to be attentively and intentionally provided. In this talk, we first define the general problem of learning from implicit human feedback and then propose to address this problem through a novel data-driven framework, EMPATHIC. This two-stage method consists of (1) mapping implicit human feedback to relevant task statistics such as rewards, optimality, and advantage; and (2) using such a mapping to learn a task. We instantiate the first stage and three second-stage evaluations of the learned mapping. To do so, we collect a dataset of human facial reactions while participants observe an agent execute a sub-optimal policy for a prescribed training task. We train a deep neural network on this data and demonstrate its ability to (1) infer relative reward ranking of events in the training task from prerecorded human facial reactions; (2) improve the policy of an agent in the training task using live human facial reactions; and (3) transfer to a novel domain in which it evaluates robot manipulation trajectories.
01:35 pm - 02:00 pm		David Abel Three Dogmas of Reinforcement Learning Abstract Modern reinforcement learning has been in large part shaped by three dogmas. The first is what I call the environment spotlight, which refers to our focus on environments rather than agents. The second is our implicit treatment of learning as finding a solution, rather than endless adaptation. The last is the reward hypothesis, which states that all goals and purposes can be well thought of as maximization of a reward signal. In this talk I discuss how these dogmas have shaped our views on learning. I argue that, when agents learn from human feedback, we ought to dispense entirely with the first two dogmas, while we must recognize and embrace the nuance implicit in the third.
02:00 pm - 02:25 pm		Paul Mineiro Contextual Bandits without Rewards Abstract Contextual bandits are highly practical, but the need to specify a scalar reward limits their adoption. This motivates study of contextual bandits where a latent reward must be inferred from post-decision observables, aka Interactive Grounded Learning. An information theoretic argument indicates the need for additional assumptions to succeed, and I review some sufficient conditions from the recent literature. I conclude with speculation about composing IGL with active learning.
02:25 pm - 03:00 pm		Contributed Talks
03:00 pm - 03:30 pm		Coffee break + Poster Session
03:30 pm - 04:00 pm		Taylor Kessler Faulkner Robots Learning from Real People Abstract Robots deployed in the wild can improve their performance by using input from human teachers. Furthermore, both robots and humans can benefit when robots adapt to and learn from the people around them. However, real people can act in imperfect ways, and can often be unable to provide input in large quantities. In this talk, I will address some of the past research I have conducted toward addressing these issues, which has focused on creating learning algorithms that can learn from imperfect teachers. I will also talk about my current work on the Robot-Assisted Feeding project in the Personal Robotics Lab at the University of Washington, which I am approaching through a similar lens of working with real teachers and possibly imperfect information.
04:00 pm - 04:50 pm		Panel Session 2: David Abel, Anca Dragan, Keerthana Gopalakrishnan, Taylor Kessler Faulkner, Bradley Knox, John Langford, Paul Mineiro
04:50 pm - 05:00 pm		Organizers Concluding Remarks

Papers

Imitation Learning with Human Eye Gaze via Multi-Objective Prediction [link] (spotlight)
Ravi Kumar Thakur, MD Sunbeam, Vinicius G. Goecks, Ellen Novoseller, Ritwik Bera, Vernon Lawhern, Greg Gremillion, John Valasek, Nicholas R Waytowich
Learning from a Learning User for Optimal Recommendations [link] (spotlight)
Fan Yao, Chuanhao Li, Denis Nekipelov, Hongning Wang, Haifeng Xu
Survival Instinct in Offline Reinforcement Learning and Implicit Human Bias in Data [link] (spotlight)
Anqi Li, Dipendra Misra, Andrey Kolobov, Ching-An Cheng
UCB Provably Learns From Inconsistent Human Feedback [link] (spotlight)
Shuo Yang, Tongzheng Ren, Inderjit S Dhillon, Sujay Sanghavi
Visual-based Policy Learning with Latent Language Encoding [link] (spotlight)
Jielin Qiu, Mengdi Xu, William Han, Bo Li, Ding Zhao
Legible Robot Motion from Conditional Generative Models [link]
Matthew Bronars, Danfei Xu
Asymptotically Optimal Fixed-Budget Best Arm Identification with Variance-Dependent Bounds [link]
Masahiro Kato, Masaaki Imaizumi, Takuya Ishihara, Toru Kitagawa
RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback [link]
Yannick Metz, David Lindner, Raphaël Baur, Daniel A. Keim, Mennatallah El-Assady
Bandits Meet Mechanism Design to Combat Clickbait in Online Recommendation [link]
Thomas Kleine Buening, Aadirupa Saha, Christos Dimitrakakis, Haifeng Xu
A Generative Model for Text Control in Minecraft [link]
Shalev Lifshitz, Keiran Paster, Harris Chan, Jimmy Ba, Sheila A. McIlraith
Follow-ups Also Matter: Improving Contextual Bandits via Post-serving Contexts [link]
Chaoqi Wang, Ziyu Ye, Zhe Feng, Ashwinkumar Badanidiyuru, Haifeng Xu
Bayesian Inverse Transition Learning for Offline Settings [link]
Leo Benac, Sonali Parbhoo, Finale Doshi-Velez
Principal-Driven Reward Design and Agent Policy Alignment via Bilevel-RL [link]
Souradip Chakraborty, Amrit Bedi, Alec Koppel, Furong Huang, Mengdi Wang
Temporally-Extended Prompts Optimization for SAM in Interactive Medical Image Segmentation [link]
Chuyun Shen, Wenhao Li, Ya Zhang, Xiangfeng Wang
Relative Behavioral Attributes: Filling the Gap between Symbolic Goal Specification and Reward Learning from Human Preferences [link]
Lin Guan, Karthik Valmeekam, Subbarao Kambhampati
Complementing a Policy with a Different Observation Space [link]
Gokul Swamy, Sanjiban Choudhury, Drew Bagnell, Steven Wu
Cognitive Models as Simulators: Using Cognitive Models to Tap into Implicit Human Feedback [link]
Ardavan S. Nobandegani, Thomas Shultz, Irina Rish
Unraveling the ARC Puzzle: Mimicking Human Solutions with Object-Centric Decision Transformer [link]
Jaehyun Park, Jaegyun Im, Sanha Hwang, Mintaek Lim, Sabina Ualibekova, Sejin Kim, Sundong Kim
Selective Sampling and Imitation Learning via Online Regression [link]
Ayush Sekhari, Karthik Sridharan, Wen Sun, Runzhe Wu
Learning Shared Safety Constraints from Multi-task Demonstrations [link]
Konwoo Kim, Gokul Swamy, Zuxin Liu, Ding Zhao, Sanjiban Choudhury, Steven Wu
Strategic Apple Tasting [link]
Keegan Harris, Chara Podimata, Steven Wu
Discovering User Types: Mapping User Traits by Task-Specific Behaviors in Reinforcement Learning [link]
Lars Lien Ankile, Brian Ham, Kevin Mao, Eura Shin, Siddharth Swaroop, Finale Doshi-Velez, Weiwei Pan
Provable Offline Reinforcement Learning with Human Feedback [link]
Wenhao Zhan, Masatoshi Uehara, Nathan Kallus, Jason D. Lee, Wen Sun
SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks [link]
Bill Yuchen Lin, Yicheng Fu, Karina Yang, Prithviraj Ammanabrolu, Faeze Brahman, Shiyu Huang, Chandra Bhagavatula, Yejin Choi, Xiang Ren
How to Query Human Feedback Efficiently in RL? [link]
Wenhao Zhan, Masatoshi Uehara, Wen Sun, Jason D. Lee
Contextual Bandits and Imitation Learning with Preference-Based Active Queries [link]
Ayush Sekhari, Karthik Sridharan, Wen Sun, Runzhe Wu
Bayesian Active Meta-Learning under Prior Misspecification [link]
Sabina J. Sloman, Ayush Bharti, Samuel Kaski
Contextual Set Selection Under Human Feedback With Model Misspecification [link]
Shuo Yang, Rajat Sen, Sujay Sanghavi
Building Community Driven Libraries of Natural Programs [link]
Leonardo Hernandez Cano, Yewen Pu, Robert D. Hawkins, Joshua B. Tenenbaum, Armando Solar-Lezama
Modeled Cognitive Feedback to Calibrate Uncertainty for Interactive Learning [link]
Jaelle Scheuerman, Zachary Bishof, Chris J Michael
Improving Bionic Limb Control through Batch Reinforcement Learning in an Interactive Game Environment [link]
Kilian Freitag, Rita Laezza, Jan Zbinden, Max Ortiz-Catalan
Rewarded soups: towards Pareto-optimality by interpolating weights fine-tuned on diverse rewards [link]
Alexandre Rame, Guillaume Couairon, Corentin Dancette, Mustafa Shukor, Jean-Baptiste Gaya, Laure Soulier, Matthieu Cord
Inverse Preference Learning: Preference-based RL without a Reward Function [link]
Joey Hejna, Dorsa Sadigh
Interactive-Chain-Prompting: Ambiguity Resolution for Crosslingual Conditional Generation with Interaction [link]
Jonathan Pilault, Xavier Garcia, Arthur Brazinskas, Orhan Firat
Reinforcement learning with Human Feedback: Learning Dynamic Choices via Pessimism [link]
Zihao Li, Zhuoran Yang, Mengdi Wang
Accelerating exploration and representation learning with offline pre-training [link]
Bogdan Mazoure, Jake Bruce, Doina Precup, Rob Fergus, Ankit Anand
Active Learning with Crowd Sourcing Improves Information Retrieval [link]
Zhuotong Chen, Yifei Ma, Branislav Kveton, Anoop Deoras
Guided Policy Search for Parameterized Skills using Adverbs [link]
Benjamin Adin Spiegel, George Konidaris