Interactive Learning with Implicit Human Feedback

Workshop ICML 2023 - Saturday, July 29th

Location: Hawaii Convention Center, Room 315

Gather.Town (virtual poster session): Link

Check out the workshop recording!


About
Systems that can learn interactively from their end-users are quickly becoming widespread in real-world applications. Typically humans provide tagged rewards or scalar feedback for such interactive learning systems. However, humans offer a wealth of implicit information (such as multimodal cues in the form of natural language, speech, eye movements, facial expressions, gestures etc.) which interactive learning algorithms can leverage during the process of human-machine interaction to create a grounding for human intent, and thereby better assist end-users. A closed-loop sequential decision-making domain offers unique challenges when learning from humans -– (1) the data distribution may be influenced by the choices of the algorithm itself, and thus interactive ML algorithms need to adaptively learn from human feedback, (2) the nature of the environment itself changes rapidly, (3) humans may express their intent in various forms of feedback amenable to naturalistic real-world settings, going beyond tagged rewards or demonstrations. By organizing this workshop, we attempt to bring together interdisciplinary experts in interactive machine learning, reinforcement learning, human-computer interaction, cognitive science, and robotics to explore and foster discussions on such challenges. We envision that this exchange of ideas within and across disciplines can build new bridges, address some of the most valuable challenges in interactive learning with implicit human feedback, and also provide guidance to young researchers interested in growing their careers in this space.

Some potential questions we hope to discuss at this workshop are listed below:

  • When is it possible to go beyond reinforcement learning (with hand-crafted rewards) and leverage interaction-grounded learning from arbitrary feedback signals where grounding for such feedback could be initially unknown, contextual, rich and high-dimensional?
  • How can we learn from natural/implicit human feedback signals such as natural language, speech, eye movements, facial expressions, gestures etc. during interaction? Is it possible to learn from human guidance signals whose meanings are initially unknown or ambiguous? Even when there is no explicit external reward?
  • How should learning algorithms account for a human’s preferences or internal reward that is non-stationary and changes over time? How can we account for non-stationarity of the environment itself?
  • How much of the learning should be pre-training (i.e. learning for the average user) versus how much should it be interactive or personalized (i.e. for finetuning to a specific user)?
  • How can we develop a better understanding of how humans interact with/ teach other humans or machines? And how could such an understanding lead to better designs for learning systems that leverage human signals during interaction?
  • How to design intrinsic reward systems that could push agents to (learn to) become socially integrated/coordinated/aligned with humans?
  • How can well-known design methods from HCI (such as ability-based design) be imported and massively used in AI/ML? What is missing from today’s technological solution paradigms that can allow for ability-based design to be deployed at scale? How can the machine learning community assist HCI and accessibility research communities to build adaptive learning interfaces targeting a wide range of marginalized and specially-abled sections of society?
  • What are the minimal set of assumptions under which learning from arbitrary/implicit feedback signals is possible for the interaction-grounded learning paradigm?
All our contributed papers are non-archival and can be submitted to other venues. To ask questions during the workshop, use this Sli.do link or the embedded page below:


Speakers

David Abel

Google DeepMind

Daniel Brown

University of Utah

Jonathan Grizou

University of Glasgow

Taylor Kessler Faulkner

University of Washington

Bradley Knox

University of Texas Austin

Paul Mineiro

Microsoft Research

Dorsa Sadigh

Stanford University

Jesse Thomason

USC and Amazon


Panelists

Sam Bowman

NYU and Anthropic

Furong Huang

University of Maryland

Jesse Thomason

USC and Amazon

Diyi Yang

Stanford University

David Abel

Google DeepMind

Anca Dragan

University of California Berkeley

Taylor Kessler Faulkner

University of Washington

Bradley Knox

University of Texas Austin

John Langford

Microsoft Research

Paul Mineiro

Microsoft Research


Schedule

Time (GMT-10)
09:00 am - 09:10 am Organizers
Introductory Remarks
09:10 am - 09:35 am Dorsa Sadigh
Interactive Learning in the Era of Large Models
09:35 am - 10:00 am Jesse Thomason
Considering The Role of Language in Embodied Systems
10:00 am - 10:30 am Coffee break + Poster Session
10:30 am - 10:55 am Jonathan Grizou
Aiming for internal consistency, the 4th pillar of interactive learning.
10:55 am - 11:20 am Daniel Brown
Pitfalls and paths forward when learning rewards from human feedback
11:20 am - 12:10 pm Panel Session 1: Sam Bowman, Jim Fan, Furong Huang, Jesse Thomason, Diyi Yang, Keerthana Gopalakrishnan
12:10 pm - 01:10 pm Lunch break
01:10 pm - 01:35 pm Bradley Knox
The EMPATHIC Framework for Task Learning from Implicit Human Feedback
01:35 pm - 02:00 pm David Abel
Three Dogmas of Reinforcement Learning
02:00 pm - 02:25 pm Paul Mineiro
Contextual Bandits without Rewards
02:25 pm - 03:00 pm Contributed Talks
03:00 pm - 03:30 pm Coffee break + Poster Session
03:30 pm - 04:00 pm Taylor Kessler Faulkner
Robots Learning from Real People
04:00 pm - 04:50 pm Panel Session 2: David Abel, Anca Dragan, Keerthana Gopalakrishnan, Taylor Kessler Faulkner, Bradley Knox, John Langford, Paul Mineiro
04:50 pm - 05:00 pm Organizers
Concluding Remarks

Papers
  • Imitation Learning with Human Eye Gaze via Multi-Objective Prediction [link] (spotlight)
    Ravi Kumar Thakur, MD Sunbeam, Vinicius G. Goecks, Ellen Novoseller, Ritwik Bera, Vernon Lawhern, Greg Gremillion, John Valasek, Nicholas R Waytowich
  • Learning from a Learning User for Optimal Recommendations [link] (spotlight)
    Fan Yao, Chuanhao Li, Denis Nekipelov, Hongning Wang, Haifeng Xu
  • Survival Instinct in Offline Reinforcement Learning and Implicit Human Bias in Data [link] (spotlight)
    Anqi Li, Dipendra Misra, Andrey Kolobov, Ching-An Cheng
  • UCB Provably Learns From Inconsistent Human Feedback [link] (spotlight)
    Shuo Yang, Tongzheng Ren, Inderjit S Dhillon, Sujay Sanghavi
  • Visual-based Policy Learning with Latent Language Encoding [link] (spotlight)
    Jielin Qiu, Mengdi Xu, William Han, Bo Li, Ding Zhao
  • Legible Robot Motion from Conditional Generative Models [link]
    Matthew Bronars, Danfei Xu
  • Asymptotically Optimal Fixed-Budget Best Arm Identification with Variance-Dependent Bounds [link]
    Masahiro Kato, Masaaki Imaizumi, Takuya Ishihara, Toru Kitagawa
  • RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback [link]
    Yannick Metz, David Lindner, Raphaël Baur, Daniel A. Keim, Mennatallah El-Assady
  • Bandits Meet Mechanism Design to Combat Clickbait in Online Recommendation [link]
    Thomas Kleine Buening, Aadirupa Saha, Christos Dimitrakakis, Haifeng Xu
  • A Generative Model for Text Control in Minecraft [link]
    Shalev Lifshitz, Keiran Paster, Harris Chan, Jimmy Ba, Sheila A. McIlraith
  • Follow-ups Also Matter: Improving Contextual Bandits via Post-serving Contexts [link]
    Chaoqi Wang, Ziyu Ye, Zhe Feng, Ashwinkumar Badanidiyuru, Haifeng Xu
  • Bayesian Inverse Transition Learning for Offline Settings [link]
    Leo Benac, Sonali Parbhoo, Finale Doshi-Velez
  • Principal-Driven Reward Design and Agent Policy Alignment via Bilevel-RL [link]
    Souradip Chakraborty, Amrit Bedi, Alec Koppel, Furong Huang, Mengdi Wang
  • Temporally-Extended Prompts Optimization for SAM in Interactive Medical Image Segmentation [link]
    Chuyun Shen, Wenhao Li, Ya Zhang, Xiangfeng Wang
  • Relative Behavioral Attributes: Filling the Gap between Symbolic Goal Specification and Reward Learning from Human Preferences [link]
    Lin Guan, Karthik Valmeekam, Subbarao Kambhampati
  • Complementing a Policy with a Different Observation Space [link]
    Gokul Swamy, Sanjiban Choudhury, Drew Bagnell, Steven Wu
  • Cognitive Models as Simulators: Using Cognitive Models to Tap into Implicit Human Feedback [link]
    Ardavan S. Nobandegani, Thomas Shultz, Irina Rish
  • Unraveling the ARC Puzzle: Mimicking Human Solutions with Object-Centric Decision Transformer [link]
    Jaehyun Park, Jaegyun Im, Sanha Hwang, Mintaek Lim, Sabina Ualibekova, Sejin Kim, Sundong Kim
  • Selective Sampling and Imitation Learning via Online Regression [link]
    Ayush Sekhari, Karthik Sridharan, Wen Sun, Runzhe Wu
  • Learning Shared Safety Constraints from Multi-task Demonstrations [link]
    Konwoo Kim, Gokul Swamy, Zuxin Liu, Ding Zhao, Sanjiban Choudhury, Steven Wu
  • Strategic Apple Tasting [link]
    Keegan Harris, Chara Podimata, Steven Wu
  • Discovering User Types: Mapping User Traits by Task-Specific Behaviors in Reinforcement Learning [link]
    Lars Lien Ankile, Brian Ham, Kevin Mao, Eura Shin, Siddharth Swaroop, Finale Doshi-Velez, Weiwei Pan
  • Provable Offline Reinforcement Learning with Human Feedback [link]
    Wenhao Zhan, Masatoshi Uehara, Nathan Kallus, Jason D. Lee, Wen Sun
  • SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks [link]
    Bill Yuchen Lin, Yicheng Fu, Karina Yang, Prithviraj Ammanabrolu, Faeze Brahman, Shiyu Huang, Chandra Bhagavatula, Yejin Choi, Xiang Ren
  • How to Query Human Feedback Efficiently in RL? [link]
    Wenhao Zhan, Masatoshi Uehara, Wen Sun, Jason D. Lee
  • Contextual Bandits and Imitation Learning with Preference-Based Active Queries [link]
    Ayush Sekhari, Karthik Sridharan, Wen Sun, Runzhe Wu
  • Bayesian Active Meta-Learning under Prior Misspecification [link]
    Sabina J. Sloman, Ayush Bharti, Samuel Kaski
  • Contextual Set Selection Under Human Feedback With Model Misspecification [link]
    Shuo Yang, Rajat Sen, Sujay Sanghavi
  • Building Community Driven Libraries of Natural Programs [link]
    Leonardo Hernandez Cano, Yewen Pu, Robert D. Hawkins, Joshua B. Tenenbaum, Armando Solar-Lezama
  • Modeled Cognitive Feedback to Calibrate Uncertainty for Interactive Learning [link]
    Jaelle Scheuerman, Zachary Bishof, Chris J Michael
  • Improving Bionic Limb Control through Batch Reinforcement Learning in an Interactive Game Environment [link]
    Kilian Freitag, Rita Laezza, Jan Zbinden, Max Ortiz-Catalan
  • Rewarded soups: towards Pareto-optimality by interpolating weights fine-tuned on diverse rewards [link]
    Alexandre Rame, Guillaume Couairon, Corentin Dancette, Mustafa Shukor, Jean-Baptiste Gaya, Laure Soulier, Matthieu Cord
  • Inverse Preference Learning: Preference-based RL without a Reward Function [link]
    Joey Hejna, Dorsa Sadigh
  • Interactive-Chain-Prompting: Ambiguity Resolution for Crosslingual Conditional Generation with Interaction [link]
    Jonathan Pilault, Xavier Garcia, Arthur Brazinskas, Orhan Firat
  • Reinforcement learning with Human Feedback: Learning Dynamic Choices via Pessimism [link]
    Zihao Li, Zhuoran Yang, Mengdi Wang
  • Accelerating exploration and representation learning with offline pre-training [link]
    Bogdan Mazoure, Jake Bruce, Doina Precup, Rob Fergus, Ankit Anand
  • Active Learning with Crowd Sourcing Improves Information Retrieval [link]
    Zhuotong Chen, Yifei Ma, Branislav Kveton, Anoop Deoras
  • Guided Policy Search for Parameterized Skills using Adverbs [link]
    Benjamin Adin Spiegel, George Konidaris

Organizers

Akanksha Saran

Microsoft Research

Andi Peng

Massachusetts Institute of Technology

Andreea Bobu

University of California Berkeley

Tengyang Xie

University of Illinois at Urbana-Champaign

Anca Dragan

University of California Berkeley

John Langford

Microsoft Research



Contact
Reach out to interactive.implicit.learning@gmail.com for any questions.

Sponsors