Advancing HRI Research and Benchmarking Through Open-Source Ecosystems

2023 ACM/IEEE International Conference on Human-Robot Interaction (HRI) Workshop

This full-day workshop was held during the Human-Robot Interaction (HRI) 2023 conference on March 13, 2023 in Stockholm, Sweden.

Click here to join the COMPARE Slack! #hri-2023-workshop

Abstract

Recent rapid progress in HRI research makes it more crucial than ever to have systematic development and benchmarking methodologies to assess and compare different algorithms and strategies. Indeed, the lack of such methodologies results in inefficiencies and sometimes stagnation, since new methods cannot be effectively compared to prior work and the research gaps become challenging to identify. Moreover, lacking an active and effective mechanism to disseminate and utilize the available datasets and benchmarking protocols significantly reduces their impact and utility.

A unified effort in the development, utilization, and dissemination of open-source assets amongst a governed community of users can advance these domains substantially; for HRI, this is particularly needed in the curation and generation of datasets for benchmarking. This workshop will take a step towards removing the roadblocks to the development and assessment of HRI by reviewing, discussing, and laying the groundwork for an open-source ecosystem at the intersection of HRI and robot manipulation. The workshop will play a crucial role for identifying the preconditions and requirements to develop an open-source ecosystem that provides open-source assets for HRI benchmarking and comparison, aiming to determine the needs and wants of HRI researchers. Invited speakers include those who have contributed to the development of open-source assets in HRI and robot manipulation and discussion topics will include issues related to the usage of open-source assets and the benefits of forming of an open-source ecosystem.

Key Takeaways

Based on the discussions had at the workshop, a set of key takeaways have been summarized and organized into topics below:

HRI vs. other domains

Compared to other domains, researchers in HRI:
- Fewer: contributions to open-source, usage of open-source assets, hardware access limitations
- More: barriers faced when integrating open-source assets, researchers not benchmarking at all, lack of relevant comparable benchmarks
- Similar: lack of relevant open-source assets, learning about the availability of new open-source assets, amount of benchmarking (if performed), simulation limitations

Open-source development and maturation

Popularity of some existing open-source assets makes the visibility and proposal of new assets challenging to be taken up by the community
Licensing structure must be worked out to ensure industry adoption of open-source assets; mechanisms like FOSSA (https://fossa.com/) with GitHub are an option to track open-source licenses
Need established guidelines and/or governing body to indicate what an open-source effort is “ready” / “good enough” for integration into other systems
Open-source methods to automate data collection (to generate a lot of data) for rich, data-driven solutions and algorithms, are needed
Communication of the data collection methodology is as important as understanding the structure of the data itself
Documentation of bias and limitations in datasets is very important towards replication
Establish criteria for mature assets; popularity is one aspect, but maintenance and continued usage makes it mature
Components of open-source robotics should be worked on in parallel, like what is done in computer vision; while a broad domain, there are large enough groups working on the components to warrant this structure

Simulations for benchmarking

Comparative benchmarking in simulation can occur with static simulations, dynamic simulations (i.e., built in variation/randomness), and/or made to be interactive (i.e., reactive to human-robot interaction)
Static simulations are the current norm for HRI and comparison; dynamic simulations and/or static with interactivity may be best for evaluating robustness due to variation being induced, but make it trickier for comparison
In the development of simulations for benchmarking, need to define the “sweet spot” of how much to simulate to ensure sim-to-real transfer (e.g., gaps in human physiology in simulation, but how important is this?)

Benchmarking and competitions

CORSMAL effort (https://corsmal.eecs.qmul.ac.uk/) includes all four types of open-source assets (physical, digital, functional, and instructional) and a good example of using benchmarking and competitions to develop these assets
Three modalities for conducting benchmarking: completely simulated, physical with remote participation, completely physical; ensuring fairness across modalities is tricky (e.g., simulator challenges with contact manipulation), so comparison across them has to be done sensitively
Can measurement tools for benchmarking (HRI or otherwise) be standardized if the use cases are changing? Consideration for generalization on this front is needed

Sustainability

Difficult to allocate resources to support open-source efforts (e.g., CORSMAL) beyond original funding period; additional grants or other incentives are needed

Ensuring relevance

Getting stakeholders in-the-loop as part of the ecosystem when generating datasets is good practice to develop relevant benchmarks
Benchmarks for specific applications may be less useful than those that can be more broadly applied across domains; should more benchmarks be developed for elemental capabilities rather than specific ones?

Standards

When developing standards for HRI experimentation, should academia and industry be held to the same standards, or different ones?
Industrial applications of robotics that have more of a culture of heavy testing (e.g., medical/healthcare, aviation) can help push standards forward compared to other industry that may be more conservative
ROS4HRI is an example of an accepted Open ROS specification (REP-155); ROS Enhancement Proposals (https://ros.org/reps/rep-0000.html) could be a mechanism the we use to establish more open-source software standards around benchmarking

Replicability and generalizability

Potential criteria for developing benchmarks: the benchmark must be transferable to multiple domains and environments
Methods to apply a similar level of rigor from non-HRI benchmarking spaces to HRI:
- Reporting information in the same format is one solution (in roadmap for the IEEE group)
- Need to set bounds on scope of replication (e.g., which aspects of the system or benchmark are being replicated and which are being varied?) and/or more clearly identify the replicable/replicated parts and which are not
- Funding mechanisms purely for replication studies and generalizability projects are needed!
Can standard hardware alleviate issues in the variety of capabilities in robot systems? This has been shown to work in the past (e.g., PR2), but only for about a decade, then the hardware fades; software assets built can extend beyond the hardware though
Rather than control the platform, let’s control the outcome: motivate everyone to optimize a solution to meet a desired outcome
Benchmarking between studies could be done absolutely (A1 vs. A2) or variably (A1+X vs. A2+Y) for more piece meal evaluations and to ease replicability requirements
Replicability studies (same everything) vs. generalizability studies (same except ABC)
NSF has shown interest in the model of industry taking research-grade code and making it production quality; can a series of third-parties exist to do replication/generalizability like this?
Decentralize the replication/generalization process: lab A wants testing, so they conduct testing for labs B and C, in turn labs B and C conduct testing for lab A, etc.

Improvement efforts

Developing a website/repo of assets is easy, but maintaining can be tricky without regular prompts/incentives to do so
Conference track integration should start as dedicated track(s) to benchmarking and datasets, then broaden into requirements for other/all tracks
Can’t navel-gaze forever; it doesn’t have to be perfect, it just has to be useful

Overview

Presentations and guided discussion will take place across two categories:

Human Factors in Benchmarking and Dataset Generation: best practices to generate datasets and benchmarking protocols that accommodate variations in human inputs and allow for systematic comparison between different algorithms and platforms.
Open-source and Human-Robot Interaction: perspectives on the current state of open-source to support human-robot interaction research, examples of successful implementations, and lessons learned to improve the ecosystem.

After each category sessions’ presentations and Q+A for each presenter is completed, a guided discussion will be facilitated amongst the workshop participants.

The following topics, among others, will be put forth to motivate these discussions:

Availability: What open-source assets are available? What types of assets are there too many or too few of? How is the availability of these assets promoted, or how should it be?
Composition: What formats or structures of open-source assets are used? What characteristics are they missing and which are unnecessary?
Applicability: Are the open-source assets and experimentation practices reviewed applicable to your research? What uses cases would they be applicable to? Are there particular domains or applications that would benefit greatly from open-source assets?
Benefits: What are the benefits of having this open-source asset available? How do you use it for your own work? Are there missing features that would provide greater benefit to you or others?
Implementation: What are the barriers to using open-source assets for HRI experimentation? Are there existing instructions and documentation that assist in implementation, or are these features lacking? What level of support is desired to ease implementation?

The workshop will be hybrid, with a focus on in-person participation, but a virtual option for remote attendees to watch presentations and participate in discussion will be available. A Slack workspace is being established for pre, during, and post-workshop discussions and coordination, to serve as an open communication platform for the open-source ecosystem.

Speakers

Danica Kragic

Royal Institute of Technology (KTH)

Andrea Cavallaro

Idiap Research Institute

Harold Soh

National University of Singapore

Shelly Bagchi

National Institute of Standards and Technology (NIST)

Henny Admoni

Carnegie Mellon University

Tapo Bhattacharjee

Cornell University

Sonia Chernova

Georgia Institute of Technology

You!

Consider contributing to this workshop! See below

Schedule

Invited talks: 30 minutes each (20 presentation + 10 questions)

Submitted talks: 15 minutes each (10 presentation + 5 questions)

Discussion: 30 minutes each

All times given are in Central European Time (CET; UTC+01:00)

A YouTube playlist of the workshop presentations can be found here: https://www.youtube.com/playlist?list=PLfUzSIwyYwvWnBbCouga8SoIMsHlALnEj

Introduction

9:15 Introduction of workshop participants
9:25 Current State of Open-source Robot Manipulation Landscape and User Experiences for HRI, Adam Norton [presentation | youtube]

Human Factors in Benchmarking and Dataset Generation

9:45 Developing Datasets and Benchmarks for Social Navigation, Henny Admoni [presentation | youtube]
10:15 Benchmarking Human-Robot Handovers, Andrea Cavallaro [presentation | youtube]
10:45 Democratizing Robotic Caregiving through Human-Centered Platforms and Datasets, Tapo Bhattacharjee [presentation | youtube]
11:15 Coffee break
11:30 Submitted talks
- Preserving HRI Capabilities: Physical, Remote and Simulated Modalities in the SciRoc 2021 Competition, Vincenzo Suriani [paper | presentation | youtube]
- Towards an Open Source Library and Taxonomy of Benchmark Usecase Scenarios for Trust-Related HRI Research, Peta Masters and Victoria Young [paper | presentation | youtube]
12:00 Improving Research Transference to the Real World by Developing Standards & Recommended Practices for HRI, Shelly Bagchi [presentation | youtube]
12:30 Discussion on benchmarking and replication [notes]

Lunch break

1:00 - 2:30

Open-source and Human-Robot Interaction

2:30 Our Open-Source Adventures in Human-Robot Interaction, Harold Soh
3:00 Humans for Robots and Robots for Humans, Danica Kragic and Marco Moletta [presentation | youtube]
3:30 Big Data Benchmarking in HRI: Challenges and Opportunities, Sonia Chernova [presentation]
4:00 Coffee break
4:30 Submitted talks
- The Need to Simplify Open-Source Real-time Systems for Human-Robot Interaction, Christopher K. Fourie [paper | presentation | youtube]
- ROS4HRI: Standardising an Interface for Human-Robot Interaction, Racquel Ros [paper | presentation | youtube]
- Applicability of Open-Source Tools in Robot-Assisted Reinforcement Learning-based QWriter system, Zhansaule Telisheva [paper]
5:15 Discussion on next steps for an open-source ecosystem [notes]
6:00 Workshop end

Participation

The workshop will be hybrid, with a focus on in-person participation, but a virtual option for remote attendees to watch presentations and participate in discussion will be available. Join the COMPARE project Slack workspace, channel #hri-2023-workshop, to participate in discussions pre, during, and post-workshop: https://join.slack.com/t/compare-ecosystem/shared_invite/zt-1nfgdwq4z-_8_PsXVhJ6H1FAZuQizjTA

Contributions (CLOSED)

Short papers are sought to be presented that discuss issues faced, successes achieved, and/or analyses of the current landscape of robotic manipulation and HRI when developing or utilizing open-source assets. Submissions may be in the form of position papers, proposals for new efforts, or reporting of new results, with the expectation that authors of accepted papers will provide a presentation at the workshop (in-person or remotely) and participate in topic discussions.

Submissions of papers should use the HRI 2023 format, 2-4 pages in length (excluding references), anonymization not required. Contributed papers should fit into one or more of the following topics of the workshop: human factors in benchmarking, open-source benchmarking protocols and datasets, the availability of open-source assets, their composition, applicability or lack there of, benefits of open-source, and barriers to implementation, among others.

All submissions will be reviewed and authors of accepted papers will be asked to give a 10 minute talk at the workshop. At least one author of each accepted submission must register for the workshop.

December 5, 2022: Call for submissions open
January 13, 2023, 23:59 Anywhere on Earth (AoE): Early submission deadline for short papers to ensure decision by HRI 2023 early registration deadline (January 20)
January 19, 2023: Notification of acceptance of early workshop submissions
February 1, 2023, 23:59 AoE: Submission deadline for short papers
February 10, 2023: Notification of acceptance for workshop submissions

Submissions should be e-mailed to adam_norton@uml.edu with the text “[HRI 2023 Workshop Submission]” in the subject line.

Organizers

Adam Norton, University of Massachusetts Lowell
Holly Yanco, University of Massachusetts Lowell
Berk Calli, Worcester Polytechnic Institute
Aaron Dollar, Yale University

Contact

Please contact Adam Norton with any questions or comments via e-mail: adam_norton@uml.edu

Funded by the National Science Foundation, Pathways to Enable Open-Source Ecosystems (POSE), Award TI-2229577