Social Mobile Manipulation Challenge

CVPR2025 Embodied AI Workshop Challenge

Overview

The 1st Social Mobile Manipulation Challenge—organized based on the insights presented in the InfiniteWorld —focuses on developing embodied AI agents capable of performing long sequences of complex tasks through social interactions.

Competition Description

The challenge includes two interaction modes: human-robot and robot-robot. The goal is to advance the community's capabilities in areas like human intention reasoning, long-term task planning, and effective social interaction between embodied agents.

Track 1: Vertical Interaction

This track focuses on vision-language navigation. The participating agent receives a partial scene graph as a prompt to perform tasks such as "pick up an object from one place and place it at another." The task descriptions are simple yet ambiguous, challenging the agent's understanding of both the instructions and the scene.

Task and Data Format

Model Input and Output

Actions and Parameters

Action Description Action Value (Range)
MoveForward Move forward by a specified distance 0 m ~ 1.50 m
TurnLeft Rotate left by a specified angle 0° ~ 90°
TurnRight Rotate right by a specified angle 0° ~ 90°
Stop Stop current navigation target 0

Evaluation Metrics

Challenges

Track 2: Horizontal Interaction

This track focuses on a multi-robot cooperative scenario. Two robots operate independently in the same scene but periodically share their observed scene maps to improve task efficiency. The robots use the Stretch-3 model and perform actions via atomic commands.

Robot and Actions

Observations and Task Instructions

Scene Graph Sharing

After a certain number of steps, the two robots exchange their respective scene graphs (node graphs) to complete tasks more quickly. For example, a node table might include:

Node ID Type Description Connected Nodes
N1 Room Starting area N2, N3
N2 Hallway Corridor to other areas N1, N4
N3 Object Detected table N1
N4 Door Exit leading to another room N2, N5
N5 Room Adjacent room N4

Evaluation Metrics