Skip to content

OpenDriveLab/EgoHumanoid

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

5 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

image

EgoHumanoid

arXiv Project Page License

๐Ÿค– The first framework enabling humanoid loco-manipulation with egocentric human demonstrations.

Human demonstrations offer rich environmental diversity and scale naturally, making them an appealing alternative to robot teleoperation. We present EGOHUMANOID, the first framework to co-train a vision-language-action policy using abundant egocentric human demonstrations together with a limited amount of robot data.

To bridge the embodiment gap, we introduce a systematic alignment pipeline with two key components: view alignment reduces visual discrepancies; action alignment maps human motions into a unified action space for humanoid control.

Extensive real-world experiments demonstrate that incorporating robot-free egocentric data significantly outperforms robot-only baselines by 51%, particularly in unseen environments.


Table of Contents

๐Ÿ“š Detailed Documentation:


๐Ÿ“– Overview

graph LR
    A[๐Ÿ‘ค Human Demo<br/>PICO VR + ZED] -->|Collect| B[๐Ÿ“น Egocentric Data]
    B -->|Process| C[โšก View Alignment]
    C -->|Train| D[๐Ÿค– VLA Model<br/>ฯ€โ‚€.โ‚…-based]
    D -->|Deploy| E[๐Ÿš€ Unitree G1<br/>Robot Control]
Loading

EgoHumanoid consists of four main components:

1๏ธโƒฃ Data Collection

Collect synchronized multi-modal data from both humanoid robots (Unitree G1) and human demonstrators (PICO VR + ZED Mini)

2๏ธโƒฃ Data Processing

Process, align, and retarget human demonstrations to robot action space

3๏ธโƒฃ Model Training

Fine-tune vision-language-action models ฯ€โ‚€.โ‚… on the processed datasets

4๏ธโƒฃ Deployment

Deploy trained policies on real humanoid robots with real-time inference

๐Ÿ› ๏ธ Hardware Setup

๐Ÿค– Robot Data Collection Hardware

Required:

  • โœ… Unitree G1 Humanoid Robot with dex3-1 hands

  • โœ… Workstation PC

    • Ubuntu 22.04
    • NVIDIA GPU (RTX 4090+)
    • Docker + NVIDIA Container Toolkit
  • โœ… ZED Mini Camera (mounted on robot head)

  • โœ… Network Setup

    • Static IP: 192.168.123.222
    • Subnet: 255.255.255.0
  • โœ… PICO VR Headset (for teleoperation)

๐Ÿ‘ค Human Data Collection Hardware

Required:

  • โœ… PICO VR Headset (5 trackers for full-body tracking)
  • โœ… ZED Mini Camera (mounted on headset)
  • โœ… Workstation PC
    • Ubuntu 22.04/24.04
    • USB 3.0 ports
    • Network connection

Setup Requirements:

  • ๐Ÿ“ก PICO and PC on same network
  • ๐ŸŽฏ Full-body tracking activated
  • ๐Ÿ”Œ ZED Mini via USB 3.0

Note

We use ZED Mini instead of ZED X Mini (as in the paper) for easier accessibility and setup.


๐Ÿ’ป Environment Setup

๐Ÿ“‹ Prerequisites

๐Ÿ–ฅ๏ธ OS Ubuntu 22.04 (tested and recommended)
๐Ÿ Python 3.11+
๐ŸŽฎ GPU โ€ข โ‰ฅ 8 GB VRAM (inference)
โ€ข โ‰ฅ 22.5 GB VRAM (fine-tuning with LoRA)
โ€ข โ‰ฅ 70 GB VRAM (full fine-tuning)

Installation

  1. Clone the repository with submodules:
git clone --recurse-submodules https://github.com/OpenDriveLab/EgoHumanoid.git
cd EgoHumanoid
  1. Install uv package manager:
# See https://docs.astral.sh/uv/getting-started/installation/
curl -LsSf https://astral.sh/uv/install.sh | sh
  1. Set up Python environment:
# Create environment and install dependencies
GIT_LFS_SKIP_SMUDGE=1 uv sync
GIT_LFS_SKIP_SMUDGE=1 uv pip install -e .
  1. Set up GR00T WholeBodyControl for robot control (for robot data collection only):
# Clone GR00T WholeBodyControl
cd data_collection/robot_data
git clone https://github.com/NVlabs/GR00T-WholeBodyControl.git
cd GR00T-WholeBodyControl/decoupled_wbc

# Copy teleoperation scripts
cp -r ../../teleop/ control/main/teleop

# Set up Docker environment
# Modify docker/run_docker.sh to mount src/openpi directory
# See data_collection/robot_data/README.md for detailed setup instructions

# Install and start Docker container
./docker/run_docker.sh --install --root
  1. Install ZED SDK (for data collection/processing):

Follow the ZED SDK installation guide for your platform, then install the Python API:

python /usr/local/zed/get_python_api.py
  1. Install PICO SDK (for human data collection):

Follow the XR Robotics guidelines to set up the PICO SDK and XRoboToolkit-PC-Service.


๐ŸŽฅ Data Collection

For detailed hardware setup and collection procedures, see:

Robot Data Collection

Robot data collection uses the Unitree G1 humanoid with teleoperation control and synchronized camera recording.

1. Set up Docker environment:

cd data_collection/robot_data/GR00T-WholeBodyControl/decoupled_wbc

# Modify docker/run_docker.sh to mount src/openpi directory
# See data_collection/robot_data/README.md for details

# Install and start Docker container
./docker/run_docker.sh --install --root
./docker/run_docker.sh --root

2. Inside Docker container, run G1 control:

# For real robot (ensure network is configured)
python decoupled_wbc/control/main/teleop/run_g1_control_loop.py --interface real --control-frequency 50 --with_hands

3. In a separate terminal, run teleoperation:

python decoupled_wbc/control/main/teleop/run_teleop_policy_loop.py --body-control-device pico --hand_control_device=pico --enable_real_device

4. Start data collection:

python decoupled_wbc/control/main/teleop/zed_mini_run_g1_data_exporter.py --dataset-name <task_name> --visualize

Controller Bindings:

  • Menu + Left Trigger: Toggle lower-body policy
  • Menu + Right Trigger: Toggle upper-body policy
  • Left Stick: X/Y translation
  • Right Stick: Yaw rotation
  • L/R Triggers: Control hand grippers
  • A Button: Start collecting episode
  • B Button: Discard episode

Output:

  • data_collection/<task_name>/episode_*.hdf5 - Robot state, actions, and navigation commands
  • data_collection/<task_name>/episode_*.svo2 - ZED camera recordings

See data_collection/robot_data/README.md for detailed instructions.

Human Data Collection

Human data collection captures synchronized full-body motion and binocular camera views.

1. Set up data collection environment:

# Create conda environment
conda create -n humandata python=3.11
conda activate humandata

# Install dependencies
pip install -r data_collection/human_data/requirements.txt

2. Start data collection:

cd data_collection/human_data

# Basic collection
python scripts/human_data_collection.py --name <dataset_name>

# With ZED camera preview
python scripts/human_data_collection.py --name <dataset_name> --visualize-zed

# Specify save directory
python scripts/human_data_collection.py --data-dir <save_dir> --name <dataset_name>

3. Collection workflow:

  1. System initializes (PICO SDK + ZED Mini + MeshCat visualization)
  2. Open browser at http://localhost:7000/static/ to view 3D skeleton
  3. Enter episode index (e.g., 0, 1, 2...)
  4. Perform demonstration
  5. Press Space to finish episode
  6. Data is saved automatically
  7. Continue to next episode or press Ctrl+C to exit

Output:

  • <data_dir>/<dataset_name>/episode_*.hdf5 - Body pose, hand pose, controller pose, timestamps
  • <data_dir>/<dataset_name>/episode_*.svo2 - ZED Mini video with depth

See data_collection/human_data/README.md for detailed instructions.


โšก Data Processing

For detailed pipeline documentation, see:

Human Data Pipeline

The human data processing pipeline transforms raw VR recordings into robot-compatible datasets.

Run the full pipeline:

cd data_alignment/human_data_process

./run_human_data_pipeline.sh \
  --input_dir /path/to/raw_data \
  --output_dir /path/to/intermediate \
  --final-output-dir /path/to/final \
  --file all

Pipeline stages:

  1. Reorder Episodes: Sort chronologically and renumber
  2. Navigation Pipeline: Generate velocity commands from body pose
  3. Downsample: Reduce frequency and discretize commands
  4. Merge Camera: Integrate ZED camera frames
  5. Hand Status: Compute binary hand open/close status

Advanced usage:

# Skip stages
./run_human_data_pipeline.sh \
  --input_dir /path/to/raw \
  --output_dir /path/to/processed \
  --final-output-dir /path/to/final \
  --skip-reorder \
  --skip-merge

# Generate validation plots
./run_human_data_pipeline.sh \
  --input_dir /path/to/raw \
  --output_dir /path/to/processed \
  --final-output-dir /path/to/final \
  --with-png

# Dry run (preview commands)
./run_human_data_pipeline.sh \
  --input_dir /path/to/raw \
  --output_dir /path/to/processed \
  --final-output-dir /path/to/final \
  --dry-run

See data_alignment/human_data_process/README.md for detailed pipeline documentation.

Robot Data Pipeline

Process robot demonstration data:

cd data_alignment/robot_data_process

python merge_data.py \
  --dataset-dir /path/to/robot/data \
  --output-dir /path/to/processed/output

View Alignment

Transform egocentric camera viewpoints to match robot's perspective using depth-based warping and inpainting.

Process single HDF5 file:

cd data_alignment/view_alignment

python viewport_transform_batch_h5.py \
  --h5_file /path/to/input.h5 \
  --image_key "observation_image_left" \
  --trajectory "down" \
  --movement_distance 0.07 \
  --output_dir ./output

Process directory (multi-GPU):

python viewport_transform_batch_h5.py \
  --h5_dir /path/to/h5_directory \
  --batch_size 32 \
  --trajectory "down" \
  --movement_distance 0.07 \
  --num_gpus 4 \
  --output_dir /path/to/output

Trajectory options: left, right, up, down, forward, backward

See data_alignment/view_alignment/README.md for more details.

Convert to LeRobot Format

Convert processed HDF5 datasets to LeRobot format for training:

cd data_alignment

# Single-threaded
python convert_to_lerobot.py \
  --src-path /path/to/processed/data \
  --output-path /path/to/lerobot/data \
  --repo-id my_dataset \
  --fps 20 \
  --task "task description"

# Multi-threaded (faster)
python convert_to_lerobot.py \
  --src-path /path/to/processed/data \
  --output-path /path/to/lerobot/data \
  --repo-id my_dataset \
  --num-workers 16 \
  --fps 20 \
  --task "task description"

๐Ÿค– Model Training

Compute Normalization Statistics

Before training, compute normalization statistics for your dataset:

uv run python scripts/compute_norm_states_ultra_fast.py --config-name=norm_compute

Run Training

Train the model using the computed normalization statistics:

# Set XLA memory fraction for better GPU utilization
XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 uv run scripts/train.py <config_name> --exp_name=<experiment_name>

Examples:

# Train on your custom dataset
XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 uv run scripts/train.py pi05_g1_custom --exp_name=my_experiment

# Multi-GPU training with FSDP
XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 uv run scripts/train.py pi05_g1_custom --exp_name=my_experiment --fsdp-devices 4

Checkpoints are saved to checkpoints/<config_name>/<exp_name>/ during training. Training progress is logged to the console and Weights & Biases.


๐Ÿš€ Deployment

Policy Server

Start a policy server for remote inference:

# Use a trained checkpoint
uv run scripts/serve_policy.py policy:checkpoint \
  --policy.config=<config_name> \
  --policy.dir=checkpoints/<config_name>/<exp_name>/<iteration>

The server will listen on port 8000 by default.

๐ŸŽฎ Robot Inference

The deployment client connects to the OpenPI policy server via websocket for action inference and controls the G1 robot via the GR00T WBC framework.

On the robot/client side:

# Inside the GR00T Docker container
cd /root/Projects/openpi

# Run the deployment client
python scripts/deploy.py --host <server_ip> --port 8000

๐ŸŽ›๏ธ Keyboard Controls:

Key Action
] โ–ถ๏ธ Activate WBC policy / Exit silent mode
p ๐ŸŽฏ Enter preparation phase (move to initial pose)
c ๐Ÿ–๏ธ Toggle left hand open/close (right hand stays open)
l โฏ๏ธ Start/pause inference loop
[ ๐Ÿ”‡ Enter silent mode (slowly return to initial pose)
o ๐Ÿ›‘ Deactivate policy (emergency stop)
Ctrl+C โŒ Exit program

๐Ÿ“‹ Workflow:

graph LR
    A[Start Policy Server<br/>on GPU Host] --> B[Start G1 Robot<br/>Control Loop]
    B --> C[Run Deployment<br/>Client]
    C --> D[Use Keyboard<br/>Controls]
    D --> E[Robot Execution]
Loading

Example Python API:

from openpi.training import config as _config
from openpi.policies import policy_config

# Load policy
config = _config.get_config("pi05_g1_custom")
checkpoint_dir = "checkpoints/pi05_g1_custom/exp1/100000"
policy = policy_config.create_trained_policy(config, checkpoint_dir)

# Run inference
observation = {
    "observation/exterior_image_1_left": camera_left_image,
    "observation/wrist_image_left": wrist_image,
    "observation/state": joint_positions,
    "prompt": "pick up the object"
}
action_chunk = policy.infer(observation)["actions"]

# Execute on robot
robot.execute_action(action_chunk[0])

For detailed deployment instructions including camera setup, robot initialization, and troubleshooting, see the comments in scripts/deploy.py.


๐Ÿ“Š Requirements Summary

๐Ÿ’ป Component ๐ŸŽฎ GPU Memory ๐Ÿ”ง Example Hardware
Inference > 8 GB VRAM RTX 4090
Fine-tuning (LoRA) > 22.5 GB VRAM RTX 4090
Fine-tuning (Full) > 70 GB VRAM A100 80GB / H100
Robot Control N/A Ubuntu 22.04 PC
Human Data Collection N/A Ubuntu 22.04 + USB 3.0

๐Ÿ“ Citation

If you find EgoHumanoid useful in your research, please consider citing:

@article{shi2026egohumanoid,
    title={EgoHumanoid: Unlocking In-the-Wild Loco-Manipulation with Robot-Free Egocentric Demonstration},
    author={Shi, Modi and Peng, Shijia and Chen, Jin and Jiang, Haoran and Li, Yinghui and Huang, Di and Luo, Ping and Li, Hongyang and Chen, Li},
    journal={arXiv preprint arXiv:2602.10106},
    year={2026}
}

โญ If you find this project helpful, please consider giving it a star! โญ


๐Ÿ“œ License

This project is licensed under the Apache 2.0 License.

The OpenPI models and code are provided by Physical Intelligence under the Apache 2.0 License.


๐Ÿ™ Acknowledgments

We sincerely thank the following projects and teams:

OpenPI
Vision-language-action models
GR00T
Humanoid control framework
XR Robotics
PICO VR integration
Stereolabs
ZED camera SDK

About

The first framework enabling humanoid robots to learn whole-body loco-manipulation from egocentric human demos

Topics

Resources

License

Stars

Watchers

Forks

Contributors