EgoHumanoid

EgoHumanoid

🤖 The first framework enabling humanoid loco-manipulation with egocentric human demonstrations.

Human demonstrations offer rich environmental diversity and scale naturally, making them an appealing alternative to robot teleoperation. We present EGOHUMANOID, the first framework to co-train a vision-language-action policy using abundant egocentric human demonstrations together with a limited amount of robot data.

To bridge the embodiment gap, we introduce a systematic alignment pipeline with two key components: view alignment reduces visual discrepancies; action alignment maps human motions into a unified action space for humanoid control.

Extensive real-world experiments demonstrate that incorporating robot-free egocentric data significantly outperforms robot-only baselines by 51%, particularly in unseen environments.

📖 Overview
🛠️ Hardware Setup
- 🤖 Robot Data Collection Hardware
- 👤 Human Data Collection Hardware
💻 Environment Setup
- 📋 Prerequisites
- Installation
🎥 Data Collection
- Robot Data Collection
- Human Data Collection
⚡ Data Processing
🤖 Model Training
🚀 Deployment
- Policy Server
- 🎮 Robot Inference
📊 Requirements Summary
📝 Citation

📚 Detailed Documentation:

📖 Overview

graph LR
    A[👤 Human Demo<br/>PICO VR + ZED] -->|Collect| B[📹 Egocentric Data]
    B -->|Process| C[⚡ View Alignment]
    C -->|Train| D[🤖 VLA Model<br/>π₀.₅-based]
    D -->|Deploy| E[🚀 Unitree G1<br/>Robot Control]

EgoHumanoid consists of four main components:

1️⃣ Data Collection

Collect synchronized multi-modal data from both humanoid robots (Unitree G1) and human demonstrators (PICO VR + ZED Mini)

2️⃣ Data Processing

Process, align, and retarget human demonstrations to robot action space

3️⃣ Model Training

Fine-tune vision-language-action models π₀.₅ on the processed datasets

4️⃣ Deployment

Deploy trained policies on real humanoid robots with real-time inference

🛠️ Hardware Setup

🤖 Robot Data Collection Hardware

Required:

✅ Unitree G1 Humanoid Robot with dex3-1 hands
✅ Workstation PC
- Ubuntu 22.04
- NVIDIA GPU (RTX 4090+)
- Docker + NVIDIA Container Toolkit
✅ ZED Mini Camera (mounted on robot head)
✅ Network Setup
- Static IP: 192.168.123.222
- Subnet: 255.255.255.0
✅ PICO VR Headset (for teleoperation)

👤 Human Data Collection Hardware

Required:

✅ PICO VR Headset (5 trackers for full-body tracking)
✅ ZED Mini Camera (mounted on headset)
✅ Workstation PC
- Ubuntu 22.04/24.04
- USB 3.0 ports
- Network connection

Setup Requirements:

📡 PICO and PC on same network
🎯 Full-body tracking activated
🔌 ZED Mini via USB 3.0

Note

We use ZED Mini instead of ZED X Mini (as in the paper) for easier accessibility and setup.

💻 Environment Setup

📋 Prerequisites

🖥️ OS	Ubuntu 22.04 (tested and recommended)
🐍 Python	3.11+
🎮 GPU	• ≥ 8 GB VRAM (inference) • ≥ 22.5 GB VRAM (fine-tuning with LoRA) • ≥ 70 GB VRAM (full fine-tuning)

Installation

Clone the repository with submodules:

git clone --recurse-submodules https://github.com/OpenDriveLab/EgoHumanoid.git
cd EgoHumanoid

Install uv package manager:

# See https://docs.astral.sh/uv/getting-started/installation/
curl -LsSf https://astral.sh/uv/install.sh | sh

Set up Python environment:

# Create environment and install dependencies
GIT_LFS_SKIP_SMUDGE=1 uv sync
GIT_LFS_SKIP_SMUDGE=1 uv pip install -e .

Set up GR00T WholeBodyControl for robot control (for robot data collection only):

# Clone GR00T WholeBodyControl
cd data_collection/robot_data
git clone https://github.com/NVlabs/GR00T-WholeBodyControl.git
cd GR00T-WholeBodyControl/decoupled_wbc

# Copy teleoperation scripts
cp -r ../../teleop/ control/main/teleop

# Set up Docker environment
# Modify docker/run_docker.sh to mount src/openpi directory
# See data_collection/robot_data/README.md for detailed setup instructions

# Install and start Docker container
./docker/run_docker.sh --install --root

Install ZED SDK (for data collection/processing):

Follow the ZED SDK installation guide for your platform, then install the Python API:

python /usr/local/zed/get_python_api.py

Install PICO SDK (for human data collection):

Follow the XR Robotics guidelines to set up the PICO SDK and XRoboToolkit-PC-Service.

🎥 Data Collection

For detailed hardware setup and collection procedures, see:

Robot Data Collection

Robot data collection uses the Unitree G1 humanoid with teleoperation control and synchronized camera recording.

1. Set up Docker environment:

cd data_collection/robot_data/GR00T-WholeBodyControl/decoupled_wbc

# Modify docker/run_docker.sh to mount src/openpi directory
# See data_collection/robot_data/README.md for details

# Install and start Docker container
./docker/run_docker.sh --install --root
./docker/run_docker.sh --root

2. Inside Docker container, run G1 control:

# For real robot (ensure network is configured)
python decoupled_wbc/control/main/teleop/run_g1_control_loop.py --interface real --control-frequency 50 --with_hands

3. In a separate terminal, run teleoperation:

python decoupled_wbc/control/main/teleop/run_teleop_policy_loop.py --body-control-device pico --hand_control_device=pico --enable_real_device

4. Start data collection:

python decoupled_wbc/control/main/teleop/zed_mini_run_g1_data_exporter.py --dataset-name <task_name> --visualize

Controller Bindings:

Menu + Left Trigger: Toggle lower-body policy
Menu + Right Trigger: Toggle upper-body policy
Left Stick: X/Y translation
Right Stick: Yaw rotation
L/R Triggers: Control hand grippers
A Button: Start collecting episode
B Button: Discard episode

Output:

data_collection/<task_name>/episode_*.hdf5 - Robot state, actions, and navigation commands
data_collection/<task_name>/episode_*.svo2 - ZED camera recordings

See data_collection/robot_data/README.md for detailed instructions.

Human Data Collection

Human data collection captures synchronized full-body motion and binocular camera views.

1. Set up data collection environment:

# Create conda environment
conda create -n humandata python=3.11
conda activate humandata

# Install dependencies
pip install -r data_collection/human_data/requirements.txt

2. Start data collection:

cd data_collection/human_data

# Basic collection
python scripts/human_data_collection.py --name <dataset_name>

# With ZED camera preview
python scripts/human_data_collection.py --name <dataset_name> --visualize-zed

# Specify save directory
python scripts/human_data_collection.py --data-dir <save_dir> --name <dataset_name>

3. Collection workflow:

System initializes (PICO SDK + ZED Mini + MeshCat visualization)
Open browser at http://localhost:7000/static/ to view 3D skeleton
Enter episode index (e.g., 0, 1, 2...)
Perform demonstration
Press Space to finish episode
Data is saved automatically
Continue to next episode or press Ctrl+C to exit

Output:

<data_dir>/<dataset_name>/episode_*.hdf5 - Body pose, hand pose, controller pose, timestamps
<data_dir>/<dataset_name>/episode_*.svo2 - ZED Mini video with depth

See data_collection/human_data/README.md for detailed instructions.

⚡ Data Processing

For detailed pipeline documentation, see:

Human Data Pipeline

The human data processing pipeline transforms raw VR recordings into robot-compatible datasets.

Run the full pipeline:

cd data_alignment/human_data_process

./run_human_data_pipeline.sh \
  --input_dir /path/to/raw_data \
  --output_dir /path/to/intermediate \
  --final-output-dir /path/to/final \
  --file all

Pipeline stages:

Reorder Episodes: Sort chronologically and renumber
Navigation Pipeline: Generate velocity commands from body pose
Downsample: Reduce frequency and discretize commands
Merge Camera: Integrate ZED camera frames
Hand Status: Compute binary hand open/close status

Advanced usage:

# Skip stages
./run_human_data_pipeline.sh \
  --input_dir /path/to/raw \
  --output_dir /path/to/processed \
  --final-output-dir /path/to/final \
  --skip-reorder \
  --skip-merge

# Generate validation plots
./run_human_data_pipeline.sh \
  --input_dir /path/to/raw \
  --output_dir /path/to/processed \
  --final-output-dir /path/to/final \
  --with-png

# Dry run (preview commands)
./run_human_data_pipeline.sh \
  --input_dir /path/to/raw \
  --output_dir /path/to/processed \
  --final-output-dir /path/to/final \
  --dry-run

See data_alignment/human_data_process/README.md for detailed pipeline documentation.

Robot Data Pipeline

Process robot demonstration data:

cd data_alignment/robot_data_process

python merge_data.py \
  --dataset-dir /path/to/robot/data \
  --output-dir /path/to/processed/output

View Alignment

Transform egocentric camera viewpoints to match robot's perspective using depth-based warping and inpainting.

Process single HDF5 file:

cd data_alignment/view_alignment

python viewport_transform_batch_h5.py \
  --h5_file /path/to/input.h5 \
  --image_key "observation_image_left" \
  --trajectory "down" \
  --movement_distance 0.07 \
  --output_dir ./output

Process directory (multi-GPU):

python viewport_transform_batch_h5.py \
  --h5_dir /path/to/h5_directory \
  --batch_size 32 \
  --trajectory "down" \
  --movement_distance 0.07 \
  --num_gpus 4 \
  --output_dir /path/to/output

Trajectory options: left, right, up, down, forward, backward

See data_alignment/view_alignment/README.md for more details.

Convert to LeRobot Format

Convert processed HDF5 datasets to LeRobot format for training:

cd data_alignment

# Single-threaded
python convert_to_lerobot.py \
  --src-path /path/to/processed/data \
  --output-path /path/to/lerobot/data \
  --repo-id my_dataset \
  --fps 20 \
  --task "task description"

# Multi-threaded (faster)
python convert_to_lerobot.py \
  --src-path /path/to/processed/data \
  --output-path /path/to/lerobot/data \
  --repo-id my_dataset \
  --num-workers 16 \
  --fps 20 \
  --task "task description"

🤖 Model Training

Compute Normalization Statistics

Before training, compute normalization statistics for your dataset:

uv run python scripts/compute_norm_states_ultra_fast.py --config-name=norm_compute

Sample Dataset

A sample dataset is hosted on Hugging Face for quickly validating the training pipeline or running a smoke-test fine-tune:

Subset	Description	Link
Robot	Teleoperation demonstrations (Unitree G1)	example/robot
Human	Robot-free egocentric demonstrations (PICO VR + ZED)	example/human

# Download the sample dataset
hf download OpenDriveLab/EgoHumanoid --repo-type=dataset --local-dir ./data

Run Training

Train the model using the computed normalization statistics:

# Set XLA memory fraction for better GPU utilization
XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 uv run scripts/train.py <config_name> --exp_name=<experiment_name>

Examples:

# Train on your custom dataset
XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 uv run scripts/train.py pi05_g1_custom --exp_name=my_experiment

# Multi-GPU training with FSDP
XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 uv run scripts/train.py pi05_g1_custom --exp_name=my_experiment --fsdp-devices 4

Checkpoints are saved to checkpoints/<config_name>/<exp_name>/ during training. Training progress is logged to the console and Weights & Biases.

🚀 Deployment

Policy Server

Start a policy server for remote inference:

# Use a trained checkpoint
uv run scripts/serve_policy.py policy:checkpoint \
  --policy.config=<config_name> \
  --policy.dir=checkpoints/<config_name>/<exp_name>/<iteration>

The server will listen on port 8000 by default.

🎮 Robot Inference

The deployment client connects to the OpenPI policy server via websocket for action inference and controls the G1 robot via the GR00T WBC framework.

On the robot/client side:

# Inside the GR00T Docker container
cd /root/Projects/openpi

# Run the deployment client
python scripts/deploy.py --host <server_ip> --port 8000

🎛️ Keyboard Controls:

Key	Action
`]`	▶️ Activate WBC policy / Exit silent mode
`p`	🎯 Enter preparation phase (move to initial pose)
`c`	🖐️ Toggle left hand open/close (right hand stays open)
`l`	⏯️ Start/pause inference loop
`[`	🔇 Enter silent mode (slowly return to initial pose)
`o`	🛑 Deactivate policy (emergency stop)
`Ctrl+C`	❌ Exit program

📋 Workflow:

graph LR
    A[Start Policy Server<br/>on GPU Host] --> B[Start G1 Robot<br/>Control Loop]
    B --> C[Run Deployment<br/>Client]
    C --> D[Use Keyboard<br/>Controls]
    D --> E[Robot Execution]

Example Python API:

from openpi.training import config as _config
from openpi.policies import policy_config

# Load policy
config = _config.get_config("pi05_g1_custom")
checkpoint_dir = "checkpoints/pi05_g1_custom/exp1/100000"
policy = policy_config.create_trained_policy(config, checkpoint_dir)

# Run inference
observation = {
    "observation/exterior_image_1_left": camera_left_image,
    "observation/wrist_image_left": wrist_image,
    "observation/state": joint_positions,
    "prompt": "pick up the object"
}
action_chunk = policy.infer(observation)["actions"]

# Execute on robot
robot.execute_action(action_chunk[0])

For detailed deployment instructions including camera setup, robot initialization, and troubleshooting, see the comments in scripts/deploy.py.

📊 Requirements Summary

💻 Component	🎮 GPU Memory	🔧 Example Hardware
Inference	> 8 GB VRAM	RTX 4090
Fine-tuning (LoRA)	> 22.5 GB VRAM	RTX 4090
Fine-tuning (Full)	> 70 GB VRAM	A100 80GB / H100
Robot Control	N/A	Ubuntu 22.04 PC
Human Data Collection	N/A	Ubuntu 22.04 + USB 3.0

📝 Citation

If you find EgoHumanoid useful in your research, please consider citing:

@article{shi2026egohumanoid,
    title={EgoHumanoid: Unlocking In-the-Wild Loco-Manipulation with Robot-Free Egocentric Demonstration},
    author={Shi, Modi and Peng, Shijia and Chen, Jin and Jiang, Haoran and Li, Yinghui and Huang, Di and Luo, Ping and Li, Hongyang and Chen, Li},
    journal={arXiv preprint arXiv:2602.10106},
    year={2026}
}

⭐ If you find this project helpful, please consider giving it a star! ⭐

📜 License

This project is licensed under the Apache 2.0 License.

The OpenPI models and code are provided by Physical Intelligence under the Apache 2.0 License.

🙏 Acknowledgments

We sincerely thank the following projects and teams:

Vision-language-action models

Humanoid control framework

PICO VR integration

ZED camera SDK

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data_alignment		data_alignment
data_collection		data_collection
packages/openpi-client		packages/openpi-client
scripts		scripts
src/openpi		src/openpi
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

EgoHumanoid

Table of Contents

📖 Overview

1️⃣ Data Collection

2️⃣ Data Processing

3️⃣ Model Training

4️⃣ Deployment

🛠️ Hardware Setup

🤖 Robot Data Collection Hardware

👤 Human Data Collection Hardware

💻 Environment Setup

📋 Prerequisites

Installation

🎥 Data Collection

Robot Data Collection

Human Data Collection

⚡ Data Processing

Human Data Pipeline

Robot Data Pipeline

View Alignment

Convert to LeRobot Format

🤖 Model Training

Compute Normalization Statistics

Sample Dataset

Run Training

🚀 Deployment

Policy Server

🎮 Robot Inference

📊 Requirements Summary

📝 Citation

📜 License

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages