Skip to content

docs: add data generation tutorial for synthesized data pipeline#238

Open
yvvonie wants to merge 7 commits intoDexForce:mainfrom
yvvonie:GYY_Tutorial_Add
Open

docs: add data generation tutorial for synthesized data pipeline#238
yvvonie wants to merge 7 commits intoDexForce:mainfrom
yvvonie:GYY_Tutorial_Add

Conversation

@yvvonie
Copy link
Copy Markdown

@yvvonie yvvonie commented Apr 17, 2026

  • This PR introduces a new tutorial ( data_generation.rst ) to document the internal workflow for generating synthetic expert demonstration datasets. It aims to provide clear instructions for developers on how to configure and run the data generation pipeline using EmbodiChain's built-in environment launcher.
  • Documentation Index : Registered the new tutorial in docs/source/tutorial/index.rst .

"robot_meta": {
"robot_type": "CobotMagic",
"control_freq": 25,
"control_parts": ["left_arm", "left_eef", "right_arm", "right_eef"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have remove the control_parts from dataset. Now it is placed in the children of env.

"func": "LeRobotRecorder",
"mode": "save",
"params": {
"save_path": "/root/workspace/Embodied_Challenge/lerobot_dataset/",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the hard code path from docs

- **Action Configuration**: Describes how the task-specific expert trajectory should be generated.
- **Environment Launcher**: Builds the environment directly from configuration files.
- **Expert Policy**: Each task provides ``create_demo_action_list()`` to generate a scripted trajectory.
- **Dataset Manager**: Records observation-action pairs during ``env.step()``.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Environment Rollout would be better

basic_env
modular_env
rl
data_generation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move above rl section

Step 2: Prepare the Action Configuration
----------------------------------------

The second input is the ``action_config.json`` file. This file defines the expert action graph used by the task. It is the main configuration entry for scripted trajectory generation. Take ``items_handover_place`` as example, the file is organized around ``scope``, ``node``, ``edge``, and ``sync``.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implement a subsection for action bank (maybe can be fold by user)

:start-at: def cli():
:end-at: main(args, env, gym_config)

This means the runtime inputs of the whole data-generation pipeline are simply the task config files plus launcher arguments.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add CLI interface for running and preview env. (see https://dexforce.github.io/EmbodiChain/guides/cli.html)


This means the runtime inputs of the whole data-generation pipeline are simply the task config files plus launcher arguments.

Step 4: Generate and Execute Expert Actions
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The setp 4, 5, 6 can be removed. Instead, we can introduce more parameters in gym_config that controls the data generation (eg, max_episodes, ...)

Outputs
~~~~~~~

After successful execution, completed episodes are saved under the configured ``save_path``. A LeRobot dataset typically contains:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also mention the default saving path


- **Keep the config pair together**: Always version ``gym_config.json`` and ``action_config.json`` together for a task.
- **Use valid scripted policies**: Make sure ``create_demo_action_list()`` returns executable trajectories for the current scene.
- **Enable ``use_videos`` for visual tasks**: This is especially useful for downstream vision-based training.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have this parameter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants