Skip to content

TuojingAI/SoftVTBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

SoftVTBench

A Safety-Aware Visuo-Tactile Benchmark for Physically Constrained Robotic Manipulation of Deformable Objects

Bowen Jing1,*, Mingxin Wang1,2,*, Ruiyang Hao3, Chenchen Ge1,4, Hanwen Shen5, Junjie He6, Yang Cui7, Yiming Hou1,4, Weitao Zhou2,8,‡, Jiawei Wang8, Minglei Li8, Dandan Zhang9, Ding Zhao10, Houde Liu2, Xiaofan Li11, Si Liu12, Ping Luo13, Haibao Yu1,13,‡

1Tuojing Intelligence  ·  2Tsinghua University  ·  3King's College London  ·  4Southeast University  ·  5Stevens Institute of Technology  ·  6The Hong Kong University of Science and Technology (GZ)  ·  7The University of Manchester  ·  8Simple AI  ·  9Imperial College London  ·  10Carnegie Mellon University  ·  11Zhejiang University  ·  12Beihang University  ·  13The University of Hong Kong

* Equal contribution    ‡ Corresponding author

Paper (coming soon)  |  Website  |  Dataset (coming soon)  |  Citation

SoftVTBench teaser

SoftVTBench performs contact-rich visuo-tactile evaluation under a hidden safe interaction envelope: too loose causes slip or drop, too tight causes over-deformation, and only the safe window between them satisfies both task and safety objectives.

Code and training/evaluation scripts for SoftVTBench will be released in this repository. See the project website for the full paper introduction, method, task suites, and results.

Abstract

Deformable object manipulation poses challenges beyond task completion: successful execution must also maintain safe physical interaction, holding the object stably without slip or drop while avoiding excessive deformation. However, existing manipulation benchmarks are predominantly success-oriented and rarely evaluate whether a policy remains physically safe throughout execution. We present SoftVTBench, a safety-aware visuo-tactile benchmark for physically constrained deformable object manipulation. Built in Isaac Sim with finite-element-simulated deformable objects, SoftVTBench provides multi-view RGB observations, RGB tactile sensing with marker motion, proprioception, and language instructions, and defines four matched task suites over object type (deformable vs. rigid) and variation axis (object vs. spatial). It reports Goal Success and Safety Success separately, where Safety Success additionally requires no drop and peak object deformation below a calibrated, object-specific threshold, computed from privileged FEM simulation states that are hidden from the policy. We implement π0.5-based baselines under this protocol. Experiments show that success-only evaluation substantially overstates policy performance — a large fraction of goal-completing rollouts violate physical safety — and that adding tactile sensing improves Safety Success (e.g., from 21.4% to 35.6% on object-centric deformable tasks) and reduces object deformation during execution, while Goal Success remains comparable. SoftVTBench provides a reproducible benchmark for studying visuo-tactile deformable manipulation under physical interaction constraints.

Benchmark Design

SoftVTBench performs contact-rich visuo-tactile evaluation of robot policies under two coupled requirements: completing the manipulation goal and maintaining safe physical interaction throughout execution. A rollout is considered physically safe only if the object remains stably grasped without slip or drop, and its peak deformation stays below a calibrated, object-specific threshold. By reporting Goal Success and Safety Success separately, SoftVTBench exposes goal-complete but physically unsafe rollouts that are hidden by success-only evaluation.

SoftVTBench instantiates this evaluation protocol in Isaac Sim with simulated deformable objects based on finite element method (FEM) soft-body dynamics and a Franka Panda arm with a parallel-jaw gripper carrying GelSight Mini tactile sensors on both fingers. The benchmark provides synchronized third-person and wrist RGB observations, tactile RGB images, marker-motion fields, proprioception, and a standardized end-effector and gripper action interface, all synchronized at 20 Hz.

Safety Success ⊆ Goal Success
Safety Success = Goal Success AND NoDrop AND D_peak ≤ τ_o

D_peak is the peak object-size-normalized FEM-RMS deformation over the rollout, computed after removing global rigid-body motion. τ_o is an object-specific threshold calibrated from an offline compression sweep. Both are computed from privileged simulator ground truth and hidden from the policy.

SoftVTBench method overview

SoftVTBench's four matched task suites are generated around a shared procedural pipeline, and jointly probe physical safety, compliance shift, tactile feedback, and gripper control — exposing the Goal–Safety gap that success-only evaluation hides.

Task Suites

A matched 2×2 design over object type (rigid vs. deformable) and variation axis (object vs. spatial).

Rigid control suites — LIBERO-style rigid-object suites that establish baseline manipulation and spatial-variation competence before deformable-object safety is introduced.

Suite Description
Object-Rigid Rigid LIBERO-style object tasks re-executed through the SoftVTBench tactile-equipped sensing and recording stack. Tests basic pick-and-place competence under object identity variation.
Spatial-Rigid Rigid LIBERO-style spatial tasks under the same sensing stack. Tests robustness to spatial variation — localization, approach, and transfer across changing layouts.

Deformable main suites — grasp-and-place suites where the object must reach its target region without dropping or over-deforming.

Suite Description
Object-Soft The robot grasps a deformable object and places it into a fixed target container while avoiding drop and excessive deformation. Object identity, geometry, and material compliance vary across tasks.
Spatial-Soft The same grasp-and-place objective under spatial variation: two visually identical instances appear per scene, and the language instruction specifies which one to manipulate under changing layouts.

Demo videos for all four suites are on the project website.

Experimental Results

π0.5-Vision (VO) vs. π0.5-Visuo-Tactile (VT), evaluated under identical physical and task conditions.

Main Results

Suite Method Goal Success Safety Success
Object-Rigid VO 38.8% N/A
Object-Rigid VT 32.4% N/A
Spatial-Rigid VO 56.4% N/A
Spatial-Rigid VT 63.4% N/A
Object-Soft VO 70.4% 21.4%
Object-Soft VT 71.8% 35.6%
Spatial-Soft VO 74.2% 32.6%
Spatial-Soft VT 84.2% 44.6%

VO: vision-only π0.5 policy (binary gripper command). VT: visuo-tactile π0.5 policy, additionally observing tactile RGB and marker-motion history with a continuous gripper command. Safety Success is not defined (N/A) for rigid suites, which carry no deformation constraint.

Goal Success vs. Safety Success

Goal Success and Safety Success across suites. For rigid suites, deformation-based safety constraints are not applicable (N/A). For deformable suites, the gap between Goal Success and Safety Success indicates unsafe goal completions caused by excessive deformation, dropping, or unstable contact.

Deformation Distribution

Suite Method Mean P5 Median P95
Object-Soft VO 16.10% 4.30% 10.65% 44.70%
Object-Soft VT 15.12% 3.90% 8.67% 38.81%
Spatial-Soft VO 13.16% 5.16% 10.75% 28.96%
Spatial-Soft VT 11.58% 4.75% 9.67% 26.56%

FEM-RMS deformation over all deformable-object rollouts, reported as a percentage of the object bounding-box diagonal. Lower values indicate safer physical interaction.

Analysis

Tactile information does not consistently improve performance on rigid-object tasks: VT trails VO on Object-Rigid (32.4% vs. 38.8%) but leads on Spatial-Rigid (63.4% vs. 56.4%). When deformation-related safety constraints are absent, visual observations and proprioception already provide the primary information needed for task completion, and tactile sensing yields no consistent gain.

In contrast, VT shows clear advantages on deformable-object tasks. Goal Success is comparable between VO and VT (Object-Soft: 70.4% vs. 71.8%), but VT obtains substantially higher Safety Success (Object-Soft: 21.4% → 35.6%; Spatial-Soft: 32.6% → 44.6%). Tactile sensing is not primarily helping the object reach the target — it is improving contact regulation during execution, reducing slippage, over-compression, and unstable grasping.

The deformation statistics confirm this: VT lowers mean, median, and P95 deformation on both deformable suites, showing that tactile feedback shifts the entire interaction distribution toward safer contact, not just the fraction that clears the safety threshold. Together, these results show that success-only evaluation substantially overstates policy performance on deformable manipulation, and that tactile sensing is most valuable precisely where physical safety is at stake.

Dataset

Task Suites 4
Episodes 2,000
Assets 33
Core Metrics Goal Success / Safety Success

Each episode contains synchronized third-person and wrist RGB, left/right tactile RGB and marker-motion fields, robot proprioception, end-effector and gripper action trajectories, suite and task identifiers, and evaluator-only privileged signals (FEM nodal deformation, contact status, drop events). All streams are recorded at 20 Hz, with deformable-object safety thresholds calibrated per asset from an offline interaction calibration protocol and held out for evaluation only.

Dataset download link: coming soon.

Citation

If you find SoftVTBench useful, please consider citing our paper.

@article{jing2026softvtbench,
  title   = {SoftVTBench: A Safety-Aware Visuo-Tactile Benchmark for
             Physically Constrained Robotic Manipulation of Deformable Objects},
  author  = {Jing, Bowen and Wang, Mingxin and Hao, Ruiyang and Ge, Chenchen and
             Shen, Hanwen and He, Junjie and Cui, Yang and Hou, Yiming and
             Zhou, Weitao and Wang, Jiawei and Li, Minglei and Zhang, Dandan and
             Zhao, Ding and Liu, Houde and Li, Xiaofan and Liu, Si and
             Luo, Ping and Yu, Haibao},
  year    = {2026},
  note    = {Preprint}
}

About

visuotactile benchmark for the deformable object manipulation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors