SoftVTBench

A Safety-Aware Visuo-Tactile Benchmark for Physically Constrained Robotic Manipulation of Deformable Objects

Bowen Jing^1,*, Mingxin Wang^1,2,*, Ruiyang Hao³, Chenchen Ge^1,4, Hanwen Shen⁵, Junjie He⁶, Yang Cui⁷, Yiming Hou^1,4, Weitao Zhou^2,8,‡, Jiawei Wang⁸, Minglei Li⁸, Dandan Zhang⁹, Ding Zhao¹⁰, Houde Liu², Xiaofan Li¹¹, Si Liu¹², Ping Luo¹³, Haibao Yu^1,13,‡

¹Tuojing Intelligence · ²Tsinghua University · ³King's College London · ⁴Southeast University · ⁵Stevens Institute of Technology · ⁶The Hong Kong University of Science and Technology (GZ) · ⁷The University of Manchester · ⁸Simple AI · ⁹Imperial College London · ¹⁰Carnegie Mellon University · ¹¹Zhejiang University · ¹²Beihang University · ¹³The University of Hong Kong

* Equal contribution ‡ Corresponding author

Paper (coming soon) | Website | Dataset (coming soon) | Citation

SoftVTBench performs contact-rich visuo-tactile evaluation under a hidden safe interaction envelope: too loose causes slip or drop, too tight causes over-deformation, and only the safe window between them satisfies both task and safety objectives.

Code and training/evaluation scripts for SoftVTBench will be released in this repository. See the project website for the full paper introduction, method, task suites, and results.

Abstract

Deformable object manipulation poses challenges beyond task completion: successful execution must also maintain safe physical interaction, holding the object stably without slip or drop while avoiding excessive deformation. However, existing manipulation benchmarks are predominantly success-oriented and rarely evaluate whether a policy remains physically safe throughout execution. We present SoftVTBench, a safety-aware visuo-tactile benchmark for physically constrained deformable object manipulation. Built in Isaac Sim with finite-element-simulated deformable objects, SoftVTBench provides multi-view RGB observations, RGB tactile sensing with marker motion, proprioception, and language instructions, and defines four matched task suites over object type (deformable vs. rigid) and variation axis (object vs. spatial). It reports Goal Success and Safety Success separately, where Safety Success additionally requires no drop and peak object deformation below a calibrated, object-specific threshold, computed from privileged FEM simulation states that are hidden from the policy. We implement π_0.5-based baselines under this protocol. Experiments show that success-only evaluation substantially overstates policy performance — a large fraction of goal-completing rollouts violate physical safety — and that adding tactile sensing improves Safety Success (e.g., from 21.4% to 35.6% on object-centric deformable tasks) and reduces object deformation during execution, while Goal Success remains comparable. SoftVTBench provides a reproducible benchmark for studying visuo-tactile deformable manipulation under physical interaction constraints.

Benchmark Design

SoftVTBench performs contact-rich visuo-tactile evaluation of robot policies under two coupled requirements: completing the manipulation goal and maintaining safe physical interaction throughout execution. A rollout is considered physically safe only if the object remains stably grasped without slip or drop, and its peak deformation stays below a calibrated, object-specific threshold. By reporting Goal Success and Safety Success separately, SoftVTBench exposes goal-complete but physically unsafe rollouts that are hidden by success-only evaluation.

SoftVTBench instantiates this evaluation protocol in Isaac Sim with simulated deformable objects based on finite element method (FEM) soft-body dynamics and a Franka Panda arm with a parallel-jaw gripper carrying GelSight Mini tactile sensors on both fingers. The benchmark provides synchronized third-person and wrist RGB observations, tactile RGB images, marker-motion fields, proprioception, and a standardized end-effector and gripper action interface, all synchronized at 20 Hz.

Safety Success ⊆ Goal Success
Safety Success = Goal Success AND NoDrop AND D_peak ≤ τ_o

D_peak is the peak object-size-normalized FEM-RMS deformation over the rollout, computed after removing global rigid-body motion. τ_o is an object-specific threshold calibrated from an offline compression sweep. Both are computed from privileged simulator ground truth and hidden from the policy.

SoftVTBench's four matched task suites are generated around a shared procedural pipeline, and jointly probe physical safety, compliance shift, tactile feedback, and gripper control — exposing the Goal–Safety gap that success-only evaluation hides.

Task Suites

A matched 2×2 design over object type (rigid vs. deformable) and variation axis (object vs. spatial).

Rigid control suites — LIBERO-style rigid-object suites that establish baseline manipulation and spatial-variation competence before deformable-object safety is introduced.

Suite	Description
Object-Rigid	Rigid LIBERO-style object tasks re-executed through the SoftVTBench tactile-equipped sensing and recording stack. Tests basic pick-and-place competence under object identity variation.
Spatial-Rigid	Rigid LIBERO-style spatial tasks under the same sensing stack. Tests robustness to spatial variation — localization, approach, and transfer across changing layouts.

Deformable main suites — grasp-and-place suites where the object must reach its target region without dropping or over-deforming.

Suite	Description
Object-Soft	The robot grasps a deformable object and places it into a fixed target container while avoiding drop and excessive deformation. Object identity, geometry, and material compliance vary across tasks.
Spatial-Soft	The same grasp-and-place objective under spatial variation: two visually identical instances appear per scene, and the language instruction specifies which one to manipulate under changing layouts.

Demo videos for all four suites are on the project website.

Experimental Results

π_0.5-Vision (VO) vs. π_0.5-Visuo-Tactile (VT), evaluated under identical physical and task conditions.

Main Results

Suite	Method	Goal Success	Safety Success
Object-Rigid	VO	38.8%	N/A
Object-Rigid	VT	32.4%	N/A
Spatial-Rigid	VO	56.4%	N/A
Spatial-Rigid	VT	63.4%	N/A
Object-Soft	VO	70.4%	21.4%
Object-Soft	VT	71.8%	35.6%
Spatial-Soft	VO	74.2%	32.6%
Spatial-Soft	VT	84.2%	44.6%

VO: vision-only π_0.5 policy (binary gripper command). VT: visuo-tactile π_0.5 policy, additionally observing tactile RGB and marker-motion history with a continuous gripper command. Safety Success is not defined (N/A) for rigid suites, which carry no deformation constraint.

Goal Success and Safety Success across suites. For rigid suites, deformation-based safety constraints are not applicable (N/A). For deformable suites, the gap between Goal Success and Safety Success indicates unsafe goal completions caused by excessive deformation, dropping, or unstable contact.

Deformation Distribution

Suite	Method	Mean	P5	Median	P95
Object-Soft	VO	16.10%	4.30%	10.65%	44.70%
Object-Soft	VT	15.12%	3.90%	8.67%	38.81%
Spatial-Soft	VO	13.16%	5.16%	10.75%	28.96%
Spatial-Soft	VT	11.58%	4.75%	9.67%	26.56%

FEM-RMS deformation over all deformable-object rollouts, reported as a percentage of the object bounding-box diagonal. Lower values indicate safer physical interaction.

Analysis

Tactile information does not consistently improve performance on rigid-object tasks: VT trails VO on Object-Rigid (32.4% vs. 38.8%) but leads on Spatial-Rigid (63.4% vs. 56.4%). When deformation-related safety constraints are absent, visual observations and proprioception already provide the primary information needed for task completion, and tactile sensing yields no consistent gain.

In contrast, VT shows clear advantages on deformable-object tasks. Goal Success is comparable between VO and VT (Object-Soft: 70.4% vs. 71.8%), but VT obtains substantially higher Safety Success (Object-Soft: 21.4% → 35.6%; Spatial-Soft: 32.6% → 44.6%). Tactile sensing is not primarily helping the object reach the target — it is improving contact regulation during execution, reducing slippage, over-compression, and unstable grasping.

The deformation statistics confirm this: VT lowers mean, median, and P95 deformation on both deformable suites, showing that tactile feedback shifts the entire interaction distribution toward safer contact, not just the fraction that clears the safety threshold. Together, these results show that success-only evaluation substantially overstates policy performance on deformable manipulation, and that tactile sensing is most valuable precisely where physical safety is at stake.

Dataset


Task Suites	4
Episodes	2,000
Assets	33
Core Metrics	Goal Success / Safety Success

Each episode contains synchronized third-person and wrist RGB, left/right tactile RGB and marker-motion fields, robot proprioception, end-effector and gripper action trajectories, suite and task identifiers, and evaluator-only privileged signals (FEM nodal deformation, contact status, drop events). All streams are recorded at 20 Hz, with deformable-object safety thresholds calibrated per asset from an offline interaction calibration protocol and held out for evaluation only.

Dataset download link: coming soon.

Citation

If you find SoftVTBench useful, please consider citing our paper.

@article{jing2026softvtbench,
  title   = {SoftVTBench: A Safety-Aware Visuo-Tactile Benchmark for
             Physically Constrained Robotic Manipulation of Deformable Objects},
  author  = {Jing, Bowen and Wang, Mingxin and Hao, Ruiyang and Ge, Chenchen and
             Shen, Hanwen and He, Junjie and Cui, Yang and Hou, Yiming and
             Zhou, Weitao and Wang, Jiawei and Li, Minglei and Zhang, Dandan and
             Zhao, Ding and Liu, Houde and Li, Xiaofan and Liu, Si and
             Luo, Ping and Yu, Haibao},
  year    = {2026},
  note    = {Preprint}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
SoftVTBench_arxiv_preview.pdf		SoftVTBench_arxiv_preview.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SoftVTBench

A Safety-Aware Visuo-Tactile Benchmark for Physically Constrained Robotic Manipulation of Deformable Objects

Abstract

Benchmark Design

Task Suites

Experimental Results

Main Results

Deformation Distribution

Analysis

Dataset

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

SoftVTBench

A Safety-Aware Visuo-Tactile Benchmark for Physically Constrained Robotic Manipulation of Deformable Objects

Abstract

Benchmark Design

Task Suites

Experimental Results

Main Results

Deformation Distribution

Analysis

Dataset

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages