3D Deformable Surface Reconstruction from Visual and Tactile Input with Geometric Prior

1Analog Devices Inc., Cluj-Napoca, Romania
2Department of Automation, Technical University of Cluj-Napoca, Romania
System Architecture

Visual-tactile fusion architecture for deformable surface reconstruction using geometric priors with an Inspire humanoid hand.

Abstract

Reconstructing 3D deformable surfaces during grasping can be a challenging task even though tactile input can provide high fidelity spatial information. The complementary characteristics of visual and tactile data remain an unexplored domain for deformable surface reconstruction. In this paper, we study the problem of surface reconstruction with complementary visual and tactile information based on 3D geometric priors for grasped objects.

We first recover the rough geometric shape of the object (cylinder or sphere) in a table-top scenario using a sampling approach for depth data. Based on the prior geometric shape and the spatial distribution of the touch input, we recover the deformable surface form by fusing the visual and tactile data in 3D. Our results show that this approach enhances 3D reconstruction by exploring the spatial distribution of visual and tactile input.

Simulation Environment

IsaacSim Environment

For contact-aware deformable object manipulation, IsaacSim includes a physics-based Contact Sensor built on the PhysX Contact Report API. In our setup, we use a rigid cylinder and sphere with compliant contacts defined in the physics material. This models soft interaction while keeping bodies rigid, similar to TacSL which uses penalty-based soft-contact constraints for controlled interpenetration.

3D Scene Parsing

RGB-D scene parsing with cylinder detection (yellow) and hand segmentation (magenta).

Real Robot Setup

The real robot experiments used a 6-DoF anthropomorphic Inspire robot Hand, equipped with embedded tactile sensors across all five fingers and the palm. The hand is controlled via ROS2 through a Modbus TCP interface and provides both motor-current-based force feedback and high-resolution tactile pressure readings from distributed sensor arrays.

The tactile data is reprojected in 3D using forward kinematics of the hand, obtaining a point cloud with 5 fields per point: position (x, y, z), RGB color, and 12-bit intensity data.

Deformable bottle

(a) Deformable bottle

Rigid bottle

(b) Rigid bottle

Deformable ball

(c) Deformable ball

Rigid ball

(d) Rigid ball

3D tactile point cloud visualization for different object types.

Multi-Modal Dataset

Multi-modal samples

RGB views with tactile heatmaps for cylinder (left) and sphere (right) grasping. Top: deformable, bottom: rigid.

We collected a multi-modal grasping dataset comprising more than 6,000 synchronized samples, containing camera RGB images, tactile heat-maps, 3D tactile point clouds with intensity, hand actuator states, and joint angles. The dataset is classified into deformable and non-deformable objects based on object compliance.

Methods

PointNet++ for Tactile Point Clouds

We use PointNet++ for hierarchical feature extraction from tactile point clouds. Unlike the original PointNet which uses global pooling, it captures local geometric structures through set abstraction layers. Our regression approach is inspired by 3D hand pose estimation techniques. We employ data augmentation including z-axis rotation and ±10% scaling to improve generalization.

VGG19 for Visual Radii Estimation

We tested a radius estimation pipeline using VGG19 on RGB-touch image data in simulation. The model successfully learns the data and makes good predictions. We also explored transfer learning techniques by training the VGG19 model on the left hand and adapting it for the right hand.

VGG predictions

VGG19 radii estimation from RGB touch data compared to ground truth.

Results

Radius estimation comparison

Radius estimators comparison by object class.

Key Results:

  • PointNet++ achieves sub-millimeter accuracy (MAE = 0.7mm) on rigid cylinders, with errors below 0.2mm on standard cans
  • For spheres, the cylinder-trained model generalizes well to both rigid and deformable balls (MAE = 3.1mm)
  • SAC (RANSAC) yields higher errors but requires no learning
  • VGG19 on simulated data achieves the lowest error (0.6mm cylinder, 0.04mm sphere)
  • Real-world transfer shows 0.02mm MAE on cylinders
  • Performance degrades on deformable objects (MAE = 6.6mm) due to compression during grasping

These results confirm the model learns surface curvature to radius mapping effectively.

Quantitative Results

Method Object Type Cylinders MAE Spheres MAE
PointNet++ Rigid 0.7mm 3.6mm
Deformable 6.6mm 3.1mm
RANSAC Rigid 5.4mm -
Deformable 8.1mm -
VGG19 Simulation 0.6mm 0.04mm
Real 0.02mm 0.05mm

BibTeX

@InProceedings{popa2026vitacdef,
  author    = {Popa, Ioan Laurentiu and Brezae, Tudor and Sucala, Paul and Konievic, Robert and Tamas, Levente},
  booktitle = {ICRA 2026 ViTac Workshop},
  title     = {{3D Deformable Surface Reconstruction from Visual and Tactile Input with Geometric Prior}},
  year      = {2026},
}