3D Deformable Surface Reconstruction from Visual and Tactile Input with Geometric Prior

Ioan Laurentiu Popa¹, Tudor Brezae², Paul Sucala², Robert Konievic², Levente Tamas²

¹Analog Devices Inc., Cluj-Napoca, Romania
²Department of Automation, Technical University of Cluj-Napoca, Romania

Visual-tactile fusion architecture for deformable surface reconstruction using geometric priors with an Inspire humanoid hand.

Abstract

Reconstructing 3D deformable surfaces during grasping can be a challenging task even though tactile input can provide high fidelity spatial information. The complementary characteristics of visual and tactile data remain an unexplored domain for deformable surface reconstruction. In this paper, we study the problem of surface reconstruction with complementary visual and tactile information based on 3D geometric priors for grasped objects.

We first recover the rough geometric shape of the object (cylinder or sphere) in a table-top scenario using a sampling approach for depth data. Based on the prior geometric shape and the spatial distribution of the touch input, we recover the deformable surface form by fusing the visual and tactile data in 3D. Our results show that this approach enhances 3D reconstruction by exploring the spatial distribution of visual and tactile input.

Simulation Environment

For contact-aware deformable object manipulation, IsaacSim includes a physics-based Contact Sensor built on the PhysX Contact Report API. In our setup, we use a rigid cylinder and sphere with compliant contacts defined in the physics material. This models soft interaction while keeping bodies rigid, similar to TacSL which uses penalty-based soft-contact constraints for controlled interpenetration.

RGB-D scene parsing with cylinder detection (yellow) and hand segmentation (magenta).

Real Robot Setup

The real robot experiments used a 6-DoF anthropomorphic Inspire robot Hand, equipped with embedded tactile sensors across all five fingers and the palm. The hand is controlled via ROS2 through a Modbus TCP interface and provides both motor-current-based force feedback and high-resolution tactile pressure readings from distributed sensor arrays.

The tactile data is reprojected in 3D using forward kinematics of the hand, obtaining a point cloud with 5 fields per point: position (x, y, z), RGB color, and 12-bit intensity data.

(a) Deformable bottle

(b) Rigid bottle

(d) Rigid ball

3D tactile point cloud visualization for different object types.

Multi-Modal Dataset

RGB views with tactile heatmaps for cylinder (left) and sphere (right) grasping. Top: deformable, bottom: rigid.

We collected a multi-modal grasping dataset comprising more than 6,000 synchronized samples, containing camera RGB images, tactile heat-maps, 3D tactile point clouds with intensity, hand actuator states, and joint angles. The dataset is classified into deformable and non-deformable objects based on object compliance.

Methods

PointNet++ for Tactile Point Clouds

We use PointNet++ for hierarchical feature extraction from tactile point clouds. Unlike the original PointNet which uses global pooling, it captures local geometric structures through set abstraction layers. Our regression approach is inspired by 3D hand pose estimation techniques. We employ data augmentation including z-axis rotation and ±10% scaling to improve generalization.

VGG19 for Visual Radii Estimation

We tested a radius estimation pipeline using VGG19 on RGB-touch image data in simulation. The model successfully learns the data and makes good predictions. We also explored transfer learning techniques by training the VGG19 model on the left hand and adapting it for the right hand.

VGG19 radii estimation from RGB touch data compared to ground truth.

Results

Radius estimators comparison by object class.

Key Results:

PointNet++ achieves sub-millimeter accuracy (MAE = 0.7mm) on rigid cylinders, with errors below 0.2mm on standard cans
For spheres, the cylinder-trained model generalizes well to both rigid and deformable balls (MAE = 3.1mm)
SAC (RANSAC) yields higher errors but requires no learning
VGG19 on simulated data achieves the lowest error (0.6mm cylinder, 0.04mm sphere)
Real-world transfer shows 0.02mm MAE on cylinders
Performance degrades on deformable objects (MAE = 6.6mm) due to compression during grasping

These results confirm the model learns surface curvature to radius mapping effectively.

Quantitative Results

Method	Object Type	Cylinders MAE	Spheres MAE
PointNet++	Rigid	0.7mm	3.6mm
PointNet++	Deformable	6.6mm	3.1mm
RANSAC	Rigid	5.4mm	-
RANSAC	Deformable	8.1mm	-
VGG19	Simulation	0.6mm	0.04mm
VGG19	Real	0.02mm	0.05mm

BibTeX

@InProceedings{popa2026vitacdef,
  author    = {Popa, Ioan Laurentiu and Brezae, Tudor and Sucala, Paul and Konievic, Robert and Tamas, Levente},
  booktitle = {ICRA 2026 ViTac Workshop},
  title     = {{3D Deformable Surface Reconstruction from Visual and Tactile Input with Geometric Prior}},
  year      = {2026},
}