🤖 Grasp Stability Prediction

Predicting whether a robotic grasp is Stable or Unstable from tactile sensor images using deep learning.

Role

End-to-end ML engineer (solo project)

Timeline

2024 · Personal / course project

Tech

PyTorch, ResNet18, MLP, Gradio, Hugging Face Spaces

Live Demo GitHub Repo ← Back to Projects

Grasp Stability Prediction demo interface

TL;DR

Built a deep learning model that predicts whether a robotic grasp is stable based on tactile sensor images.
Used ResNet18 as a feature extractor and a custom MLP head to classify grasps as Stable or Unstable.
Wrapped the model in a Gradio interface and deployed it on Hugging Face Spaces for interactive experimentation.
Focused on making the model behavior interpretable via simple inputs/outputs rather than a black-box system hidden in a notebook.

Problem & Context

In robotic manipulation, a central question is whether a planned grasp on an object will be stable once executed. Failed grasps mean drops, retries, and potentially damaged objects. Visual information alone is often not enough; tactile feedback provides rich information about contact, pressure distribution, and local geometry.

This project explores how to treat tactile sensor images as input to a deep learning model that predicts grasp stability. The goal is to provide a fast, data-driven decision about stability that could be integrated into a grasp planning pipeline, while still being simple enough to deploy as a web demo.

Data & Inputs

Tactile sensor readings represented as 2D images (pressure/intensity maps at the contact surface).
Each sample labeled as either stable or unstable based on the outcome of the grasp.
Preprocessing steps to normalize intensities and resize images to the input resolution expected by ResNet18.

The dataset was cleaned to remove ambiguous or corrupted samples and balanced to avoid bias toward one class (for example, if there are more stable than unstable grasps).

Approach & Architecture

The model design is based on transfer learning with a convolutional network backbone and a lightweight classifier head. The main steps are:

Start from a pretrained ResNet18 backbone to extract features from tactile images.
Replace the final classification layer with a custom MLP head tailored to the binary stability task.
Fine-tune the last layers of ResNet18 and the MLP head jointly on the grasp stability dataset.
Use cross-entropy loss and standard regularization (weight decay, dropout) to reduce overfitting.

Results & Evaluation

The model learns a clear separation between stable and unstable grasps on the held-out test set. In practice, the demo correctly identifies many cases where the contact pattern indicates instability even if the scenario might appear reasonable at first glance.

Evaluated performance on a test split with standard classification metrics (accuracy, precision, recall).
Per-class inspection showed that the model is particularly good at detecting obviously unstable contact patterns, with some confusion in borderline cases.
The interactive demo helps visually inspect how small changes in tactile patterns influence the predicted stability.

For a production setting, the next step would be to benchmark this model against alternative baselines (e.g. hand-crafted features, simpler CNNs) and test its robustness on data collected from different objects and sensor setups.

Implementation

Implemented in Python using PyTorch for model definition and training.
Training and experimentation done in Jupyter notebooks, with reproducible scripts for preprocessing and training.
Deployed as a Gradio interface and hosted on Hugging Face Spaces so others can try the model directly in the browser.

The codebase is structured to separate data loading, model definition, training loops, and the Gradio UI, making it easier to iterate on the model architecture without touching the deployment layer.

Challenges & Lessons Learned

Working with tactile images is less standardized than RGB images, so preprocessing choices (normalization, resizing, augmentation) had a noticeable impact on performance.
Balancing the dataset between stable and unstable grasps was important to avoid a biased classifier that defaults to the majority class.
Building a small but clean Gradio UI forced me to think about how to expose the model in a way that is understandable and debuggable for users.

Overall, this project reinforced the value of combining a solid baseline architecture (ResNet18) with careful problem framing and deployment, rather than over-complicating the model from the start.

Links

Live Demo on Hugging Face Spaces · GitHub Repository · Back to all projects