🪙 Euro Coin Classifier (ViT)

Fine-tuning a Vision Transformer to recognize Euro coin denominations from images and exposing it via a simple web interface.

Role

Model design, training, and deployment (solo project)

Timeline

2024 · Personal project

Tech

PyTorch, Vision Transformer (ViT), Hugging Face, Gradio/FastAPI

Live Demo Hugging Face Repo ← Back to Projects

TL;DR

Built an image classifier for Euro coins using a fine-tuned Vision Transformer (ViT).
Trained on a curated dataset of coin images labeled by denomination (e.g., 1c, 2c, 5c, 10c, 20c, 50c, 1€, 2€).
Achieved near-perfect accuracy on the held-out test set, with a clean confusion matrix and stable behavior.
Deployed the model as a web app where users can upload an image of a coin and see the predicted denomination.

Problem & Context

Euro coins share similar shapes and colors, but differ in size, engravings, and design details. Automatically recognizing the coin denomination from an image is a nice testbed for applying modern vision models and evaluating how well they handle fine-grained visual differences.

This project explores how to use a Vision Transformer for this task, instead of a classic CNN, and how to wrap the model into a small, usable app that could be integrated into a larger system (for example a coin-sorting device or an educational tool).

Data & Inputs

A custom dataset of Euro coin images, each labeled with its correct denomination.
Images include variations in lighting, background, and orientation to test robustness.
Standard preprocessing: resizing to the ViT input size, normalization with ImageNet statistics, and basic data augmentation.

The dataset was split into training, validation, and test partitions to allow proper hyperparameter tuning and evaluation without leaking test information.

Approach & Architecture

The core idea is to use a pretrained Vision Transformer and fine-tune it on the Euro coin dataset. This leverages strong representations learned from large-scale image pretraining while adapting to the specific denominations.

Start from a pretrained ViT model (e.g. ViT-Base) available in the Hugging Face ecosystem.
Replace the final classification head with a new linear layer sized to the number of coin classes.
Fine-tune the model on the coin dataset using cross-entropy loss and a moderate learning rate.
Use standard regularization (weight decay, slight augmentation) to avoid overfitting to a relatively small dataset.

Results & Evaluation

The ViT-based classifier performs extremely well on the test split, with almost no misclassifications between denominations. The confusion matrix is close to diagonal, indicating clear separation between classes.

High overall accuracy (close to 100% on the test set, depending on exact split).
Misclassifications mostly occur in borderline cases with poor lighting or partially occluded coins.
The model generalizes well to images not seen during training, including those uploaded by users in the demo.

These results confirm that ViT is a strong choice for fine-grained visual categorization, even on a relatively small but well-structured dataset like Euro coins.

Implementation

Implemented in Python using PyTorch and Hugging Face Transformers.
Training loop handles dataset loading, augmentation, checkpointing, and metric logging.
Deployed with a simple Gradio or FastAPI-style interface on Hugging Face Spaces, allowing users to upload a coin image and get a prediction.

The code is organized so that changing the backbone (e.g. from ViT to a CNN) or adding new denominations is straightforward and mainly requires updating configuration and labels.

Challenges & Lessons Learned

The dataset size is relatively small compared to typical ViT pretraining regimes, so careful regularization and monitoring validation metrics was important.
Minor changes in image preprocessing (cropping vs. padding, normalization choices) can noticeably affect performance.
A clean, simple UI helps reveal corner cases and failure modes that are not obvious from metrics alone.

Overall, this project was a good exercise in applying transformer-based vision models to a concrete, visual classification task and thinking about deployment beyond just a notebook.

Links

Live Demo on Hugging Face Spaces · Hugging Face Repository · Back to all projects