Project Analysis
CIFAR-10 Image Classification with CNN & Transfer Learning
About Project
π¨ Real-World Challenges in Image Classification
- Small input size: CIFAR-10 images are only 32Γ32, limiting detail and context
- Overfitting: Models often memorize training data rather than generalize
- Poor robustness: Struggle with varied lighting, backgrounds, blur, or webcam noise
- Deployment hurdles: Real-time webcam predictions suffer from latency and performance issues, especially on low-resource hardware
Problem Statement
π¨ Real-World Challenges in Image Classification
- Small input size: CIFAR-10 images are only 32Γ32, limiting detail and context
- Overfitting: Models often memorize training data rather than generalize
- Poor robustness: Struggle with varied lighting, backgrounds, blur, or webcam noise
- Deployment hurdles: Real-time webcam predictions suffer from latency and performance issues, especially on low-resource hardware
Objective
π― Project Objective
To build a robust and deployable image classification system capable of performing well in real-world scenarios.
π οΈ Key Components
- Convolutional Neural Networks (CNNs): Custom or pre-trained models for baseline performance
- Data Augmentation: Improve generalization with transformations (e.g., flip, zoom, shift, blur)
- Transfer Learning: Utilize
ResNet50with fine-tuning for efficient feature extraction - Real-Time Prediction: Capture and classify images from webcam stream
- Deployment: Integrated into a web application using
Django(optionally withFastAPIfor backend inference)
Proposed Solution
π§ Model Training & Experiments
- Trained a custom CNN on the
CIFAR-10dataset - Experimented with:
- Optimizers: SGD, RMSProp, Adam
- Batch Sizes: 16, 32, 64
- Dropout Rates: 0.1, 0.2, 0.3, 0.5
- Model Variants: Deeper vs Wider architectures
- Applied data augmentation (rotation, zoom, contrast, horizontal flipping)
- Implemented Transfer Learning with
ResNet50:- Adjusted for CIFAR-10's 32Γ32 input size
- Froze initial layers to retain pre-trained weights
Deployment
- Deployed the best model using a Django-based web interface
- Tested real-time webcam input to perform live image classification
Technologies Used
- Python, TensorFlow/Keras: CNN model development and training
- Pandas, NumPy, Matplotlib: Data preprocessing and visualization
- ResNet50 (Transfer Learning): Used as an ImageNet-pretrained backbone
- Django: Web interface and deployment
- OpenCV: Real-time webcam image capture and handling
- Google Colab (T4 GPU): Model training and experimentation environment
Challenges Faced
- Slow Training: Each training run took over
589 seconds, especially with deeper CNN architectures - Overfitting: Baseline CNN showed signs of overfitting despite using regularization and dropout
- Transfer Learning Issues:
ResNet50underperformed due to input size mismatch (32Γ32vs.224Γ224) - Heavy Deployment: Django web app crashed under memory load during real-time webcam predictions
- Real-time Noise: Webcam inputs had noise, blur, and resolution issues, leading to poor classification results
- Compute Requirements: Fine-tuning
ResNet50required image resizing and high compute resources
Methodology
π§ͺ CNN Model Testing (24 Combinations)
- Batch Sizes: 16, 32, 64
- Optimizers: SGD, RMSProp, Adam
- Dropout Rates: 0.1, 0.2, 0.3, 0.5
- Model Variants:
- Deeper: 3 convolutional layers
- Wider: 2 layers Γ 64 filters
- Best Configuration: RMSProp + Deeper model + Dropout 0.1
- Best Validation Accuracy: 56.86%
π Data Augmentation
- Applied: Horizontal flip, random rotation, zoom, contrast enhancement
- Outcome: Marginal overfitting reduction, but short-term accuracy dropped
π Transfer Learning (ResNet50)
include_top=False, added custom dense layers with GlobalAveragePooling- Frozen pretrained layers to preserve learned features
- Validation Accuracy: 23.48% (poor due to CIFARβs small input size)
Validation Insights
- Performance order: Baseline CNN > Augmentation > Transfer Learning
- Deeper CNNs outperformed wider ones
- RMSProp was the most stable optimizer across runs
- Clear overfitting observed via divergence in training and validation curves
Result / Outcome
π Configuration Comparison
| Configuration | Accuracy (Train / Validation) | Remarks |
|---|---|---|
| Best CNN (RMSProp, 3-layer) | 78.7% / 73.5% | Best overall performance |
| CNN + Augmentation | 65.0% / 54.8% | Slight overfitting, improved robustness |
| Transfer Learning (ResNet50) | 16.6% / 23.4% | Poor performance due to input size mismatch |
Model Insights
- Confusion observed between visually similar classes (e.g., cats vs dogs, trucks vs cars)
- Frogs and airplanes were classified with the least error
- Webcam input: Lower accuracy due to blur, lighting, and low resolution
- Deployment: Initial version built on Django; planned upgrade to Django + FastAPI