Project Analysis
image

CIFAR-10 Image Classification with CNN & Transfer Learning

About Project

🚨 Real-World Challenges in Image Classification
  • Small input size: CIFAR-10 images are only 32Γ—32, limiting detail and context
  • Overfitting: Models often memorize training data rather than generalize
  • Poor robustness: Struggle with varied lighting, backgrounds, blur, or webcam noise
  • Deployment hurdles: Real-time webcam predictions suffer from latency and performance issues, especially on low-resource hardware

Problem Statement

🚨 Real-World Challenges in Image Classification
  • Small input size: CIFAR-10 images are only 32Γ—32, limiting detail and context
  • Overfitting: Models often memorize training data rather than generalize
  • Poor robustness: Struggle with varied lighting, backgrounds, blur, or webcam noise
  • Deployment hurdles: Real-time webcam predictions suffer from latency and performance issues, especially on low-resource hardware

Objective

🎯 Project Objective

To build a robust and deployable image classification system capable of performing well in real-world scenarios.

πŸ› οΈ Key Components
  • Convolutional Neural Networks (CNNs): Custom or pre-trained models for baseline performance
  • Data Augmentation: Improve generalization with transformations (e.g., flip, zoom, shift, blur)
  • Transfer Learning: Utilize ResNet50 with fine-tuning for efficient feature extraction
  • Real-Time Prediction: Capture and classify images from webcam stream
  • Deployment: Integrated into a web application using Django (optionally with FastAPI for backend inference)

Proposed Solution

🧠 Model Training & Experiments
  • Trained a custom CNN on the CIFAR-10 dataset
  • Experimented with:
    • Optimizers: SGD, RMSProp, Adam
    • Batch Sizes: 16, 32, 64
    • Dropout Rates: 0.1, 0.2, 0.3, 0.5
    • Model Variants: Deeper vs Wider architectures
  • Applied data augmentation (rotation, zoom, contrast, horizontal flipping)
  • Implemented Transfer Learning with ResNet50:
    • Adjusted for CIFAR-10's 32Γ—32 input size
    • Froze initial layers to retain pre-trained weights
Deployment
  • Deployed the best model using a Django-based web interface
  • Tested real-time webcam input to perform live image classification

Technologies Used

  • Python, TensorFlow/Keras: CNN model development and training
  • Pandas, NumPy, Matplotlib: Data preprocessing and visualization
  • ResNet50 (Transfer Learning): Used as an ImageNet-pretrained backbone
  • Django: Web interface and deployment
  • OpenCV: Real-time webcam image capture and handling
  • Google Colab (T4 GPU): Model training and experimentation environment

Challenges Faced

  • Slow Training: Each training run took over 589 seconds, especially with deeper CNN architectures
  • Overfitting: Baseline CNN showed signs of overfitting despite using regularization and dropout
  • Transfer Learning Issues: ResNet50 underperformed due to input size mismatch (32Γ—32 vs. 224Γ—224)
  • Heavy Deployment: Django web app crashed under memory load during real-time webcam predictions
  • Real-time Noise: Webcam inputs had noise, blur, and resolution issues, leading to poor classification results
  • Compute Requirements: Fine-tuning ResNet50 required image resizing and high compute resources

Methodology

πŸ§ͺ CNN Model Testing (24 Combinations)
  • Batch Sizes: 16, 32, 64
  • Optimizers: SGD, RMSProp, Adam
  • Dropout Rates: 0.1, 0.2, 0.3, 0.5
  • Model Variants:
    • Deeper: 3 convolutional layers
    • Wider: 2 layers Γ— 64 filters
  • Best Configuration: RMSProp + Deeper model + Dropout 0.1
  • Best Validation Accuracy: 56.86%
πŸ”„ Data Augmentation
  • Applied: Horizontal flip, random rotation, zoom, contrast enhancement
  • Outcome: Marginal overfitting reduction, but short-term accuracy dropped
πŸ” Transfer Learning (ResNet50)
  • include_top=False, added custom dense layers with GlobalAveragePooling
  • Frozen pretrained layers to preserve learned features
  • Validation Accuracy: 23.48% (poor due to CIFAR’s small input size)
Validation Insights
  • Performance order: Baseline CNN > Augmentation > Transfer Learning
  • Deeper CNNs outperformed wider ones
  • RMSProp was the most stable optimizer across runs
  • Clear overfitting observed via divergence in training and validation curves

Result / Outcome

πŸ“Š Configuration Comparison
Configuration Accuracy (Train / Validation) Remarks
Best CNN (RMSProp, 3-layer) 78.7% / 73.5% Best overall performance
CNN + Augmentation 65.0% / 54.8% Slight overfitting, improved robustness
Transfer Learning (ResNet50) 16.6% / 23.4% Poor performance due to input size mismatch
Model Insights
  • Confusion observed between visually similar classes (e.g., cats vs dogs, trucks vs cars)
  • Frogs and airplanes were classified with the least error
  • Webcam input: Lower accuracy due to blur, lighting, and low resolution
  • Deployment: Initial version built on Django; planned upgrade to Django + FastAPI

EDA
ML MODEL