Projets

Mon Portfolio

Spam Detection Application

View Details

FIFA 19 Player Value Prediction & Recruitment Strategy

View Details

Spotify

View Details

HR Salary Prediction

View Details

Predict Iris

View Details

AI-Enhanced Technical Indicator-Based Stock Trading Strategy

View Details

YOLOv8-Based Object Detection & Segmentation for Trash

View Details

GAN Model for MNIST Digit Generation

View Details

Stable Diffusion-Based Image Generation and Prompt Engineering

View Details

CIFAR-10 Image Classification with CNN & Transfer Learning

View Details

Recommender System for Amazon Beauty Products

View Details

Personal Assistant Chatbot – Rule-Based + NLP Hybrid

View Details

Description

Generating realistic handwritten digit images requires a model capable of learning complex data distributions from limited training samples. Traditional image generation methods often lack diversity and tend to overfit the dataset.

Project Challenge

The key challenge is to train a GAN (Generative Adversarial Network) that can:

Generate sharp and visually diverse handwritten digits
Maintain training stability throughout epochs
Avoid common pitfalls such as mode collapse and vanishing gradients

Problem

Project Challenge

The key challenge is to train a GAN (Generative Adversarial Network) that can:

Generate sharp and visually diverse handwritten digits
Maintain training stability throughout epochs
Avoid common pitfalls such as mode collapse and vanishing gradients

Objective

To implement and train a Generative Adversarial Network (GAN) on the MNIST dataset that learns to generate synthetic 28×28 grayscale handwritten digit images.

Project Goals

Monitor training progression using generated image samples and loss plots
Perform hyperparameter tuning to identify optimal training configurations
Justify all architectural and design decisions for both generator and discriminator
Explore advanced GAN variants such as:
- DCGAN – Deep Convolutional GAN
- WGAN – Wasserstein GAN for better stability
- StyleGAN – For high-quality and stylized image synthesis

Solution

Project Components

Generator: Upsamples random noise into 28×28 grayscale images using Dense and Conv2DTranspose layers for feature expansion and image shaping.
Discriminator: Classifies real vs. fake images using Conv2D layers with Dropout for regularization and overfitting control.
Custom GAN Class: Encapsulates the full training loop with a custom train_step() method for adversarial updates of both generator and discriminator.
LossPlotCallback: Custom Keras callback to visualize and save generated image samples and loss trends after each epoch, aiding in training diagnostics.

Technologies Used

Tools & Technologies

Frameworks & Libraries

TensorFlow / Keras – For building and training GAN models
Matplotlib / PIL – For saving generated images and plotting loss curves

Dataset

MNIST – Grayscale handwritten digit dataset (28×28)

Training Configuration

Adam Optimizer – With learning rate and β₁ tuning for stability
Binary Cross-Entropy – Used as the loss function for both generator and discriminator

Development Environment

Jupyter Notebook / Python – For implementation and experimentation

Challenges Faced

Training Challenges & Observations

Generator instability observed at higher epochs, requiring monitoring and early stopping
Mode collapse risk was mitigated using dropout in the discriminator and label smoothing (0.9 for real labels)
Spiking generator loss indicated increasing difficulty in fooling a strong discriminator
Small image resolution (28×28) limited the expressive power of the generator

Methodology

Generator Architecture

Input: 100-dimensional latent vector
Dense layer: Output shape 7×7×256
Conv2DTranspose layers:
- 128 filters → 7×7
- 64 filters → 14×14
- 1 filter → 28×28 (final output)
Activations: LeakyReLU for intermediate layers, Tanh for output
Output: 28×28 grayscale image

Discriminator Architecture

Input: 28×28×1 image
Conv2D layers:
- 64 filters + Dropout
- 128 filters + Dropout
Output: Flatten → Dense → Sigmoid (probability real/fake)

GAN Training Loop

For each batch:

Description

🧩 Problem Statement

Generating high-quality, realistic, and stylistically accurate images using text prompts is a complex task. Models like Stable Diffusion can produce widely varied outputs depending on multiple generation parameters.

Key Factors Influencing Output

Prompt wording: Subtle changes can drastically affect composition and detail
Negative prompts: Used to suppress unwanted elements
Scheduler selection: Influences the image generation process (e.g., DDIM, Euler)
CFG Scale: Controls the strength of prompt adherence (higher = more literal)
Inference steps: Affects image quality, detail, and noise reduction

Understanding how each parameter impacts the output is critical for controlling quality, style, and realism in AI art and computer vision applications.

Problem

🧩 Problem Statement

Key Factors Influencing Output

Prompt wording: Subtle changes can drastically affect composition and detail
Negative prompts: Used to suppress unwanted elements
Scheduler selection: Influences the image generation process (e.g., DDIM, Euler)
CFG Scale: Controls the strength of prompt adherence (higher = more literal)
Inference steps: Affects image quality, detail, and noise reduction

Understanding how each parameter impacts the output is critical for controlling quality, style, and realism in AI art and computer vision applications.

Objective

To explore and evaluate how prompt design, negative prompts, diffusion schedulers, CFG scale, and inference steps impact the output quality of a Stable Diffusion model.

🔍 Goal

Identify optimal parameter combinations that strike a balance between:

Realism – Photographic quality and coherence of the output
Creativity – Diversity and uniqueness of generated content
Style Control – Consistent aesthetic output aligned with artistic intent

Solution

🧪 Experimental Breakdown

This project was executed in 6 experimental parts, each focusing on evaluating a specific parameter of the Stable Diffusion model.

✅ Part A: Negative Prompt Experiment

Main Prompt: "A cozy cottage in the forest at sunset, highly detailed"
Negative Prompt: "daylight, bright, sunny, day time"
Insight: Negative prompt effectively transformed the output into a night-time setting by suppressing bright/daytime features.

✅ Part B: Scheduler Comparison

Schedulers Tested:
- DPMSolverMultistepScheduler – Clearer, sharper, photorealistic
- EulerAncestralDiscreteScheduler – Softer, more artistic but less sharp

✅ Part C: CFG Scale Tuning

CFG Values Tested: 1.0, 3.0, 7.5, 12.0, 15.0, 19.0, 25.0
Observations:
- Low CFG → More artistic, vague
- Mid-range CFG (7.5–12.0) → Best realism and fidelity
- High CFG → Overfitting, unwanted artifacts
Conclusion: Optimal CFG range: 7.5–12.0

✅ Part D: Inference Step Variation

Steps Tested: 10, 40, 80
Findings:
- 10 steps → Fast but blurry with many artifacts
- 40 steps → Best balance between quality and speed
- 80 steps → Slight improvement, but not time-efficient
Conclusion: 40 steps is the sweet spot

✅ Part E: Prompt Engineering + Optimization

Prompt: “A futuristic glass building glowing at night, cinematic lighting, ultra-realistic”
Parameter Combos Tested:
- CFG=7.5, Steps=30 → Acceptable but low clarity
- CFG=12.0, Steps=40 → Best result: realistic, clean, glowing effects
- CFG=15.0, Steps=60 → Over-sharpened with lighting artifacts

Technologies Used

Model & Libraries

Stable Diffusion v1.5 – Pretrained text-to-image generation model
Diffusers Library (Hugging Face) – Simplified pipeline for loading models and schedulers

Schedulers Tested

DPMSolverMultistepScheduler
EulerAncestralDiscreteScheduler

Hardware

CUDA GPU – Used for efficient, fast image generation

Environment

Python – Core programming language
Jupyter Notebook – Interactive development and visualization

Challenges Faced

CFG Balance: Too low leads to vague, unfocused results; too high produces overfitted, rigid textures
Inference Time: Significantly longer at higher steps (e.g., 80+), affecting real-time usability
Scheduler Tradeoff: DPMSolver offers realism and sharpness; EulerA provides softness with artistic style — choice depends on use case
Manual Prompt Tuning: Required for each new prompt to find optimal configuration (no one-size-fits-all)
GPU Memory Management: Generating multiple images concurrently can exhaust GPU resources; requires careful batching or scaling

Methodology

🧪 Pipeline Setup

Used StableDiffusionPipeline from Hugging Face diffusers
Deployed on GPU for fast inference
Schedulers were set dynamically based on experiment

Prompt Control

Tested base prompts, negative prompts, and parameter combinations

Parameter Sweeps

Generated and saved images by varying one key parameter per experiment

Analysis Criteria

Visual sharpness
Prompt accuracy
Artifact presence
Realism vs artistic expression

Experiment Results

Experiment	Best Configuration	Insight/Output
Negative Prompting	Added: “daylight, sunny…”	Shifted sunset to night effectively
Scheduler Comparison	DPMSolver	Clearer and more realistic than EulerA
CFG Scale	7.5 – 12.0	Best balance between accuracy and style
Inference Steps	40	Fast + high-quality rendering
Prompt Optimization	CFG=12.0, Steps=40	Best result for futuristic architecture scene

Conclusion

Prompt design and tuning of CFG scale & inference steps are critical to realistic generation
Best Configuration:

Scheduler: DPMSolverMultistepScheduler
CFG Scale: 12.0
Inference Steps: 40

Negative prompts and scheduler selection

Result

<div class="mb-16 fw-bold">📷 Visual Output</div> <ul class="text-secondary-light" style="list-style-type: disc; padding-left: 20px;"> <li><strong>Epoch 1:</strong> Random noise</li> <li><strong>Epoch 10:</strong> Shapes of digits begin to appear</li> <li><strong>Epoch 30:</strong> Recognizable, sharp digits generated</li> </ul> <div class="mb-16 fw-bold">📉 Loss Behavior</div> <table class="text-secondary-light" border="1" cellpadding="8" cellspacing="0" style="border-collapse: collapse;"> <thead> <tr> <th>Epoch</th> <th>d_loss (↓)</th> <th>g_loss (↑)</th> </tr> </thead> <tbody> <tr> <td>1</td> <td>0.45</td> <td>0.44</td> </tr> <tr> <td>10</td> <td>0.65</td> <td>0.91</td> </tr> <tr> <td>30</td> <td>0.64</td> <td>0.95</td> </tr> </tbody> </table> <ul class="text-secondary-light" style="list-style-type: disc; padding-left: 20px; margin-top: 10px;"> <li>Discriminator loss remained stable</li> <li>Generator loss increased as expected (indicates adversarial progress)</li> </ul> <div class="mb-16 fw-bold">🔁 Hyperparameter Tuning</div> <ul class="text-secondary-light" style="list-style-type: disc; padding-left: 20px;"> <li>Learning rates tested: <code>0.0002</code>, <code>0.0001</code></li> <li>Adam β₁ values: <code>0.5</code>, <code>0.4</code></li> <li>Latent dimensions: <code>100</code>, <code>128</code></li> </ul> <p class="text-secondary-light"><strong>Best Configuration:</strong></p> <ul class="text-secondary-light" style="list-style-type: disc; padding-left: 20px;"> <li><code>lr = 0.0002</code></li> <li><code>β₁ = 0.4</code></li> <li><code>latent_dim = 100</code></li> <li><strong>Generator loss:</strong> 0.80</li> <li><strong>Discriminator loss:</strong> 0.63</li> </ul>