Sunil M Anirudhan

Project Analysis

Stable Diffusion-Based Image Generation and Prompt Engineering

About Project

🧩 Problem Statement

Generating high-quality, realistic, and stylistically accurate images using text prompts is a complex task. Models like Stable Diffusion can produce widely varied outputs depending on multiple generation parameters.

Key Factors Influencing Output

Prompt wording: Subtle changes can drastically affect composition and detail
Negative prompts: Used to suppress unwanted elements
Scheduler selection: Influences the image generation process (e.g., DDIM, Euler)
CFG Scale: Controls the strength of prompt adherence (higher = more literal)
Inference steps: Affects image quality, detail, and noise reduction

Understanding how each parameter impacts the output is critical for controlling quality, style, and realism in AI art and computer vision applications.

Problem Statement

🧩 Problem Statement

Key Factors Influencing Output

Prompt wording: Subtle changes can drastically affect composition and detail
Negative prompts: Used to suppress unwanted elements
Scheduler selection: Influences the image generation process (e.g., DDIM, Euler)
CFG Scale: Controls the strength of prompt adherence (higher = more literal)
Inference steps: Affects image quality, detail, and noise reduction

Understanding how each parameter impacts the output is critical for controlling quality, style, and realism in AI art and computer vision applications.

Objective

To explore and evaluate how prompt design, negative prompts, diffusion schedulers, CFG scale, and inference steps impact the output quality of a Stable Diffusion model.

🔍 Goal

Identify optimal parameter combinations that strike a balance between:

Realism – Photographic quality and coherence of the output
Creativity – Diversity and uniqueness of generated content
Style Control – Consistent aesthetic output aligned with artistic intent

Proposed Solution

🧪 Experimental Breakdown

This project was executed in 6 experimental parts, each focusing on evaluating a specific parameter of the Stable Diffusion model.

✅ Part A: Negative Prompt Experiment

Main Prompt: "A cozy cottage in the forest at sunset, highly detailed"
Negative Prompt: "daylight, bright, sunny, day time"
Insight: Negative prompt effectively transformed the output into a night-time setting by suppressing bright/daytime features.

✅ Part B: Scheduler Comparison

Schedulers Tested:
- DPMSolverMultistepScheduler – Clearer, sharper, photorealistic
- EulerAncestralDiscreteScheduler – Softer, more artistic but less sharp

✅ Part C: CFG Scale Tuning

CFG Values Tested: 1.0, 3.0, 7.5, 12.0, 15.0, 19.0, 25.0
Observations:
- Low CFG → More artistic, vague
- Mid-range CFG (7.5–12.0) → Best realism and fidelity
- High CFG → Overfitting, unwanted artifacts
Conclusion: Optimal CFG range: 7.5–12.0

✅ Part D: Inference Step Variation

Steps Tested: 10, 40, 80
Findings:
- 10 steps → Fast but blurry with many artifacts
- 40 steps → Best balance between quality and speed
- 80 steps → Slight improvement, but not time-efficient
Conclusion: 40 steps is the sweet spot

✅ Part E: Prompt Engineering + Optimization

Prompt: “A futuristic glass building glowing at night, cinematic lighting, ultra-realistic”
Parameter Combos Tested:
- CFG=7.5, Steps=30 → Acceptable but low clarity
- CFG=12.0, Steps=40 → Best result: realistic, clean, glowing effects
- CFG=15.0, Steps=60 → Over-sharpened with lighting artifacts

Technologies Used

Model & Libraries

Stable Diffusion v1.5 – Pretrained text-to-image generation model
Diffusers Library (Hugging Face) – Simplified pipeline for loading models and schedulers

Schedulers Tested

DPMSolverMultistepScheduler
EulerAncestralDiscreteScheduler

Hardware

CUDA GPU – Used for efficient, fast image generation

Environment

Python – Core programming language
Jupyter Notebook – Interactive development and visualization

Challenges Faced

CFG Balance: Too low leads to vague, unfocused results; too high produces overfitted, rigid textures
Inference Time: Significantly longer at higher steps (e.g., 80+), affecting real-time usability
Scheduler Tradeoff: DPMSolver offers realism and sharpness; EulerA provides softness with artistic style — choice depends on use case
Manual Prompt Tuning: Required for each new prompt to find optimal configuration (no one-size-fits-all)
GPU Memory Management: Generating multiple images concurrently can exhaust GPU resources; requires careful batching or scaling

Methodology

🧪 Pipeline Setup

Used StableDiffusionPipeline from Hugging Face diffusers
Deployed on GPU for fast inference
Schedulers were set dynamically based on experiment

Prompt Control

Tested base prompts, negative prompts, and parameter combinations

Parameter Sweeps

Generated and saved images by varying one key parameter per experiment

Analysis Criteria

Visual sharpness
Prompt accuracy
Artifact presence
Realism vs artistic expression

Experiment Results

Experiment	Best Configuration	Insight/Output
Negative Prompting	Added: “daylight, sunny…”	Shifted sunset to night effectively
Scheduler Comparison	DPMSolver	Clearer and more realistic than EulerA
CFG Scale	7.5 – 12.0	Best balance between accuracy and style
Inference Steps	40	Fast + high-quality rendering
Prompt Optimization	CFG=12.0, Steps=40	Best result for futuristic architecture scene

Conclusion

Prompt design and tuning of CFG scale & inference steps are critical to realistic generation
Best Configuration:

Scheduler: DPMSolverMultistepScheduler
CFG Scale: 12.0
Inference Steps: 40

Negative prompts and scheduler selection

Result / Outcome

📷 Visual Output
- Epoch 1: Random noise
- Epoch 10: Shapes of digits begin to appear
- Epoch 30: Recognizable, sharp digits generated
📉 Loss Behavior

Epoch d_loss (↓) g_loss (↑)

1 0.45 0.44

10 0.65 0.91

30 0.64 0.95
- Discriminator loss remained stable
- Generator loss increased as expected (indicates adversarial progress)
🔁 Hyperparameter Tuning
- Learning rates tested: 0.0002, 0.0001
- Adam β₁ values: 0.5, 0.4
- Latent dimensions: 100, 128
Best Configuration:
- lr = 0.0002
- β₁ = 0.4
- latent_dim = 100
- Generator loss: 0.80
- Discriminator loss: 0.63

Project Analysis

Stable Diffusion-Based Image Generation and Prompt Engineering

About Project

Problem Statement

Objective

Proposed Solution

Technologies Used

Challenges Faced

Methodology

Result / Outcome

EDA

ML MODEL

Epoch	d_loss (↓)	g_loss (↑)
1	0.45	0.44
10	0.65	0.91
30	0.64	0.95

Choose Your Language

Project Analysis

Stable Diffusion-Based Image Generation and Prompt Engineering

About Project

Problem Statement

Objective

Proposed Solution

Technologies Used

Challenges Faced

Methodology

Result / Outcome

EDA

ML MODEL