Project Analysis
image

Stable Diffusion-Based Image Generation and Prompt Engineering

About Project

🧩 Problem Statement

Generating high-quality, realistic, and stylistically accurate images using text prompts is a complex task. Models like Stable Diffusion can produce widely varied outputs depending on multiple generation parameters.

Key Factors Influencing Output
  • Prompt wording: Subtle changes can drastically affect composition and detail
  • Negative prompts: Used to suppress unwanted elements
  • Scheduler selection: Influences the image generation process (e.g., DDIM, Euler)
  • CFG Scale: Controls the strength of prompt adherence (higher = more literal)
  • Inference steps: Affects image quality, detail, and noise reduction

Understanding how each parameter impacts the output is critical for controlling quality, style, and realism in AI art and computer vision applications.

Problem Statement

🧩 Problem Statement

Generating high-quality, realistic, and stylistically accurate images using text prompts is a complex task. Models like Stable Diffusion can produce widely varied outputs depending on multiple generation parameters.

Key Factors Influencing Output
  • Prompt wording: Subtle changes can drastically affect composition and detail
  • Negative prompts: Used to suppress unwanted elements
  • Scheduler selection: Influences the image generation process (e.g., DDIM, Euler)
  • CFG Scale: Controls the strength of prompt adherence (higher = more literal)
  • Inference steps: Affects image quality, detail, and noise reduction

Understanding how each parameter impacts the output is critical for controlling quality, style, and realism in AI art and computer vision applications.

Objective

To explore and evaluate how prompt design, negative prompts, diffusion schedulers, CFG scale, and inference steps impact the output quality of a Stable Diffusion model.

πŸ” Goal

Identify optimal parameter combinations that strike a balance between:

  • Realism – Photographic quality and coherence of the output
  • Creativity – Diversity and uniqueness of generated content
  • Style Control – Consistent aesthetic output aligned with artistic intent
  • Proposed Solution

    πŸ§ͺ Experimental Breakdown

    This project was executed in 6 experimental parts, each focusing on evaluating a specific parameter of the Stable Diffusion model.

    βœ… Part A: Negative Prompt Experiment
    • Main Prompt: "A cozy cottage in the forest at sunset, highly detailed"
    • Negative Prompt: "daylight, bright, sunny, day time"
    • Insight: Negative prompt effectively transformed the output into a night-time setting by suppressing bright/daytime features.
    βœ… Part B: Scheduler Comparison
    • Schedulers Tested:
      • DPMSolverMultistepScheduler – Clearer, sharper, photorealistic
      • EulerAncestralDiscreteScheduler – Softer, more artistic but less sharp
    βœ… Part C: CFG Scale Tuning
    • CFG Values Tested: 1.0, 3.0, 7.5, 12.0, 15.0, 19.0, 25.0
    • Observations:
      • Low CFG β†’ More artistic, vague
      • Mid-range CFG (7.5–12.0) β†’ Best realism and fidelity
      • High CFG β†’ Overfitting, unwanted artifacts
    • Conclusion: Optimal CFG range: 7.5–12.0
    βœ… Part D: Inference Step Variation
    • Steps Tested: 10, 40, 80
    • Findings:
      • 10 steps β†’ Fast but blurry with many artifacts
      • 40 steps β†’ Best balance between quality and speed
      • 80 steps β†’ Slight improvement, but not time-efficient
    • Conclusion: 40 steps is the sweet spot
    βœ… Part E: Prompt Engineering + Optimization
    • Prompt: β€œA futuristic glass building glowing at night, cinematic lighting, ultra-realistic”
    • Parameter Combos Tested:
      • CFG=7.5, Steps=30 β†’ Acceptable but low clarity
      • CFG=12.0, Steps=40 β†’ Best result: realistic, clean, glowing effects
      • CFG=15.0, Steps=60 β†’ Over-sharpened with lighting artifacts

    Technologies Used

    Model & Libraries
    • Stable Diffusion v1.5 – Pretrained text-to-image generation model
    • Diffusers Library (Hugging Face) – Simplified pipeline for loading models and schedulers
    Schedulers Tested
    • DPMSolverMultistepScheduler
    • EulerAncestralDiscreteScheduler
    Hardware
    • CUDA GPU – Used for efficient, fast image generation
    Environment
    • Python – Core programming language
    • Jupyter Notebook – Interactive development and visualization

    Challenges Faced

    • CFG Balance: Too low leads to vague, unfocused results; too high produces overfitted, rigid textures
    • Inference Time: Significantly longer at higher steps (e.g., 80+), affecting real-time usability
    • Scheduler Tradeoff: DPMSolver offers realism and sharpness; EulerA provides softness with artistic style β€” choice depends on use case
    • Manual Prompt Tuning: Required for each new prompt to find optimal configuration (no one-size-fits-all)
    • GPU Memory Management: Generating multiple images concurrently can exhaust GPU resources; requires careful batching or scaling

    Methodology

    πŸ§ͺ Pipeline Setup
    • Used StableDiffusionPipeline from Hugging Face diffusers
    • Deployed on GPU for fast inference
    • Schedulers were set dynamically based on experiment
    Prompt Control
    • Tested base prompts, negative prompts, and parameter combinations
    Parameter Sweeps
    • Generated and saved images by varying one key parameter per experiment
    Analysis Criteria
    • Visual sharpness
    • Prompt accuracy
    • Artifact presence
    • Realism vs artistic expression
    Experiment Results
    Experiment Best Configuration Insight/Output
    Negative Prompting Added: β€œdaylight, sunny…” Shifted sunset to night effectively
    Scheduler Comparison DPMSolver Clearer and more realistic than EulerA
    CFG Scale 7.5 – 12.0 Best balance between accuracy and style
    Inference Steps 40 Fast + high-quality rendering
    Prompt Optimization CFG=12.0, Steps=40 Best result for futuristic architecture scene
    Conclusion
    • Prompt design and tuning of CFG scale & inference steps are critical to realistic generation
    • Best Configuration:
      • Scheduler: DPMSolverMultistepScheduler
      • CFG Scale: 12.0
      • Inference Steps: 40
    • Negative prompts and scheduler selection

      Result / Outcome

      πŸ“· Visual Output
      • Epoch 1: Random noise
      • Epoch 10: Shapes of digits begin to appear
      • Epoch 30: Recognizable, sharp digits generated
      πŸ“‰ Loss Behavior
      Epoch d_loss (↓) g_loss (↑)
      1 0.45 0.44
      10 0.65 0.91
      30 0.64 0.95
      • Discriminator loss remained stable
      • Generator loss increased as expected (indicates adversarial progress)
      πŸ” Hyperparameter Tuning
      • Learning rates tested: 0.0002, 0.0001
      • Adam β₁ values: 0.5, 0.4
      • Latent dimensions: 100, 128

      Best Configuration:

      • lr = 0.0002
      • β₁ = 0.4
      • latent_dim = 100
      • Generator loss: 0.80
      • Discriminator loss: 0.63

EDA
ML MODEL