Intermediate Guide Stable Diffusion

Stable Diffusion Advanced: Model Fine-Tuning and LoRA

Train custom LoRA models to encode your art style, create consistent characters, or specialise for specific aesthetics without needing a degree in ML.

AI Snapshot

✓ Train LoRA models on your images to teach Stable Diffusion your unique style or consistent character without expensive GPU time or ML expertise
✓ Use trained LoRA models with minimal VRAM overhead; combine multiple LoRAs to create complex, highly specific results
✓ Share your trained models on Civitai for community use or keep them private; build repeatable generation pipelines with your visual identity

Why This Matters

Stable Diffusion generates images but defaults to generic output. Fine-tuning through LoRA (Low-Rank Adaptation) training teaches the model your specific style. Instead of fighting the model to produce your aesthetic, the model learns your aesthetic. An artist whose style is distinctive can train a LoRA on 50 of their paintings; thereafter, every generation automatically incorporates their style.

This is transformative for creators with established visual identities. A designer with recognisable aesthetic can generate unlimited on-brand content. A character designer can train a model on their design style and generate infinite character variations in their signature style. Comic book artists, concept artists, illustrators—all benefit from style consistency at scale.

For creative entrepreneurs scaling production, LoRA training is competitive advantage. Teams generating multiple projects monthly can maintain visual consistency across projects by training LoRAs per style. This systematisation of creative output isn't replacing creativity; it's amplifying it by removing tedious consistency work.

How to Do It

Gather and prepare training images

Collect 30-100 images representing your style or subject. For style LoRA: 50 images of your artwork or reference images. For character LoRA: 50-100 images of the character in different poses and lighting. Prepare images: resize to 512x512 or 768x768, ensure consistent quality, remove corrupted images. Create captions for each image describing what's in it. Use tools like CLIP Interrogator to auto-caption, then manually refine.

Install LoRA training software

Use Kohya SS (Kohya's GUI for LoRA training) or sd-scripts. Kohya SS is most user-friendly. Download from GitHub, install requirements, run the GUI. You'll need a decent GPU (8GB VRAM minimum; 12GB+ recommended). The training setup takes 30 minutes.

Configure LoRA training parameters

In Kohya SS, configure: training data folder, output folder, epochs (typically 10-100 depending on dataset size), learning rate (0.0001 is standard starting point), batch size (4-8 depending on VRAM). For most users, keep defaults and adjust only epochs and dataset path. LoRA training is forgiving; you can iterate and retrain if results aren't satisfactory.

Train your LoRA model

Click 'Train' in Kohya SS. Training takes 10 minutes to 2 hours depending on dataset size and GPU power. The output is a .safetensors file (your trained LoRA, 50-500MB depending on size). Place this in your LoRA folder in Automatic1111. No additional setup needed.

Use your trained LoRA in generation

In Automatic1111, select your base model. In the prompt, reference your LoRA: 'a portrait of a woman wearing a red dress <lora:my_style_lora:0.8> watercolour style, high quality'. The :0.8 is the strength (0-1 scale). Lower values apply the LoRA subtly; higher values apply it strongly. Experiment with strength to find your preference.

Combine multiple LoRAs for complex results

Stack LoRAs for complex results: 'portrait <lora:character_lora:0.7> <lora:style_lora:0.6> <lora:lighting_lora:0.5>'. Start with 2-3 LoRAs; too many compete and produce poor results. Adjust strength values until you get desired balance.

Prompts to Try

✦ Style LoRA generation with strength tuning

Use trained LoRA: '{subject/scene} <lora:trained_style:0.8> {additional descriptors}'. Adjust strength (0.5-1.0) to control how strongly the LoRA style applies.

Images incorporating your trained style. Adjust strength to find sweet spot between style influence and prompt adherence.

✦ Character LoRA with pose and action variation

Use character LoRA: '{character_name} <lora:character_lora:0.9> {action/pose/setting}'. Vary actions and settings to generate consistent character in diverse contexts.

Consistent character in different scenarios. The LoRA ensures character consistency whilst varying context.

✦ Multi-LoRA combination for complex results

Combine LoRAs: '{subject} <lora:style_lora:0.7> <lora:character_lora:0.6> <lora:lighting_lora:0.5>'. Balance strength values; too many strong LoRAs conflict.

Complex outputs combining multiple LoRA influences. Iteratively adjust strength values until satisfied.

Common Mistakes

Training LoRA on too few images (less than 30) or poor quality images

Insufficient training data produces weak LoRA that poorly captures style. Poor quality images teach the model noise.

How to avoid: Minimum 30 images, ideally 50-100. Ensure all images are good quality, representative of the style. Remove outliers and corrupted images.

Using LoRA strength too high (above 0.9) or stacking too many LoRAs

Excessive strength overwhelms the prompt; LoRA takes over and you lose prompt control. Too many LoRAs conflict.

How to avoid: Start with 0.7-0.8 strength. Adjust up or down based on results. Combine maximum 3 LoRAs; test interactions with different strengths.

Poor image captions during LoRA training

Auto-captions are generic. Detailed, accurate captions help LoRA train more effectively on relevant image features.

How to avoid: Use CLIP Interrogator for initial captions, then manually refine. Captions should describe what's distinctive about each image.

Tools That Work for This

Kohya SS (Kohya's Stable Diffusion GUI) — LoRA training without ML expertise

Most user-friendly LoRA training interface. Handles all technical details; requires only data preparation.

CLIP Interrogator — Quick image captioning for training

Auto-generates captions for training images. Saves time on manual captioning.

Civitai.com — Sharing your LoRA or finding community-trained models

Community platform to share, discover, and download trained LoRAs.

Frequently Asked Questions

How long does LoRA training take?

Typically 10-60 minutes depending on GPU and dataset size. RTX 3060 trains in 20-30 minutes. Slower GPUs or larger datasets take longer. You can pause training anytime and resume later.

What's the difference between LoRA training and full fine-tuning?

LoRA is lightweight fine-tuning that modifies only a small number of model parameters (hence 'Low-Rank'). Full fine-tuning modifies all parameters, requires 10x more computation, and consumes 10x more disk space. For most uses, LoRA is superior: faster, cheaper, more shareable.

Can I sell images generated with my trained LoRA?

Yes. You own the images you generate. However, check Stable Diffusion's licence and your jurisdiction's copyright law. Most commercial use is allowed, but consult legal guidance for your specific region.

Next Steps

Prepare 50-60 images representing your style or subject. Train one LoRA and test results. Adjust caption quality and retrain if needed. Once satisfied, train a second LoRA for a different style or subject. Experiment combining LoRAs. After 2-3 trained models, you'll have full proficiency.

Encode your unique aesthetic into a LoRA and generate unlimited artwork in your signature style.