【Stable Diffusion】LoRA炼丹 超详细教学·模型训练看这篇就够了

24 Aug 202341:14

TLDRAI小王子's tutorial offers an in-depth guide on training Lora models for personalized AI avatars or IP images. He explains the difference between Lora and checkpoint models, the factors influencing image rendering, and the importance of selecting diverse training images. The video covers the technical aspects of training, including software requirements, optimal image quality and quantity, and the significance of training steps and epochs. It also provides practical advice on troubleshooting and optimizing model training for best results.


  • 🧙‍♂️ The tutorial focuses on training a custom Lora model for AI-generated images, aiming to improve the quality and specificity of the outputs.
  • 🎨 Understanding the difference between Lora and checkpoint is crucial; Lora is likened to a designer's draft, while the checkpoint is the overall style designer.
  • 🖌️ The quality of rendered images depends on factors like the checkpoint, Lora, keywords, parameters, and the AI's interpretation of these elements.
  • 🌐 Lora models can be applied across different base models, allowing for versatility in style application.
  • 📸 Selecting a diverse range of high-quality images for training is essential to capture various expressions, angles, and features.
  • 🛠️ The number of images needed for training varies based on complexity; simple subjects may need as few as 15 images, while complex scenes might require over 100.
  • 🔧 Training involves adjusting parameters like step count and Epoch, which influence how thoroughly the AI learns from each image.
  • 💻 Hardware requirements for training include a preference for Nvidia GPUs and specific Python versions and software installations.
  • 🔄 The training process involves using tools like Kohya_ss and Stable Diffusion WebUI for image preprocessing and model training.
  • 📈 Monitoring training progress and loss rates is important to gauge the effectiveness of the training and make adjustments as needed.
  • 🎉 The AI assistant encourages sharing trained Lora models within communities for feedback and further improvement.

Q & A

  • What is the main focus of the AI小王子's tutorial?

    -The main focus of the AI小王子's tutorial is to teach viewers how to train their own Lora models for creating personalized AI models, such as AI models or IP images, with an emphasis on improving the quality of the generated images.

  • What are the key factors that determine the output of rendered images?

    -The key factors that determine the output of rendered images are checkpoint, Lora, keywords, and parameters. The checkpoint influences the overall style the most, Lora can be understood as a design draft, keywords represent the client's requirements, and parameters are akin to the designer's experience and skills.

  • What is the difference between training a checkpoint and training a Lora?

    -Training a checkpoint involves using a large model and requires more space and time due to its size, typically at least 2GB. It can be trained by fusing with other similar models. On the other hand, training a Lora, which is smaller and requires less space (around 100MB), can be applied to different big models and is more flexible, acting as a design draft or an assistant to the big model.

  • How does one select the right images for training their Lora?

    -For effective Lora training, one should select images with various facial expressions, different compositions (like close-ups, full body shots, and various angles), distinct character features (such as different colored and styled clothes), diverse backgrounds, and complex lighting. High-quality images with higher pixel counts are preferred, but care must be taken not to use images that are too high resolution as they can slow down rendering times.

  • What is the recommended number of images for training a simple and complex subject in Lora?

    -For training a simple subject like a character Lora, at least 15 images are recommended. For more complex subjects like buildings or scenes, at least 100 photos are suggested.

  • How does the number of training steps per image affect the Lora training?

    -The number of training steps per image determines how many times the AI trains on each image. More steps mean the AI learns more details from the image. However, too many steps can lead to overtraining, while too few may result in insufficient training. The optimal number of steps varies and should be determined through testing and trial.

  • What are the system requirements for training Lora models?

    -The system should have an Nvidia or AMD GPU, with Nvidia GPUs being preferred due to better compatibility and performance. The amount of VRAM available will determine the training resolution, with common resolutions being 512x512 or 768x768. Additionally, the system should have Python 3.10 installed, Git for version control, and Visual Studio for development.

  • What software and plugins are needed for Lora training?

    -The necessary software and plugins for Lora training include Kohya_ss for training Lora models, an additional networks plugin, cudnn training accelerator for Nvidia RTX 30 series and above GPUs, and a preset file for ease of use.

  • How does the training process work in Kohya_ss?

    -The training process in Kohya_ss involves installing the required dependencies, setting up the Kohya_ss environment, creating a new folder for the training, and using the command line to clone the repository and set up the training environment. The training is then conducted through the Kohya_ss GUI, where the user configures the training parameters, selects the images for training, and initiates the training process.

  • What is the significance of the 'enable buckets' option in the training parameters?

    -The 'enable buckets' option allows the AI to automatically crop the images based on what it deems most important for learning. This feature is crucial for the AI to focus on the key elements of the images and improve the quality of the trained Lora model.

  • How can one evaluate the quality of the trained Lora models?

    -The quality of the trained Lora models can be evaluated by looking at the loss rate, which is a score given by the AI to reflect the training parameters. A lower loss rate is generally better, but it's not the only factor. The final decision should be based on whether the model meets the aesthetic requirements and if the generated images are satisfactory.



🧙 Introduction to AI Model Training

The paragraph introduces the concept of AI model training, specifically focusing on the creation of a personalized AI model through a process referred to as '炼丹'. The speaker, AI小王子, offers guidance for those seeking to improve their AI models, suggesting that the tutorial will delve into simplifying complex parameters and overcoming common issues in model training. The discussion includes the differences between Lora and checkpoint training, the factors that determine the quality of rendered images, and the importance of selecting appropriate images for training.


📚 Detailed Training Steps and Considerations

This section provides a detailed explanation of the training process, emphasizing the importance of selecting the right images for training a Lora model. It discusses the significance of various facial expressions, composition, character features, lighting, and image quality. The speaker also addresses the number of images needed for training, the concept of training steps, and the balance between image resolution and rendering time. Additionally, it touches on where to find images for training and the recommended number of steps for different types of models.


💻 Software Requirements and Installation

The speaker outlines the software and hardware requirements for training Lora models, recommending Nvidia graphics cards and specific Python versions. The paragraph details the installation process of necessary programs like Kohya_ss, additional networks plugin, and cudnn training accelerator. It provides step-by-step instructions for setting up the environment, including commands to be executed in PowerShell, and addresses potential issues that may arise during installation.


🖼️ Preparing for Model Training

This segment focuses on the preparation phase before starting the Lora model training. It explains the creation of specific folders for organizing training images and the process of assigning tags to images using the Stable Diffusion web interface. The speaker discusses the importance of creating a sufficient number of folders with detailed tags for different elements of the image, such as body parts and clothing, and the calculation of training steps per image based on the total number of images and desired training steps.


🛠️ Configuring Training Parameters

The paragraph delves into the configuration of training parameters within the Kohya_ss interface. It covers the selection of base models, the importance of choosing the correct model version and resolution, and the setup of folders for images, logs, and models. The speaker advises on the use of buckets for image cropping and provides a comprehensive guide on adjusting various parameters such as training batch size, epochs, learning rates, and network dimensions to optimize the training process.


📈 Analyzing Training Results and Model Selection

This section discusses the analysis of training results using TensorBoard and the selection of the best Lora model. It explains how to interpret the loss rate and the process of comparing different models using additional networks. The speaker provides insights into evaluating the models based on facial and clothing details, as well as the importance of personal judgment in selecting the most suitable model. The paragraph concludes with guidance on copying the trained Lora models to the appropriate directories.


🎉 Conclusion and Encouragement for Further Learning

The speaker concludes the tutorial by reflecting on the effort put into creating the content and encourages viewers to practice the techniques learned. The paragraph highlights the importance of community engagement through comments and server groups for exchanging ideas and improvements. The speaker also mentions the availability of common troubleshooting solutions on a Discord server for premium members and invites viewers to unlock this resource for further assistance.




The term 'AI小王子' refers to the speaker or creator of the video content, who presents himself as an expert or guide in the field of AI, particularly in the context of training AI models for image generation. This persona is used to establish authority and approachability in the tutorial setting.


In the context of this video, '炼丹' is a metaphor for the process of training AI models, specifically referring to the creation and fine-tuning of models for generating images. It is borrowed from ancient Chinese alchemy, where '炼丹' literally means 'smelting the elixir', symbolizing the transformation and creation of something valuable.


Lora, in this context, refers to a type of AI model used for image generation. It is described as a 'small model' or 'extension' that can be applied to different base models, allowing for the customization of image outputs with specific styles or features.


A checkpoint in AI training refers to a point during the training process where the model's state is saved. This saved state, often including the model's weights, can be used to resume training later or to generate outputs from the model at that particular stage of learning.


Parameters in the context of AI model training are the settings and values that define how the model learns from data. These can include learning rates, batch sizes, and other hyperparameters that influence the efficiency and outcome of the training process.


Rendering images in AI refers to the process of generating visual outputs based on input data, typically using a trained model. This process involves the AI interpreting the input and creating a new image that matches the desired style, content, or features.


Model fusion in AI training is a technique where multiple models or model components are combined to create a new model that leverages the strengths of its constituent parts. This can result in improved performance or the ability to generate images with a wider range of styles or features.


Keywords in the context of AI image generation are words or phrases that guide the AI in producing specific features or elements in the generated images. They act as instructions to the model, helping it understand what kind of image should be rendered.


An epoch in machine learning refers to a complete pass of the entire dataset during the training process. Multiple epochs allow the model to learn from the data multiple times, which can improve its performance and ability to generalize from the training data.


In the context of AI training, '步数' refers to the number of iterations or updates the model undergoes during the training process. Each step typically involves the model learning from a single batch of data, and the total number of steps determines the extent of the model's exposure to the training data.


A graphics card, or GPU (Graphics Processing Unit), is a critical hardware component in AI training. It performs the complex mathematical calculations required for deep learning algorithms, and its capabilities can significantly affect the speed and efficiency of the training process.


AI小王子 introduces a comprehensive guide on training Lora models for personalized AI avatars or IP images.

The course discusses the common issues faced when training models and offers solutions to simplify complex parameters.

Explains the difference between Lora and checkpoint, and when to use each for model training.

The impact of rendering factors such as checkpoint, Lora, keywords, and parameters on the final image output is detailed.

Provides insights on selecting the right images for training, emphasizing the importance of diverse expressions and compositions.

Discusses the role of Lora as a 'design draft' that can be applied across different base models.

Explains how to use multiple Lora in keywords for complex image generation, such as mixing expressions, clothing, and character Lora.

The tutorial covers the importance of image quality for training and the balance between high resolution and rendering speed.

Offers practical advice on where to find or create images for training, including using personal photos and动漫截图.

Provides a guideline on the number of images needed for training simple and complex subjects, ranging from 15 to 100 photos.

Explains the concept of training 'steps' and how they relate to the AI's learning process, comparing it to human attention span.

Gives recommendations on the appropriate training step numbers for different types of Lora, from 10-16 steps for二次元 to 50-100 steps for complex scenes.

Discusses the technical requirements for training, including the preferred Nvidia graphics cards and the impact of AMD cards on training speed and accuracy.

Provides a step-by-step guide on installing necessary software and plugins for Lora training, including Kohya_ss and additional networks plugin.

Explains the process of image preprocessing and tagging using Stable Diffusion web UI for better AI understanding of image content.

Shares tips on creating a balanced training setup by adjusting critical parameters such as learning rate, batch size, and epochs for optimal Lora output.

Advises on post-training evaluation using TensorBoard and the additional networks plugin to compare and select the best Lora models.