[stable diffusion 教程 2024] 身体姿势模仿 | 动作、表情、手势生成技巧

28 Feb 202407:04

TLDRIn this informative video, the presenter, Xiao Lin, introduces OpenPose, an AI tool that enhances Stable Diffusion's ability to generate images of characters performing specific actions and expressions. OpenPose, a plugin within ControlNet, allows users to upload a reference image and have Stable Diffusion mimic the pose and expression without replicating the original's appearance, clothing, or skin tone. Xiao Lin demonstrates various OpenPose preprocessors, such as OpenPose face and OpenPose hand, and explains how to fine-tune the generated pose with an editing tool. The video concludes with instructions on installing the ControlNet plugin and OpenPose models for a fully functional experience.


  • 🎨 AI-generated character motion imitation can be highly random, making it difficult to control specific actions or expressions with just prompt words.
  • 🚀 OpenPose is an AI model within the Stable Diffusion plugin ControlNet that allows for precise control over character actions and expressions.
  • 📸 OpenPose works by taking a reference image and describing the human body's movements and facial expressions with points and lines.
  • 🖌️ Stable Diffusion then uses this point-line model, combined with user-provided prompt words, to generate new images.
  • 🌟 OpenPose offers six different preprocessors to read actions from reference images, ranging from basic to full-body and facial expression analysis.
  • 💃 The demonstration involved using OpenPose to replicate a popular dance move, showcasing its ability to generate images of characters performing specific actions.
  • 🖼️ Testing OpenPose with a Mona Lisa reference image showed its capability in mimicking facial expressions, though the mysterious smile was challenging to replicate.
  • 👐 OpenPose hand processor was tested with an image showing clear hand gestures, effectively mimicking the gestures in the generated images.
  • 🤹‍♀️ The full OpenPose processor, along with its enhanced version DW_Openpose_full, significantly improved the accuracy of finger and facial expression details.
  • 🛠️ OpenPose provides a manual adjustment tool for the point-line model, allowing users to fine-tune the depiction of actions and expressions.
  • 🔧 Installation of ControlNet and OpenPose models is necessary for using OpenPose, with detailed steps provided in the script.
  • 📢 The presenter, 小林, encourages viewers to subscribe, like, and share for more content on Stable Diffusion and related topics.

Q & A

  • What is the main challenge with using AI to generate specific character actions?

    -The main challenge is the significant randomness involved when using AI models like Stable Diffusion or Midjourney to generate character actions. It's difficult to control a character to perform a specific action, such as dancing or mimicking a smile, solely through prompt words.

  • What tool does the speaker introduce to overcome the limitations of AI-generated character actions?

    -The speaker introduces OpenPose, an AI model within the Stable Diffusion plugin ControlNet, designed to control character actions and expressions precisely.

  • How does OpenPose work to mimic character actions?

    -OpenPose works by taking a reference image and using a preprocessor to describe the character's body movements and facial expressions with points and lines. Stable Diffusion then generates new images based on this point-line model and the user's input prompts.

  • What are the different preprocessors available in OpenPose?

    -OpenPose offers six preprocessors: OpenPose (basic), OpenPose face (for facial expressions), OpenPose hand (for hand and finger movements), OpenPose faceonly (only facial expressions), OpenPose full (a combination of all actions and expressions), and Dw openPose full (an enhanced version of OpenPose full).

  • How can users adjust the accuracy of the point-line model generated by OpenPose?

    -Users can manually adjust the point-line model using the Edit button in the OpenPose interface. This allows for fine-tuning of details, such as finger positions, to correct any inaccuracies in the model.

  • What is the process for installing the ControlNet plugin and OpenPose models?

    -To install the ControlNet plugin, users should go to the Extensions page in Stable Diffusion WebUI, select 'Install from URL', and input the ControlNet plugin's git repository address. After installation, users download the OpenPose model files from Huggingface and save them in the 'Extensions/sd-webui-controlnet/models' directory. Finally, restart the WebUI to access the ControlNet UI interface.

  • What was the first experiment conducted using OpenPose?

    -The first experiment involved using the OpenPose preprocessor to mimic a popular street dance move from a reference image. The user uploaded the image to ControlNet Unit 0, selected the OpenPose model, and input prompts to generate a new image of a Chinese girl dancing outside a restaurant.

  • How was the Mona Lisa's smile replicated in the experiment?

    -The Mona Lisa's smile was replicated using the OpenPose face preprocessor. The reference image of Mona Lisa was uploaded, and the model generated a new image of a Chinese woman with a similar facial expression, although the mysterious quality of the Mona Lisa smile was noted as difficult to replicate exactly.

  • What issue was encountered when using the OpenPose_full preprocessor?

    -The OpenPose_full preprocessor sometimes inaccurately described the actions in the reference image, such as transforming fingers that were spread out into a state of being entwined. This issue could be resolved by manually adjusting the point-line model.

  • How can users receive the prompt words used in the experiments?

    -Users can receive the prompt words by subscribing to the sharing service mentioned in the video description. Once subscribed, the prompt words for all experiments are sent to the user's email.

  • What is the significance of the OpenPose tool for AI-generated character actions?

    -OpenPose is significant because it allows for precise control over character actions and expressions in AI-generated images. It enables users to create images where characters perform specific actions and express emotions as directed, without copying the original character's appearance, clothing, or skin color.



🎭 Introduction to AI Motion Imitation with OpenPose

The paragraph introduces the concept of AI-generated character motion imitation, highlighting the challenges in controlling specific actions using prompt words with AI drawing tools like Stable Diffusion and Midjourney. The speaker, Xiao Lin, presents OpenPose, an AI model within the Stable Diffusion plugin ControlNet, which allows for precise control over character actions and expressions without replicating appearance, clothing, or skin color. The explanation includes the basic principles of OpenPose, its various preprocessors for different levels of action and expression capture, and a brief overview of the upcoming experimental segment.


🧠 Experimenting with OpenPose Preprocessors and Editing

This paragraph delves into the experimental phase of using OpenPose's preprocessors to capture and imitate actions and expressions from reference images. It covers the process of using ControlNet with Stable Diffusion to generate images based on OpenPose's point and line models. The paragraph also discusses the accuracy of OpenPose in detailing finger movements and facial expressions, and introduces a method for manually adjusting point line diagrams to correct inaccuracies. Finally, it outlines the installation steps for the ControlNet plugin and OpenPose model files.



💡AI-generated motion imitation

AI-generated motion imitation refers to the process where artificial intelligence algorithms are used to create visual representations of human movements or gestures based on a reference image or a set of instructions. In the context of the video, this technology allows users to control the actions and expressions of characters in generated images, such as making them dance or smile like a specific person, without replicating their appearance, clothing, or skin color.

💡Stable Diffusion

Stable Diffusion is an AI model used for generating images from textual descriptions. It is known for its ability to create realistic and diverse visual outputs based on the prompts provided by users. In the video, Stable Diffusion is highlighted as the primary tool that is enhanced by the OpenPose plugin to achieve more controlled and accurate motion imitation in the generated images.


ControlNet is a plugin for Stable Diffusion that allows users to have more control over the generated images by specifying certain attributes or actions. It works in conjunction with AI models like OpenPose to interpret reference images and translate them into detailed instructions for Stable Diffusion, resulting in images where the characters' actions and expressions match the user's desired output.


OpenPose is an AI model integrated into the ControlNet plugin that specializes in interpreting and replicating human poses and facial expressions from reference images. It breaks down the reference image into a point and line model, which is then used by Stable Diffusion to generate new images with the desired motion and expressions.


In the context of the video, a preprocessor is a component of the OpenPose model that processes the reference image to extract specific information about human poses and expressions. This information is then used to guide the Stable Diffusion model in generating images with the desired movements and expressions.

💡Point and line model

The point and line model is a representational technique used by OpenPose to describe the human body's posture and movements. It uses points to represent joints and lines to connect them, creating a simplified, stick-figure-like outline of a person's pose. This model is then used as a guide for Stable Diffusion to generate images with the same pose without copying the original's appearance.

💡Mona Lisa

The Mona Lisa is a famous painting by Leonardo da Vinci, known for the subject's enigmatic smile. In the video, it is used as a reference image for the OpenPose face preprocessor to demonstrate how the AI can capture and imitate the facial expression of the Mona Lisa, although it notes the challenge of replicating the painting's unique mysterious quality.

💡Gesture recognition

Gesture recognition refers to the ability of a system to interpret and understand human gestures, often used in AI for controlling devices or, as in this video, for generating images with specific hand movements. In the context of the video, gesture recognition is a key feature of the OpenPose hand preprocessor, which captures the details of hand and finger positions from a reference image.

💡Image generation

Image generation is the process of creating new images from scratch using AI models. It involves inputting data, such as textual descriptions or reference images, into an AI system, which then produces visual content based on that input. In the video, image generation is the main outcome of using Stable Diffusion with the OpenPose plugin and ControlNet, allowing users to create custom images with specific motions and expressions.


WebUI stands for Web User Interface, which is a platform or application that allows users to interact with software or services over the internet through a graphical interface. In the context of the video, the WebUI is the interface through which users interact with the Stable Diffusion and ControlNet models to generate images.


Installation in this context refers to the process of setting up and configuring software, such as the ControlNet plugin and OpenPose models, on a user's system to enable them to use the AI-generated motion imitation features. The video provides a step-by-step guide on how to install these components to utilize the full capabilities of the system.



使用Stable Diffusion和Midjourney生成人物动作时存在的随机性问题。

介绍OpenPose工具,它是Stable Diffusion的插件ControlNet中的AI模型,用于控制人物动作。


Stable Diffusion结合OpenPose生成的点线模型和用户提示词来创造新的图形。


OpenPose face和OpenPose hand预处理器分别用于读取面部表情和手部动作。

OpenPose full预处理器结合了所有动作和表情的读取。

DW openPose full是OpenPose full的增强版本,提高了动作和表情读取的准确性。



OpenPose hand预处理器能够模仿手势,尽管存在一些准确性问题。

OpenPose full和DW_Openpose full在手指动作和面部表情细节读取上具有明显优势。




小林频道持续关注Stable Diffusion相关技术,鼓励订阅点赞和转发。