InvokeAI - AI Image Prompting

4 Dec 202325:35

TLDRThe video script delves into the intricacies of image prompting, demonstrating how to combine text and image prompts with control nets and IP adapters to create innovative concepts. The process involves experimenting with various weights and parameters to iterate and refine the output, ultimately achieving a desired artistic style and concept fusion. The video showcases a journey from concept to creation, emphasizing the importance of iteration and the use of available tools to refine and achieve the final visual outcome.


  • 🎨 Image prompting can be combined with text prompts to create new ideas that merge elements from multiple images.
  • 🚗 The example given involves transforming a car with an unexpected material concept, like ice, to push image prompting to its potential.
  • 🔍 Control nets are used to extract and emphasize specific features from images, like the structure of a car.
  • 🔄 IP (Image Processing) adapters are utilized to introduce concepts or styles from one image into another.
  • 📊 Weights and thresholds play a crucial role in determining the influence of the IP adapter and control net on the final output.
  • 🤖 Experimentation with different settings is key to achieving desired results, as the process requires iteration and refinement.
  • 🎯 The goal is to find a balance between maintaining the core structure and introducing new concepts or styles.
  • 🌐 The process can be used to transform artistic styles, as demonstrated by turning a mystical woman illustration into a more realistic image.
  • 🔄 Iterative adjustments can help refine the output, bringing it closer to the desired concept each time.
  • 🛠️ Tools like control nets, image to image, and canvas manipulation provide various ways to control and adjust the final image.
  • 📈 The workflow is adaptable and can be automated for those using IP adapters in their creative processes.

Q & A

  • What is the main focus of the video?

    -The main focus of the video is to explore the depth of image prompting and how to use image prompts in combination with text prompts and control nets to create new ideas and push the potential of image prompting.

  • What is the initial object the speaker tries to work with?

    -The initial object the speaker tries to work with is a car structure.

  • How does the speaker intend to modify the car structure?

    -The speaker intends to modify the car structure by imbuing it with an unexpected material concept, specifically an ice sculpture.

  • What role does the control net play in the process?

    -The control net is used to extract the structure of the car, allowing the speaker to work with the default processing without altering the thresholds.

  • What is an IP adapter and how is it used in the process?

    -An IP adapter is a tool used to bring in the concept or style from another image into the current working image. The speaker uses it to introduce the ice concept into the car structure.

  • What is the significance of the weight setting in the IP adapter?

    -The weight setting in the IP adapter determines the influence of the concept on the final output. Lower weights (3 to 4.5) slightly influence the output, while higher weights (6 to 7) have a major impact on the content and can even drive the prompt without the need for a text prompt.

  • How does the speaker address the issue of pixelation and jagged edges in the image?

    -The speaker addresses the issue by adjusting the denoising process, giving it more freedom to avoid artifacts caused by the control net doing too good of a job at capturing details.

  • What is the final concept the speaker achieves with the car structure?

    -The final concept the speaker achieves is a car made of ice with an aerodynamic sports car look, maintaining the icy vibe and translucency while incorporating the structure and style of a car.

  • What other concepts does the speaker explore after the car structure?

    -After the car structure, the speaker explores the concept of a mystical woman, aiming to imbue the core concept into the final generation and make it more photorealistic.

  • How does the speaker use the combination of IP adapters, control nets, and text prompts to refine the image?

    -The speaker uses a combination of IP adapters to bring in concepts or styles, control nets to maintain the structure, and text prompts to specify the desired elements. By adjusting the weights and using iterative adjustments, the speaker refines the image towards the desired concept.

  • What is the key takeaway from the video?

    -The key takeaway from the video is the process of iterating and refining image prompts using a combination of tools like IP adapters, control nets, and text prompts to achieve a desired concept or style in the final output.



🚗 Exploring Image and Text Prompts

The paragraph discusses the process of using image and text prompts together to create new concepts and ideas. The speaker uses the example of transforming a car into an ice sculpture by combining control net and IP adapter. The importance of adjusting weights and thresholds to achieve the desired output is emphasized, and the process of iteration is highlighted to refine the concept and achieve the full potential of image prompting.


🏎️ Iterating Towards the Ice Car Concept

The speaker continues to work on the ice car concept by experimenting with different settings and adjustments. The goal is to maintain the car's structure while introducing the ice material concept. Various techniques are discussed, such as using different IP adapters and adjusting the denoising strength. The speaker also considers the impact of weights on the final output and how they can influence the level of detail and concept infusion.


🎨 Balancing Style and Content

In this section, the speaker explores the balance between style and content in image generation. The focus is on using IP adapters and control nets to bring in the desired artistic style while keeping the core content. The speaker experiments with different weights and settings to achieve a more photorealistic output, discussing the trade-offs between structure and detail. The process involves fine-tuning the image to create a final piece that combines the desired elements effectively.


🖌️ Enhancing Realism with Denoising and Control Nets

The speaker delves into the techniques of enhancing realism in image generation by manipulating denoising strength and control nets. The goal is to introduce variability in details while maintaining the original image structure. The paragraph describes the process of using control nets to control structure and denoising to bring in pixel data. The speaker also discusses the use of different inputs, such as IP adapters and control nets, to achieve the desired look and feel in the final image.


👁️‍🗨️ Refining the Mystic Sea Concept

The speaker focuses on refining the 'Mystic Sea' concept by using various image prompts and control mechanisms. The process involves adjusting the image, IP adapter, and control net to achieve a more realistic and stylized output. The speaker experiments with different weights and denoising strengths to find the right balance between artistic style and content representation. The goal is to create a final image that captures the essence of the 'Mystic Sea' concept while incorporating the desired structural elements and artistic style.


🌐 Sharing Creative Workflows

The paragraph concludes with the speaker encouraging the audience to explore and create using the discussed techniques. The emphasis is on the iterative nature of the creative process and the various tools available for refining image prompts and concepts. The speaker also mentions the importance of sharing and collaborating through communities like Discord. The goal is to inspire the audience to experiment with the tools and workflows presented, and to engage with others in the creative community.



💡Image Prompting

Image prompting is a technique used in the video to generate or refine images based on textual descriptions or other images. It is a core concept in the video, as the speaker discusses using it to combine text prompts with image prompts to create new and unique visual concepts. The process involves using tools like control nets and IP adapters to manipulate the structure and content of the images.

💡Control Net

A control net in the context of the video is a tool used to maintain the structural integrity of an image while allowing for modifications and iterations. It helps in controlling the level of detail and the overall composition of the image, ensuring that the core structure is preserved even when introducing new elements or concepts.

💡IP Adapter

An IP adapter in the video refers to a tool that infuses specific concepts or styles into an image. It works by extracting elements from a source image and applying them to the final output, allowing for the combination of different visual elements to create a new image that merges the characteristics of multiple inputs.


Denoising in the context of the video is a process that involves reducing noise or unwanted elements in an image to enhance its quality. It is used to control the level of detail and clarity in the final image, with higher denoising strengths leading to more detailed and refined outputs, while lower strengths allow for more flexibility and creative interpretations.

💡Concept Iteration

Concept iteration is the process of refining and improving upon an initial idea or concept through successive stages of development. In the video, this involves using a combination of tools and techniques to gradually align the final image with the desired outcome, making adjustments based on the results of each iteration.

💡Image to Image

Image to image is a process where the raw pixel data from one image is used as the basis for creating a new image. This technique allows for significant alterations to the original content while retaining some structural or stylistic elements, providing a flexible way to generate new visual concepts.

💡Aesthetic Styles

Aesthetic styles refer to the visual characteristics and artistic elements that define the look and feel of an image. In the video, the speaker discusses pulling specific styles from images and applying them to new content, such as infusing a retro wave detective style into a new image.

💡Text Prompt

A text prompt is a textual description used to guide the generation of an image. It contains specific terms and concepts that the user wants to see in the final output. In the video, text prompts are used in conjunction with image prompts to create detailed and complex visual concepts.


Photorealism is a visual quality where an image appears extremely realistic, resembling a high-quality photograph. In the video, the speaker aims to achieve photorealism in the final output by adjusting various parameters and using a combination of tools to refine the image's details and lighting.

💡Creative Inspiration

Creative inspiration refers to the process of generating new ideas or concepts based on existing content or external influences. In the video, the speaker uses creative inspiration to drive the iterative process of image generation, combining various prompts and tools to create unique and visually appealing images.

💡Unified Canvas

The unified canvas is a term used in the video to describe a workspace where the final image is composed and refined. It is a place where all the elements and adjustments come together, allowing the user to make detailed edits and touch-ups to achieve the desired look and feel.


Exploring the combination of image and text prompts to create new concepts and ideas.

Using control net and an SXL canny control net to extract the structure of a car.

Integrating the concept of an ice sculpture with the car structure.

Adjusting the IP adapter weight to influence the final output without overtaking the text prompt.

Experimenting with different levels of denoising to avoid pixelation and artifacting.

Merging multiple concepts, such as the car, ice, and sports car, to create a unique design.

Iterating towards a core concept by adjusting weights and inputs in the creative process.

Transforming an illustration of a mystical woman into a more photorealistic image.

Utilizing IP adapter and control net to bring in concepts or styles into the final output.

Exploring the difference between using IP adapter and image to image translation.

Adjusting denoising strength and control net to maintain structure while introducing variability.

Enhancing the image by focusing on specific elements like the face with a control net.

Creating a retro wave detective concept by pulling style from an artistic image.

Balancing the influence of text prompt and image prompt to achieve the desired concept.

Iterating and refining the concept through multiple generations and adjustments.

Achieving a final output that combines the style and structure of various inputs.