How to Create Talking AVATAR to Explode Your Views!

The Zinny Studio
28 Jan 202425:17

TLDRThis tutorial walks viewers through the process of creating a talking AI Avatar for YouTube channels using Mid Journey and Canva. It covers generating consistent facial expressions, editing with Canva, and adding elements like a teacup. The guide also includes creating a voiceover with 11 Labs, animating the avatar with Synthesia, and upscaling the final video with CapCut for higher quality. The goal is to create engaging content for faceless YouTube channels or kids' storybooks.


  • 😀 AI faceless YouTube channels have gained popularity with the use of talking AI Avatars as their visual representation.
  • 🖼️ The process of creating an AI Avatar involves generating consistent facial features using a tool like M Journey for aesthetics.
  • 🎨 The script describes using specific prompts in M Journey to generate images of a character with varying facial expressions.
  • 🔄 It's important to upscale the generated images and adjust them to ensure they meet the desired aesthetic standards.
  • 👤 The script emphasizes the need for the character to have a neutral or slight smile for natural animation in the final Avatar.
  • 📸 Techniques for editing and refining the Avatar image are discussed, including removing unwanted elements and adjusting details like clothing.
  • 🎙️ The tutorial covers generating a voiceover for the Avatar, either using one's own voice or AI voice tools like 11 Labs.
  • 🎬 Animation and lip-syncing of the Avatar are achieved using tools like Didg, which requires the uploaded audio and the Avatar image.
  • 🖌️ Additional editing may be performed in Canva for final adjustments to the Avatar image before animation.
  • 🔍 The importance of selecting images where the character is looking straight into the camera for a natural appearance in animations is highlighted.
  • 📚 The process concludes with the suggestion to upscale the final video to 4K using tools like CapCut for higher video quality.

Q & A

  • What is the main purpose of the video script?

    -The main purpose of the video script is to guide viewers through the process of creating a talking AI Avatar for YouTube channels, which can help to increase viewership.

  • What tool does the script recommend for generating the initial images for the AI Avatar?

    -The script recommends using 'M Journey' for generating the initial images because it provides the desired aesthetics compared to other generative AI image tools.

  • How many facial expressions are suggested to generate for the main character?

    -The script suggests generating at least five different facial expressions for the main character.

  • What is the purpose of upscaling the images in the script?

    -Upscaling the images is done to enhance the quality and resolution of the images, making them suitable for use in videos or other high-definition formats.

  • What aspect ratio should be used for the final Avatar image according to the script?

    -The script suggests using an aspect ratio of 16:9 for the final Avatar image, which is standard for YouTube videos.

  • How can one add elements like a teacup to the Avatar's scene in the script?

    -The script describes using the 'very Regen' option in 'M Journey' to add elements like a teacup to the scene by drawing a selection and typing in the desired element.

  • What editing tool is used to clean up the image before animation?

    -The script mentions using 'Canva' to edit the image, remove unwanted elements, and prepare it for animation.

  • How does the script handle the voiceover for the AI Avatar?

    -The script suggests recording a natural voiceover or using '11 Labs' to generate a realistic AI voiceover for the Avatar.

  • What animation tool does the script prefer for lip-syncing the AI Avatar?

    -The script prefers using 'D-ID' for animating the AI Avatar and lip-syncing the voiceover.

  • How can the final video quality be improved after animation?

    -The script recommends using 'CapCut' and its video upscaler feature to improve the video quality from 1080p to 4K after animation.



🎨 Creating AI Avatars with Midjourney

This paragraph outlines the process of generating images for an AI avatar using Midjourney, a generative AI image tool. The creator emphasizes the importance of generating consistent facial characteristics for the avatar and demonstrates how to use prompts to guide the AI in creating specific expressions. The creator also discusses the challenges of getting varied expressions and the need for upscaling the images to ensure quality and consistency.


🖼️ Refining Avatar Images and Setting Preferences

The second paragraph focuses on refining the generated images by zooming in and using tools like the sniping tool to capture specific facial expressions. It details the steps to upload these expressions as reference images and create a preference set in Midjourney for consistent character creation. The creator also explains how to test the preference set and make adjustments to achieve the desired avatar look, highlighting common mistakes to avoid during this phase.


🔄 Generating Avatars with Reference Images and Inpainting

In this paragraph, the process of generating avatars using existing reference images is explored. The creator shows how to upload a previous avatar and use it as a base for generating new images with specific features. The paragraph also introduces the concept of inpainting within Midjourney to add or remove elements from the background, such as a teacup or altering the dress, to customize the avatar further.


✂️ Editing and Preparing the Avatar Image

The fourth paragraph describes the steps taken to edit the generated avatar image using Canva. This includes removing unwanted elements like a paper or a mug, adding a logo, and adjusting the image to fit the YouTube video dimensions. The creator also discusses the importance of image quality and the process of upscaling the image for better resolution before moving on to the next steps.


🎙️ Creating Voiceovers and Animating the Avatar

This paragraph explains how to generate voiceovers for the avatar, either by using one's natural voice or an AI voice from 11 Labs. The creator then moves on to the animation process, detailing the use of a tool called 'did' for lip-syncing the avatar to the voiceover. The paragraph concludes with the steps to create and download the animated video, emphasizing the importance of quality and the option to upscale the video to 4K using CapCut.


📚 Conclusion and Additional Resources

The final paragraph wraps up the tutorial by summarizing the tools used and the overall process of creating talking avatars for YouTube videos or other media. The creator also mentions the potential for creating characters for children's storybooks and invites viewers to check out additional resources or other videos for more information on the topic.



💡Talking AI Avatar

A 'Talking AI Avatar' is a digital character that represents the face of a YouTube channel or other digital media, capable of mimicking human speech. In the context of the video, the avatar serves as a visual focal point for the channel, enhancing viewer engagement. The script describes the process of creating such an avatar, which is central to the video's theme of boosting channel views.

💡M Journey

M Journey is a generative AI image tool mentioned in the script. It is used to generate aesthetic images for the avatar. The tool's ability to create consistent facial features is highlighted as crucial for developing the character's identity, which is a key step in the avatar creation process.

💡Facial Expressions

Facial expressions are the various looks or emotions that a character can display, such as smiling, sad, or surprised. The script emphasizes the importance of generating multiple facial expressions for the same character to create a dynamic and engaging avatar that can convey different emotions.


Upscaling in the context of the video refers to the process of increasing the resolution of an image to make it clearer and more detailed. The script describes upscaling as a necessary step after generating the avatar's images to ensure they are of high enough quality for use in videos.

💡Aspect Ratio

Aspect ratio is the proportional relationship between the width and height of an image or screen, described in the script as 16 by 9, which is a common ratio for YouTube videos. It is important for ensuring that the avatar's image fits correctly within the video frame and maintains its clarity.


In-painting is a technique used in image editing to fill in or remove parts of an image. The script mentions using in-painting within M Journey to add elements like a teacup to the avatar's scene, enhancing the avatar's environment and overall visual appeal.


Canva is an online design platform used in the script for editing the avatar's image. It allows for the removal of unwanted elements and the addition of logos or other design elements. Canva is integral to customizing the avatar's appearance and preparing it for animation.

💡Voice Over

A voice over is the spoken word component of a video, which can be either a natural voice recording or generated using AI, as mentioned in the script. The voice over is essential for giving the avatar a voice and making it 'talk', which is a central aspect of creating a talking AI avatar.

💡11 Labs

11 Labs is an AI tool for generating realistic voice overs, as referenced in the script. It allows users to select from various voices or create custom ones, which can then be used to give the avatar a speaking role in videos, contributing to the avatar's interactive and engaging nature.

💡Lip Syncing

Lip syncing is the process of matching the movements of the lips in a video with the corresponding speech sounds. The script discusses using a tool like Did to animate the avatar by syncing its mouth movements with the voice over, creating a more lifelike and engaging video experience.

💡Video Upscaling

Video upscaling is the process of increasing the resolution of a video for better quality. The script mentions using CapCut for video upscaling to enhance the final video quality to 4K, ensuring that the final output meets high-quality standards for viewers.


AI faceless YouTube channels have gained popularity with the use of talking AI Avatars as their face.

M Journey is recommended for generating aesthetically pleasing images for the Avatar.

Creating a consistent facial character involves generating various facial expressions for the same character.

Upscaling images in M Journey is essential to refine the facial expressions of the Avatar.

Using the 'very subtle' option in M Journey helps to generate different facial expressions.

Sniping tool can be used to zoom in and save specific facial expressions for the Avatar.

Creating a character set in M Journey allows for consistent Avatar creation across images.

Testing the character set in M Journey helps to visualize the Avatar before finalizing.

Selecting an image where the character looks straight into the camera is crucial for Avatar animation.

Using a reference image can simplify the Avatar creation process in M Journey.

In-painting in M Journey allows for adding or removing elements from the Avatar's background.

Canva is used for editing the Avatar image, including removing unwanted elements and adding logos.

11 Labs is a preferred tool for generating realistic AI voices for the Avatar.

D-ID is the preferred lip-syncing tool for animating the Avatar with the voice-over.

Upscaling the final video to 4K using CapCut enhances the video quality for YouTube.

The process of creating a talking Avatar involves multiple steps including image generation, editing, voice-over creation, and animation.

Consistent character creation is not only for YouTube channels but also applicable for creating children's storybooks.