Create AI Avatar Talking Videos

SmartShift AI
5 Jun 202304:51

TLDRThis video tutorial outlines a step-by-step process to create engaging AI avatar talking videos, which have gained significant popularity on platforms like Instagram. The video has amassed a substantial following due to its informative and practical content. The process begins with character generation using a text-to-image generator like Blue Willow, followed by script creation with the assistance of chat GPT. The script, focusing on life advice, is then transformed into audio using platforms like 11labs, which offers a variety of AI voices. After generating the audio, a video is created from the chosen avatar image and the audio file, using a dedicated website. The final step involves adding subtitles to the video using a video editor like CapCut, to enhance accessibility and viewer engagement. The video concludes with a motivational message, encouraging viewers to apply the knowledge gained to create their own content and remain consistent in their creative journey.


  • 📈 Start with character generation using a text-to-image generator like Blue Willow to create an AI avatar.
  • 💡 Use AI like chat GPT to generate a script; in this case, asking for life's golden rules.
  • 🎙️ Generate audio by pasting the script into a service like 11labs, choosing an AI voice, and downloading the audio.
  • 🔊 Consider historical context, such as the creation of the first model of the human vocal tract in 1779 for speech synthesis.
  • 📷 Upload the pre-generated avatar and audio to a video creation website to generate the video content.
  • 🎞️ Customize the video with subtitles using a video editor like CapCut, enhancing readability and viewer engagement.
  • 🌟 Choose bold fonts and adjust colors to make certain words stand out in the subtitles for added emphasis.
  • 📝 Ensure the final video resolution is high, such as 1080p, for professional quality.
  • 👍 Encourage embracing growth and self-improvement as continuous life journeys.
  • 🛡️ Stress the importance of cultivating resilience to navigate life's ups and downs.
  • 🤝 Highlight the value of fostering meaningful connections with positive and supportive individuals.

Q & A

  • What is the first step in creating an AI talking video?

    -The first step is generating a character. You can use a text-to-image generator like Blue Willow and choose a prompt to create the avatar.

  • Which platform gained 700,000 followers in two months by posting AI talking videos?

    -An unspecified Instagram page mentioned in the transcript gained 700,000 followers in just two months by posting AI talking videos.

  • How many videos did the Instagram page post to achieve such a rapid growth in followers?

    -The Instagram page posted a total of 40 videos to achieve the rapid growth in followers.

  • What is used to generate the script for the AI talking video?

    -Chat GPT is used to generate the script. You can ask it for various topics or themes, such as 'three golden rules to always remember in life'.

  • How can one generate audio for the AI talking video?

    -You can generate audio by using a service like 11labs, where you paste the script and choose from a variety of AI voices to create the audio file.

  • What is the name of the website where you can generate a video from an image?

    -The website is not explicitly named in the transcript, but it is implied that it is a platform where you can create a free account, upload an avatar, and generate a video with the uploaded audio.

  • How long does it typically take to generate a video after uploading the avatar and audio?

    -The video can typically be generated within a few seconds after uploading the avatar and audio.

  • What is the first golden rule to always remember in life as mentioned in the video?

    -The first golden rule mentioned is to 'Embrace growth' as life is a continuous journey of growth and self-improvement.

  • How can subtitles be generated for the AI talking video?

    -Subtitles can be generated using a video editor like CapCut. You import the video, select the aspect ratio, enable auto-captions, and then resize and customize the text as needed.

  • What is the recommended aspect ratio for adding subtitles in the video editor?

    -The recommended aspect ratio for adding subtitles in the video editor is 9 by 16.

  • What is the final step in the process of creating an AI talking video?

    -The final step is to export the video in 1080 resolution, ensuring it includes the generated subtitles and is ready for sharing or publishing.

  • What is the advice given for staying motivated and consistent in creating content?

    -The advice is to stay consistent in your journey to success, and to seek motivation and inspiration from engaging with the community, such as by liking and subscribing to content that provides value.



📈 Viral Video Creation: From Character to Script

This paragraph introduces the concept of creating AI talking videos that have gained significant popularity on social media platforms like Instagram. The speaker shares a success story of an Instagram page that quickly amassed 700,000 followers by posting such videos. The paragraph outlines the first step in the video creation process, which involves generating a character using a text-to-image generator like Blue Willow. The speaker provides a prompt example and explains the process of selecting and downloading the avatar image to proceed to the next stage.

📝 Script Generation with Chat GPT

The second step in the video creation process is generating the script. The speaker uses Chat GPT to obtain three golden rules to remember in life. The script is then copied and used for the next stage of the process. This paragraph emphasizes the importance of a compelling script in creating engaging and informative content.

🎙️ Producing Audio with AI Voices

The third paragraph focuses on generating audio for the video. The speaker suggests using the 11labs website to paste the script and choose from a variety of AI voices to narrate the content. The paragraph provides a historical context, mentioning the first model of the human vocal tract created in 1779, and offers an alternative platform for those who may not prefer the voices available on 11 Labs. The speaker guides the audience on how to download the audio file, ensuring the voice chosen aligns with the video's theme.

🎬 Creating the Video from the Image and Audio

The fourth step is about generating the video from the previously created image and audio. The speaker instructs the audience to open a specific website, create a free account, and use it to upload the avatar and audio file. The paragraph explains the process of adding the audio to the video and generating the final video product, which can then be downloaded and shared.

🖌️ Adding Subtitles for Accessibility

The final step outlined in the paragraph is generating subtitles for the video. The speaker recommends using the CapCut video editor to create a new project, import the video, and automatically generate subtitles. The paragraph provides instructions on resizing fonts, selecting bold fonts for emphasis, and changing colors for keywords to capture the viewer's attention. The speaker concludes with a motivational message, encouraging the audience to create content consistently and to engage with the video by liking and subscribing for more content.



💡AI talking videos

AI talking videos are a form of digital media where artificial intelligence is used to create a character that appears to talk and interact with the audience. These videos often use text-to-speech technology and animated characters to deliver content. In the script, the creation of such videos is the central theme, with the speaker detailing the process of making an AI talking video from character generation to final video output.

💡Instagram page

An Instagram page refers to a public profile on the social media platform Instagram where users can share photos and videos with their followers. The script mentions an Instagram page that gained significant popularity by posting AI talking videos, highlighting the impact of these videos on social media engagement.

💡Text-to-image generator

A text-to-image generator is a software tool that converts textual descriptions into visual images. In the context of the video script, Blue Willow is used as an example of such a generator to create an avatar for the AI talking video, demonstrating the technology's role in character design.

💡Chat GPT

Chat GPT, likely referring to a chatbot or AI language model, is used in the script to generate a script for the AI talking video. It is an example of AI technology being used to assist in content creation, specifically in developing the narrative or dialogue for the video.


11Labs is mentioned in the script as a website where one can generate audio for the AI talking video using a script. The platform offers various AI voices, allowing creators to choose a voice that fits the character of their video, which is a crucial step in bringing the AI character to life.

💡Human vocal tract model

The human vocal tract model refers to an early attempt at speech synthesis, created in 1779, which could produce vowel sounds. This historical reference in the script illustrates the evolution of speech synthesis technology, leading up to the advanced AI voices available today for creating talking videos.

💡Video editor

A video editor is software used to edit video content, such as adding subtitles, adjusting visuals, and fine-tuning the final output. In the script, CapCut is used as an example of a video editor for generating subtitles, which is an important step in making the video accessible and engaging for a wider audience.

💡Auto captions

Auto captions are a feature in video editing software that automatically generates subtitles based on the audio of the video. The script describes using auto captions to add subtitles to the AI talking video, which not only makes the content more accessible but also enhances the viewer's understanding of the spoken words.


A font refers to a specific size, weight, and style of typeface used in printed or digital media. In the context of the video script, the speaker discusses changing the font size and style for the subtitles to make them more readable and visually appealing, which is an important aspect of video design.


Consistency in this context refers to the continuous and regular creation and sharing of content, particularly in the journey to success on social media platforms. The script encourages viewers to stay consistent in creating and posting AI talking videos to achieve their goals, emphasizing the importance of regular engagement with the audience.


Resilience is the ability to recover quickly from difficulties or adapt to change. In the script, it is one of the 'golden rules' provided by Chat GPT, suggesting that building resilience is crucial for navigating life's ups and downs, which ties into the overarching theme of personal growth and improvement.

💡Meaningful connections

Meaningful connections refer to the positive and supportive relationships one builds with others. The script highlights the importance of surrounding oneself with people who inspire and uplift, which is a key aspect of fostering a positive environment for personal and professional development.


An Instagram page gained 700,000 followers in two months by posting AI talking videos.

A total of 40 videos were posted, each receiving millions of views.

Some videos reached up to 50 million views.

The tutorial explains how to create AI talking videos.

Step 1 involves generating a character, using Blue Willow for instance.

A text-to-image generator is used to create an avatar.

Step 2 is about generating a script with the help of chat GPT.

Three golden rules for life are requested from chat GPT.

Step 3 involves generating audio from the script using 11labs website.

AI voices can be chosen and the audio file can be downloaded.

11 Labs offers a variety of AI voices.

An alternative for realistic natural-sounding voices is mentioned.

Step 4 is to generate a video from the image using a specific website.

The website also has an AI audio generator.

Users can upload their own audio file and generate a video.

Step 5 is generating subtitles using a video editor like CapCut.

Auto-captions can be generated and customized for size and style.

The final video is exported in 1080 resolution.

Three golden rules shared: Embrace growth, cultivate resilience, and foster meaningful connections.

Consistency is key in the journey to success.