Easy AI Voice Cloning with KITS AI - Online Platform and API Usage

Jarods Journey
1 Mar 202432:14

TLDRThe video provides a comprehensive guide on using Kits AI, an online platform for voice cloning and conversion. The host demonstrates how to create an account, navigate the website, and utilize the platform's features, including training a voice with RVC, converting audio files, and using the API for programming integration. The video also compares the voice conversion quality of Kits AI with RVC, discusses the platform's pricing plans, and mentions the limitations of the free plan. The host concludes by thanking Kits AI for the collaboration and encourages viewers to try the service.


  • 🌐 KITS AI is an online platform offering services like RVC (Recurrent Voice Cloning) and UVR (Universal Voice Reverb).
  • 📱 Users can log in using Google, Discord, or other platforms to access the service.
  • 📚 The platform includes two main components: the website interface and the API for developers.
  • 🎓 The 'Train' tab is used for training an RVC voice, which is essential for voice cloning.
  • ⏱️ Training a voice model can be time-consuming, taking up to 11 hours as mentioned in the script.
  • 🔄 The 'Conversion' tab allows users to convert one voice to another, with options to adjust pitch and other settings.
  • 🎵 Users can utilize YouTube video links for voice conversion, which is a convenient feature.
  • 📈 The platform uses a pre-trained model for voice conversion, which is claimed to be better in pitch.
  • 💬 Text-to-speech functionality is available, offering a good quality output.
  • 🤖 The 'Blender' tool allows users to merge two different voices, creating a unique voice model.
  • 📊 The platform provides a vocal remover tool similar to UVR, which can separate vocals from instrumentals.
  • 💡 AI Mastering is a newer feature that, while not yet perfect, shows promise for future improvements.

Q & A

  • What is KITS AI and what does it offer?

    -KITS AI is an online service that consolidates voice-related technologies like RVC (Recurrent Voice Cloning) and UVR (Universal Voice Reverb) into a single platform. It allows users to train an RVC voice for voice cloning, convert one voice to another, and use its API for voice conversion without needing RVC or UVR on their local machine.

  • How can one get started with KITS AI?

    -To get started with KITS AI, one needs to visit their website, log in, and create an account using Google, Discord, or another preferred method. The service requires an internet connection and follows a subscription model with different plans and pricing.

  • What are the main features available on the KITS AI website?

    -The main features on the KITS AI website include a conversion tool, a training section for RVC voices, a library to manage trained models, and the ability to use YouTube videos for voice conversion. Users can also adjust pitch, semitones, and volume blend during the conversion process.

  • How does the KITS AI API work for voice conversion in a programming context?

    -The KITS AI API allows developers to integrate voice conversion into their programs. It involves making HTTP requests to the KITS AI servers with the appropriate API key, headers, and parameters to fetch voice models and perform conversions. The API supports methods to upload voice models, initiate conversions, and download the converted audio.

  • What are the costs associated with using KITS AI?

    -KITS AI operates on a subscription model with different plans. The basic plan at $9.99 per month offers 30 download minutes. The Creator plan provides unlimited download minutes, suitable for more extensive use. The free plan allows for conversions but does not permit voice cloning or file downloads.

  • How does KITS AI handle the storage of voice data?

    -All voice data and models are stored on KITS AI's servers. Users can train their voice models on the platform, and these models are then available for use in future conversions without needing to re-upload or store them locally.

  • What is the process for training an RVC voice on KITS AI?

    -To train an RVC voice, users upload voice files to the 'Train' tab on KITS AI, clean up vocals using Harmony and D Reverb settings if needed, name the model, and initiate the training process. Training can take a significant amount of time, depending on the complexity of the voice.

  • How can users utilize YouTube videos for voice conversion on KITS AI?

    -Users can input the URL of a YouTube video into the conversion page on KITS AI to use the video's audio for voice conversion. The platform extracts the vocals and allows users to apply various conversion settings before processing the conversion.

  • What are the advanced settings available during voice conversion on KITS AI?

    -Advanced settings during voice conversion include the ability to adjust the pitch, semitones, conversion strength, and volume blend. Users can also choose to remove instrumentals, reverb and delay from vocals, and backing vocals if necessary.

  • How does KITS AI's text-to-speech feature work?

    -KITS AI's text-to-speech feature allows users to input text, which is then converted into spoken audio using the trained voice models. The quality of the speech synthesis is claimed to be good, with pitch matching capabilities based on the pre-trained model used.

  • What is the process for changing voice models using KITS AI's blender?

    -The blender on KITS AI allows users to merge two different voice models, adjusting the blend ratio to create a new voice. This can be done with pre-trained models or with user-trained models, offering a way to customize the voice output.



🌐 Introduction to Kits AI and Its Features

The video introduces Kits AI, an online platform that consolidates tools like RVC (Resemblyzer Voice Conversion) and uvr (Melodyne Unvoicer) into one service. The host demonstrates how to access the website, sign up using various accounts, and navigate the interface. Two main topics are covered: the website itself and the API, which allows for voice conversion without needing RVC or uvr installed locally. The service requires an account and has associated plans and pricing. The left-hand side of the website provides access to conversion, training, and tools, as well as a library of trained voices. The training tab is where users can train an RVC voice for voice cloning, and the process is detailed, including the time it takes for training and the ability to clean up vocals and remove instruments. The host also discusses uploading and using pre-trained models, conversion capabilities, and the option to use YouTube videos for conversion.


📞 Exploring Download Minutes and Text-to-Speech

The host explains the concept of download minutes, which are consumed when downloading converted songs but not during conversions. The ability to switch between different voice models is highlighted, and a text-to-speech demonstration is provided using a YouTube prompt. The host also discusses the quality of the text-to-speech feature, comparing it to other services, and mentions the blender tool that allows users to merge two voices and adjust the blend ratio. Additionally, the host covers the vocal remover tool, similar to uvr, and provides a brief critique of the AI mastering feature, noting it's a relatively new feature with room for improvement.


🔗 Accessing Voice Models via API and Setting Up a Python Environment

The video script details how to access voice models using the Kits AI API. It guides the viewer through setting up a Python environment, installing necessary packages, and creating a new file for the project. The process includes obtaining an API key, setting up a request with the correct URL, headers, and parameters, and handling the response. The script also covers how to reference the API documentation for both POST and GET requests, how to build the header with the API key, and how to construct a parameters dictionary for fetching voice models.


📚 Parsing Voice Model Data and Preparing for Conversion

The host demonstrates how to parse the JSON response to extract voice model names and IDs, storing them in a dictionary for later use. The script outlines creating a POST request for voice conversion, setting up the necessary URL, headers, and data fields, including the voice model ID and conversion parameters. The process of sending a sound file as raw bytes and handling the server's response is explained. The host also shows how to check the response status code and retrieve the job ID for the conversion.


🔄 Executing the Conversion and Retrieving the Result

The script explains how to perform a voice conversion using the Kits API. It details setting up a new response variable for the POST request and checking the status code to ensure the conversion was successful. The host shows how to extract the job ID from the response and use it to make a GET request to retrieve the conversion result. The process includes handling the conversion data and output file URL, and finally, downloading the converted file into the current directory from where the script is run.


🎶 Comparing RVC Trained Voice with Kits AI Voice Conversion

The host conducts a comparison between an RVC trained voice and a Kits AI trained voice. The process involves running inference on an audio file with both systems and adjusting parameters to match equivalent settings. The host plays the converted audio from Kits AI first, followed by the RVC conversion, allowing the viewer to judge which one sounds better. The script also includes a demonstration of handling audio outside the range of the original voice model by adjusting the transpose and semitones.


💰 Kits AI Pricing Plans and Subscription Considerations

The video concludes with an overview of Kits AI's pricing plans. The host outlines the features and limitations of each plan, including download minutes, the ability to clone voices, and the number of audio slots available for composers. The free plan allows for conversions but not downloads, requiring an upgrade for that feature. The Creator plan offers unlimited download minutes, which is a significant difference from the Converter plan. The host also provides affiliate and non-affiliate links for signing up for Kits AI and thanks the viewers for their support.




KITS AI is an online platform that offers voice cloning and voice conversion services. It is the central focus of the video, where the host demonstrates how to use the platform for various voice-related tasks without the need for local installations of software like RVC or UVR. The platform is used to train and convert voices, making it a key tool for the video's demonstrations.

💡Voice Cloning

Voice cloning refers to the process of replicating a person's voice using AI technology. In the video, the host explains how to use KITS AI to train an RVC voice for voice cloning, which can then be used to convert one voice to another. This is showcased as a significant feature of the platform, allowing users to create synthetic versions of voices for various applications.


API, or Application Programming Interface, is a set of rules and protocols that allows different software applications to communicate with each other. The video discusses how users can utilize KITS AI's API for voice conversion within their own coding projects, making it a crucial aspect for developers looking to integrate voice conversion capabilities into their programs.


UVR, or Universal Vocal Remover, is a tool mentioned in the video that is used for removing vocals from songs. The host compares the capabilities of KITS AI's vocal removal features to those of UVR, noting that KITS AI provides a similar functionality for users looking to separate vocals from instrumentals in audio tracks.


RVC, or Real-Time Voice Cloning, is a technology that enables the real-time replication and conversion of voices. The video script discusses using RVC for training a voice model and then compares the results of using RVC with those from KITS AI's pre-trained models, highlighting the differences in pitch and quality.

💡Voice Conversion

Voice conversion is the process of changing one voice to another using AI. The video demonstrates how to perform voice conversion using both the KITS AI website and its API. It is a core feature of the platform, allowing users to change the pitch, tone, and other characteristics of a voice to match a different voice model.

💡Pre-trained Model

A pre-trained model in the context of the video refers to a machine learning model that has already been trained on a large dataset and can be used for specific tasks, such as voice conversion. KITS AI uses different pre-trained models for training voices, which is mentioned as a reason for the quality of the conversions it provides.


Text-to-speech (TTS) is the technology that converts written text into spoken words. The host of the video uses KITS AI to perform text-to-speech conversion, showcasing the platform's capability to generate speech from text inputs, which is an important feature for creating voiceovers and similar applications.

💡Vocal Remover

The vocal remover tool, as mentioned in the video, is a feature within KITS AI that allows users to remove vocals from songs, similar to UVR. It is used to demonstrate the platform's ability to isolate and remove certain elements from audio tracks, which can be useful for a variety of music production tasks.

💡AI Mastering

AI mastering is a process that uses artificial intelligence to enhance the quality of audio recordings. In the video, the host experiments with KITS AI's AI mastering feature, noting that while it may not be as advanced as some other tools, it still offers a valuable service for improving the overall sound of audio tracks.

💡Python Scripting

Python scripting is showcased in the video as a means to automate voice conversion tasks using KITS AI's API within a programming environment. The host walks through the process of writing a Python script to interact with the KITS AI API, demonstrating how developers can leverage the platform's capabilities for custom applications.


KITS AI is an online service that combines RVC (Recurrent Voice Cloning) and UVR (Universal Voice Reverb) into one platform.

Users can create an account using Google, Discord, or other platforms to access the service.

The platform offers both a website interface and an API for users who need voice conversion capabilities in their projects.

All data is stored on KITS AI, and there are various plans and pricing options available for using the service.

The 'Train' tab is used for RVC, allowing users to train a voice model for voice cloning purposes.

Voice conversion can take a significant amount of time, with one example taking 11 hours to train.

The 'Library' section allows users to store and manage their trained voice models.

KITS AI supports uploading of RVC.pth files for direct use in the conversion process.

YouTube videos can be used as input for voice conversion, showcasing the platform's versatility.

The platform provides pitch adjustment and other settings to fine-tune the conversion process.

Users are charged download minutes when they download converted files, not for the conversion process itself.

The 'Blender' tool allows users to merge two voices and adjust the blend ratio to create a unique voice model.

The 'Vocal Remover' feature uses UVR technology to separate vocals from instrumentals in a track.

AI Mastering is a newer feature that aims to improve the quality of audio tracks, though it may require further enhancements.

The platform's web UI offers a comprehensive set of features for voice conversion without the need for local installations.

Developers can integrate voice conversion into their Python scripts using KITS AI's API and the provided documentation.

The video demonstrates how to use the API for fetching voice models and performing audio conversions programmatically.

KITS AI offers different subscription plans, including a free plan with limitations on downloading converted files.

The video concludes with a comparison of voice conversion quality between RVC-trained and KITS AI-trained voices.

The presenter provides an affiliate link for signing up for KITS AI, allowing viewers to support the channel.