How To Transcribe Audio To Text (UPDATED Video Transcription Tutorial!)

Primal Video
3 Oct 202213:49

TLDRThe video tutorial provides a comprehensive guide on transcribing audio to text. It covers both free and paid tools, including built-in features on Windows and Mac, mobile voice typing, and software like Google Docs and Microsoft Word. The video also introduces for real-time transcription and Otter for meeting transcriptions. For post-recording transcriptions, Temi and Descript are recommended for fast AI-based services, while Rev offers high-accuracy human transcription. The tutorial concludes with a mention of video editing tools like Adobe Premiere Pro that may include transcription features, suggesting viewers check if their preferred software has this capability.


  • 💬 Use the Windows key + H to open voice typing on Windows and transcribe speech to text in any application.
  • 🍎 Enable Apple Dictation on Mac through System Preferences under Keyboard, then use the Control key twice to start dictation.
  • 📱 On iOS and Android, use the microphone icon on the keyboard to transcribe speech while typing in any text field.
  • ✅ Both Microsoft Word and Google Docs have built-in dictation features that are accurate and support punctuation and paragraphs.
  • 🌐 For a web-based solution, uses Google's speech recognition technology for free transcriptions.
  • 🐦 Otter is a comprehensive tool for real-time transcriptions, meeting management, and booking systems, with high accuracy.
  • 📈 Temi offers fast AI-based transcription services at 25 cents per minute with bulk processing capabilities.
  • 🎥 Descript is an all-in-one editing system for audio and video that allows you to edit content as if you were working with a text document.
  • 🤖 While most services use AI for transcription with about 85-90% accuracy, Rev provides human transcriptions with up to 99% accuracy.
  • 📊 Rev also integrates with YouTube for automatic captioning and subtitles, and supports translation to other languages.
  • 🎬 Some video editing tools like Adobe Premiere Pro are now including transcription features, which can be beneficial if they align with your workflow.

Q & A

  • What are the transcription tools available on Windows computers?

    -On Windows, you can use the built-in voice typing feature by pressing the Windows key and the letter H. This allows you to dictate into any text box, document, or writing app, and it supports punctuation and paragraph control.

  • How do you enable Apple Dictation on a Mac?

    -To enable Apple Dictation on a Mac, go to System Preferences, click on Keyboard, then Dictation, and enable it. The default keyboard shortcut to activate dictation is pressing the control key twice, but this can be customized.

  • How does voice typing work on mobile devices for both iOS and Android?

    -On mobile devices, you can enable voice typing by opening any document or text field where you can type. There will be a microphone icon on the keyboard, which when pressed, allows you to dictate and have it transcribed automatically.

  • What is the process of using for transcriptions?

    -To use, you go to the website, select your language, allow access to your microphone, and press start to begin dictating. The tool uses Google speech recognition technology to transcribe your speech. You can copy and paste the transcribed text once finished.

  • What are the features of Otter beyond speech-to-text?

    -Otter is a meeting management and booking system that also offers real-time transcription of speech from multiple people. It can automatically detect different speakers and transcribe their speech accurately and quickly.

  • How does Temi differ from other transcription services?

    -Temi is an AI-based transcription service that offers fast transcription at a rate of 25 cents per minute. It allows bulk transcription and provides a visual representation of the text with the video, highlighting uncertain areas in orange for easy review.

  • What is unique about Descript as a transcription tool?

    -Descript is an end-to-end editing system for podcasts, videos, and screen recordings that also offers transcription. It allows users to edit videos as if they were text documents, making video editing accessible to anyone.

  • What is the accuracy level of AI transcribing services?

    -AI transcribing services typically have a maximum accuracy of around 85 to 90%, depending on the platform.

  • How does Rev ensure a higher level of transcription accuracy?

    -Rev offers transcription services done by real humans, which ensures a higher level of accuracy, up to 99%. This service is more expensive, costing 1.50 per minute.

  • What additional services does Rev provide for video content creators?

    -Rev integrates directly with YouTube, allowing users to create accurate captions and subtitles for their videos. It also supports translating videos and audio to other languages and offers live audio transcription for Zoom calls.

  • Are there any video editing tools that have built-in transcription features?

    -Yes, some video editing tools like Adobe Premiere Pro have started to include transcription tools. It's recommended to search for your specific editing tool combined with 'transcribe' to see if it offers this feature.

  • What is the recommended way to transcribe audio or video files if you need high accuracy?

    -For high accuracy transcriptions, using a service like Rev, which employs human transcribers, is recommended. This option, while more expensive, ensures an accuracy rate of up to 99%.



😀 Free Speech-to-Text Transcription Tools

The paragraph introduces various free transcription tools available for speech-to-text conversion. It covers built-in features on Windows and Mac computers, as well as on iOS and Android phones. Additionally, it mentions the transcription capabilities in Google Docs and Microsoft Word. The paragraph also highlights a web-based tool,, which uses Google's speech recognition technology for accurate transcriptions without the need for software installation.


🚀 Advanced Real-Time Transcription with Otter

This paragraph discusses Otter, a service that offers real-time transcription with advanced features. It is not only a speech-to-text tool but also a meeting management system. Otter can transcribe speech from multiple speakers simultaneously and differentiate between them. The paragraph explains how to use Otter for live transcription during video creation and how it helps in identifying mistakes and continuing recording from a specific point. It also touches on Otter's free plan and its paid plans for additional features.


📑 Transcribing Pre-recorded Media with Temi and Descript

The paragraph presents Temi and Descript as top choices for transcribing pre-recorded video or audio files. Temi is an AI-based service that offers fast transcription at a rate of 25 cents per minute with a quick turnaround time. It allows bulk transcription and provides a visual representation of the text alongside the video. Descript is a comprehensive editing system for various media types, offering transcription along with robust editing capabilities. It allows editing videos by manipulating text as if it were a document. The paragraph outlines the process of using these services, their pricing models, and the features they offer.

📝 High-Accuracy Transcription with Rev

This paragraph focuses on Rev, a transcription service that provides extremely high accuracy, up to 99%, by using human transcribers. It contrasts Rev's service with AI transcription services, which typically have a maximum accuracy of 85 to 90%. Rev offers both human and AI transcription options, with the human service costing $1.50 per minute. The paragraph details how to place an order on Rev, the options available for transcription, and additional features like time stamping and direct integration with YouTube for caption creation. It also mentions Rev's AI transcription service and its competitive pricing.

🎥 Video Editing Tools with Built-in Transcription

The final paragraph briefly mentions the integration of transcription tools within video editing applications, using Adobe Premiere Pro as an example. It suggests that depending on one's workflow, their preferred video editing software might already include transcription capabilities. The paragraph encourages users to search for their specific editing tool in combination with 'transcribe' to discover if such a feature is available.




Transcribe refers to the process of converting spoken language into written form. In the context of the video, it is the primary action being discussed, with various tools and methods introduced to achieve transcription from audio to text. An example from the script is 'Whether you want to convert audio, video, or speech-to-text, there's transcription tools and software, that can automatically do it for you.'


Speech-to-text is a technology that converts spoken words into written text. It is a key concept in the video, as it discusses different software and tools that facilitate this conversion. The script mentions 'speech-to-text' in the context of built-in features on computers and phones, such as Windows' voice typing and Apple Dictation.

💡Google Docs

Google Docs is a web-based document editing platform that allows users to create, edit, and store documents online. The video highlights Google Docs as one of the tools that integrate voice typing for transcription purposes. The script states, 'And in Google Docs, you just wanna go up the top, to the tools menu down to voice typing, and you have a little microphone, pop up somewhere on your screen.'


Otter is a digital tool that provides real-time transcription services, which is particularly useful for meeting management and booking systems. It is mentioned in the video as a service that can transcribe speech from multiple people simultaneously. The script refers to it as 'a great choice' for businesses or individuals looking to transcribe meetings in real-time.


Temi is an AI-based transcription service that offers fast transcription at a rate of 25 cents per minute. The video describes it as a bulk transcription tool suitable for users with many files to transcribe quickly. The script illustrates its use with 'you can actually do things in bulk' and 'the turnaround time is usually less than five minutes'.


Descript is an end-to-end editing system for various types of media, including podcasts and videos, which also offers transcription services. The video emphasizes Descript's ability to edit videos as if they were text documents, making it a powerful tool for video editing. The script describes it as 'one of my favorite tools right now' and 'the future of video editing'.


Rev is a transcription service that stands out for its high accuracy, achieved through the use of human transcribers rather than AI. The video discusses Rev as an option for those seeking a 99% accuracy rate in transcription. The script mentions 'Rev because of that accuracy, because it's not using an AI algorithm, it's actually using a real human, to do the transcribing for you.'


Accuracy in the context of the video refers to the precision with which transcription services convert spoken language into written text. It is an important factor when choosing a transcription tool, with different services offering varying levels of accuracy. The script contrasts AI transcription with human transcription, noting that 'AI transcribing... have a maximum accuracy, of around 85 to 90%' while 'Rev' offers '99% accuracy'.

💡 is a web-based tool that uses Google's speech recognition technology to transcribe spoken words into text. The video presents it as a simple, free option for transcription. The script describes how to use the tool: 'you wanna go to launch dictation, you can specify your language up the top here, and then all we need to get started, is just press start, down the bottom here.'

💡Voice Commands

Voice commands are spoken instructions that control software functions. In the context of the video, voice commands are used within transcription tools to control features such as inserting new paragraphs or deleting text. The script explains that 'we can see all the different, speech recognition commands that you've got access to, when you're using this tool.'

💡Real-time Transcription

Real-time transcription is the process of converting spoken language into written text instantaneously as the speech is occurring. The video discusses the benefits of this feature for creating videos and managing meetings. The script mentions 'real-time transcribing' in the context of Otter, stating 'it has the ability in there, to automatically transcribe speech, from multiple people in real-time'.


Transcription tools and software can automatically convert audio to text.

Free transcription options are built into computers and phones.

Windows voice typing can be activated with the Windows key and H.

Apple Dictation on Mac requires enabling in system preferences and uses a double press of the control key.

Phone voice typing can be accessed through the microphone icon on the keyboard.

Google Docs and Microsoft Word have built-in dictation features. is a free web-based tool using Google speech recognition technology.

Otter is a meeting management system that transcribes speech from multiple people in real-time.

Temi is a fast AI-based transcribing service costing 25 cents per minute.

Descript is an end-to-end editing system for audio and video with high accuracy transcription.

Rev offers 99% accurate human transcription services at 1.50 per minute.

Rev also provides AI transcription and has direct integration with YouTube for captions.

Adobe Premiere Pro and other video editing tools are incorporating transcription features.

Transcription services have improved in accuracy and are now more usable than before.

Voice typing supports punctuation and paragraph control on both Windows and Apple devices.

Otter's free plan allows live recording and transcribing with no monthly fees or contracts.

Descript's software allows editing videos by manipulating text as if it were a document.

Rev's human transcription service is favored for its high accuracy and additional features.

Many transcription tools offer real-time transcription, making them ideal for meetings and interviews.

Transcription services are becoming more integrated into various applications, enhancing workflow efficiency.