The GPT-4o Voice App is Mind-blowing! Is Siri AI Coming ?!

Better Creating
17 May 202410:59

TLDRThe video discusses the impressive voice conversation feature in the chat GPT iOS app, which offers a natural and intuitive AI assistant experience. Open AI's new GPT 4 model is highlighted for combining vision, text, and audio, enhancing conversational abilities. The script also speculates on Apple's potential advancements in AI with Siri, suggesting that Siri might integrate with apps, recognize user intent, and become more interactive with the help of generative AI models like chat GPT. The host also promotes for learning about AI and other technologies to stay ahead in the AI race.


  • 😲 The GPT-4o Voice App has a voice conversation option that is incredibly natural and intuitive, providing a Star Trek-like AI assistant experience.
  • 🚀 Open AI has released GPT 4, which combines vision, text, and audio for the first time, significantly improving conversational AI.
  • 🗣️ The new voice mode in GPT 4 is natively integrated, offering faster response times and supporting 50 languages.
  • 📈 GPT 4's updates include features like Vision, system memory for conversation continuity, and the ability to browse and search within conversations.
  • 🎙️ The voice model in GPT 4 can now pick up on emotions in your voice and respond accordingly, enhancing the human-like interaction.
  • 📈 Major players in the multimodal language model space include Open AI, Google, and Facebook, all working to improve AI's understanding and response capabilities.
  • 📱 The AI race might already be won on smartphones, with personal assistant devices like the rabbit R1 and the Humane AI pin receiving mixed reviews.
  • 🔍 Chat GPT can perform quick research and provide recommendations, such as suggesting places to visit for breathtaking views.
  • 📊 Chat GPT can analyze data, such as a plot displaying temperatures, and provide insights based on the visual information presented.
  • 🍎 Apple's AI department is rumored to be working on a Siri generative AI assistant for 2024, potentially transforming Siri's capabilities.
  • 📚 is highlighted as a way to invest in one's intelligence and stay ahead in the AI race, offering interactive lessons on subjects like AI and computer science.

Q & A

  • What is the main topic discussed in the video?

    -The main topic discussed in the video is the advancements in voice AI technology, specifically the GPT-4o Voice App, and the potential for a new Siri AI assistant in 2024.

  • What is the significance of the GPT-4o Voice App?

    -The GPT-4o Voice App is significant because it offers a natural, intuitive voice conversation option that combines vision, text, and audio, making it a highly efficient and user-friendly AI tool.

  • How does the GPT-4o Voice App improve upon previous AI systems?

    -The GPT-4o Voice App improves upon previous AI systems by integrating vision, text, and audio together for the first time, allowing for more natural and human-like conversations with faster response times and the ability to understand and respond to emotions in the user's voice.

  • What are some of the cool use cases for the GPT-4o Voice App mentioned in the video?

    -Some cool use cases mentioned in the video include using the app for travel recommendations, such as visiting Windermere or hiking up to Scafell Pike, and for quick research on various topics.

  • Who are some major players in the multimodal language model space in AI?

    -Some major players in the multimodal language model space in AI include Open AI with GPT, Google with their models like CLIP, and Facebook with their efforts such as DALL-E.

  • What is the potential impact of Siri being powered by a generative AI model in 2024?

    -The potential impact of Siri being powered by a generative AI model in 2024 could be a significant transformation in how users interact with their devices, with improved app integration, intent recognition, and the ability to perform more complex tasks.

  • What is the 'Feret UI' system developed by Apple?

    -The 'Feret UI' is a generative AI system developed by Apple that is designed to make sense of app screens, potentially leading to a more interactive and user-friendly experience.

  • What is the role of the sponsor 'Brilliant' in the video?

    -Brilliant is the sponsor of the video, offering an app that provides interactive learning experiences in various fields, including AI and computer science, to help users stay ahead in the AI race.

  • How can users try out the GPT-4o Voice App?

    -Users can try out the GPT-4o Voice App by downloading the app and utilizing its voice conversation feature to experience the natural and intuitive AI interactions.

  • What is the significance of WWDC in the context of the video?

    -WWDC (Apple's Worldwide Developers Conference) is significant in the context of the video as it is where Apple is expected to unveil the future of their platforms and share details about the upcoming Siri AI assistant.

  • What is the importance of managing one's goals and projects in relation to AI assistants?

    -Managing one's goals and projects is important because even with advanced AI assistants, personal control and organization are crucial for effectively achieving objectives and staying on top of tasks.

  • How does the video suggest one can safeguard their career against the rise of AI?

    -The video suggests that investing in one's own intelligence through learning and education, such as using the Brilliant app, is one of the best ways to safeguard a career against the rise of AI.



🤖 Advancements in AI Voice Conversations

The video script begins with the host expressing gratitude to 'Brilliant' for sponsoring the video and introduces the topic of AI advancements, particularly focusing on the voice conversation feature in the Chat GPT iOS app. The host shares his enthusiasm about the natural and intuitive nature of this AI-driven voice interaction, which he believes is not widely recognized. He also hints at upcoming AI developments from Apple and shares his expectations for a new Siri generative AI assistant rumored for 2024. The script touches on recent AI gadget releases, the 'AI Wild West' era, and a personal update from the host. It also discusses Open AI's new flagship model GPT 4, which combines vision, text, and audio, making the conversation system significantly more efficient and accessible for free to all users.


🚀 Features and Future of AI from Open AI and Apple

This paragraph delves into the features of Open AI's GPT 4, highlighting its ability to analyze data, understand images, and maintain continuity across conversations. The host demonstrates the system's capability to understand and respond to voice commands without the need for tapping to interrupt. The script also discusses the potential for AI to transform Siri in 2024, with iOS 18, suggesting that Apple's AI department might be working on a generative AI model to enhance Siri's functionality. The host speculates on the possible integration of Siri with other apps, improved intent recognition, and the potential for Siri to perform more complex tasks. The paragraph concludes with a teaser about Apple's possible unveiling of these advancements at WWDC 2024.


🛠️ Tools for Personal Productivity and AI's Role

The final paragraph shifts focus to the importance of personal productivity tools and how AI assistants can be rendered useless without effective goal and project management. The host encourages viewers to watch the next video for system tekken tools that can help manage tasks with less effort. He also mentions an Apple research paper revealing the development of 'Feret UI', a generative AI system designed to understand app screens, hinting at a potentially interactive future for AI assistants. The host invites viewers to try the Chat GPT voice app and share their thoughts, expressing his excitement for the upcoming Apple event where these developments might be showcased.



💡GPT-4o Voice App

The GPT-4o Voice App is a reference to the latest version of OpenAI's language model integrated with voice capabilities. It represents a significant advancement in AI technology, allowing for more natural and intuitive interactions with an AI assistant. In the video's context, it is presented as a 'mind-blowing' feature that sets a new standard for AI personal assistants, with the ability to have a conversation in a very human-like manner.

💡Siri AI

Siri AI refers to Apple's voice-activated personal assistant, which is expected to be enhanced with more advanced AI capabilities. The video discusses rumors and speculations about Siri's potential transformation into a more powerful and versatile AI assistant, possibly incorporating generative AI models similar to those of GPT-4o. This would allow Siri to perform more complex tasks and understand user intent more accurately.

💡AI Wild West

The term 'AI Wild West' is used metaphorically to describe the current state of the AI industry, where there is rapid development and innovation with few regulations or standards. It suggests a landscape of opportunity and competition, where companies are racing to develop the most advanced AI technologies. In the video, it sets the stage for discussing the latest AI gadgets and the potential for Siri to become a leading player in this competitive environment.

💡Multimodal Language Models

Multimodal Language Models are AI systems that can process and understand multiple types of data, such as text, images, and audio. These models are designed to enhance understanding and generate more nuanced responses. In the video, the script mentions major players like OpenAI, Google, and Facebook that are working on such models, which combine different modalities to improve AI capabilities.

💡Chat GPT

Chat GPT is the name given to the AI assistant featured in the video. It is capable of voice conversations and is described as 'scary good' due to its natural and intuitive interaction style. The video highlights Chat GPT's ability to understand user intent and provide relevant responses, showcasing the potential of modern AI assistants.

💡GPT 4

GPT 4 represents the latest iteration of OpenAI's language model, which has been updated to include vision, text, and audio capabilities. The video script mentions that this new model makes the conversation system even better, allowing for more efficient and natural interactions with the AI.


Brilliant is an educational platform that sponsors the video. It offers interactive lessons on various subjects, including AI and computer science. The video promotes Brilliant as a way to invest in one's intelligence and stay ahead in the AI-driven world by learning about the technologies shaping the future.


WWDC stands for Worldwide Developers Conference, an annual event hosted by Apple where they announce new products and technologies. The video suggests that Apple might reveal updates to Siri or other AI advancements at the upcoming WWDC, indicating the significance of the event for AI developments.

💡Neural Networks

Neural Networks are a subset of machine learning that is inspired by the human brain and is used to create AI models capable of learning from data. In the video, the host mentions taking a course on neural networks to gain an understanding of the technology behind AI, emphasizing the importance of continuous learning in the field.


CRM stands for Customer Relationship Management, which refers to the practices, strategies, and technologies that companies use to manage and analyze customer interactions and data. In the context of the video, CRM is mentioned when comparing social media management platforms like Hootsuite and Salesforce, highlighting the broader functionality of Salesforce beyond just social media.


Introduction to the voice conversation feature on the chat GPT IOS app and its impressive capabilities.

The mind-blowing experience of using the voice AI, which feels natural and intuitive.

Simon's personal hot take on the potential developments in Apple's AI department and the rumored Siri generative AI assistant for 2024.

Major AI gadget releases in the personal assistant space and the AI race likely being won on smartphones.

Open AI's announcement of their new flagship model GPT 4, which combines vision, text, and audio.

The standard chat GPT app's hidden feature allowing for natural and intuitive voice conversations.

Use cases for the voice AI, such as getting travel recommendations for visiting the Lake District.

Chat GPT's ability to provide information on major players in the multimodal language model space.

The natural conversation flow with the AI, which feels genuinely human.

Comparison of social media management platforms Hootsuite and Salesforce.

Open AI's new update to GPT 4, which improves voice response and adds emotional understanding.

The new system's ability to handle voice mode natively with GPT 4, increasing efficiency and speed.

Introduction of new features like Vision, system memory for conversation continuity, and a browse function.

The ability to interrupt the AI without tapping and having it analyze data from images or screenshots.

Sponsorship by Brilliant, an app for learning interactively with lessons on AI, computer science, and more.

Evidence suggesting that Siri will be transformed in 2024 with iOS 18, potentially changing the AI landscape.

Expectations for the new Siri, including integration with other apps, intent recognition, and interactive capabilities.

Apple's research into 'Ferent UI', a generative AI system designed to understand app screens, hinting at a more interactive future.

A call to action for viewers to try the chat GPT voice app and share their thoughts ahead of Apple's WWDC 2024 event.

The importance of having control over personal goals and projects, with a teaser for the next video on productivity tools.