Sakana Evolutionary Model Merge - and other AI News

Olivio Sarikas
23 Mar 202410:02

TLDRIn this video, the host shares exciting AI developments, ranging from creative workflows for Patreon supporters, like avatar generators and unique visual effects, to groundbreaking projects like Google's Vlogger, which creates complete videos from audio inputs. The discussion also covers the evolution of AI with projects like Sakana AI's model merging and Meta's spatial understanding initiative. Additionally, the video touches on Stable Video 3D's advancements, anime-style video enhancement, and the integration of AI in real life with Neuralink's brain-computer interface. The rapid evolution of AI and its merging with reality raises questions about our ability to differentiate between AI-generated and real-world content, highlighting the need for AI to help us manage the overwhelming pace of technological advancements.


  • 💭 The presenter shares stunning AI developments, focusing on creative and practical uses, and introduces exclusive workflows for Patreon supporters.
  • 💡 Introduces a unique avatar generator that maintains character consistency across different facial expressions using 'face detailer' for emotional variation.
  • 🌠 Showcases a workflow that applies a glitch effect on full resolution photos using stable diffusion, breaking conventional size limitations.
  • 👨‍💻 Demonstrates an 'image to image' conversion tool for creating anime-style pets, enhancing images without relying on specific AI models.
  • 📹 Presents 'Vlogger' by Google, a project that generates complete videos from audio inputs and images, including body and facial movements, envisioning future AI applications in media.
  • 🔄 Highlights Sakana AI's 'evolutionary model merch', an AI that merges different models to optimize performance, reflecting the AI space's expansive growth.
  • 🎮 Explores 'Stable Video 3D', a technology that produces high-quality rotational videos from images, potentially leading to 3D printing applications.
  • 📡 Discusses 'Anime Diff Lightning', a tool for generating high-speed video content, emphasizing rapid testing over quality.
  • 🛠‍💡 Covers Meta's project using AI and language models to interpret physical spaces, aiming for applications in navigation and environmental interaction.
  • 📈 Shares the breakthrough of a person using Neuralink to control a computer with thought, showcasing direct brain-to-AI communication.
  • 🧐 Reflects on the accelerated pace of AI development and its integration into reality, raising questions about our ability to discern AI-generated content from human-made.

Q & A

  • What is the purpose of the avatar generator workflow mentioned in the video?

    -The avatar generator workflow is designed to create avatars with the same details but different facial expressions, maintaining character consistency by using a tool called face detailer to change the emotion of the face. It also includes a feature for generating an endless amount of randomized prompts to produce completely different characters each time.

  • How does the glitch effect workflow differ from standard image processing?

    -The glitch effect workflow allows for the use of full-size, full-resolution photos in Stable Diffusion, which is typically not possible, by applying a glitch effect over the image. This represents a novel experiment in image processing.

  • What is the concept behind Sakana AI's evolutionary model merch project?

    -Sakana AI's project proposes using AI to merge different AI models and then testing them against each other in an evolutionary manner to determine which model performs the best. This process is automated and guided by AI, aiming to improve the models available in the vast AI space.

  • What advancements does Stable Video 3D offer according to the tutorial?

    -Stable Video 3D provides improved quality for creating rotational videos around objects, which are superior to previous methods. From these rotational images, a 3D mesh can be created, demonstrating a significant advancement in 3D modeling and animation.

  • How does the Anime Diff Lightning tool function, and what are its limitations?

    -Anime Diff Lightning is designed to create fast video outputs but with a trade-off in quality. It's useful for testing different prompts and concepts quickly, but the lower quality means it may not be suitable for all applications.

  • What innovative approach does Meta's project use to understand space?

    -Meta's project utilizes AI and language models to interpret space around them, focusing on logic rather than visual data to overcome the challenges of poor quality or ambiguous visual information. This approach enables understanding and interaction with environments in novel ways.

  • How did Meta train their AI for spatial understanding without real-world video data?

    -Lacking sufficient real-world video data, Meta created over 100,000 virtual environments and allowed their AI to navigate these spaces for training. This method provided the AI with the necessary experience to understand spatial arrangements and contexts.

  • What groundbreaking achievement was made with the Neuralink chip?

    -The first person with a Neuralink chip implanted in their brain was able to move a mouse on a screen and play a chess game using only their thoughts, marking a significant advancement in brain-computer interface technology.

  • How does the rapid advancement of AI impact human ability to keep up?

    -The script suggests that AI is evolving and creating information at such a rapid rate that humans cannot keep pace without AI's help. This includes AI's role in creating, merging models, and even making selections for humans, indicating a shift towards a more AI-dependent approach in managing and understanding AI advancements.

  • What effect does the quality of AI-generated images have on the appreciation of hand-drawn art?

    -The script mentions that the high quality of AI-generated images has made hand-drawn art less impressive to some, as flaws become more apparent compared to the flawless, high-quality outputs of AI, indicating a shift in perception and appreciation of art.



🤖 AI Innovations and Patreon Projects

The speaker introduces recent advancements in AI, showcasing unique workflows created for Patreon supporters. These include an avatar generator capable of producing various facial expressions with consistent character details, an application for applying glitch effects to high-resolution photos in stable diffusion, and an anime-style pet creator using image-to-image techniques. The segment also introduces 'vlogger', a Google project that generates complete videos from audio and image inputs, emphasizing the comprehensive rendering of body and facial movements. Furthermore, the speaker discusses the evolving landscape of AI content creation, the rapid iteration in personal media production, and the potential for more relatable AI-generated figures in media, highlighting the continuous and accelerating pace of innovation in AI technology.


🌍 AI's Evolving Role in Understanding and Merging Realities

The narrative shifts to the multifaceted impacts of AI in understanding and interacting with the physical world. Highlights include a Meta project that leverages AI and language models to interpret spatial environments, improving upon traditional visual data analysis. This approach facilitates innovative applications, like guiding through spaces or assessing object properties. Additionally, the speaker discusses 'evolutionary model merch', a concept for merging AI models to enhance performance, and the advancements in creating immersive 3D experiences through stable video 3D technology. The segment concludes with the groundbreaking development of Neuralink, demonstrating direct brain interface capabilities. The speaker reflects on the rapid assimilation of AI in various domains, noting a shift in perception towards handcrafted art due to the superior quality of AI-generated content, and calls for a discourse on the implications of these advancements.



💡Avatar Generator

An 'Avatar Generator' is a tool or application that creates digital representations or characters, often used in online platforms for user profiles or gaming. In the context of the video, the avatar generator mentioned uses AI to produce avatars with consistent character details but different facial expressions. This highlights the advancement in AI technology, enabling personalized and dynamic content creation. The ability to change emotions of the face while maintaining character consistency showcases a novel application of AI in enhancing user engagement and experience in digital environments.

💡Face Detailer

The 'Face Detailer' concept, as used in the video, refers to an AI-driven tool or feature that modifies the facial expressions of an avatar without altering its fundamental characteristics. This tool exemplifies the intersection of AI and graphic design, allowing creators to imbue static images with a range of emotions, thus making digital content more relatable and dynamic. Its application in generating different facial expressions while preserving avatar identity illustrates the potential of AI in personalized and expressive content generation.


Randomization, within the context of the video, refers to the automated process of generating diverse and unique outputs by an AI system. The mention of using randomization to create an endless amount of randomized prompts indicates the capability of AI to produce a vast array of character designs, enhancing creativity and variety in content creation. This feature underlines the AI's ability to support artists and creators by providing them with limitless inspiration and possibilities for character and concept development.

💡Stable Diffusion

Stable Diffusion is an AI-driven technology that enables the creation of high-quality images from textual descriptions. In the video, it's mentioned in relation to overcoming the limitation of processing full-resolution photos, through the rendering of a glitch effect over the image. This application showcases AI's evolving capacity to handle complex image processing tasks, pushing the boundaries of artistic expression and digital content creation by integrating unique visual effects.

💡Vlogger Project

The 'Vlogger Project' by Google, as described in the video, illustrates an innovative use of AI in video creation. It combines audio inputs with static images to generate complete videos, including body and head movements and facial expressions aligned with the spoken content. This project represents a significant leap towards automating content creation, suggesting a future where AI can produce highly realistic video content, potentially changing the landscape of digital media production and consumption.

💡Evolutionary Model Merging

Evolutionary Model Merging, as discussed in the context of Sakana AI's project, is a groundbreaking AI concept that involves combining different AI models and testing them against each other in an evolutionary manner. This method aims to determine which combinations of models perform best, automating the improvement of AI technologies. This approach reflects the video's theme of AI's self-evolutionary capabilities, highlighting a future where AI can autonomously refine and advance its functionalities.

💡Stable Video 3D

Stable Video 3D refers to a technology or tool that creates high-quality, three-dimensional video content from images. Mentioned in the video as producing stunning results better than previous versions, it demonstrates AI's capability in enhancing visual media through creating rotational videos around objects. This indicates a significant advancement in AI's application in 3D modeling and animation, offering new possibilities for visual storytelling and digital art.


Neuralink, a project mentioned in the video, is a brain-computer interface developed by Elon Musk's company. It represents cutting-edge technology that allows for direct communication between the human brain and computers. The video highlights a landmark achievement where an individual, using a Neuralink chip, can control a computer mouse and play chess with their mind. This breakthrough underscores the potential of AI and neural technology in transforming human interaction with technology, particularly for assistive purposes and enhancing human capabilities.

💡AI Influencers

AI Influencers, as touched upon in the video, refer to virtual characters or entities created using AI technologies, designed to simulate human influencers in digital and social media spaces. The video explores the idea of AI influencers and their potential to reshape public perception and content consumption by offering more relatable, customizable, and diverse digital personas. This concept challenges traditional notions of influence and representation in media, highlighting AI's role in creating new forms of social interaction and engagement.

💡Language Models

Language Models, as mentioned in relation to a project by Meta, utilize AI to understand and interpret human language, enabling machines to generate text that mimics human-like speech. The video discusses an innovative application of language models in understanding physical spaces, using the logic inherent in language processing to make inferences about the environment. This application signifies a novel integration of AI in enhancing spatial awareness and navigation, demonstrating the versatility and potential of language models beyond text generation.


Introduction to stunning AI news ranging from the strange to the beautiful.

Showcasing workflows for Patreon supporters including an avatar generator with consistent character details but different expressions.

Introducing a glitch effect workflow for full resolution photos, a new concept beyond stable diffusion's limitations.

Discussion on a cavi pet creator project using image to image techniques to stylize pets in anime style.

Google's Vlogger project uses audio and an image to create a full video with body and facial expressions matching the audio.

Exploring the future of AI in personalization and representation, impacting visual media.

Sakana AI's project on merging AI models in an evolutionary manner to improve performance.

Showcase of stable video 3D results and potential future applications including 3D printing.

Introduction to anime diff lightning for creating fast video content with considerations on quality.

Meta's project using AI and language models to understand physical spaces beyond visual data.

Training AI on virtual environments to navigate and understand real-world spaces.

Neuralink's achievement with the first person using their mind to control a computer interface.

AI's role in accelerating information creation and model improvement beyond human capacity.

The blending of AI creations with reality, and the challenge in distinguishing them.

Observation on the impact of AI-generated images on the appreciation of hand-drawn artwork.