Which is better? Midjourney v6 vs. DALL-E 3 vs. Stable Diffusion XL

WesGPT
25 Dec 202314:07

TLDRIn this video, the host compares image generation results from three AI models: Dolly 3, Stable Diffusion XL, and Mid Journey version 6. The models are tested across five categories - cartoon images, photorealistic humans, architecture, seamless patterns, and logos - to determine which one best captures the essence of each prompt. The audience is encouraged to guess the model behind each image before the reveal, highlighting the strengths and unique styles of each AI in generating images.

Takeaways

  • 🌟 The video compares image generation results from three AI models: Dolly 3, Stable Diffusion XL, and Mid Journey version 6.
  • 📈 Dolly 3 is available on the plus plan within Chat GPT, while Mid Journey requires a subscription plan and is accessed through Discord.
  • 🎨 The models are tested across five categories: cartoon images, photorealistic humans, architecture, seamless patterns, and logos.
  • 💡 Users are encouraged to guess which image corresponds to which model in the comments before watching the reveal.
  • 🐙 The first category, cartoon images, features an underwater adventure with a cheerful octopus wearing a pirate hat.
  • 🎭 The photorealistic human category focuses on generating an image of a street performer playing a saxophone in an urban setting.
  • 🏰 For the architecture category, the prompt is to create an image of a Gothic Cathedral complex with detailed features and a surrounding medieval park.
  • 🌸 The seamless patterns category involves creating a vintage floral wallpaper with hand-drawn flowers and leaves in pastel colors.
  • ☕ The logo category challenges the models to design a logo for a gourmet coffee shop, incorporating a steaming coffee cup and coffee beans.
  • 📊 The video concludes with a discussion on the strengths and weaknesses of each model and encourages viewers to suggest further comparisons and tests.

Q & A

  • What are the three image generation models compared in the video?

    -The three image generation models compared in the video are Dolly 3, Stable Diffusion XL, and Mid Journey version 6.

  • How can one access Dolly 3 for image generation?

    -Dolly 3 can be accessed through the plus plan within Chat GPT.

  • What is the pricing like for Mid Journey version 6?

    -The basic subscription plan for Mid Journey version 6 costs $10 per month, which allows for about 200 image generations.

  • Which category did the video script not choose for testing the image generators?

    -The video script chose cartoon images, photorealistic humans, architecture, seamless patterns, and logos, but did not mention any other specific categories.

  • What was the prompt given for generating a cartoon image?

    -The prompt for generating a cartoon image was 'underwater adventure'.

  • How many image generations can one get for every $10 spent on Mid Journey version 6?

    -For every $10 spent on Mid Journey version 6, one can get approximately 5,000 image generations.

  • What was the common element in all the image prompts used in the video?

    -The common element in all the image prompts was that they were designed to fit into one of the five chosen categories.

  • Which image generation model was considered the most photorealistic according to the video?

    -According to the video, Mid Journey version 6 was considered the most photorealistic, particularly for the photorealistic human image prompt.

  • How can one access Mid Journey's image generator?

    -To access Mid Journey's image generator, one needs to subscribe to a plan and then join their Discord server, where the Mid Journey bot can be added to one's own server for image generation.

  • What was the general conclusion about the image generation models?

    -The general conclusion was that there might not be a true winner as all models performed well, and the preference for a particular style or look would come down to personal choice.

  • How did the video compare the image generation results?

    -The video compared the image generation results by using the same prompt for each model across five different categories and then evaluating and comparing the outputs based on the criteria set forth in the script.

Outlines

00:00

🎨 Image Generation Models Comparison

This paragraph introduces a video comparing three major image generation models: Dolly 3, Stable Diffusion XL, and Mid Journey version 6. The video will evaluate these models across five categories: cartoon images, photorealistic humans, architecture, seamless patterns, and logos. Each model is accessed through different platforms and requires specific purchases or subscriptions. The comparison is based on generating images from given prompts, and viewers are encouraged to guess which image corresponds to which model before the reveal.

05:01

🧜‍♂️ Underwater Adventure: Cartoon Image Comparison

In this section, the video script describes the first round of the comparison, focusing on generating cartoon images based on the prompt 'underwater adventure.' The images created by Dolly 3, Mid Journey version 6, and Stable Diffusion XL are shown, each with a unique interpretation of the prompt. The first image features a cheerful octopus with a pirate hat, surrounded by treasure chests and fish. The second image is more cartoony with a pirate logo and more fish, while the third has a bubbly style with goggles on the octopus. The viewers are asked to guess which model produced each image before the reveal, which concludes that the first image was Mid Journey version 6, the second was Dolly 3, and the third was Stable Diffusion XL.

10:01

🎷 Photorealistic Street Performer: Human Image Comparison

This paragraph details the second round of the image generation comparison, focusing on photorealistic human images. The prompt given was to generate an image of a middle-aged black male street performer playing a saxophone. The first image shows a man wearing a cabby hat and playing the saxophone, with a busy city street in the background. The second image has a man with an unusual saxophone, and the third image features an older man with a touque, playing the saxophone correctly. The viewers are invited to guess the model for each image before the reveal, which indicates that the first image is Dolly 3, and the second is Mid Journey version 6, with the third being Stable Diffusion XL.

🏰 Gothic Cathedral: Architectural Image Comparison

The paragraph discusses the third round of the comparison, which is about generating an image of a Gothic cathedral. The prompt includes detailed flying buttresses, pointed arches, stained glass windows, and a surrounding park. The first image is an isometric view showing the garden and buttresses, the second looks more like a photograph with a Gothic style, and the third image resembles a painting with a medieval style. The viewers are asked to identify the model for each image, and the reveal shows that the isometric image was generated by Dolly 3, the photograph style by Mid Journey version 6, and the painting style by Stable Diffusion XL.

🌸 Vintage Floral Wallpaper: Seamless Texture Comparison

This section of the script covers the fourth round, where the models are tasked with creating a seamless texture of a vintage floral wallpaper. The design should have hand-drawn flowers and leaves in pastel colors. The first image appears hand-drawn and potentially无缝, the second image seems more seamless, and the third image looks more AI-generated. The viewers are prompted to guess the model for each image, and the reveal indicates that the third image was mistakenly identified as Mid Journey version 6, while the first two are correctly identified.

☕️ Gourmet Coffee Shop: Business Logo Comparison

The final round of the comparison involves creating a logo for a gourmet coffee shop. The prompt includes a steaming coffee cup with coffee beans and a cozy, inviting feel with warm色调. The first image attempts text but has spelling errors, the second image is more polished with incorrect words, and the third image focuses on the coffee and beans without text. The viewers are asked to choose their favorite and guess the models, with the reveal showing that the first image is Dolly 3, the second is Mid Journey version 6, and the third is Stable Diffusion XL. The video concludes by encouraging viewers to suggest further model tests and to use the new Mid Journey version 6 in their Discord server.

Mindmap

Keywords

💡Image Generation

Image Generation refers to the process of creating visual content using artificial intelligence algorithms. In the context of the video, it involves comparing the outputs of three different AI models—Dolly 3, Stable Diffusion XL, and Mid Journey version 6—to assess their capabilities in producing various types of images based on given prompts. The video script provides examples of the images generated in categories such as cartoon images, photorealistic humans, architecture, seamless patterns, and logos.

💡Dolly 3

Dolly 3 is one of the AI models mentioned in the video script, which is available on the plus plan within Chat GPT. It is an image-generating AI that can produce cartoon-like images with a distinct style, as demonstrated in the video through its output in the cartoon image category. Dolly 3 is one of the contenders in the comparison of image generation capabilities.

💡Stable Diffusion XL

Stable Diffusion XL is the newest model from the Stable Diffusion series, an AI image-generating model that can be accessed through an API or by visiting beta.dreamstudio/generate. It requires the purchase of credits to generate images, but these are relatively inexpensive. In the video, Stable Diffusion XL is one of the models being tested and compared for its image generation quality and style.

💡Mid Journey version 6

Mid Journey version 6 is the latest model released by Mid Journey, which can be accessed through a Discord server after purchasing a subscription plan. This model is known for its photorealistic image generation capabilities and is compared with Dolly 3 and Stable Diffusion XL in the video for various image categories. The video highlights the unique outputs of Mid Journey version 6 and how it performs against the other two models.

💡Cartoon Images

Cartoon Images refer to the visual content that is created in a stylized manner, often exaggerating features for effect or humor. In the video, one of the categories where the AI models' image generation capabilities are tested is the creation of cartoon images. The prompt given to the models is to depict an underwater cartoon scene, and the outputs are evaluated based on their adherence to the prompt and the quality of the cartoonish style.

💡Photorealistic

Photorealistic refers to images that are created or manipulated to resemble photographs as closely as possible. In the context of the video, photorealistic human images are one of the categories being tested, where the AI models are prompted to generate images of a street performer in a highly realistic and detailed manner, capturing the urban environment and the performer's emotions.

💡Architecture

Architecture in this context refers to the category of images that the AI models are tasked to generate, specifically images of an elaborate Gothic cathedral complex. The prompt includes detailed elements such as flying buttresses, pointed arches, stained glass windows, and a surrounding area that evokes the medieval period. The video evaluates how well each model captures these architectural details and the overall atmosphere of the scene.

💡Seamless Patterns

Seamless patterns are designs that can be tiled or repeated without any visible breaks or mismatches, creating a continuous and uniform appearance. In the video, one of the categories for image generation is seamless textures, where the models are prompted to create a vintage floral wallpaper with hand-drawn flowers and leaves in pastel colors. The evaluation focuses on how well the generated images can be tiled without any noticeable seams.

💡Logos

Logos are graphical symbols or icons used to represent a company, organization, or product. In the video, the AI models are challenged to illustrate a logo for a gourmet coffee shop, with specific requirements such as a steaming coffee cup, coffee beans, and a cozy, inviting feel with a warm color scheme. The evaluation assesses the creativity and appropriateness of the logo design in relation to the given brief.

💡Personal Preference

Personal preference refers to an individual's likes or dislikes, which can vary greatly from person to person. In the context of the video, personal preference is emphasized as a deciding factor when comparing the image generation results of the AI models, as there may not always be a clear 'winner' and the choice often comes down to subjective aesthetics and individual tastes.

Highlights

The video compares image generation results between Dolly 3, Stable Diffusion XL, and Mid Journey version 6 across five categories.

Dolly 3 is available on the plus plan within Chat GPT.

Stable Diffusion XL is the newest model from Stable Diffusion and can be accessed through their API or Dream Studio.

Mid Journey version 6 requires a subscription plan starting at $10 per month for basic access and 200 image generations.

The categories tested are cartoon images, photorealistic humans, architecture, seamless patterns, and logos.

The video uses a single prompt for each category to test the models' abilities.

The first category, cartoon images, features an underwater adventure with a cheerful octopus wearing a pirate hat.

Mid Journey version 6's image of the octopus was chosen as the best listener to the prompt in the cartoon category.

In the photorealistic human category, the prompt was to generate an image of a street performer playing a saxophone.

Mid Journey version 6 was praised for its photorealistic portrayal of the saxophone player, standing out with light glares and a well-muted background.

The architecture category tested the models with a prompt to create an image of a Gothic Cathedral complex.

Dolly 3 produced an isometric view of the Gothic Cathedral, while Mid Journey version 6's image resembled a photograph.

Stable Diffusion XL's approach to the Gothic Cathedral was more like a painting, with a focus on the church and less on the surroundings.

Seamless textures were the subject of the fourth category, with a vintage floral wallpaper prompt.

The video notes that Mid Journey has a feature for creating seamless textures, but it was not used to give an advantage in this test.

The final category, business logos, tasked the models with illustrating a logo for a gourmet coffee shop.

The video concludes with a comparison of the logo designs, highlighting the different approaches each model took to the prompt.

Dolly 2's generation is showcased for historical context, showing the significant advancements made by the newer models.

The video encourages viewers to suggest different prompts and image types for future comparisons and tests.