Midjourney vs DALL E 3 Prompt Battle Best AI Image Generator

Master AI Fast
3 Jan 202404:20

TLDRIn a rematch between Midjourney version 6 and DALL-E 3, the AI image generators are compared across four categories: Minecraft, The Roman Empire, Photography, and F1 Racing. The video script details a series of prompt tests, revealing the differences in the generated images. DALL-E 3 is noted for accurately recreating the prompt in the first test, capturing the essence of the Roman centurions' selfie in the second, and capturing most of the prompt requirements in the third. Midjourney is given a slight edge in the photography category for its realistic image. However, DALL-E 3 is recognized for its ability to capture the majority of prompt details, making it the overall winner in terms of image variety. The video encourages viewers to subscribe for more content and to watch another video comparing the two AIs with a consistent prompt throughout.


  • 🏙️ Midjourney and DALL-E 3 were compared in an image generation rematch across five categories: Minecraft, The Roman Empire, Photography, F1 Racing, and an unspecified fifth category.
  • 🎨 The first prompt involved creating a futuristic city in the style of Minecraft, with DALL-E 3 winning for better adherence to the Minecraft style.
  • 📸 In the Roman Empire category, DALL-E 3 captured the fun and happy nature of the centurions, despite inaccuracies in the Colosseum depiction.
  • 📷 Midjourney won the photography category for a more realistic and photo-like image of a blonde woman on a London rooftop.
  • 🏎️ DALL-E 3 triumphed in the F1 Racing category for capturing more of the prompt's details, despite the empty racetrack.
  • 🤔 Both AIs struggled with the prompt's instructions on certain aspects, such as the Colosseum's accuracy and the interpretation of a 'clean' racetrack scene.
  • 🌟 DALL-E 3 was noted for its ability to recreate the prompt properly and for capturing the majority of the prompt requirements in most categories.
  • 📹 The video script suggests that the visual output from Midjourney tends to look more like a real photograph, while DALL-E 3's images can appear more computer-generated.
  • 🏆 DALL-E 3 is declared the overall winner for creating prompts related to image variety.
  • 🔄 The transcript mentions a previous video comparing Midjourney and DALL-E 3 with consistent prompts throughout, with surprising results.
  • 📢 The narrator encourages viewers to subscribe to the channel for updates on new video releases.
  • 📚 The video aims to provide insights into the performance of two leading AI image generators, offering viewers a comparative analysis.

Q & A

  • What is the main purpose of the video?

    -The main purpose of the video is to compare the performance of two AI image generators, Midjourney version 6 and DALL-E 3, across different categories and prompts to determine which one performs better.

  • What are the four categories used for comparison in the video?

    -The four categories used for comparison are Minecraft, The Roman Empire, Photography, and F1 Racing.

  • Which AI image generator won the first prompt battle?

    -DALL-E 3 won the first prompt battle because it recreated the prompt properly, adhering to the iconic blocky style of Minecraft.

  • What was the issue with the image generated by DALL-E 3 in the Roman Empire category?

    -The issue with DALL-E 3's image was that it didn't capture the main Colosseum accurately and the image looked drawn rather than the realism of 8K that the prompt asked for.

  • Which AI image generator won the second prompt battle?

    -DALL-E 3 won the second prompt battle as it was able to capture most of the prompt requirements, despite the realism of Midjourney's interpretation.

  • What was the deciding factor for the winner in the Photography category?

    -The deciding factor was that Midjourney's image looked more like a real photo, which aligned with the prompt's request for a cinematic photo with ultra-realistic details.

  • What was the issue with the F1 Racing images generated by both AIs?

    -The issue with both images was that they lacked the impression of an actual race, with empty racetracks and no rubber marks on the road, which did not fulfill the prompt's request for a hyper-realistic F1 race scene.

  • Which AI image generator won the overall comparison?

    -DALL-E 3 was declared the overall winner as it created prompts related to image variety more successfully.

  • What was the viewer's recommendation at the end of the video?

    -The viewer recommended subscribing to the channel to help the algorithm and to stay updated with new video posts.

  • What is the significance of the phrase '8K Resolution' in the prompts?

    -The phrase '8K Resolution' signifies the level of detail and quality expected in the generated images, aiming for high-definition and ultra-realistic visuals.

  • How did the video script assess the performance of the AI image generators?

    -The video script assessed the performance by comparing the generated images against the specific requirements of each prompt and evaluating how well each AI captured the essence and details requested.

  • What is the potential misunderstanding in the F1 Racing prompt that the AIs might have had?

    -The potential misunderstanding was the interpretation of 'uncluttered' in the prompt, which might have led the AIs to generate images with empty racetracks, missing the action and crowd expected in a racing scene.



🎨 AI Image Generators Compared: Midjourney vs. DALL-E 3

This video script outlines a rematch between Midjourney version 6 and DALL-E 3, two AI image generators, across four categories: Minecraft, The Roman Empire, Photography, and F1 Racing. The script details a series of prompt tests to determine which model performs best in each category. The first prompt involves creating a futuristic city in the style of Minecraft, where DALL-E 3 is declared the winner for accurately recreating the prompt. The second prompt, featuring Roman centurions taking a selfie, results in DALL-E 3 winning again for capturing most of the prompt's requirements. The third prompt, for a cinematic photo of a happy blonde woman in London, is won by Midjourney for its realistic look. The final prompt, for a hyper-realistic F1 race, sees DALL-E 3 capturing more of the prompt's details despite both images lacking some elements. The video concludes with a call to action for viewers to subscribe for more content and a teaser for another video comparing the two AIs.




Midjourney refers to an AI image generator that is being compared against another AI, DALL-E 3, in this video. It is a key participant in the 'Prompt Battle' where different categories are used to test and compare the capabilities of each AI in generating images based on given prompts. The video aims to determine which AI performs better in adhering to the instructions and creating images that match the desired themes.


DALL-E 3 is another AI image generator featured in the video, competing against Midjourney. It is assessed based on its ability to interpret and render images that match the specific themes and details requested in various prompts. DALL-E 3 is shown to have strengths in capturing the essence of the prompts, making it a contender in the comparison.

💡Prompt Battle

A 'Prompt Battle' is the main event of the video where two AI image generators are tested against each other. The battle involves providing each AI with a series of prompts and evaluating the resulting images based on how well they meet the criteria outlined in the prompts. It serves as a method to compare the performance and accuracy of the AIs in generating images.


Minecraft is a popular sandbox video game known for its blocky, pixelated style. In the context of the video, it is one of the categories used to test the AIs. The AIs are tasked with generating images that not only depict futuristic cities but also do so in the distinctive style reminiscent of Minecraft's graphics.

💡Roman Empire

The Roman Empire represents a historical category used in the video to challenge the AIs. The AIs are prompted to create images that capture the essence of Roman centurions in a modern, fun context, such as taking a selfie. This tests the AIs' ability to blend historical elements with contemporary scenarios.


Photography is one of the categories used to evaluate the AIs' image-generating capabilities. The AIs are given prompts that specify certain photographic techniques and qualities, such as 'wide angle directional light' and 'soft lighting,' to assess their ability to render images with a realistic and artistic touch.

💡F1 Racing

F1 Racing is a category that focuses on the AIs' ability to create hyper-realistic and dynamic images of a fast-paced sporting event. The prompts given to the AIs request images that capture the action and teamwork inherent in Formula 1 races, testing their capacity to generate detailed and lively scenes.

💡8K Resolution

8K Resolution refers to a screen resolution of approximately 8,000 pixels in width, which is a high-definition standard used in the video to test the AIs' ability to generate highly detailed images. The script mentions '8K, extremely detailed' as a requirement for the images, indicating a need for exceptional clarity and precision in the visuals.


Cinematic is a term used in the video to describe the desired quality and style of the generated images. It implies that the images should have a visual storytelling aspect, with a focus on lighting, composition, and realism that is reminiscent of film production. The AIs are expected to create images with a 'cinematic' feel, suggesting a high level of artistry and emotional engagement.


Realism, in the context of the video, is a quality that the AIs are expected to achieve in their generated images. It refers to the accurate representation of subjects in a way that closely resembles real-life appearances. The prompts often ask for 'ultra realistic' or 'realistic' images, indicating that the AIs should aim for a high degree of verisimilitude in their outputs.

💡Hyper Realistic

Hyper Realistic is a term used to describe images that go beyond regular realism to an almost exaggerated level of detail and clarity. In the video, this term is associated with the prompts given to the AIs, suggesting that the generated images should not only be realistic but surpass ordinary levels of detail to create a heightened sense of authenticity.


Introduction to the rematch between Midjourney version 6 and DALL-E 3.

Categories compared: Minecraft, The Roman Empire, Photography, and F1 Racing.

Detailed comparison of AI-generated images in the style of Minecraft.

Explanation of how DALL-E 3 successfully mimics the Minecraft style.

The prompt featuring Roman centurions in Rome taking a selfie, showcasing the requirements of happy, fun imagery with cinematic quality.

Analysis of how each image generator interpreted the Roman centurions prompt.

DALL-E 3's success in capturing the essence of the Roman Empire prompt.

Description of a cinematic photo of a blonde woman in London, detailing the expectations of ultra realism and high resolution.

Comparison of how realistic the images from Midjourney and DALL-E 3 appear.

Midjourney's edge in producing a photo that appears more realistic.

Overview of the F1 racing prompt, emphasizing the need for a hyper-realistic, uncluttered drone shot capturing the essence of racing.

Discussion on how both AI models interpreted the F1 racing scene, with a focus on the lack of racing dynamics in the images.

Final assessment of which AI performed better in capturing the detailed prompt of the F1 racing scene.

Conclusion that DALL-E 3 generally captures more of the prompt details across different categories.

Invitation to subscribe and tune into more detailed comparisons and results on the channel.