Understanding the Midjourney algorithm for better prompts and a more efficient AI design workflow.
Prompt Engineering
Introduction to Prompt Engineering
“Prompt engineering” has been a major buzzword this year for futurists explaining what new jobs an AI-driven world might create. Put simply though, prompt engineering is the systematic research and analysis of what makes a good prompt for a given AI task. If you ever have a specific concept in mind but struggle to get Midjourney or similar AI image generators to produce the specific result you’re looking for despite countless attempts, a little prompt engineering might be exactly what you need. So in this article, we will dive into exactly how prompting tends to work in Midjourney, and how to fine tune your prompting workflow to efficiently get to the results you’re looking for.
Today’s Prompting Challenge
In this post, we will be attempting to produce a sensible Modern Art Nouveau Revival style home as our example, however the concepts presented can be applied to any style (Click here for style ideas on Midlibrary.com)
As demonstrated in previous posts [Post 1: Art Nouveau] [Post 3: The Best AI Image Generators] Midjourney excels at producing Art Nouveau style imagery. However, it produces wildly variable results and it can be hard to control how Midjourney interprets it. Results vary from normal but ornate late 19th century architecture to downright alien biological forms, seemingly unpredictably.
This is because the term “Art Nouveau” refers to such a bold stylemost AI image algorithms (but not all) heavily weight it compared to other words in the prompt. Midjourney achieves this weighting by tokenizing its prompts. To grossly simplify, Midjourney takes prompts and converts them to a fixed number of tokens, where each token corresponds to a specific set of concepts, colors, or patterns of pixels that it has learned to associate with that token. Each token activates a specific part of the AI model, which then guides the image generation process to produce results in an Art Nouveau style. Possibly as a result of Art Nouveau’s complexity, it monopolizes many tokens unless an equally powerful phrase or many other words are added to counteract it.
With that background information we can start to engineer our prompts to produce the exact style we are looking for and discover other insights into Midjourney’s models.
Starting Simple
For our first example we will see how Midjourney Version 5.2 Default Style interprets “art nouveau revival architecture” to see if we can get a viable design concept on the first attempt. For all examples the prompts used are listed below the image.
Ok that’s too ridiculous. We need to help Midjourney tone it down a notch. Moving forward, let’s start with the more subdued “Modern Art Nouveau” benchmark prompt developed in Post 3 – The Best AI Image Generators for Architecture. As noted above, for more on benchmarks and to follow along with another style besides Art Nouveau, check out Midlibrary’s excellent showcase of architectural styles in Midjourney by clicking here.
The benchmark prompt and image we will be starting from is as follows:
Much better, but how can we improve it further?
Prompt Engineering Methods
Let’s say these aren’t exactly the results we’re looking for so let’s see if we can improve the output. We will use the same seed for each result to minimize random variation and we will explore the following prompt engineering methods and concepts:
- Shorten
- Prompt Dilution
- Word Order
- Negative Prompts
- Linked Tokens
- Specifying Materials
- Version and Style
Shorten
The first and easiest way to understand Midjourney’s algorithm is the /shorten command. This command identifies which words the algorithm is focusing on vs ignoring, tells you the relative importance of each word in your prompt, and suggests ways to shorten the prompt while achieving similar results as shown below:
This shows that Midjourney is heavily weighting Art Nouveau, home, and revival, which means we need to reduce their relative strength if we think the results are too biological or fantastical for our “Modern Art Nouveau” goal.
Taking Midjourney’s advice and using shortened prompt #1 results in:
Wait a minute… according to shorten this was supposed to provide similar results to the original but it just made the façade ridiculously unconstructible (albeit much more exciting). So what happened? Well this demonstrates the important concept of prompt dilution.
Prompt Dilution
As you add words to a prompt, they steal the algorithm’s attention from the other words and dilute their strength. When prompts get really long (which is derisively known as “splatter-prompting”) many of the words don’t even meet the threshold to be a token and get completely ignored. Here is an example of what happens when you add or remove excess words. Notice what it does to temper or accentuate the outlandishness of the Art Nouveau styling by simply removing the words “stunning ornamentation”.
Since the only change was removing the word “stunning ornamentation” you would think that would make for a more subdued image but in reality the reverse happens. Art Nouveau now makes up a higher percentage of the prompt so the curves and sinuousness actually ratchet up a notch.
Word Order
Word order also matters as shown in the /shorten analysis. Placing words toward the front of a prompt will give them higher weighting while words at the end may be completely ignored. Here is an example of moving key words up in the prompt to emphasize their importance:
In the first variation, we delete the low-weighted word “award-winning” which shouldn’t have much of an impact, but in reality it both dilutes the prompt and moves Art Nouveau up to the first spot, supercharging the style.
In the second example, “sensibilities” is deleted and “modern” moves up to the top spot. The biological sinew is still there, but the massing and facade take on a more avantgarde composition.
The third example in this and all cases is the Modern Art Nouveau benchmark image for comparison.
Negative Prompts
Negative prompts remove features you don’t want to see. They also help strip away layers of Midjourney’s concept of a style, like the biological filigree look the platform tends to prefer in V5.2. Let’s look at our images and remove some of the less constructible things we see using negative prompts:
Turns out Midjourney sees Art Nouveau as just a blend of craftsman and Victorian style plus a lot of biological motifs (a fair assessment).
Linked Token Reintroduction
As you can see, Midjourney seems to think that terms like “sinuous”, “tendrils”, “filigree”, “lotus pod” and its close relative “trypophobia” are so closely linked with its concept of Art Nouveau that negating them gets rid of our Art Nouveau styling entirely revealing the underlying tokens. So apparently, Midjourney thinks that Art Nouveau is just Craftsman style with biological elements. To fix this overcorrection, you can reintroduce the linked tokens you truly want with more specificity. In this case, let’s use “biophilia” as a more targeted term than the algorithm’s usual overreliance on tendrils and curves. Alternately we can prompt for specific organisms we want it to draw inspiration from like trees, butterflies, and peacocks which are also common in Art Nouveau design.
Specifying Materials
Many of these results have included plaster as their cladding….historically a sensible choice for curved walls. However we can further steer the results away from Midjourney’s material biases and towards more realism by specifying materials like wood, glass, and brick. This helps constrain the results, freeing us from some of the more outlandish curves and pastel colors associated with Art Nouveau plaster buildings (though your carpenter won’t be too happy about these).
Version and Style
Finally, if you are ever struggling to achieve the results you want with words alone, the problem might be in the version or style you are using. Each Midjourney version interprets prompts differently, makes different artistic choices, and has a different aesthetic worth exploring. As of this writing, V5.2 and V6 Alpha are the most recent versions.
Furthermore, Midjourney has two basic styles: Default (a more opinionated, artistic style) and Raw (a more photographic style that interprets prompts more literally). It is up to the designer to determine which model and style works best for their current project. Here are some examples:
Notably, Raw and V6 tend to produce more grounded realistic results, but may not be as artistic or picturesque as V5.2 and Default. Often times, switching to an alternate model or style is the best solution when you hit a dead end in the version and style you’re working in and need a completely different aesthetic that the prompt engineering methods noted above couldn’t achieve.
Conclusion
Which styles do you struggle to generate consistently with AI? What tips do you have for designers overcoming those same challenges? Leave a comment below to share your experiences and suggestions.
If you found these prompt engineering tips and tricks helpful, subscribe below to stay up to date on design workflows and the future of AI in architecture here on Pixels to Plans.
About the Author
AcanthusAlchemist
Designer and engineer exploring the intersection of AI, architecture, and urbanism.
email: acanthus@pixelstoplans.com
Subscribe to our Newsletter
Interested in exploring other AI generated architecture concepts and design workflows? Sign up for our free mailing list below to be notified whenever we post new content here on Pixels to Plans.