This Stable Diffusion XL or SDXL prompt guide aims to provide a comprehensive understanding of various aspects related to the SDXL model from prompt anatomy to styles.
As we all know, StabilityAI claims that the model is optimized for generating images from concise prompts.
That’s why I will begin with prompt anatomy, and check whether a short prompt suffices or if a longer one is required.
Then, we will delve into how prompt parameters like styles and aspect ratios might influence your desired outcome.
Stable Diffusion XL or SDXL Prompt Guide:
Before delving into the guide, I’d like to draw a parallel between prompt creation and cooking. Just as in cooking, patience is key.
So, take your time to read the entire article attentively, as rushing can result in your dish being less flavorful.
And much like cooking, we begin by organizing the elements to craft prompts, followed by a discussion on parameters that will enhance image quality.
Decoding the Anatomy of SDXL Prompt:
Now, let’s delve into the crucial elements to bear in mind when formulating a prompt:
Subject: The Pinnacle of Focus
I would like to begin with the subject because it forms the nucleus of your image, capturing attention and conveying the primary message.
So, describe the environment or surroundings where the image should take place.
This could include details like whether it’s indoors or outdoors if it’s in a city or a natural setting, and any special conditions like it being daytime or nighttime.
Detailed Imagery: Painting with Precision
Now that you’ve chosen your subject, it’s time to add layers of richness and depth, immersing your audience in the world you’re creating.
So, you need to dive into specifics like clothing, expressions, textures, proportions, reflections, and interactions.
Emotions and Atmosphere:
Now indicate the feelings or mood you want the image to convey.
For example, you might want the image to feel happy, calm, mysterious, or dramatic. This helps set the emotional tone of the image.
For instance, “a joyful family picnic” or “a mysterious forest at dusk.”
Composition and Perspective:
It is not universally suitable for all types of images.
If it applies to your specific case, specify how you would like the elements in the image to be arranged.
You can specify if you want a close-up view of something, a wide-angle shot, or any other specific angle or framing.
For example, “a close-up of a bee on a flower” or “a panoramic view of a city skyline at night.”
Color Palette:
If colors are important for your image, let the model SDXL know.
You can specify certain colors or general tones (like warm colors, cool colors, etc.) in your prompt that you’d like to see in the image.
Action or Activity:
Now, if you want to see any movement or interaction in the image try to incorporate it in your prompt.
This could include people engaged in an activity, animals in motion, or any other dynamic elements.
As we understand, we should craft the prompt with a clear description, taking into account all the elements.
So, let’s see the prompt template curated from the above discussion.
Template: [Subject], [clothing and expressions], [Emotions and Atmosphere], [Composition and Perspective], [Color Palette], [Action or Activity]
If you require additional prompt templates for various types of prompts, you can refer to our article on the best stable diffusion prompt templates.
Now, I am going to create a prompt using all the elements mentioned above and generate an image using the SDXL 1.0 model.
Long Prompt: “In a magical forest, a mystical unicorn with a pearl-white coat grazes by a glowing pool. The air is filled with the melody of nightingales. Bioluminescent plants create an enchanting radiance, while ancient trees stand tall. A scene of wondrous magic, where dreams and reality blend.”
Now, I hope you’ve noticed that the length of the prompt is so long.
However, StabilityAI claims that the model is proficient enough for shorter prompts.
So let’s discuss it.
Fix SDXL Prompt Length:
Without a practical example, it is hard to say whether a short prompt is enough or not for the SDXL model.
So, let’s first make the above prompt short and generate the image again.
Short Prompt: In a magical forest, a mystical unicorn with a pearl-white coat grazes by a glowing pool in the moonlight.
Now, if you closely observe the two images, I hope you may notice that the image generated with a long prompt is much better than that of a short prompt in terms of lighting, smoothness, details, and quality.
However, the image generated by a short prompt is also good enough if you do not require a highly detailed image.
So, we can conclude that providing clear and detailed instructions ensures you get the best results.
And it’s important to note that StabilityAI’s claims hold true at a certain level, but may not apply to all types of images.
Choose Your Image Style:
If you have a specific artistic style in mind, you have two options at this point:
You can either mention the style name in your prompt or utilize the pre-made style in your SDXL model.
Don’t worry, let me clarify this.
To illustrate this, I will first test a photorealistic prompt, where the prompt is specifically designed for photorealistic images.
Prompt: A stunning young model, 21 years beautiful girl, High Detailed RAW color Photo, Summer Dress Collection, stage lighting, photography, photorealism, volumetric lighting, smiling, Ultra HD, hdr, 8k, DSLR
Now, I’ll simply remove the words tailored for realistic images in the prompt and apply the Photographic style.
So, the prompt will be:
A stunning young model, 21 years beautiful girl, Wearing a Summer Dress, smiling face
So, it is evident from the above two images that if you desire the image to resemble a realistic photograph, or something entirely different, you should employ SDXL Photorealistic Prompts.
Fine Tuning with SDXL Prompt Parameters:
If you stick with the default parameter settings of the SDXL model, it’s generally good to go.
However, there are instances where the default settings might not be precisely aligned, which is why it’s crucial to understand the appropriate parameter levels for optimal results.
Aspect Ration:
The aspect ratio is simply determined by your desired output. Upon observing the above image, you can discern which aspect ratio is suitable for different types of images.
LoRA Scale:
LoRA, short for Low-Rank Adaptation of Large Language Models, allows you to specify the level of importance you want to assign to the new update.
Remember it is exclusively relevant to trained models. The LoRa Scale value ranges from a minimum of 0 to a maximum of 1 and based on our experience, the optimal value is 0.6.
Prompt Strength:
When working with img2img or inpaint, you have the option to utilize Prompt Strength. A value of 1.0 signifies complete information destruction in the image.
In Stable Diffusion XL, this parameter governs how significantly your prompt influences the image generation process. It ranges from 0 to 1, and the recommended value is 0.8.
Scheduler:
Schedulers or samplers are like secret conductors working with UNet in stable diffusion, guiding the denoising process.
So I can say choosing the right scheduler is like picking the perfect song for a special occasion. It can turn a good result into an amazing one.
It’s like giving directions to the AI to create exactly what you want. And if you’re aiming for super realistic images, you can use K_EULER.
Steps:
If you using any AI then you should know that every little tweak gets you closer to your perfect artwork and in AI, ‘steps’ are those tiny tweaks.
So, it is clear that the more steps, the more it fine-tunes the result.
But, remember, more steps also mean more time and computer power.
So, if you’re looking for a thorough discussion, you may read the Reddit post on the optimal number of steps for the stable diffusion model.
Guidance Scale:
In Stable Diffusion XL, the guidance scale (CFG scale) governs how faithfully the image generation process adheres to a text prompt. A higher value means the image closely aligns with the provided text input.
So, if you believe your prompt isn’t very effective, you can utilize a value below 6.
However, if you have a strong prompt, a range of 8-12 is recommended for achieving excellent quality while still paying reasonable attention to the prompt.
Seed:
A seed is the initial number that sets the image creation in motion. It makes a random colorful picture.
Using the same seed means you get the same colors and shapes every time.
Actually, this helps with making consistent images, trying out different settings, or changing prompts.
So, try different seeds from your end and decide on a proper one because if you alter just one number in a seed, you get a totally different random picture.
Use Proper Negative Prompt:
Using negative prompts may be a must-usable parameter for Stable Diffusion v1 and v2 models. Because from my experience, without negative prompts, getting an optimum result is so hard.
But for the SDXL model, negative prompts do not affect so much. For convenience, you can see the below comparison which I generated using the same prompt.
I hope you understand.
But still, if you need a negative prompt, you may use the below universal negative prompt which I often employ.
Ugly, cartoon, two heads, extra limbs, fad color, disfigured, deformed, out of frame, easy negative, low contrast, underexposed, distorted face, blurry, draft, poorly drawn hands, poorly drawn feet, poorly drawn face, bad artist, long neck, long body, extra legs, extra arms, jpeg artifacts, signature, malformed limbs, watermarks
As we learn all the possible techniques and elements to write a good prompt. Now let’s see some basic tips that can be summarized from the above discussion.
Stable Diffusion XL Prompt Tips:
During a prompt creation, it’s important to bear in mind that the prompt should be clear, specific, and offer ample context for the SDXL model to grasp your intent.
So, try to avoid using vague or ambiguous language. Clearly state what you want to see in the image.
For example, instead of saying “a beautiful landscape,” specify “a sunset over a calm lake with mountains in the background.”
And if you want a detailed image, you need a well-described prompt.
In this case, describe the prompt in as much detail as possible using natural language. Although the keyword-based approach still works well for SDXL.
For negative prompts, you should avoid using excessively long ones. Try to include only the things you want to avoid in your image.
For example, use “Cartoon” and “Cropped” in the negative prompt when generating a realistic image or specifying an object you’d like to remove.
Hi there! I’m Zaro, the passionate mind behind aienthusiastic.com. With a background in Electronics Science, I’ve had the privilege of delving deep into AI and ML. And this blog is my platform to share my enthusiasm with you.