Some looks are easier to show than to describe. When words run out, reference images take over. This tutorial covers the three main ways to feed an image into your prompt — image-as-subject, image-as-style, and structural reference — across Midjourney, DALL·E 3, and Stable Diffusion.
You have probably tried to write a prompt that captures "that exact moody, washed-out aesthetic from the film I watched last night" and watched the model produce something only vaguely close. Reference images solve this. Instead of describing a look in 50 words, you hand the model an example and say: "make something with this feeling". The accuracy jump is often dramatic.
Reference images can be used in three distinct ways, and confusing them is the single biggest source of frustration when starting out.
You want the model to keep a specific person, character, or object across multiple generations. In Midjourney this is the --cref (character reference) feature: /imagine prompt: a knight standing on a cliff --cref https://image-url. In DALL·E 3 you can upload an image inside ChatGPT and say "use this character in a new scene". In Stable Diffusion you use IP-Adapter or trained LoRAs of the subject.
You want the model to copy the aesthetic of an image — its palette, lighting, brushwork, or grain — but invent new content. In Midjourney this is --sref https://image-url. In DALL·E 3 you upload the image and say "create a new scene with the same colour palette, lighting, and mood as this reference, but showing a different subject". In Stable Diffusion this is IP-Adapter (style mode) or img2img with a low denoising strength.
You want the model to match a specific composition, pose, or layout — but invent everything else. This is the domain of Stable Diffusion's ControlNet extensions (Canny, Depth, OpenPose, Scribble) and the recent --cw (character weight) parameter in Midjourney. ControlNet is the most precise: you can feed in a rough sketch of a pose, and the model will render a fully styled image while preserving that exact pose.
Text-only — vague style description
illustration of a fox running through a forest, in the
style of an obscure 1970s Eastern European children's
book illustrator with soft watercolour washes and ink
linework
This is a heroic attempt to describe a very specific look in words — but the model has no way of knowing exactly which illustrator you mean. Without a reference, you get a generic "vintage watercolour" output that approaches but never quite hits the niche aesthetic you had in mind.
Reference + text — precision unlocked
/imagine prompt:
A small red fox running through a misty pine forest at
dawn, mid-stride, glancing over its shoulder.
--sref https://i.imgur.com/your-style-reference.png
--ar 4:5 --v 6 --sw 200
(--sref points to a saved screenshot from the 1970s
illustrator's actual book; --sw 200 dials up the style
strength so the model leans heavily into the reference.)
Now you have a fully on-brief illustration: the soft watercolour wash, the ink linework, the warm-cool palette — all carried straight from the reference image. The fox is original, but the look is unmistakeably faithful to the artist you wanted to channel.
--cref for character, --sref for style. Stable Diffusion: IP-Adapter for subject/style, ControlNet for structure. DALL·E 3: upload an image and describe the role you want it to play.--sw (style weight) and --cw (character weight) range from 0 to 1000 or 0 to 100. Start at default, then raise or lower based on how strictly you want the reference followed.Tip: Save a "style swatch" folder of 20–30 reference images that capture different aesthetic directions — film stills, magazine spreads, paintings. When a brief lands, you can pull a relevant
--srefin seconds instead of describing the look from scratch.
Find an image whose aesthetic you love (a film still, an album cover, a magazine page). Use it as a --sref in Midjourney with three completely different subjects — a portrait, a landscape, a product. Notice how the look carries while the content changes.
If you use Stable Diffusion, install ControlNet (OpenPose) and feed in a stick-figure pose. Generate the same pose in three styles: photorealistic portrait, anime illustration, oil painting. See how structural reference outranks text in pose accuracy.
Generate a character with a strong, specific look. Save the image. Use it as a --cref for three new scenes (in a forest, in a cafe, on a spaceship). Evaluate how well the character's identity carries — this is foundational for the consistent-character work in Topic 15.
--cref and --sref; Stable Diffusion uses IP-Adapter and ControlNet; DALL·E 3 accepts uploads inside ChatGPT.--sw, --cw) to control how strictly the model follows the reference.Sign in to join the discussion and post comments.
Sign inPrompt Engineering for Business & Productivity
Use AI to work smarter — automate tasks, make better decisions, and communicate professionally. 12 practical business prompt tutorials for professionals.
Foundations of Prompt Engineering
The must-know basics of prompt engineering. Learn what prompts are, how AI models read them, and how to write clear instructions that get great results.
Prompt Engineering for Developers
Use AI as your coding co-pilot. 18 tutorials on writing prompts to generate clean code, debug faster, write tests, build APIs, and ship better software.
Prompt Engineering for Education & Learning
Use AI as your personal tutor. Learn how to study faster, create lesson plans, generate practice questions, master languages, and prepare for competitive exams with smart prompts.
Prompt Engineering for Content & Copywriting
Write blogs, ads, emails, and social media content ten times faster with AI. 13 practical tutorials on prompt engineering for content creators and copywriters.
Prompt Engineering for Data Science & Analytics
Supercharge your data workflows with AI. 15 practical tutorials on using prompt engineering for data cleaning, EDA, machine learning, SQL, visualisation, and more.