Using Reference Images and Style Transfer in Prompts

Some looks are easier to show than to describe. When words run out, reference images take over. This tutorial covers the three main ways to feed an image into your prompt — image-as-subject, image-as-style, and structural reference — across Midjourney, DALL·E 3, and Stable Diffusion.

1. Introduction

You have probably tried to write a prompt that captures "that exact moody, washed-out aesthetic from the film I watched last night" and watched the model produce something only vaguely close. Reference images solve this. Instead of describing a look in 50 words, you hand the model an example and say: "make something with this feeling". The accuracy jump is often dramatic.

2. The Concept Explained

Reference images can be used in three distinct ways, and confusing them is the single biggest source of frustration when starting out.

The three reference modes. Choose deliberately — they produce very different outputs.

1. Subject reference

You want the model to keep a specific person, character, or object across multiple generations. In Midjourney this is the --cref (character reference) feature: /imagine prompt: a knight standing on a cliff --cref https://image-url. In DALL·E 3 you can upload an image inside ChatGPT and say "use this character in a new scene". In Stable Diffusion you use IP-Adapter or trained LoRAs of the subject.

2. Style reference

You want the model to copy the aesthetic of an image — its palette, lighting, brushwork, or grain — but invent new content. In Midjourney this is --sref https://image-url. In DALL·E 3 you upload the image and say "create a new scene with the same colour palette, lighting, and mood as this reference, but showing a different subject". In Stable Diffusion this is IP-Adapter (style mode) or img2img with a low denoising strength.

3. Structural reference

You want the model to match a specific composition, pose, or layout — but invent everything else. This is the domain of Stable Diffusion's ControlNet extensions (Canny, Depth, OpenPose, Scribble) and the recent --cw (character weight) parameter in Midjourney. ControlNet is the most precise: you can feed in a rough sketch of a pose, and the model will render a fully styled image while preserving that exact pose.

3. The Problem Without Reference Images

Text-only — vague style description

illustration of a fox running through a forest, in the
style of an obscure 1970s Eastern European children's
book illustrator with soft watercolour washes and ink
linework

This is a heroic attempt to describe a very specific look in words — but the model has no way of knowing exactly which illustrator you mean. Without a reference, you get a generic "vintage watercolour" output that approaches but never quite hits the niche aesthetic you had in mind.

4. The Solution

Reference + text — precision unlocked

/imagine prompt:
A small red fox running through a misty pine forest at
dawn, mid-stride, glancing over its shoulder.

--sref https://i.imgur.com/your-style-reference.png
--ar 4:5 --v 6 --sw 200

(--sref points to a saved screenshot from the 1970s
illustrator's actual book; --sw 200 dials up the style
strength so the model leans heavily into the reference.)

Now you have a fully on-brief illustration: the soft watercolour wash, the ink linework, the warm-cool palette — all carried straight from the reference image. The fox is original, but the look is unmistakeably faithful to the artist you wanted to channel.

5. Step-by-Step Breakdown

Decide which mode you need. Is the reference for the subject (keep the character), the style (copy the look), or the structure (match the pose/composition)?
Pick the right tool and parameter. Midjourney: --cref for character, --sref for style. Stable Diffusion: IP-Adapter for subject/style, ControlNet for structure. DALL·E 3: upload an image and describe the role you want it to play.
Use clean reference images. Crop tightly, remove watermarks, prefer a single dominant subject. Noisy or low-resolution references produce noisy outputs.
Combine reference with strong text. The reference handles "look and feel" — your text still drives the new subject, the scene, and the mood. Both halves matter.
Tune the strength dial. Midjourney's --sw (style weight) and --cw (character weight) range from 0 to 1000 or 0 to 100. Start at default, then raise or lower based on how strictly you want the reference followed.

Tip: Save a "style swatch" folder of 20–30 reference images that capture different aesthetic directions — film stills, magazine spreads, paintings. When a brief lands, you can pull a relevant --sref in seconds instead of describing the look from scratch.

6. Practice Exercises

Exercise 1

Find an image whose aesthetic you love (a film still, an album cover, a magazine page). Use it as a --sref in Midjourney with three completely different subjects — a portrait, a landscape, a product. Notice how the look carries while the content changes.

Exercise 2

If you use Stable Diffusion, install ControlNet (OpenPose) and feed in a stick-figure pose. Generate the same pose in three styles: photorealistic portrait, anime illustration, oil painting. See how structural reference outranks text in pose accuracy.

Exercise 3

Generate a character with a strong, specific look. Save the image. Use it as a --cref for three new scenes (in a forest, in a cafe, on a spaceship). Evaluate how well the character's identity carries — this is foundational for the consistent-character work in Topic 15.

7. Key Takeaways

Reference images solve precision problems that words alone cannot — they teleport the model into exactly the visual region you want.
There are three reference modes: subject (keep this character), style (copy this look), and structural (match this composition).
Midjourney uses --cref and --sref; Stable Diffusion uses IP-Adapter and ControlNet; DALL·E 3 accepts uploads inside ChatGPT.
Combine reference images with strong text — references handle look, text drives content.
Tune strength parameters (--sw, --cw) to control how strictly the model follows the reference.

Discussion

Negative Prompts: How to Tell AI What NOT to Generate Midjourney-Specific Prompt Parameters (--ar, --v, --style, --chaos)