Prompt Engineering for AI Video Generation (Runway, Sora)

AI video generation is the natural next frontier after still images. Tools like Runway Gen-3, OpenAI's Sora, Pika, and Luma Dream Machine can now turn a single text prompt into a short film clip. The grammar of video prompting is similar to image prompting — but with one critical new layer: time.

1. Introduction

If you have followed this section so far, you already know 80% of what you need to prompt AI video. Subject, style, lighting, mood — these all still apply. The new dimensions are camera motion, scene action, and pacing. This tutorial covers all three using the major current tools as our examples. Models in this space evolve fast, but the prompt principles below will remain useful.

2. The Concept Explained

An AI video prompt is essentially an image prompt with a time axis added. You are describing what the camera does, what the subject does, and how long the clip lasts. Think of it like writing a one-shot shooting script: one sentence for the scene, one for the camera move, one for the action, optionally one for the lighting transition.

Video prompts add three new layers — camera motion, scene action, and pacing — on top of the image prompt foundations.

Camera motion vocabulary

Static: locked-off camera, tripod shot, no camera movement.
Slow movements: slow dolly in, slow push toward subject, gentle tracking left, slow tilt up.
Cinematic moves: crane down from above, drone pull-back wide reveal, slow arcing shot around the subject, handheld documentary shake.
Lens effects: rack focus from foreground to background, slow zoom in, anamorphic flare across the lens.

Scene action vocabulary

State what changes during the clip. "The leaves begin to fall around her", "steam slowly rises from the cup", "the cat turns its head toward the camera", "raindrops accelerate as the storm intensifies". Keep action small and singular — most current video models handle one or two motion events well, but break down with complex multi-action prompts.

Pacing and duration

Most current tools cap individual clips at 5–10 seconds. Plan your action accordingly. For longer narratives, chain multiple clips, each with its own prompt, then edit them together in DaVinci Resolve or CapCut. Reference shot lengths in your prompt: "5-second clip", "slow-paced, contemplative cinematic tempo".

Image-to-video workflow

The most reliable AI video workflow is image-to-video: generate a still image you love in Midjourney or DALL·E 3, then feed it as the starting frame in Runway, Luma, or Pika, with a prompt describing only what should move and how the camera should behave. This separates the "what" (image quality) from the "how" (motion quality), and the results are dramatically better than text-to-video alone.

3. The Problem Without Motion Vocabulary

Static image-style prompt sent to a video model

a woman in a red dress in a Paris cafe at golden hour

The video model has no idea what to animate. It might invent a wobbly handheld zoom, jitter the subject's face, or simply produce a near-still clip with slight breathing movement. Without explicit camera and action instructions, the output is unpredictable and usually disappointing.

4. The Solution

Full video prompt — image + motion + action + pacing

5-second cinematic clip.

START FRAME: A woman in her early thirties wearing a
deep red linen dress, sitting at a small marble cafe
table in Paris at golden hour. Steaming espresso in a
white cup in front of her. Soft warm side-light, slight
breeze stirring her hair.

CAMERA MOTION: Slow steady dolly-in from a wide
establishing shot to a tight medium close-up over the
five seconds. Subtle anamorphic lens flare crosses the
frame as the camera moves.

SCENE ACTION: She lifts the espresso cup to her lips
and takes a slow, contemplative sip, her gaze drifting
to the right as if catching someone interesting on the
street. Steam rises gently from the cup throughout.

PACING: Slow, unhurried, contemplative cinematic tempo,
24fps film feel.

STYLE: Photorealistic, A24 indie film aesthetic, shot
on Arri Alexa with 50mm anamorphic lens.

(Recommended workflow: generate the start frame in
Midjourney, upload it to Runway Gen-3, then paste the
camera and scene action above as the motion prompt.)

The output reads like a real film clip: a deliberate dolly-in, a single elegant action, gentle steam motion, and a coherent emotional arc. The careful breakdown gives the video model everything it needs.

5. Step-by-Step Breakdown

Decide the starting frame first. Generate it as a still in Midjourney or DALL·E 3 — using everything from Topics 3–10 of this section.
Pick one camera move. Slow dolly-in, gentle tracking, locked-off, crane down. One clear move per clip. Combining several confuses the model.
Choose one scene action. A single motion event: a sip, a turn, a step, a leaf falling. Keep it small and singular.
State the pacing and duration. "5-second clip, slow contemplative tempo." This anchors expectations and helps the model allocate motion across the timeline.
Use image-to-video when possible. A great still + a precise motion prompt outperforms text-to-video almost every time. This is the dominant professional workflow.
Plan longer pieces as chained clips. Five-second beats stitched together in an editor. Each clip prompted individually for maximum quality.

Tip: Save your favourite camera-move phrases as a reusable library — "slow dolly-in", "drone pull-back wide reveal", "crane down from above". These compress huge cinematic intent into a few words the video model understands.

6. Practice Exercises

Exercise 1

Take a still image you love (one of your own generations or a real photo). Use the image-to-video workflow in Runway or Luma to add a slow dolly-in motion. Notice how much a single deliberate camera move changes the image.

Exercise 2

Generate the same starting frame with three different camera moves (static, slow dolly-in, drone pull-back). Compare the emotional impact. Camera move alone shifts the meaning of the same image dramatically.

Exercise 3

Plan a 20-second story as four 5-second clips. Write a prompt for each. Generate them in the same style, then stitch them together in any video editor. This is the foundation of a real AI-video workflow.

7. Key Takeaways

Video prompts are image prompts plus three new layers: camera motion, scene action, and pacing.
Keep camera motion and scene action small and singular — one move, one event per clip.
The dominant professional workflow is image-to-video: generate a still in Midjourney or DALL·E 3, then animate it in Runway, Luma, or Pika.
Plan longer pieces as chained 5-second clips, stitched in DaVinci Resolve or CapCut.
Save reusable camera-move phrases — they compress complex cinematic intent into a few words the video model understands.

Discussion

Social Media Graphics Prompts: Banners, Thumbnails, Posts Building a Complete Brand Identity with AI Image Prompts