AI video generation is the natural next frontier after still images. Tools like Runway Gen-3, OpenAI's Sora, Pika, and Luma Dream Machine can now turn a single text prompt into a short film clip. The grammar of video prompting is similar to image prompting — but with one critical new layer: time.
If you have followed this section so far, you already know 80% of what you need to prompt AI video. Subject, style, lighting, mood — these all still apply. The new dimensions are camera motion, scene action, and pacing. This tutorial covers all three using the major current tools as our examples. Models in this space evolve fast, but the prompt principles below will remain useful.
An AI video prompt is essentially an image prompt with a time axis added. You are describing what the camera does, what the subject does, and how long the clip lasts. Think of it like writing a one-shot shooting script: one sentence for the scene, one for the camera move, one for the action, optionally one for the lighting transition.
State what changes during the clip. "The leaves begin to fall around her", "steam slowly rises from the cup", "the cat turns its head toward the camera", "raindrops accelerate as the storm intensifies". Keep action small and singular — most current video models handle one or two motion events well, but break down with complex multi-action prompts.
Most current tools cap individual clips at 5–10 seconds. Plan your action accordingly. For longer narratives, chain multiple clips, each with its own prompt, then edit them together in DaVinci Resolve or CapCut. Reference shot lengths in your prompt: "5-second clip", "slow-paced, contemplative cinematic tempo".
The most reliable AI video workflow is image-to-video: generate a still image you love in Midjourney or DALL·E 3, then feed it as the starting frame in Runway, Luma, or Pika, with a prompt describing only what should move and how the camera should behave. This separates the "what" (image quality) from the "how" (motion quality), and the results are dramatically better than text-to-video alone.
Static image-style prompt sent to a video model
a woman in a red dress in a Paris cafe at golden hour
The video model has no idea what to animate. It might invent a wobbly handheld zoom, jitter the subject's face, or simply produce a near-still clip with slight breathing movement. Without explicit camera and action instructions, the output is unpredictable and usually disappointing.
Full video prompt — image + motion + action + pacing
5-second cinematic clip.
START FRAME: A woman in her early thirties wearing a
deep red linen dress, sitting at a small marble cafe
table in Paris at golden hour. Steaming espresso in a
white cup in front of her. Soft warm side-light, slight
breeze stirring her hair.
CAMERA MOTION: Slow steady dolly-in from a wide
establishing shot to a tight medium close-up over the
five seconds. Subtle anamorphic lens flare crosses the
frame as the camera moves.
SCENE ACTION: She lifts the espresso cup to her lips
and takes a slow, contemplative sip, her gaze drifting
to the right as if catching someone interesting on the
street. Steam rises gently from the cup throughout.
PACING: Slow, unhurried, contemplative cinematic tempo,
24fps film feel.
STYLE: Photorealistic, A24 indie film aesthetic, shot
on Arri Alexa with 50mm anamorphic lens.
(Recommended workflow: generate the start frame in
Midjourney, upload it to Runway Gen-3, then paste the
camera and scene action above as the motion prompt.)
The output reads like a real film clip: a deliberate dolly-in, a single elegant action, gentle steam motion, and a coherent emotional arc. The careful breakdown gives the video model everything it needs.
Tip: Save your favourite camera-move phrases as a reusable library — "slow dolly-in", "drone pull-back wide reveal", "crane down from above". These compress huge cinematic intent into a few words the video model understands.
Take a still image you love (one of your own generations or a real photo). Use the image-to-video workflow in Runway or Luma to add a slow dolly-in motion. Notice how much a single deliberate camera move changes the image.
Generate the same starting frame with three different camera moves (static, slow dolly-in, drone pull-back). Compare the emotional impact. Camera move alone shifts the meaning of the same image dramatically.
Plan a 20-second story as four 5-second clips. Write a prompt for each. Generate them in the same style, then stitch them together in any video editor. This is the foundation of a real AI-video workflow.
Sign in to join the discussion and post comments.
Sign inPrompt Engineering for Business & Productivity
Use AI to work smarter — automate tasks, make better decisions, and communicate professionally. 12 practical business prompt tutorials for professionals.
Prompt Engineering for Specific AI Tools
Tool-by-tool mastery — deep dives into ChatGPT, Claude, Gemini, GitHub Copilot, Midjourney, Stable Diffusion, and more. Learn the exact prompting techniques each platform rewards.
Prompt Engineering for Developers
Use AI as your coding co-pilot. 18 tutorials on writing prompts to generate clean code, debug faster, write tests, build APIs, and ship better software.
Prompt Engineering Projects & Real-World Applications
Twelve hands-on projects that turn prompt engineering theory into a portfolio. Build chatbots, content generators, RAG systems, and more.
Prompt Engineering for Education & Learning
Use AI as your personal tutor. Learn how to study faster, create lesson plans, generate practice questions, master languages, and prepare for competitive exams with smart prompts.
Foundations of Prompt Engineering
The must-know basics of prompt engineering. Learn what prompts are, how AI models read them, and how to write clear instructions that get great results.