Seedance2

Guide

The Universal Prompt Formula — Six Dimensions of AI Video Direction

Most unstable AI video output stems from prompts that lack internal logic. The Universal Prompt Formula establishes logical connections between prompt elements across six dimensions — subject, action, scene boundaries, camera, lighting, and timing — so the model knows exactly what to prioritize. This guide covers the core framework, a complete street-scene walkthrough, and advanced applications for multi-person scenes, precise camera control, timed interactions, and close-up shots.

Last updated: Last verified:

Source basis and reading boundary

These guides are written as third-party reference summaries, not official product documentation or support content.

Source basis

Why most AI video prompts produce unstable output

When a prompt has no clear priority structure, the AI model distributes attention equally across every element — people, objects, background, camera — and tries to animate them all at once. The result is hallucinated movements, drifting cameras, deformed characters, and scenes where nothing looks intentional. The Universal Prompt Formula solves this by building a hierarchy of logical connections: one absolute subject, one core action, clearly bounded secondary elements, precise camera following logic, mood-driven lighting, and second-by-second timing. Instead of gambling on random generations, you direct the AI like a film director who has blocked every beat of the scene.

Dimension 1 — Determine the Absolute Subject

The most important step is establishing who or what the AI should focus on. Design your prompt around a single subject to prevent the model from distributing attention equally among all elements. Prioritize describing the subject's actions before describing the background. For a scene of a woman on a street, write 'A woman walking left on the street' — this tells the AI the woman is the priority, and it will automatically handle background pedestrians and buildings with less computational weight. When the model knows who is the protagonist, it has spare attention to infer natural background behavior instead of rendering everything at the same fidelity.

Dimension 2 — Anchor the Core Action

A video's soul comes from a single core action rather than a pile of random movements. Establish a hierarchy where the core action takes priority and auxiliary actions support it without conflict. If the core action is 'walking left,' auxiliary micro-movements could be 'gently touching hair' or 'occasionally glancing around.' Avoid adding unrelated core actions like 'bending over to pick something up' simultaneously — this forces the character to perform two competing primary motions, which deforms the body. The rule is one core action per shot, supported by 1–3 non-conflicting micro-movements that add naturalism.

Dimension 3 — Define Scene Boundaries (Subject-Object Logic)

Every secondary element in the frame must have a defined role: background object or interactive object. Background objects should be weakened — 'static buildings,' 'slow-moving distant pedestrians' — so they never compete for the model's attention. Interactive objects should coordinate with the subject — 'the flower vendor looks up at the subject as she passes, with minimal movement amplitude.' This subject-object logic prevents the AI from treating a background pedestrian as equally important as the protagonist, which is the root cause of crowd scenes where everyone makes the same gesture.

Dimension 4 — Plan Camera Movement

Vague instructions like 'camera moves closer' or 'cinematic shot' cause the virtual camera to drift aimlessly. Instead, establish a following logic between the camera and the subject with precise terms. Follow shot: 'camera moves left at the same speed as the subject, keeping her centered in frame, fixed focal length.' Push-in: 'camera moves from a full-body framing to a face close-up while the subject maintains her walking motion.' These instructions lock the visual focus so the model knows exactly where the lens should point and how fast it should move. The camera exists to serve the subject — never let it wander independently.

Dimension 5 — Set Lighting and Color

Lighting and color should evoke the emotional logic of the scene and highlight the subject — not just make the image look 'pretty.' All visual settings should surround the subject to make them the visual center. For a lonely woman at night: 'Warm yellow streetlights hitting the subject from the back-left, creating long shadows on the ground; low-saturation blue tones for the overall scene.' This isolates the subject visually from the background. When the lighting creates a clear contrast between subject and environment, the model concentrates rendering quality on the lit subject and treats the darker background as secondary.

Dimension 6 — Control Time and Rhythm

AI video instability often stems from a lack of timing instructions. Divide the video into clear temporal stages with second-by-second planning. For a 10-second video: 1–3s — subject enters from the right and walks left, arms swinging naturally; 4–6s — subject reaches the center and touches her hair, head tilts slightly; 7–10s — subject continues walking to the left edge while the camera follows at matching speed until the frame ends. This second-level state planning tells the model what to render at each moment, preventing it from inventing random actions or rushing through the entire sequence in the first two seconds.

Complete Example — The Night Street Walker

Combining all six dimensions into one prompt: Absolute Subject & Core Action — A woman walks left on a city street at a steady pace. Auxiliary Actions — She gently brushes her hair with one hand, occasionally tilts her head to observe her surroundings, and maintains an even walking rhythm. Scene Boundaries — Background buildings remain static; a few distant pedestrians move slowly without drawing attention; a roadside flower vendor glances up at the subject as she passes, with minimal gesture. Camera — Follow shot moving left at the same speed as the subject, fixed focal length, subject centered in frame throughout. Lighting & Color — Warm yellow streetlights illuminate from the subject's back-left, casting a long shadow; the overall scene is rendered in low-saturation cool blue tones creating a quiet, melancholic atmosphere. Timing — 1–3s: subject enters from the right edge, begins walking left with natural arm swing; 4–6s: subject reaches frame center, performs the hair-brush gesture, slight head tilt; 7–10s: subject continues left toward the frame edge, camera follows smoothly until the end.

Why the complete example works

Four mechanisms make this prompt effective. First, interference avoidance: by declaring the woman as the absolute subject, the AI does not split attention equally among background pedestrians. Second, deformation prevention: 'walking left' is the core action and 'touching hair' is auxiliary — the model knows which motion takes priority, preventing limb distortion from competing actions. Third, camera stability: the precise follow-shot instruction with 'same speed' and 'keep centered' replaces vague phrasing that would cause the camera to drift or lose the subject. Fourth, randomness elimination: second-by-second timing turns you into an AI director who tells the model what to render at every stage, removing the possibility of the model improvising conflicting actions.

Advanced — Multi-Person Scenes

In crowd scenes, the AI distributes attention equally across every visible person, causing everyone to perform the same eerie gesture. The fix is 'strong subject, weak background, clear hierarchy.' First, redefine the absolute subject by picking one focus from the group: 'A woman in a red dress (subject) walks through a crowded street.' Second, use parallel-subject actions to differentiate groups: the subject walks left while other pedestrians browse flowers at a roadside shop. Third, set background objects to 'slow-moving, not competing for camera focus' and interactive objects to small, subject-serving actions like 'the vendor looks up at the subject.' Fourth, lock the camera on the subject with a follow shot that keeps her centered regardless of crowd movement. Fifth, use second-level timing so different people act at different moments — 1–3s the subject enters while the crowd flows, 4–6s the subject performs a micro-action while one interactive character responds, 7–10s the subject exits while the background maintains steady evolution.

Advanced — Preventing Camera Drift

Camera drift happens when the model receives vague motion descriptions. Two precise instruction patterns solve this. Follow shot: state that the camera moves at the same speed as the subject, focal length is fixed, and the subject is always centered — this locks the spatial relationship between lens and subject. Push-in: state the start framing (full body) and end framing (face close-up) explicitly, while requiring the subject to maintain her ongoing action — this gives the model a clear trajectory from A to B. Both patterns work because they define the camera's relationship to the subject rather than describing the camera in isolation. A camera instruction without a subject reference is the single most common cause of aimless drifting.

Advanced — Timed Character Interactions

To make characters interact at a specific timestamp, combine second-level state planning with interactive-object logic. Define when the interaction happens within a specific time window: 1–3s the subject approaches the flower vendor; 4–6s (interaction point) the subject stops in front of the vendor, the vendor (interactive object) looks up and gently lifts a bouquet; 7–10s the subject takes the flowers and continues walking, the vendor returns to resting position. Within the interaction window, describe both the subject's action and the interactive object's response as a coordinated pair. Set background objects to static or slow-moving during the interaction window so the model's full attention is on the exchange.

Advanced — Close-Up Shot Adaptation

The formula scales down to close-ups by shifting from macro scene direction to micro detail sculpting. The absolute subject becomes a body part: 'close-up of the woman's face.' The core action becomes micro-expressions: 'slowly blinks, corners of the mouth lift into a faint smile' instead of full-body movements. Scene boundaries require extreme weakening — 'background is bokeh blur, single-color tone, completely static' — to force all rendering resources onto the subject's facial details. Camera uses either a static close-up (fixed focal length on the face) or a very slow push-in from mid-close to extreme close-up. Lighting describes how light falls on specific facial features: 'soft side light illuminating half the face, the other half in shadow, creating mystery.' Timing controls the expression arc: 1–3s calm expression with eyes cast down; 4–6s slow head lift, eyes meet the camera; 7–10s the faintest smile appears as the camera subtly pushes in.

Examples & sources

Night Street Walker — Full Six-Dimension Prompt

A complete prompt applying all six dimensions to a moody urban scene. Each line maps to one dimension of the formula.

Subject: A woman walks left on a city street at a steady pace.
Auxiliary: She gently brushes her hair, occasionally tilts her head to glance around, feet maintaining even rhythm.
Scene: Background buildings are static. Distant pedestrians move slowly, not competing for focus. A roadside flower vendor glances up at the subject as she passes (minimal gesture).
Camera: Follow shot — camera moves left at the subject's speed, fixed focal length, subject centered.
Lighting: Warm yellow streetlights from back-left casting long shadows. Overall scene in low-saturation cool blue tones. Quiet, melancholic mood.
Timing:
  1–3s: Subject enters from right, walks left, natural arm swing.
  4–6s: Subject reaches center, brushes hair, slight head tilt.
  7–10s: Subject continues to left edge, camera follows to end.

Crowded Street — Multi-Person Hierarchy Prompt

Demonstrates how to handle multiple characters by establishing a clear subject hierarchy and differentiating actions.

Subject: A woman in a red dress (absolute subject) walks left through a crowded street.
Parallel actions: Other pedestrians browse flowers at a roadside stall.
Scene: Distant crowd moves slowly, not stealing camera focus. A flower vendor (interactive) looks up at the subject as she passes — minimal movement.
Camera: Follow shot locked on subject, same speed, fixed focal length, subject always centered.
Lighting: Subject lit from side-back, background lighting muted.
Timing:
  1–3s: Subject enters from right, crowd begins flowing.
  4–6s: Subject does a micro-gesture, vendor reacts with a glance.
  7–10s: Subject exits left, background maintains steady drift.

Close-Up Expression Arc — Micro-Movement Prompt

Adapts the formula for an extreme close-up, replacing body actions with facial micro-expressions.

Subject: Close-up of a woman's face.
Core micro-expression: Slowly blinks, then corners of the mouth lift into a faint smile.
Auxiliary: Slight eye redness, subtle eyelash tremor.
Scene: Background is full bokeh blur, single warm tone, completely static.
Camera: Static close-up, fixed focal length on face. Slight push-in during final segment.
Lighting: Soft side light illuminating the left half of the face; right half in gentle shadow. Mysterious, introspective mood.
Timing:
  1–3s: Calm expression, eyes cast downward.
  4–6s: Slow head lift, eyes meet the camera.
  7–10s: Faintest smile emerges, camera subtly pushes in.

Frequently asked questions

How do I prevent all characters from doing the same action in a multi-person scene?

Declare one absolute subject with a specific action, then assign different parallel actions to other groups. Instead of 'a crowd walking,' write 'a woman in a red dress walks left (subject); other pedestrians browse flowers at a roadside shop.' The model uses the subject as its primary rendering target and distributes remaining attention to background actors, naturally creating variation. Add second-level timing so different characters act in different time windows.

What is the difference between this formula and the Shot Design workflow?

The Shot Design workflow is a five-step production process (requirement analysis → visual diagnosis → six-element assembly → validation → delivery) optimized for cinema-grade professional output. The Universal Prompt Formula is the conceptual framework underlying the prompt itself — the six dimensions of subject, action, boundaries, camera, lighting, and timing. Think of the formula as the 'what goes into the prompt' and Shot Design as 'the workflow around building and validating that prompt.' They are complementary: the formula provides the structural logic, and Shot Design provides the production discipline.

How do I stop camera drift in my AI videos?

Replace vague descriptions ('camera moves closer,' 'cinematic shot') with precise following logic. A follow shot must specify: the camera matches the subject's speed, focal length is fixed, and the subject stays centered. A push-in must specify start framing (e.g., full body) and end framing (e.g., face close-up) while the subject maintains their ongoing action. Every camera instruction must reference the subject — a camera described without a subject anchor is the primary cause of drift.

Can I use second-by-second timing for videos longer than 10 seconds?

Yes, but for videos longer than 15 seconds, combine second-level timing with multi-segment storyboarding. Split the total duration into segments of up to 15 seconds each, apply the six-dimension formula independently to each segment, and ensure handoff continuity by ending each segment on a composable resting state (freeze, slow fade, or maintained action). The timing dimension scales to any length when paired with segment boundaries.

Does the formula work for close-up and macro shots?

Yes. Scale every dimension inward: the absolute subject becomes a body part (face, hands, eyes); the core action becomes micro-expressions (slow blink, faint smile); scene boundaries require extreme background weakening (full bokeh, single-color, static); camera uses static close-up or very slow push-in; lighting describes how light falls on specific features; timing controls the expression arc second by second. The same logical hierarchy applies — just at a smaller physical scale.

Related guides