Seedance2

Guide

Seedance 2.0 Shot Design Workflow

This guide walks through the complete shot design workflow used by professional creators to produce cinema-grade Seedance 2.0 video prompts. The process covers five steps — requirement analysis, visual diagnosis, six-element precision assembly, validation, and delivery — and integrates techniques like three-layer lighting, director style presets, timestamp storyboarding, and smart multi-segment splitting. Adapted from the open-source woodfantasy/Seedance2.0-ShotDesign-Skills project (MIT-0 license).

Last updated: Last verified:

Source basis and reading boundary

These guides are written as third-party reference summaries, not official product documentation or support content.

Source basis

The 5-step shot design workflow overview

The shot design workflow follows five sequential steps: (1) Requirement Analysis — confirm duration, aspect ratio, generation mode, style direction, and any reference material; (2) Visual Diagnosis — select a director style preset, plan the storyboard structure, and decide whether a single segment or multi-segment approach is needed; (3) Six-Element Precision Assembly — build the complete prompt using the six structural elements (subject, action, scene, style/lighting, camera/focal length, sound); (4) Validation — review the assembled prompt against a quality checklist, checking for filler words, missing elements, and potential content filter triggers; (5) Delivery — output the final prompt (or multi-segment prompt set) in the correct format for the target platform. Each step has specific decision points and can loop back to earlier steps when the validation check reveals issues.

Step 1 — Requirement analysis: duration, aspect ratio, mode, and references

Before writing any prompt, lock down the production parameters. Duration determines whether you need a single segment (≤15 seconds) or multi-segment storyboard (>15 seconds). Aspect ratio (16:9, 9:16, 1:1, or 4:3) affects framing decisions and camera movement options. Generation mode — text-to-video, image-to-video, or video-to-video — changes which elements the prompt must describe versus which ones the reference files handle. Gather any reference material early: character sheets, style frames, camera motion clips, or audio tracks. Write down the creative brief in one sentence before proceeding: 'A 30-second Xianxia fantasy clip in 16:9 with sword combat and ink-wash transitions.' This anchors every subsequent decision.

Step 2 — Visual diagnosis: director style and storyboard planning

Choose a director style preset that matches the creative brief. The preset acts as a visual anchor that keeps all downstream decisions consistent. For epic sci-fi, you might select Villeneuve (vast desolation, geometric framing, desaturated palette). For an emotional character piece, Wong Kar-wai (step-printed slow motion, saturated color, handheld intimacy). For Eastern fantasy, the Xianxia preset (aerial temples, ink-wash transitions, ethereal audio). You can combine presets — 'Villeneuve composition with Wong Kar-wai color' — for hybrid identities. Next, sketch the storyboard: how many shots, what happens in each, where the emotional peak lands, and how segments hand off to each other. For clips under 13 seconds, a single shot description suffices. For 13–15 seconds, use timestamped storyboard format. For anything longer, plan multi-segment splits.

Step 3 — Six-element precision assembly: the formula

Every professional prompt should cover six elements in order: (1) Subject and Appearance Details — define the character or object, including physical build, costume, key visual identifiers, and emotional state; (2) Action and Physics Continuity — describe what happens with physically plausible motion and continuity constraints; (3) Scene Environment — set the location, time of day, weather, spatial depth, and atmospheric conditions; (4) Visual Style and Physical Lighting — apply the director style preset and build the three-layer lighting structure (source, behavior, tone); (5) Physical Focal Length and Camera Movement — specify the lens (24mm wide, 85mm portrait, 200mm telephoto) and the camera motion (dolly tracking shot, crane shot, slow orbit, handheld); (6) Native Sound Effects — describe material-specific sounds and spatial acoustic modifiers. This formula extends the SCELA framework by promoting sound from an optional afterthought to a first-class structural element.

Three-layer lighting structure in practice

Lighting is one of the most impactful elements in video generation, but it is frequently described too vaguely. The three-layer structure forces precision. Source Layer: name the actual light sources ('two warm practicals flanking the doorway plus cold moonlight from the window'). Behavior Layer: describe how light interacts with materials and atmosphere ('volumetric fog catching the moonlight beams, soft specular highlights on wet floor tiles, shadow edge diffusion on fabric'). Color Tone Layer: define the palette ('cool blue dominant with warm amber accents in the foreground, teal shadow tones'). A concrete example for a Xianxia scene: 'Source — golden sunset backlighting through temple columns plus floating ethereal orbs; Behavior — dust motes diffusing the backlight, silk robe catching soft specular highlights; Tone — warm amber base transitioning to cool jade at the edges.' This level of specificity produces dramatically more consistent lighting than 'cinematic lighting' or 'dramatic atmosphere.'

Timestamp storyboarding technique

For clips between 13 and 15 seconds, timestamp storyboarding gives the model explicit timing cues for each beat. The format opens with a style overview, then assigns time ranges to visual and camera instructions: '[Style overview]. 0–3s: [establishing shot, wide angle, camera pulls back to reveal environment]. 3–8s: [subject enters frame, medium shot, slow dolly forward]. 8–12s: [close-up on key action, shallow depth of field, subtle camera shake]. 12–15s: [resolution beat, wide shot, camera rises slowly]. Lighting: [three-layer description]. SFX: [material-specific sound description]. Negative: any text, subtitles, logos or watermarks.' Each time block should contain one clear action and one camera setup. Avoid packing multiple actions into a single 3-second window — the model handles one deliberate movement per beat more reliably than rapid multi-action sequences.

Smart multi-segment storyboard for videos longer than 15 seconds

When the target duration exceeds 15 seconds, split the video into self-contained segments using the formula: number of segments = ⌈total_duration / 15⌉, with the final segment being at least 8 seconds long. A 40-second video becomes three segments (15s + 15s + 10s), not four with a 10-second tail. Each segment is a complete, independent prompt with timestamps starting from 0, but all segments share: a unified style preamble (director preset and visual identity), consistent three-layer lighting (which may evolve narratively but stays within the established palette), stable handoff frames at segment boundaries (the last 1–2 seconds should land on a composable resting state — freeze, slow push, or fade), and a shared forbidden-items declaration. Distribute the narrative arc across segments: Segment 1 is the opening (establish world, introduce subject), middle segments develop the story (action, conflict, escalation), and the final segment delivers the climax or resolution.

Quality anchors — what to avoid and what to use instead

The most common prompt weakness is relying on abstract filler words that sound impressive but give the model no visual target. Words to eliminate: 'masterpiece,' '4K,' '8K,' 'ultra HD,' 'ultra-clear,' 'best quality,' 'amazing,' 'stunning.' These are quality claims, not quality descriptions. Replace them with physical material anchors: instead of 'ultra-clear,' write 'Kodak 5219 film stock warmth with fine grain visible on skin tones'; instead of 'high quality textures,' write 'brushed aluminum with micro-scratches catching rim light, wet concrete reflecting neon, aged leather with cracked patina.' Include organic imperfections that signal realism: lens dust, subtle focus breathing, micro camera shake, film gate weave. A prompt filled with specific material descriptions consistently outperforms one padded with superlatives.

Copyright-safe IP fallback strategy

When a creative brief references recognizable intellectual property, use the three-tier fallback to avoid content filter rejections. Level 1 — Name Replacement: swap the IP name for an original descriptive nickname that captures the archetype ('Iron Man' becomes 'Alloy Sentinel,' 'Spider-Man' becomes 'Web-Strand Acrobat'). Level 2 — Feature Modification: replace iconic visual traits — signature color schemes, costume silhouettes, weapon designs, logo elements — with original alternatives that preserve the narrative function. Level 3 — Category Abstraction: remove all visual and nominal connections to the original IP, keeping only the role ('a powered-armor hero' instead of any Iron Man derivative). Always add explicit forbidden items for words that might trigger the filter. Start at Level 1 for each character; if the platform still flags the prompt, escalate to Level 2, then Level 3. Document which level worked for each IP concept so the team can reuse the mapping.

Sound design vocabulary for video prompts

Seedance 2.0's native audio generation responds to physics-based sound descriptions far better than generic music terms. Structure sound descriptions by category: ambient sounds set the environment (wind whistling through temple corridors, rain pattering on a metal roof, distant thunder rolling across mountains), action sounds punctuate events (blade cutting through air, boots crunching on gravel, glass shattering in slow motion), vocal elements add human presence (whispered dialogue with breath condensation, crowd murmur fading to silence), and material-based onomatopoeia adds texture (silk rustling against stone, metal scraping on metal, wood creaking under weight, water dripping onto slate). Add spatial acoustic modifiers to describe the sound environment: cathedral reverb for large enclosed spaces, tight room dampening for intimate scenes, outdoor open-air for nature sequences, underground echo for cave or tunnel settings. The combination of material-specific sounds and spatial modifiers produces immersive audio that reinforces the visual scene.

Examples & sources

Eastern Xianxia fantasy clip — six-element assembly

A complete prompt built using the six-element formula and three-layer lighting for an Eastern fantasy scene. Demonstrates director preset selection (Xianxia), timestamp storyboarding, and physics-based sound design.

Xianxia ink-wash epic style. A young female sword cultivator in flowing white silk hanfu with jade hair ornaments, standing atop a mist-shrouded mountain peak at dawn. She unsheathes a glowing jade sword in a slow, deliberate arc, silk sleeves trailing with physically accurate fabric draping. Ancient temple ruins visible through layered clouds, pine trees on distant ridges, floating spirit particles. Lighting: golden sunrise backlighting through mist (source), volumetric fog diffusing light into soft halos around the figure, jade sword emitting cool inner glow with specular bloom (behavior), warm amber sunrise base with cool jade accent tones (tone). 50mm lens, slow crane shot rising from ground level to eye height, gentle parallax on background mountains. SFX: wind rushing across mountain peak, silk fabric fluttering, metallic ring of sword being drawn, distant temple bells. Negative: any text, subtitles, logos or watermarks.

Wasteland Mecha Awakening — multi-segment storyboard

A two-segment prompt set for a 25-second wasteland mecha sequence. Shows multi-segment splitting, handoff frames, unified style preamble, and three-layer lighting that evolves across segments.

Segment 1 of 2 (0–15s) — Villeneuve-style sci-fi. Style: vast desolation, geometric framing, desaturated palette with selective warm accents. 0–4s: Wide establishing shot of an endless rust-colored desert, half-buried mecha torso visible in mid-ground, dust storm approaching from horizon, camera static. 4–10s: Medium shot, the mecha's chest core begins pulsing with ice-blue energy, sand vibrating and lifting around the joints, camera slow dolly forward. 10–15s: Close-up on the mecha's face plate, one eye igniting with blue light, camera holds steady as cracks of energy spread across the corroded surface. Lighting: setting sun as backlight through dust haze (source), sand particles catching orange rim light, blue core energy casting specular highlights on corroded metal (behavior), desaturated warm amber base with ice-blue accent pushing into shadows (tone). SFX: howling desert wind, deep metallic groaning of awakening machinery, low-frequency hum building from the core. Negative: any text, subtitles, logos or watermarks.

Segment 2 of 2 (0–10s) — Continuation. Style: same as Segment 1. 0–3s: Medium-wide shot, mecha torso rising from sand, camera tracking upward, debris cascading off shoulders. 3–7s: Full-body reveal as mecha stands to full height, camera crane shot pulling back to show scale against desert, dust cloud mushrooming outward. 7–10s: Low-angle hero shot, mecha fully upright silhouetted against dust-filtered sunset, core pulsing steadily, camera slow push-in to chest. Lighting: sunset now lower on horizon, casting longer shadows (source), volumetric dust cloud diffusing all light into soft orange wash, blue core now dominant light source (behavior), warm-to-cool gradient from ground to sky (tone). SFX: thunderous impact of mecha feet on sand, hydraulic servo whine, dust cloud rushing outward, wind buffeting microphone.

Game PV trailer — cel-shaded action with director hybrid

A game PV trailer prompt combining anime cel-shaded style with cinematic camera language. Demonstrates quality anchors (no filler words), camera term disambiguation, and sound design vocabulary.

Cel-shaded CG anime style with Kurosawa compositional staging. A battle-scarred mecha pilot in a torn flight suit sprints across a collapsing steel bridge, explosions erupting behind her. Hard-edge shadows with toon shading, speed lines on fast motion, anime-style impact frames on explosions. The bridge fragments fall in slow motion with debris physics. Environment: industrial dystopia, massive gear towers in background, orange sky choked with smoke. Lighting: multiple explosion sources casting harsh directional orange light (source), hard shadow edges with anime cel-shading, sparks creating brief specular flashes on metal surfaces (behavior), saturated orange and black dominant palette with cool steel-blue on the pilot's armor (tone). 24mm wide-angle lens, dolly tracking shot following the pilot laterally, slight Dutch angle tilt increasing as the bridge collapses. SFX: steel groaning and snapping, explosion concussions with debris scatter, boots hammering on metal grating, wind whipping past at speed. Negative: any text, subtitles, logos or watermarks.

Frequently asked questions

What is the difference between SCELA and six-element assembly?

SCELA covers five dimensions: Subject, Context, Effect, Lighting, and Action. Six-element assembly extends this by adding Native Sound Effects as a sixth structural element rather than treating audio as optional. In practice, SCELA is a good starting framework for simpler prompts, while six-element assembly is the production-grade version used when you need full control over every generation dimension including synchronized audio.

How do I choose a director style preset?

Match the preset to the emotional tone and visual language of your project. For epic sci-fi with vast environments, start with Villeneuve. For emotional character-driven pieces with saturated color, try Wong Kar-wai. For Eastern fantasy, use the Xianxia preset. For compositional power with weather as a narrative element, choose Kurosawa. You can also combine presets — for example, 'Nolan temporal structure with Deakins natural lighting' — to create hybrid identities. The woodfantasy/ShotDesign-Skills project catalogs 28+ presets across Hollywood, Asian cinema, genre, and commercial categories.

What are quality anchors and why should I avoid filler words?

Quality anchors are specific physical material descriptions — 'Kodak 5219 film stock warmth,' 'brushed aluminum with micro-scratches,' 'wet concrete reflecting neon' — that give the model concrete visual targets. Filler words like 'masterpiece,' '4K,' 'ultra HD,' and 'best quality' are abstract claims the model cannot render. They consume prompt space without improving output. Replace every filler word with a material description, film stock reference, or organic imperfection (lens dust, focus breathing, micro camera shake) for consistently better results.

How does multi-segment storyboarding work for videos longer than 15 seconds?

Split the total duration using ⌈total_duration / 15⌉ segments, ensuring the final segment is at least 8 seconds. Each segment gets an independent prompt with timestamps starting from 0, but all segments share a unified style preamble, consistent three-layer lighting, stable handoff frames at boundaries, and a common forbidden-items list. Distribute narrative arc across segments: opening → development → climax → resolution. The model generates each segment independently, so handoff frame design (ending on a composable resting state like a freeze or slow fade) is critical for smooth editing.

How do I handle copyrighted characters in prompts?

Use the three-tier copyright-safe IP fallback. Level 1: replace the name with a descriptive nickname (Spider-Man → Web-Strand Acrobat). Level 2: modify iconic visual traits like signature colors and costume silhouettes. Level 3: fully abstract the concept to just the narrative role. Start at Level 1 and escalate only if the platform content filter still flags the prompt. Always add the original IP terms to your forbidden-items list to avoid accidental inclusion.

Related guides