Audio-driven video generation with voice and sound direction

Seedance 2.0 Sound Design & Dialogue Prompts

Seedance 2.0 supports audio-aware generation where @audio references and sound-design prompts influence visual output. This page covers dialogue lip-sync techniques, voice tone direction, ambient sound design, and how to write prompts that synchronize visual action with audio cues for more cohesive results.

Estado de actualizacion diaria: Proximamente

Ultima actualizacion: 2026-03-26

Estado actual

Las plantillas, los ejemplos y las futuras pruebas multimedia deben vivir aqui, no dentro de guias tutoriales generales.

Pruebas multimedia

Prueba de imagen: proximamente

Prueba de video: proximamente

Dialogue lip-sync fundamentals

To generate a speaking character with accurate lip movement, attach an @audio reference containing the dialogue track and describe the character's speaking manner in the prompt. Specify mouth movement intensity, emotional tone, and head gestures. The model uses the audio timing to drive lip synchronization and facial micro-expressions.

Directing voice tone through prompts

Even without an @audio reference, you can influence the implied voice character by describing vocal qualities in the prompt: 'whispering softly,' 'shouting with urgency,' or 'speaking calmly with measured pauses.' These descriptions affect facial expressions, body language, and mouth movement patterns in the generated video.

Ambient sound design cues

Describe the sound environment in your prompt to create visuals that feel acoustically coherent. 'Quiet library with occasional page turns' produces different visual atmosphere than 'bustling market with shouting vendors.' Sound-design cues guide the model toward appropriate crowd density, environmental motion, and atmospheric effects.

Synchronizing visual action with audio beats

For music videos or rhythmic content, use @audio references to drive visual timing. Describe which visual events should align with audio beats: 'character turns on the drum hit, camera cuts on the bass drop.' This creates tight audio-visual synchronization that feels intentionally choreographed.

Multi-character dialogue scenes

For conversations between two or more characters, structure your prompt as a sequence of speaking turns. Identify which character speaks when, their emotional state during each line, and the listening character's reactions. Attach separate @image references for each character to maintain identity, and one @audio reference for the full dialogue track.

Ejemplos de entrada / salida

Character monologue with audio reference

Generates a close-up speaking shot synchronized to a supplied dialogue audio track.

@audio[monologue-track.wav] @image[character-anchor.png] Close-up of the character from reference, speaking directly to camera, emotional monologue delivery matching audio timing, subtle brow movements and eye glistening on emotional beats, warm studio lighting from above-left, shallow depth of field, natural lip synchronization with reference audio, gentle head tilts between phrases.

A close-up monologue shot with accurate lip-sync to the audio track, natural facial micro-expressions, and consistent character identity from the reference image.

Evidence: native-audio capability examples

Ambient sound-driven scene

Creates a cafe scene where visual activity matches the implied sound environment.

Busy Parisian sidewalk cafe at golden hour, ambient sound environment of clinking cups, muted French conversation, and occasional distant accordion music, patrons gesturing animatedly at small tables, waiter weaving between tables carrying a tray, steam rising from espresso cups, gentle handheld camera movement, warm cinematic color palette, natural crowd density matching a lively cafe atmosphere.

A lively cafe scene where character animations, crowd density, and atmospheric details feel acoustically coherent with the described sound environment.

Two-character dialogue exchange

A shot-reverse-shot conversation between two characters with distinct speaking styles.

@audio[dialogue-exchange.wav] @image[character-a.png] @image[character-b.png] Two characters seated across a table, Character A speaks first with confident gestures and forward lean, Character B listens intently then responds with a gentle smile and slower cadence, alternate focus between speakers matching audio dialogue turns, consistent warm interior lighting, medium shot framing, natural reaction shots of the listener during each speaking turn.

A natural dialogue sequence with accurate lip-sync for both characters, appropriate reaction shots, and speaking styles matching the audio track's rhythm and tone.

Preguntas frecuentes

Does Seedance 2.0 generate audio output or only sync to audio input?

The primary workflow is syncing visual output to audio input. Attach your audio track as an @audio reference and the model generates visuals that align with it. For projects needing generated audio, use a dedicated audio AI tool and then feed its output into Seedance as a reference.

How accurate is lip-sync with @audio references?

Lip-sync accuracy depends on audio clarity and prompt specificity. Clear, single-speaker audio with moderate pace produces the best results. Add 'precise lip synchronization' and describe mouth movement intensity to improve accuracy. Fast-paced or overlapping speech is harder to sync reliably.

Can I use sound design cues without an actual audio file?

Yes. Describing the sound environment in text alone influences visual output: crowd noise leads to busier scenes, silence leads to stillness. This text-only approach works well for establishing atmosphere even when you plan to add audio in post-production.

Guias relacionadas

Plantillas de prompt relacionadas

Explorar mas plantillas de prompt