Fish-eye horse BGM (multi-video)
Fixed shot, central fisheye through circular aperture looking down, reference @video1 fisheye, horse in @video2 looks at fisheye, reference @video1 speaking motion, BGM reference @video3 audio.
More accurate voice and realistic sound output.
Last updated:
If a video still needs BGM, ambience, or lip-synced dialogue, the model can generate picture and sound together so those audio choices can be reviewed in the same pass.
These pages are written as third-party reference summaries rather than official product documentation.
Capability descriptions summarize public Seedance 2.0 launch materials, public project pages, and other publicly accessible explanatory write-ups.
This site does not represent Seedance, official product support, or any authorized partnership unless a page explicitly states that with documented basis.
Platform access, supported features, pricing, UI, and availability can change. Use official or primary sources for current information.

Generate voice, ambience, and music together with the video output. How it works: instead of generating silent video and adding audio in post, the model produces picture and sound in the same pass. It reads the visual context — character lip movements, environment type, action intensity — and generates matching voice, ambience, sound effects, or background music. Text prompts can guide the audio style ('upbeat electronic BGM', 'soft ambient forest sounds', 'female voiceover in English'). When to use this: ad production where every variant needs localized voiceover; social-media shorts where BGM and timing matter but manual syncing is too slow; prototyping scenes where you want to evaluate picture-plus-sound together before investing in professional audio; multilingual content where the same video needs voiceovers in different languages. Tips and practical notes: for best lip-sync results, keep character faces clearly visible and unobstructed. Specify the language and tone of voice in your prompt — 'calm male narrator in Japanese' gives better results than just 'add voice.' When combining native audio with music sync, the model can handle BGM beat alignment and dialogue simultaneously. Review audio in the first pass to catch timing issues early rather than generating many variants before checking.
Needed to produce 1000+ customized ads for different regional markets, each requiring background music and voiceover; traditional production cycle was 7 days per ad
Used native audio generation to automatically match suitable background music and voiceovers, supporting rapid generation of multi-language versions
Public campaign recaps cite production time dropping from 7 days to 30 minutes, per-ad cost moving from about CNY 50,000 to CNY 200, and Double 11 sales growing 40% year over year.
Reading note:Picture and sound were generated together, which helped the team review multilingual ad variants more quickly.
Illustrative cases on this site are compiled from public campaign recaps and secondary reporting available at the time of writing.
Metrics reflect the reported campaign period and should not be treated as current performance benchmarks.
Brand names and figures are cited for explanatory use only, not as endorsements, guarantees, or independently audited results.

Voice, sound effects, music generation, voice reference.
Fixed shot, central fisheye through circular aperture looking down, reference @video1 fisheye, horse in @video2 looks at fisheye, reference @video1 speaking motion, BGM reference @video3 audio.
From provided office building photos, generate 15s cinematic documentary, 2.35:1 widescreen, 24fps, refined visuals, voice-over tone reference @video1...
Reference images

Reference images 1: Office building documentary VO

Reference images 2: Office building documentary VO

Reference images 3: Office building documentary VO
Reference video

Reference video 1: Office building documentary VO
Cat and dog talk show segment, emotionally rich, stand-up comedy style...
Yu opera 'Executing Chen Shimei' accompaniment, black-robed Bao Zheng points at red-robed Chen, sings fiercely. Chen's eyes dart, dan role: Wait!
Generate 15s MV. Steady composition, light push-pull, low-angle hero shot, ultra-wide establishing, cliff road and vintage camper, sea horizon, sunset backlight volumetric, cinematic framing.
Girl in hat at center sings gently I'm so proud of my family! turns to hug Black girl. Latin music, skirts sway, colorful street dances.
Fixed shot. Captain in Spanish: Raid in three minutes! Blonde checks weapons, green-haired holds tactical light. Black teammate: Flanking? Captain: Same as always, keep one for interrogation.
0-3s: Fixed shot, girl from @image1 asleep in bed. 3-10s: Quick pan to man's face close-up (@image2), man helplessly wakes her, tone and voice reference @video1.
Monkey from @image1 walks to bubble tea counter, @image2 Bichon server wipes tools, monkey orders in Sichuan dialect: Hey, got Farewell My Concubine?
Educational style and tone, enact content from @image1: Monkey King crosses Flame Mountain to borrow fan from Princess Iron Fan, she seeks revenge for Red Boy, he pleads in vain, they quarrel.
Yes. Seedance 2.0 can generate voice, ambience, and music that match the video, with lip-sync and timing handled in the same pass, which can reduce separate audio post work.
Yes. Native audio generation supports multi-language voiceovers, which can help teams prepare localized versions for different regional markets.
Yes. Use text prompts to specify audio style — for example 'upbeat electronic BGM,' 'soft ambient forest sounds,' or 'female voiceover in English.' The model reads both your text guidance and the visual context to generate matching audio.
The model analyzes character lip movements visible in the generated video and matches the generated voice timing accordingly. For best results, keep character faces clearly visible and specify the language and tone in your prompt.
Related guides
These guides add workflow, prompt, and use-case context around this capability so the page connects into the broader Seedance topic cluster.
Guide
Current public overview of Seedance 2.0 by ByteDance: official website, February 12 2026 release date, Dreamina access, Doubao/豆包 connection, hardware requirements, multimodal inputs, 2K / 15-second outputs, global availability, and what still depends on platform.
Open guideGuide
Seedance 2.0 Omni-Reference multimodal input: up to 9 images, 3 videos, 3 audio + text. @ tag system for referencing assets. Native audio-video joint generation.
Open guideGuide
Seedance 2.0 use cases: e-commerce ads, TVC, product demos, film previz, MV, education, real estate, and short narrative. Based on official blog and third-party case studies.
Open guideGuide
Honest workflow notes when a longer promo is built from several Seedance 2.0 generations: unified references, the per-clip duration cap, audio continuity, and dialogue pacing.
Open guideGuide
Master the 5-step shot design workflow for Seedance 2.0: from requirement analysis through visual diagnosis, six-element assembly, validation, to professional delivery. Includes 28+ director presets, three-layer lighting, and multi-segment storyboarding.
Open guideGuide
Vertical aspect ratios, hook-first prompting, and audio loudness considerations for algorithmic feeds — third-party workflow notes.
Open guide