RECENT UPDATE RUINED BILINGUAL DIALOGUE/STORY VIDEOS!
If you plan to use Mootion's bilingual video templates (such as bilingual stories and bilingual dialogues), look elsewhere. As another reviewer mentioned, originally, image-to-video provided the option to convert static images into videos WITH AUDIO when used in bilingual dialogues. However, it took two video generation attempts to allow the audio to be heard, which appeared to be a bug. In response to this issue, the developer of the platform decided to permanently remove the audio option for image-to-video generation without any way to include it in bilingual dialogues or bilingual stories. This was done to prevent having a separate overlapping bilingual voice-over track (which is not in sync with the video) that is automatically added by the workflow later in Step 3. However, that has made the end result far worse, as bilingual dialogues now look like silent videos with a separate voiceover, which makes it confusing to watch.
This leads to:
1. Broken Dialogue & Lip-Syncing
For bilingual stories and dialogue formats, stripping the audio from the image-to-video generation breaks character interaction. Because the dialogue subtitle text is processed separately in Step 3 with separate voiceover rather than being integrated into the character generation, it fails to sync with mouth movements. It also sounds like a detached voiceover that often does not match the expression of the characters. Furthermore, when there are two characters appearing together in a scene with both characters moving their mouths in the video, it becomes confusing to tell who is actually speaking when the separate Step 3 voiceover is heard (since it's not in sync), which ultimately distracts from the bilingual learning process.
2. Subtitle Syncing is Compromised
Because voice-over audio is generated separately in Step 3, the subtitles only appear with that audio, but are usually not in sync with the mouths of the actual characters in the video clip. For this reason, subtitles need an option to stay visible for the entire duration of the video or automatically be generated as part of the video to match the visual pacing of the scene when characters begin to communicate.
3. Native Audio is Needed Beyond Dialogue
Forcing the audio off in the original video generation also damages non-dialogue aspects of these bilingual templates. These are still needed for:
• Human Sounds: Singing, crying, laughing, coughing, sneezing, crowd noises, etc.
• Ambient Sounds (SFX): Nature and action scenes (rain, thunder, wind, traffic noise, etc.)
• Non-Human Sounds: Animal sounds, alarms, mechanical sounds, etc.
Without this type of audio generated as part of bilingual videos, the story or dialogue has to then rely more on narration to make it clear what is happening, since it can't be heard. This limits the quality of the videos and makes the whole bilingual learning experience very unnatural.
4. Terrible Scene Consistency
As other reviewers have already mentioned, characters and settings often do not appear consistent from scene to scene, which makes no sense in a dialogue or storytelling video. A character from one scene should not look like a completely different person in the next scene with different clothing and in a different location. This forces you to have to waste credits to address this on your own.
___________________________________
Request to address these issues:
• Add an option to toggle on audio for image-to-video generation.
• Add a Step 3 Toggle: Give users a manual option to turn off the audio narration/voice-over track in Step 3 to prevent overlap with similar dialogue audio in the video.
• Add option for Subtitles: Ensure that when the Step 3 voiceover is disabled, bilingual subtitles can still be set to appear for the entire duration of the clip (either for the target language, native language, or both on screen together if possible in multiple lines.)
• Improve character and setting consistency: Clothing, facial features, settings, and background objects should be consistent from scene to scene when generated. Add a dedicated character and asset locking mechanism to preserve the visual identity of actors, outfits, and backgrounds from shot to shot.
Until these adjustments are made to grant creators granular control over the audio workflow, these dialogue and storytelling templates remain practically unusable for high-quality content production for bilingual use. For this reason, I have requested a refund.
-------------------------------------
UPDATE:
In response to their reply to this review and their suggestion that I switch to their General Creation video option, this does not solve the problem, as I would be left with no bilingual subtitles.
Telling me to use a different template means losing the exact bilingual caption structure needed for bilingual dialogues. Generation Creation only offers subtitles in one language (instead of two), and they are also not in sync with the native audio generated with videos.
Lemon_Mootion
Jun 17, 2026Hi there, thank you for the feedback, and we've already send you the detailed explanation via email today. And thank you again for pinpointing the bugs related to the audio toggles.
Just to clarify the design logic behind the Bilingual Dialogue and Bilingual Story templates:
These templates are designed for a specific flow where the character speaks Language A first, followed by Language B. When the audio option is enabled, our current video models are not able to properly support this sequential bilingual speech structure within a single generation. Because of this limitation, these two templates are intentionally set to audio OFF by default to ensure the output matches the intended bilingual storytelling experience.
If your goal is to create dialogue-style videos with audio enabled using our current video models, we recommend using other templates or General Creation. These are more flexible and do support the audio feature.
Feel free to send your specific use cases or more questions via email at [email protected].