Audiovisual Translation: Voiceover vs. Subtitling

Updated: October 09, 2018

Audiovisual translation (AVT) is the translation of any piece of media that contains both visual and auditory components. It’s a broad definition that encompasses a comparably broad set of techniques. Translators have developed a vast area of methods over the years in response to different technologies and media types in order to bring complex messages from one language to another in multi-modal environments.

These techniques vary depending on the nature of the content and the needs and resources of the owner of the message. Here is a short description of some of the most common techniques in AVT.

Voiceover

Voiceover, or VO, comes in many forms, but the most basic is to remove the source audio track and replace it with a translated script of the original audio, read by a native speaker of the target language. This technique also employs what’s called phrase-syncing. That means that the translated script will be timed so that it is clear when one phrase, or distinct “chunk” of speech, ends and the next begins. This allows the voice actor to match the translated phrases with the speaker on screen.

This differs from a similar method called lip-syncing because, in lip-syncing, the translation and timing are adopted at a very granular level to nearly exactly match the movement of the speaker's lips on screen. Lip-sync can be very impressive, but it is also extremely resources intensive when compared to phrase-sync, which can be equally effective.

Instructional voice over

This is the method used for videos in which the speaker does not appear on-screen, but rather describes the action that occurs on-screen. It is great for all kinds of instructional videos including e-Learning and video screen-captures. The source language audio track is removed entirely and replaced by a translated script of the original read by a native speaker of the target language.

United Nations voice over

UN-style VO is not used exclusively by the UN and in fact, is very popular in news reporting. Similar techniques are even used in audio-only contexts. In this method, the first couple of seconds of the source audio tracks are played, but then the volume is lowered on the source track while a phrase-synced target language track begins to play. This allows the speaker’s real voice to be heard while also making the translation available in a way that’s not distracting or discordant. It simulates the experience of using a simultaneous interpreter, like the ones who serve at the UN.

Subtitling

Unlike VO, subtitling doesn’t involve any addition, removal, or alteration of the audio track and takes place entirely in the visual field. In subtitling, a time-coded translation of the audio track is displayed on-screen for the viewer to read. Because subtitling doesn’t require the use of an audio engineer or voice actor, it is much more cost-effective. That being said, it has its own set of drawbacks. For a more detailed account of the elements and process of subtitling, check our full post on the topic.

Since subtitles share the screen with all the other visual elements of the source video, it is very important to make sure they can be easily distinguished from the background. In some cases, a semi-opaque box is placed behind the subtitles to make them easier to make out. In others, a more subtle method can be effective, such as applying a shadow to the text.