Video editors can add and delete words or even completely rearrange the sentences by dragging and dropping them as needed, resulting in a video that is almost indistinguishable from a non-edited one.
A team of researchers from Stanford University, Princeton University, and the Max Plank Institute for Informatics teamed up with Adobe to create a type of video editing software that is able to modify talking-head videos.
The app uses new transcripts to extract speech motions from various video pieces and, using AI, converts those into a final lip-synched video that appears natural to viewers.
If an actor or performer makes a mistake, the editor can simply modify the transcript and the app will find the right word from various words or portions of words spoken elsewhere in the video and assemble them. The algorithm only requires 40 minutes of original recordings.
“Visually, it’s seamless. There’s no need to rerecord anything,” said Ohad Fried, a postdoctoral scholar at Stanford first author of a paper about the research published on the pre-publication website arXiv. It will also be in the journal ACM Transactions on Graphics.
The software could be beneficial for video editors and producers but raises concerns about the validity of images and videos on the web, according to the authors. Their solution is to alert viewers and performers that the videos have been manipulated.
“Unfortunately, technologies like this will always attract bad actors, but the struggle is worth it given the many creative video editing and content creation applications this enables,” said Fried.
According to Fried, the risks are worth taking. Photo-editing software was also criticized when it first came out, but after all, everyone wants to live in a world where photo-editing software is available.
It’s up to us to learn to be more skeptical and cautious about what we read, hear or see.