Computer scientists demonstrated for the first time at the WACV 2021 conference, which was held online from January 5 to 9, 2021, that systems intended to detect deepfakes videos which use artificial intelligence to modify real-life footage can be fooled.
Researchers demonstrated how hostile examples can be introduced into every video frame to trick detectors. The hostile examples are inputs that have been subtly altered, leading artificial intelligence systems like machine learning models to err. In addition, the team showed that the attack still works after videos are compressed.
“Our work shows that attacks on deepfake detectors could be a real-world threat,” said Shehzeen Hussain, a UC San Diego computer engineering Ph.D. student and first co-author on the WACV paper. “More alarmingly, we demonstrate that it’s possible to craft robust adversarial deepfakes in even when an adversary may not be aware of the inner workings of the machine learning model used by the detector.”
Deepfakes include altering a subject’s face to produce impressively realistic film of hypothetical occurrences. Because of this, common deepfake detectors concentrate on the face in videos, tracking it first and then sending the cropped facial data to a neural network to assess if it is real or fake.
For example, eye blinking is not reproduced well in deepfakes, therefore detectors focus on eye movements as one technique to make that conclusion. Modern Deepfake detectors use machine learning models to distinguish between real and fake videos.
The researchers note that the widespread dissemination of fraudulent movies on social media platforms has caused serious worries in many countries, particularly undermining the trust of digital media.
“If the attackers have some knowledge of the detection system, they can design inputs to target the blind spots of the detector and bypass it,” said Paarth Neekhara, the paper’s other first coauthor and a UC San Diego computer science student.
To use these deepfake detectors in practice, we argue that it is essential to evaluate them against an adaptive adversary who is aware of these defenses and is intentionally trying to foil these defenses. We show that the current state of the art methods for deepfake detection can be easily bypassed if the adversary has complete or even partial knowledge of the detector.
The Researchers
For each face in a video frame, researchers constructed an adversarial example. Contrary to popular belief, adversarial instances are resistant to ordinary operations like video compression and resizing because they are designed to do so. The attack algorithm does this by guessing how the model ranks photos as real or false over a set of input manipulations. The adversarial image is then transformed using this estimation in a way that ensures its effectiveness during both compression and decompression.
The modified version of the face is then inserted in all the video frames. To produce a deepfake video, the procedure is then repeated for each frame in the video. Detectors that work on complete video frames rather than just face crops can also be subjected to the attack.
The team declined to release their code so it wouldn’t be used by hostile parties.
High success rate
Researchers tested two different attack scenarios: one in which the attackers had full access to the detector model, including the face extraction pipeline and the architecture and classification model parameters, and the other in which the attackers could only query the machine-learning model to determine the likelihood that a frame was real or fake.
In the first scenario, the attack’s success rate is above 99 percent for uncompressed videos. For compressed videos, it was 84.96 percent. In the second case, the success rate was 86.43 percent for uncompressed and 78.33 percent for compressed videos. This is the first piece of work to show how to successfully challenge modern deepfake detectors.
“To use these deepfake detectors in practice, we argue that it is essential to evaluate them against an adaptive adversary who is aware of these defenses and is intentionally trying to foil these defenses,” the researchers write. “We show that the current state of the art methods for deepfake detection can be easily bypassed if the adversary has complete or even partial knowledge of the detector.”
Researchers advise using a strategy known as adversarial training to improve detectors. During training, an adaptive adversary keeps producing new deepfakes that can get past the most advanced detectors, and the detector keeps becoming better so that it can catch the new deepfakes.