MAD '22: Proceedings of the 1st International Workshop on Multimedia AI against Disinformation

Digital Library logo
Full Citation in the ACM Digital Library

SESSION: Keynote Talks

Towards Generalization in Deepfake Detection

Luisa Verdoliva

In recent years there have been astonishing advances in AI-based synthetic media generation. Thanks to deep learning-based approaches it is now possible to generate data with a high level of realism. While this opens up new opportunities for the entertainment industry, it simultaneously undermines the reliability of multimedia content and supports the spread of false or manipulated information on the Internet. This is especially true for human faces, allowing to easily create new identities or change only some specific attributes of a real face in a video, so-called deepfakes. In this context, it is important to develop automated tools to detect manipulated media in a reliable and timely manner. This talk will describe the most reliable deep learning-based approaches for detecting deepfakes, with a focus on those that enable domain generalization [1]. The results will be presented on challenging datasets [2,3] with reference to realistic scenarios, such as the dissemination of manipulated images and videos on social networks. Finally, new possible directions will be outlined.

Let the Chatbot Speak! Freedom of Expression and Synthetic Media.

Katja de Vries

ML-generated media (speech, image, etc.) can be used for many purposes such as robojournalism, therapeutic chatbots or the synthetic resuscitation of a dead actor or a deceased loved one. Can ML be a source of speech that is protected by the right to freedom of expression in Article 10 of the European Convention on Human Rights (ECHR)? I first discuss if ML-generated (or: "synthetic") media fall within the protective scope of freedom of expression (Article 10(1) ECHR). After concluding that this is the case, I look at specific complexities raised by ML-generated content in terms of limitations to freedom of expression (Article 10(2) ECHR). The first set of potential limitations that I explore are those following from copyright, data protection, privacy and confidentiality law. Some types of synthetic media could potentially circumvent these limitations. Second, I study how new types of content generated by ML can create normative grey areas where the boundaries of constitutionally protected and unprotected speech are not always easy to draw. In this context, I discuss two types of ML-generated content: virtual child pornography and fake news/disinformation. Third, I argue that the nuances of Article 10 ECHR are not easily captured in an automated filter and I discuss the potential implications of the arms race between automated filters and ML-generated content. In this context I discuss the newly adopted Digital Services Act (DSA) [1] and the recent (26 April 2022) judgement [2] of the ECJ (European Court of Justice) on automatic filtering of copyright-infringing content under Article 17 of the Directive on Copyright in the Digital Single Market (DSM Directive).

SESSION: Session 1: AI for Audio Analysis

On the Generalizability of Two-dimensional Convolutional Neural Networks for Fake Speech Detection

Christoforos Papastergiopoulos
Anastasios Vafeiadis
Ioannis Papadimitriou
Konstantinos Votis
Dimitrios Tzovaras

The powerful capabilities of modern text-to-speech methods to produce synthetic computer generated voice, can pose a problem in terms of discerning real from fake audio. In the present work, different pipelines were tested and the best in terms of inference time and audio quality was selected to expand on the real audio of the TIMIT dataset. This led to the creation of a new fake audio detection dataset based on the TIMIT corpus. A range of different audio representations (magnitude spectrogram and energies representations) were studied in terms of performance on both datasets, with the two-dimensional convolutional neural networks trained only on the Fake or Real (FoR) dataset. While there was not a single best representation in terms of performance on both datasets, the Mel spectrogram and Mel energies representations were found to be more robust overall. No performance difference in recognition accuracy was evident during validation, while the two-dimensional convolutional neural network model showed a tendency to under-perform on the test set of the FoR dataset and the synthesized one based on the TIMIT corpus, regardless of the representation used. This fact was corroborated by the data distribution analysis that is presented in the present work.

Spectral Denoising for Microphone Classification

Luca Cuccovillo
Antonio Giganti
Paolo Bestagini
Patrick Aichroth
Stefano Tubaro

In this paper, we propose the use of denoising for microphone classification, to enable its usage for several key application domains that involve noisy conditions. We describe the proposed analysis pipeline and the baseline algorithm for microphone classification, and discuss various denoising approaches which can be applied to it within the time or spectral domain; finally, we determine the best-performing denoising procedure, and evaluate the performance of the overall, integrated approach with several SNR levels of additive input noise. As a result, the proposed method achieves an average accuracy increase of about 25% on denoised content over the reference baseline.

SESSION: Session 2: AI for Text Analysis

Automatic and Manual Detection of Generated News: Case Study, Limitations and Challenges

Jérémie Bogaert
Marie-Catherine de Marneffe
Antonin Descampe
Francois-Xavier Standaert

In this paper, we study the exploitation of language generation models for disinformation purposes from two viewpoints. Quantitatively, we argue that language models hardly deal with domain adaptation (i.e., the ability to generate text on topics that are not part of a training database, as typically required for news). For this purpose, we show that both simple machine learning models and manual detection can spot machine-generated news in this practically-relevant context. Qualitatively, we put forward the differences between these automatic and manual detection processes, and their potential for a constructive interaction in order to limit the impact of automatic disinformation campaigns. We also discuss the consequences of these findings for the constructive use of natural language generation to produce news items.

Extractive-Boolean Question Answering for Scientific Fact Checking

Loïc Rakotoson
Charles Letaillieur
Sylvain Massip
Fréjus A. A. Laleye

With the explosive growth of scientific publications, making the synthesis of scientific knowledge and fact checking becomes an increasingly complex task.

In this paper, we propose a multi-task approach for verifying the scientific questions based on a joint reasoning from facts and evidence in research articles. We propose an intelligent combination of (1) an automatic information summarization and (2) a Boolean Question Answering which allows to generate an answer to a scientific question from only extracts obtained after summarization.

Thus on a given topic, our proposed approach conducts structured content modeling based on paper abstracts to answer a scientific question while highlighting texts from paper that discuss the topic. We based our final system on an end-to-end Extractive Question Answering (EQA) combined with a three outputs classification model to perform in-depth semantic understanding of a question to illustrate the aggregation of multiple responses. With our light and fast proposed architecture, we achieved an average error rate of 4% and a F1-score of 95.6%. Our results are supported via experiments with two QA models (BERT, RoBERTa) over 3 Million Open Access (OA) articles in the medical and health domains on Europe PMC.

How Did Europe's Press Cover Covid-19 Vaccination News? A Five-Country Analysis

David Alonso del Barrio
Daniel Gatica-Pérez

Understanding how high-quality newspapers present and discuss major news plays a role towards tackling disinformation, as it contributes to the characterization of the full ecosystem in which information circulates. In this paper, we present an analysis of how the European press treated the Covid-19 vaccination issue in 2020-2021. We first collected a dataset of over 50,000 online articles published by 19 newspapers from five European countries over 22 months. Then, we performed analyses on headlines and full articles with natural language processing tools, including named entity recognition, topic modeling, and sentiment analysis, to identify main actors, subtopics, and tone, and to compare trends across countries. The results show several consistencies across countries and subtopics (e.g. a prevalence of neutral tone and relatively more negative sentiment for non-neutral articles, with few exceptions like the case of vaccine brands), but also differences (e.g., distinctly high negative-to-positive ratios for the no-vax subtopic.) Overall, our work provides a point of comparison to other news sources on a topic where disinformation and misinformation have resulted in increased risks and negative outcomes for people's health.

Automatic Detection of Bot-generated Tweets

Julien Tourille
Babacar Sow
Adrian Popescu

Deep neural networks have the capacity to generate textual content which is increasingly difficult to distinguish from that produced by humans. Such content can be used in disinformation campaigns and its detrimental effects are amplified if it spreads on social networks. Here, we study the automatic detection of bot-generated Twitter messages. This task is difficult due to combination between the strong performance of recent deep language models and the limited length of tweets. In this study, we propose a challenging definition of the problem by making no assumption regarding the bot account, its network or the method used to generate the text. We devise two approaches for bot detection based on pretrained language models and create a new dataset of generated tweets to improve the performance of our classifier on recent text generation algorithms. The obtained results show that the generalization capabilities of the proposed classifier heavily depends on the dataset used to trained the model. Interestingly, the two automatic dataset augmentation proposed here show promising results. Their introduction leads to consistent performance gains compared to the use of the original dataset alone.

SESSION: Session 3: AI for Visual and Multimodal Analysis

Cross-Forgery Analysis of Vision Transformers and CNNs for Deepfake Image Detection

Davide Alessandro Coccomini
Roberto Caldelli
Fabrizio Falchi
Claudio Gennaro
Giuseppe Amato

Deepfake Generation Techniques are evolving at a rapid pace, making it possible to create realistic manipulated images and videos and endangering the serenity of modern society. The continual emergence of new and varied techniques brings with it a further problem to be faced, namely the ability of deepfake detection models to update themselves promptly in order to be able to identify manipulations carried out using even the most recent methods. This is an extremely complex problem to solve, as training a model requires large amounts of data, which are difficult to obtain if the deepfake generation method is too recent. Moreover, continuously retraining a network would be unfeasible. In this paper, we ask ourselves if, among the various deep learning techniques, there is one that is able to generalise the concept of deepfake to such an extent that it does not remain tied to one or more specific deepfake generation methods used in the training set. We compared a Vision Transformer with an EfficientNetV2 on a cross-forgery context based on the ForgeryNet dataset. From our experiments, It emerges that EfficientNetV2 has a greater tendency to specialize often obtaining better results on training methods while Vision Transformers exhibit a superior generalization ability that makes them more competent even on images generated with new methodologies.

The MeVer DeepFake Detection Service: Lessons Learnt from Developing and Deploying in the Wild

Spiros Baxevanakis
Giorgos Kordopatis-Zilos
Panagiotis Galopoulos
Lazaros Apostolidis
Killian Levacher
Ipek Baris Baris Schlicht
Denis Teyssou
Ioannis Kompatsiaris
Symeon Papadopoulos

Enabled by recent improvements in generation methodologies, DeepFakes have become mainstream due to their increasingly better visual quality, the increase in easy-to-use generation tools and the rapid dissemination through social media. This fact poses a severe threat to our societies with the potential to erode social cohesion and influence our democracies. To mitigate the threat, numerous DeepFake detection schemes have been introduced in the literature but very few provide a web service that can be used in the wild. In this paper, we introduce the MeVer DeepFake detection service, a web service detecting deep learning manipulations in images and video. We present the design and implementation of the proposed processing pipeline that involves a model ensemble scheme, and we endow the service with a model card for transparency. Experimental results show that our service performs robustly on the three benchmark datasets while being vulnerable to Adversarial Attacks. Finally, we outline our experience and lessons learned when deploying a research system into production in the hopes that it will be useful to other academic and industry teams.

Uncovering the Strength of Capsule Networks in Deepfake Detection

Dan-Cristian Stanciu
Bogdan Ionescu

Information is everywhere, and sometimes we have no idea if what we read, watch or listen is accurate, real or authentic. This paper focuses on detecting deep learning generated videos, or deepfakes - a phenomenon which is more and more present in today's society. While there are some very good methods of detecting deepfakes, there are two key elements that should always be considered, i.e., no method is perfect and deepfake generation techniques continue to evolve, sometimes even faster than detection methods. In our proposed architectures, we focus on a family of deep learning methods that is new, has several advantages over traditional Convolutional Neural Networks and has been underutilized in the fight against fake information, namely the Capsule Networks. We show that: (i) state-of-the-art Capsule Network architectures can be improved in the context of deepfake detection, (ii) they can be used to obtain accurate results using a very small number of parameters, and (iii) Capsule Networks are a viable option over deep convolutional models. Experimental validation is carried out on two publicly available datasets, namely FaceForensics++ and CelebDF, showing very promising results.

Fake News Detection Based on Multi-Modal Classifier Ensemble

Yi Shao
Jiande Sun
Tianlin Zhang
Ye Jiang
Jianhua Ma
Jing Li

With the advent of the era of big data, the ubiquity of multi-modal fake news has increasingly affected information dissemination and consumption. Measurements should be taken to identify multimodal fake news for improving the credibility of news. However, existing single-modal fake news detection models fail to detect fake news based on complete multi-modal information, while multimodal models are often difficult to fully utilize the original information of each single modality to obtain the ultimate accuracy. To tackle above problems, we propose a novel multi-modal fake news detection method, called fake news detection based on multi-modal classifier ensemble, which takes into account the advantages of both single-modal and multi-modal models. Specifically, we design two single-modal classifiers for text and image inputs respectively. We then establish a similarity classifier to calculate the feature similarity over the modalities. We also build an integrity classifier that utilizes integral multi-modal information. Finally, all classifier outputs are integrated with an ensemble learning to increase the classification accuracy. Furthermore, we introduce the center loss, to reduce intra-class variance, which is helpful to achieve higher detection accuracy. The cross-entropy loss is used to maximize the inter-class variations while the center loss is used to minimize the intra-class variations so that the discriminate ability of the learned news features can be enhanced. Experimental results on both Chinese and English datasets demonstrate that the proposed method outperforms the baseline fake news detection approaches.

MAD '22: Proceedings of the 1st International Workshop on Multimedia AI against Disinformation

MAD '22: Proceedings of the 1st International Workshop on Multimedia AI against Disinformation

SESSION: Keynote Talks

Towards Generalization in Deepfake Detection

Let the Chatbot Speak! Freedom of Expression and Synthetic Media.

SESSION: Session 1: AI for Audio Analysis

On the Generalizability of Two-dimensional Convolutional Neural Networks for Fake Speech Detection

Spectral Denoising for Microphone Classification

SESSION: Session 2: AI for Text Analysis

Automatic and Manual Detection of Generated News: Case Study, Limitations and Challenges

Extractive-Boolean Question Answering for Scientific Fact Checking

How Did Europe's Press Cover Covid-19 Vaccination News? A Five-Country Analysis

Automatic Detection of Bot-generated Tweets

SESSION: Session 3: AI for Visual and Multimodal Analysis

Cross-Forgery Analysis of Vision Transformers and CNNs for Deepfake Image Detection

The MeVer DeepFake Detection Service: Lessons Learnt from Developing and Deploying in the Wild

Uncovering the Strength of Capsule Networks in Deepfake Detection

Fake News Detection Based on Multi-Modal Classifier Ensemble

Sections

User login