Leveraging AI to Combat Misinformation by Empowering Crowds and Evaluating Detectors

Author(s)
He, Bing
Advisor(s)
Kumar, Srijan
Editor(s)
Associated Organization(s)
Organizational Unit
Supplementary to:
Abstract
Online misinformation poses a global risk, leading to threatening real-world implications. To combat misinformation, existing research works either focus on leveraging the expertise of professionals including journalists and fact-checkers to annotate and debunk misinformation, or develop automatic ML methods to detect misinformation and its spreaders. However, the efficacy of professionals is limited because their manual processes do not scale with the volume of misinformation; ML methods rely on deep sequence embedding-based classifiers for detecting misinformation spreaders, but their vulnerabilities are rarely examined. To complement professionals, non-expert ordinary users (a.k.a. crowds) can act as eyes-on-the-ground who proactively question and counter misinformation, showing promise in overcoming the limitations of solely relying on professionals. However, little is known about how these crowds organically combat misinformation. Concurrently, AI has progressed dramatically, demonstrating the potential to help combat misinformation. In this thesis, we aim to utilize AI to investigate the aforementioned challenges and provide insights and solutions to empower crowds to better counter misinformation. We first characterize crowds who counter misinformation on social media platforms and how users respond to these counter-misinformation messages, and then assist crowds by generating more effective counter-misinformation replies. We apply advanced AI techniques to characterize the spread and textual properties of counter-misinformation generated by crowds as well as their characteristics during the COVID-19 pandemic. Interestingly, we found 96% counter-misinformation posts are made by crowds, which confirms their prominent role in combating misinformation. We also analyze user responses toward crowd-generated counter-misinformation replies in a conversation to investigate the impact of these counter-misinformation replies. As expected, we discovered that counter-misinformation replies that are polite, positive, and evidenced have a higher possibility of having a corrective effect on users. Our analysis work provides insights into how online misinformation is organically countered by crowds and how users respond to such counter-misinformation. Alarmingly, we also noticed that 2 out of 3 crowd messages are rude and lack evidence, and impolite and non-evidence replies may cause backfire. Generating an effective counter-misinformation response is thus crucial but challenging due to the absence of high-quality datasets and communication theory-backed models. To address these challenges, we first create two novel datasets of misinformation and counter-misinformation response pairs from in-the-wild social media and in-lab crowdsourcing, and then propose a reinforcement learning-based AI algorithm, called MisinfoCorrect, that learns to generate high-quality counter-misinformation responses for an input misinformation post. Our work illustrates the promise of AI for empowering crowds in combating misinformation. On the other hand, deep sequence embedding-based classification methods, which use a sequence of user posts to generate user embeddings and detect malicious users, are also employed to identify misinformation spreaders on social media platforms. Although deep learning models are shown to be vulnerable to adversarial attacks in computer vision and natural language processing domains, the vulnerability of deep sequence embedding-based detectors remains unknown. Thus, we evaluate existing detectors by proposing a novel end-to- end AI algorithm, called PETGEN (PErsonalized Text GENerator), that simultaneously reduces the efficacy of the detection model and generates high-quality personalized posts. Next, to improve the robustness of these detection models against the next post attack, we propose a novel transformer-based detection model. The algorithm first comprehensively encodes the local and global information (i.e., the post and sequence information) by transformer encoder and decoder blocks, and then deploys the contrastive learning-enhanced classification loss to consider the adversarial attack scenario during training. Building on our efforts, we pave the path toward the next generation of adversary-aware deep sequence embedding-based classification models to robustly identify misinformation spreaders. Our AI-based approaches lead to solutions that can empower crowds and better automated detectors for efficiently and effectively combating misinformation.
Sponsor
Date
2024-05-20
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI