Interpreting Neural Networks for and with Natural Language

Thumbnail Image
Wiegreffe, Sarah Augusta
Riedl, Mark O.
Associated Organization(s)
Organizational Unit
Organizational Unit
Supplementary to
In the past decade, natural language processing (NLP) systems have come to be built almost exclusively on a backbone of large neural models. As the landscape of feasible tasks has widened due to the capabilities of these models, the space of applications has also widened to include subfields with real-world consequences, such as fact-checking, fake news detection, and medical decision support. The increasing size and nonlinearity of these models results in an opacity that hinders efforts by machine learning practitioners and lay-users alike to understand their internals and derive meaning or trust from their predictions. The fields of explainable artificial intelligence (XAI) and more specifically explainable NLP (ExNLP) have emerged as an active area for remedying this opacity and for ensuring models' reliability and trustworthiness in high-stakes scenarios, by providing textual explanations meaningful to human users. Models that produce justifications for their individual predictions can be inspected for the purposes of debugging, quantifying bias and fairness, understanding model behavior, and ascertaining robustness and privacy. Textual explanation is a predominant form of explanation in machine learning datasets regardless of task modality. As such, this dissertation covers both explaining tasks with natural language and explaining natural language tasks. In this dissertation, I propose test suites for evaluating the quality of model explanations under two definitions of meaning: faithfulness and human acceptability. I use these evaluation methods to investigate the utility of two explanation forms and three model architectures. I finally propose two methods to improve explanation quality– one which increases the likelihood of faithful highlight explanations and one which improves the human acceptability of free-text explanations. This work strives to increase the likelihood of positive use and outcomes when AI systems are deployed in practice.
Date Issued
Resource Type
Resource Subtype
Rights Statement
Rights URI