Title:
MODELING THE LEADERSHIP OF LANGUAGE CHANGE FROM DIACHRONIC TEXT

dc.contributor.advisor Eisenstein, Jacob
dc.contributor.author Soni, Sandeep Brijlal
dc.contributor.committeeMember De Choudhury, Munmun
dc.contributor.committeeMember Zhang, Chao
dc.contributor.committeeMember Mihalcea, Rada
dc.contributor.committeeMember Bamman, David
dc.contributor.department Computer Science
dc.date.accessioned 2022-08-25T13:28:57Z
dc.date.available 2022-08-25T13:28:57Z
dc.date.created 2021-08
dc.date.issued 2021-07-21
dc.date.submitted August 2021
dc.date.updated 2022-08-25T13:28:57Z
dc.description.abstract Natural languages constantly change over time. These changes are modulated by social factors such as influence which are not always directly observable. However, large-scale computational modeling of language change using timestamped text can uncover the latent organization and social structure. In turn, the social dynamics of language change can potentially illuminate our understanding of innovation, influence, and identity: Who leads? Who follows? Who diverges? This thesis contributes to the growing body of research on using computational methods to model language change with a focus on quantifying linguistic leadership of change. A series of studies highlight the unique contributions of this thesis: methods that scale to huge volumes of data; measures that quantify leadership at the level of individuals or in aggregate; and analysis that links linguistic leadership to other forms of influence. First, temporal and predictive models of event cascades on a network of millions of Twitter users are used to show that lexical change spreads in the form of a contagion and influence from densely embedded ties is crucial for the adoption of non-standard terms. A Granger-causal test for detecting social influence in event cascades on a network is then presented, which is robust to both the presence of confounds such as homophily and can be applied to model both linguistic or non-linguistic change in a network. Next, a novel scheme to score and identify documents that lead semantic change in progress is introduced. This linguistic measure of influence on the documents is strongly predictive of their influence in terms of the number of citations that they receive for both US court opinions and scientific articles. Subsequently, a measure of lead on any semantic change between a pair of document sources (e.g. newspapers) and a method to aggregate multiple lead-lag relationships into a network is presented. Analysis on an induced network of nineteenth century abolitionist newspapers, following the proposed method, reveals the important yet understated role of women and Black editors in shaping the discourse on abolitionism. Finally, a method to induce an aggregate semantic leadership network using contextual word representations is proposed to investigate the link between semantic leadership and influence in the form of citations among publication venues that are part of the Association of Computational Linguistics. Taken together, these studies illustrate the utility of finding leaders of language change to gain insights in sociolinguistics and for applications in social science and digital humanities.
dc.description.degree Ph.D.
dc.format.mimetype application/pdf
dc.identifier.uri http://hdl.handle.net/1853/67138
dc.language.iso en_US
dc.publisher Georgia Institute of Technology
dc.subject Natural language processing
dc.subject computational social science
dc.subject digital humanities
dc.subject language change
dc.title MODELING THE LEADERSHIP OF LANGUAGE CHANGE FROM DIACHRONIC TEXT
dc.type Text
dc.type.genre Dissertation
dspace.entity.type Publication
local.contributor.advisor Eisenstein, Jacob
local.contributor.corporatename College of Computing
relation.isAdvisorOfPublication d2334908-9b54-40ce-9a5b-26987819dd65
relation.isOrgUnitOfPublication c8892b3c-8db6-4b7b-a33a-1b67f7db2021
thesis.degree.level Doctoral
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
SONI-DISSERTATION-2021.pdf
Size:
1.26 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
3.87 KB
Format:
Plain Text
Description: