The Data-Driven Analysis of Literature

Bamman, David

Title:

The Data-Driven Analysis of Literature

dc.contributor.author	Bamman, David
dc.contributor.corporatename	Georgia Institute of Technology. Machine Learning	en_US
dc.contributor.corporatename	University of California, Berkeley. School of Information	en_US
dc.date.accessioned	2019-12-02T15:04:22Z
dc.date.available	2019-12-02T15:04:22Z
dc.date.issued	2019-11-15
dc.description	Presented on November 15, 2019 at 12:45 p.m. in the Klaus Advanced Computing Building, Room 2443.	en_US
dc.description	David Bamman is an assistant professor in the School of Information at UC Berkeley, where he works on applying natural language processing and machine learning to empirical questions in the humanities and social sciences. His research often involves adding linguistic structure (e.g., syntax, semantics, coreference) to statistical models of text, and focuses on improving NLP for a variety of languages and domains (such as literary text and social media).	en_US
dc.description	Runtime: 61:01 minutes	en_US
dc.description.abstract	Literary novels push the limits of natural language processing. While much work in NLP has been heavily optimized toward the narrow domains of news and Wikipedia, literary novels are an entirely different animal--the long, complex sentences in novels strain the limits of syntactic parsers with super-linear computational complexity, their use of figurative language challenges representations of meaning based on neo-Davidsonian semantics, and their long length (ca. 100,000 words on average) rules out existing solutions for problems like coreference resolution that expect a small set of candidate antecedents. At the same time, fiction drives computational research questions that are uniquely interesting to that domain. In this talk, I'll outline some of the opportunities that NLP presents for research in the quantitative analysis of culture--including measuring the disparity in attention given to characters as a function of their gender over two hundred years of literary history (Underwood et al. 2018)--and describe our progress to date on two problems essential to a more complex representation of plot: recognizing the entities in literary texts, such as the characters, locations, and spaces of interest (Bamman et al. 2019) and identifying the events that are depicted as having transpired (Sims et al. 2019). Both efforts involve the creation of a new dataset of 200,000 words evenly drawn from 100 different English-language literary texts and building computational models to automatically identify each phenomenon. This is joint work with Matt Sims, Ted Underwood, Sabrina Lee, Jerry Park, Sejal Popat and Sheng Shen.	en_US
dc.format.extent	61:01 minutes
dc.identifier.uri	http://hdl.handle.net/1853/62069
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	Machine Learning @ Georgia Tech (ML@GT) Seminar Series
dc.subject	Literary novels	en_US
dc.subject	Natural language processing (NLP)	en_US
dc.title	The Data-Driven Analysis of Literature	en_US
dc.type	Moving Image
dc.type.genre	Lecture
dspace.entity.type	Publication
local.contributor.corporatename	Machine Learning Center
local.contributor.corporatename	College of Computing
local.relation.ispartofseries	ML@GT Seminar Series
relation.isOrgUnitOfPublication	46450b94-7ae8-4849-a910-5ae38611c691
relation.isOrgUnitOfPublication	c8892b3c-8db6-4b7b-a33a-1b67f7db2021
relation.isSeriesOfPublication	9fb2e77c-08ff-46d7-b903-747cf7406244

Files

Original bundle

Now showing 1 - 4 of 4

Name:: bamman.mp4
Size:: 491.21 MB
Format:: MP4 Video file
Description:: Download video

Download

Name:: bamman_videostream.html
Size:: 1.32 KB
Format:: Hypertext Markup Language
Description:: Streaming video

Download

Name:: transcript.txt
Size:: 59.77 KB
Format:: Plain Text
Description:: Transcription

Download

Name:: thumbnail.jpg
Size:: 73.72 KB
Format:: Joint Photographic Experts Group/JPEG File Interchange Format (JFIF)
Description:: Thumbnail

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 3.13 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Scholarly Events

Title: The Data-Driven Analysis of Literature

Files

Original bundle

License bundle

Collections

Title:

The Data-Driven Analysis of Literature