natural language processing for the ... - digital humanities

1
Digital Humanities at Berkeley a project of the Division of Arts & Humanities digitalhumanities.berkeley.edu/summer-institute Open to the public | Food and drink will be served Hosted in conjunction with the Digital Humanities at Berkeley Summer Institvute Over the past few years, natural language processing (NLP) has become an increasingly important element in computational research in the humanities and social sciences, enabling sophisticated analyses that can go far beyond simple word counting. However, there is a substantial gap between the quality of the NLP used by researchers in the humanities and the state of the art. NLP research has overwhelm- ingly focused not only on one language (English) but also one domain (news- wire)---leaving many other languages, dialects and domains (such as literary text) under- served. In this talk, I'll advocate for two things that I think are necessary to drive the next generation of textual work in the computational humanities. First, I'll argue for the importance of structured linguistic representations in computational models of text, surveying several recent projects that have leveraged that structure to good effect. Second, I'll advocate for the development of high-quality NLP for the long tail of languages, dialects and domains that humanists study--and which humanists are in the best position to take the reins and make progress on. By leveraging standard machine learning techniques with disciplinary expertise only humanists can provide, we can both dramatically expand the scope of NLP to be applied to a much wider variety of texts in our cultural record and use the linguistic structure we infer to help define new tasks altogether. Natural Language Processing for the Long Tail David Bamman, Assistant Professor School of Information August 21, 4 PM | Social Science Matrix, Barrows Hall, 8th Floor

Upload: others

Post on 05-Dec-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Digital Humanities at Berkeleya project of the Division of Arts & Humanities

digitalhumanities.berkeley.edu/summer-institute

Open to the public | Food and drink will be servedHosted in conjunction with the Digital Humanities at Berkeley Summer Institvute

Over the past few years, natural language processing (NLP) has become an increasingly important element in computational research in the humanities and social sciences, enabling sophisticated analyses that can go far beyond simple word counting. However, there is a substantial gap between the quality of the NLP used by researchers in the humanities and the state of the art. NLP research has overwhelm-ingly focused not only on one language (English) but also one domain (news-wire)---leaving many other languages, dialects and domains (such as literary text) under-served.

In this talk, I'll advocate for two things that I think are necessary to drive the next generation of textual work in the computational humanities. First, I'll argue for the importance of structured linguistic representations in computational models of text, surveying several recent projects that have leveraged that structure to good effect. Second, I'll advocate for the development of high-quality NLP for the long tail of languages, dialects and domains that humanists study--and which humanists are in the best position to take the reins and make progress on. By leveraging standard machine learning techniques with disciplinary expertise only humanists can provide, we can both dramatically expand the scope of NLP to be applied to a much wider variety of texts in our cultural record and use the linguistic structure we infer to help define new tasks altogether.

Natural Language Processing for the Long TailDavid Bamman, Assistant Professor

School of InformationAugust 21, 4 PM | Social Science Matrix, Barrows Hall, 8th Floor