Oct 2017

SACHI Seminar – Jonathan Armosa – How to Closely Read a Topic Model: Visualizing the Poetry of Emily Dickinson

Event details

  • When: 9th October 2017 15:00 - 16:00
  • Where: Cole 1.33a

Title:  How to Closely Read a Topic Model: Visualizing the Poetry of Emily Dickinson

Biography:  Jonathan Armosa is a Doctoral Fellow at New York University (NYU).  Jonathan’s research is in the area of Digital Humanities and focuses on Computational Modelling of Literature and Information Visualization. 

Jonathan has an interdisciplinary background in computer science and English Literature. He received his M.A. in English Literature and Digital Humanities at McGill University, Montreal, Canada where he worked at the McGill Centre for Digital Humanities led by Stéfan Sinclair.

Abstract:  When digital humanists use topic models to explore large corpora of texts, they do so at an inherent disadvantage. Typically presented with flat files listing topics and topic weights, they are left to interpret these lists and figures separate from the texts that have just been modeled. Several significant tools have been developed to help scholars visually navigate the textual relationships in topic models. However, in the past few years I have been working on a practical, critical methodology for understanding topic models, the relations between their outputs, and the predominant working method of the humanities: human-guided, focused, and contextualized reading.

For this talk, I will take attendees on a visual exploration of a topic model using a highly interactive and playful data visualization called “Topic Words in Context” – or “TWiC”, for short. TWiC is a multi-paneled environment for web browsers that allows users to explore and juxtapose multiple scales of data in topic models. It uses shapes, colors, and cross-panel highlighting to get viewers from “big” data to “small” and back. Importantly, it also provides an alternate “publication” view that resituates modeled texts back into their original publication contexts (i.e. texts split for modeling purposes or texts within a collection). TWiC brings our focus simultaneously to these many textual and statistical relationships at play within a topic model. From corpus-wide topic distributions to texts to the topics themselves, each scale of the model when set against each other can reveal hierarchical qualities that enrich and move beyond the linguistic relationships frequently associated with the word lists of topic models. Of the many analytical techniques TWiC makes possible, I will demonstrate how we can produce expressive, critical comparisons between our readings of texts and the smallest of quantitative scales in a topic model: individual texts and individual topics. We will look at different weighting schemes for topic and topic word distributions, how to quantitatively characterize and visualize them, and then how to compare them to traditional, focused reading. As it turns out, the expressiveness of a topic model functions differently depending on the context in which we depict its data. To show this, I will turn our attention to the poetry of Emily Dickinson and how a topic model of it may be situated within the context of her once-lost manuscripts called “fascicles”.