Workshop on Distant Reading in Literary Texts
by Thora Hagen, Michael Huber and Daniel Tepavac
On the 23rd and 24th of November the DARIAH-EU working Group “Text and Data Analysis“ and the Department of Literary Computing at the University of Würzburg hosted the workshop “Distant Reading in Literary Texts”. The aim of the event bringing together leading experts in computational analysis of literary texts to stimulate discussion about ongoing research projects and the current problems at the very frontier of this field of research. Speakers were encouraged to present yet unpublished work in progress and thereby gain feedback from colleagues early early on.
Thanks a staff exchange program with the University of Osaka in Japan that is funded by the DAAD, and a grant from the Dariah-EU Funding Scheme for Working Group Activities it was possible to assemble some of the most reputable international representatives of the field. An audience of around 40 students, research assistants and professors from the humanities and computer sciences followed the presentations and engaged vividly in the long discussions.
The program was started by Jan Rybicki (Pedagogical University of Kraków, Poland). In his presentation titled “Is Russian Literature Distant? Reading Russians with Networks” he presented a stylometric network analysis of classical Russian literature. With a focus on original Russian corpora, especially on differences in translation from foreign corpora. A reason behind the choice of using Russian as the main language for the comparative analyses is its easy uniformization and availability. The first comparison was focused on original Russian literature. Networks showed that there was a cultural and historical language change throughout time. With the aid of the most frequent words used in a corpus, the Russian corpora can be divided into different groups. When including translated corpora, the diagram changes and it is possible to see that foreign corpora do not mix with original Russian corpora. Because of the translation into Russian, the text dynamic changes. Hence, the translated texts built their own group.
In the second presentation, titled “Fifty Shades of Style”, Karina van Dalen-Oskam (University of Amsterdam, Netherlands) started with the question of what genre the book Fifty Shades of Grey is. One would think that this novel is classified as Romantic. But answering this question is not as easy when this novel is compared to other romantic novels. As a foundation serves a survey in which people have evaluated the quality and literariness of the most read and lent books in the Netherlands between 2010 and 2012. The fifty most literary and the fifty least literary books had been chosen for further investigations. When comparing both lists, it is possible to recognize that the list of the least literary books consists mostly of female authors known for writing in the ‘chick-lit’ and romantic genre. The most literary books are mostly written by male authors. Concerning Fifty Shades of Grey, the classification seems to be problematic. There is a consensus within those who evaluated the corpora that this novel has a low level of quality and literariness, but still stands out being compared to other novels with low quality and literariness. Hence, it can be seen as a romantic novel but is more similar to Dutch regional novels. (https://www.huygens.knaw.nl/the-riddle-of-literary-quality/?lang=en)
In his presentation „Distant Reading and the Problem of Perspective“, Ted Underwood (University of Illinois, USA) addressed the issue of the ongoing changes in perspective concerning different concepts, and his goal was to illustrate this phenomenon through digital means. Specifically, the meaning of the word genre certainly underwent some changes throughout history; genre adapts to time. He therefore tried to measure the distances of different perspectives on genre. He approached the problem by utilizing supervised learning algorithms. The idea is to apply a trained model of a first data set to a second data set (the two data sets covering different time periods each), and to measure the accuracy loss to define the distances between the data sets. A distance matrix could then be used to visualize all results. However, genre is always difficult to define, especially regarding the preparation of suitable data sets. Lastly, the question remains whether there exists any other way to measure the distance after applying the model. (https://tedunderwood.com/)
The presentation of the project “Scalable Reading – novella collections of the 19th century” by Thomas Weitin (University of Darmstadt, Germany) is, like the project itself, based on the “Novellenschatz”, a corpus of novellas from 1870 collected by Paul Heyse and Hermann Kurz. The project title “Reading at scale” tries to mirror a research approach which includes various techniques and methods from literature studies and digital humanities by “zooming” from close reading to distant reading. The “Novellenschatz” fits the requirements for this approach by being not too large to read by humans and not too small for quantitative analyses. Using the “similian backbone” algorithm, Thomas Weitin compares novellas by their delta distances to find out which texts are “centres” of the corpus and how the texts relate to each other. (http://www.digitalhumanitiescooperation.de/projects/reading-at-scale/)
In the presentation “Through different routes to the same landscape: What do text-clusterings tell us about style?”, Tomoji Tabata from the University of Osaka conducts a stylistic study of the British author Charles Dickens. The corpus of this study consists mainly out of texts from Dickens which are being compared to other texts from the 18th and 19th century but from different authors. Using several methods, including among others the classification task “random forest” and PCA, Tabata could detect that Dickens has an exceptional style and distinctive linguistic features. For example, Dickens uses the word “gentleman” more frequently, distinguishing him from authors of the same time period.
Christof Schöch’s (University of Trier, Germany) presentation “Burrows’ Zeta: Reimplementation and Extension” centres around the possibilities, strengths and weaknesses that come with zeta. In particular, he addressed how to mitigate certain weaknesses of zeta, for example how to deal with relative frequencies. Christof Schöch and his working group use the different zeta implementations on sets of french comedies and tragedies to highlight which modifications work, which do not work, and how said implementations influence a possible interpretation. (https://github.com/cligs/pyzeta)
Due to the recently started project, a cooperation between the Kalimachos project of the University of Würzburg and the University of Osaka, about complexity in literature
Finally, Fotis Jannidis (University of Würzburg) talked about various layers and forms of complexity and how to approach this topic. The topic of complexity stands in the focus of the current research activities in the context of both the Kallimachos project and the Osaka-Würzburg staff exchange program. Within the context of this workshop, such layers could be complexity based on style, aesthetics, the depicted world, symbolic elements or aspects of the fictional world or of intertextuality. A first approach is looking at lexical diversity as one aspect of text complexity. Interestingly, first results when comparing classical literature with light fiction did not show significant differences in lexical richness when using. One problem in many measures of lexical richness is their dependence on text length, a problem that has dominated the discourse among research in the past and led to the development of various mathematical procedures. Fotis Jannidis argued that rather than inventing ever more intricate measures to better overcome text length dependence, it is time now for the digital humanists to concentrate on understand what such measures tell about the text.
The workshop “Distant Reading in Literary Texts” turned out to be an educative exchange of methods and knowledge. In particular, the discussions following each talk served as a platform for exchanging ideas and new input. Being able to communicate face to face rather than via blogs or articles made getting new ideas across efficient and easy. For all listeners as well as speakers, this workshop turned out to be very informative, entertaining and inciting. We will hope for similar events in the future.