Tag Archives: digital humanities

Evolution of literature as seen by a classicist, a biologist and a computer scientist

Studying intertextuality shows how books are related in various ways and are reorganized and recombined over time. Image courtesy of Elena Poiata.

I find the image more instructive when I read it from the bottom up. For those who prefer to prefer to read from the top down, there’s this April 5, 2017 University of Texas at Austin news release (also on EurekAlert),

A classicist, biologist and computer scientist all walk into a room — what comes next isn’t the punchline but a new method to analyze relationships among ancient Latin and Greek texts, developed in part by researchers from The University of Texas at Austin.

Their work, referred to as quantitative criticism, is highlighted in a study published in the Proceedings of the National Academy of Sciences. The paper identifies subtle literary patterns in order to map relationships between texts and more broadly to trace the cultural evolution of literature.

“As scholars of the humanities well know, literature is a system within which texts bear a multitude of relationships to one another. Understanding what is distinctive about one text entails knowing how it fits within that system,” said Pramit Chaudhuri, associate professor in the Department of Classics at UT Austin. “Our work seeks to harness the power of quantification and computation to describe those relationships at macro and micro levels not easily achieved by conventional reading alone.”

In the study, the researchers create literary profiles based on stylometric features, such as word usage, punctuation and sentence structure, and use techniques from machine learning to understand these complex datasets. Taking a computational approach enables the discovery of small but important characteristics that distinguish one work from another — a process that could require years using manual counting methods.

“One aspect of the technical novelty of our work lies in the unusual types of literary features studied,” Chaudhuri said. “Much computational text analysis focuses on words, but there are many other important hallmarks of style, such as sound, rhythm and syntax.”

Another component of their work builds on Matthew Jockers’ literary “macroanalysis,” which uses machine learning to identify stylistic signatures of particular genres within a large body of English literature. Implementing related approaches, Chaudhuri and his colleagues have begun to trace the evolution of Latin prose style, providing new, quantitative evidence for the sweeping impact of writers such as Caesar and Livy on the subsequent development of Roman prose literature.

“There is a growing appreciation that culture evolves and that language can be studied as a cultural artifact, but there has been less research focused specifically on the cultural evolution of literature,” said the study’s lead author Joseph Dexter, a Ph.D. candidate in systems biology at Harvard University. “Working in the area of classics offers two advantages: the literary tradition is a long and influential one well served by digital resources, and classical scholarship maintains a strong interest in close linguistic study of literature.”

Unusually for a publication in a science journal, the paper contains several examples of the types of more speculative literary reading enabled by the quantitative methods introduced. The authors discuss the poetic use of rhyming sounds for emphasis and of particular vocabulary to evoke mood, among other literary features.

“Computation has long been employed for attribution and dating of literary works, problems that are unambiguous in scope and invite binary or numerical answers,” Dexter said. “The recent explosion of interest in the digital humanities, however, has led to the key insight that similar computational methods can be repurposed to address questions of literary significance and style, which are often more ambiguous and open ended. For our group, this humanist work of criticism is just as important as quantitative methods and data.”

The paper is the work of the Quantitative Criticism Lab (www.qcrit.org), co-directed by Chaudhuri and Dexter in collaboration with researchers from several other institutions. It is funded in part by a 2016 National Endowment for the Humanities grant and the Andrew W. Mellon Foundation New Directions Fellowship, awarded in 2016 to Chaudhuri to further his education in statistics and biology. Chaudhuri was one of 12 scholars selected for the award, which provides humanities researchers the opportunity to train outside of their own area of special interest with a larger goal of bridging the humanities and social sciences.

Here’s another link to the paper along with a citation,

Quantitative criticism of literary relationships by Joseph P. Dexter, Theodore Katz, Nilesh Tripuraneni, Tathagata Dasgupta, Ajay Kannan, James A. Brofos, Jorge A. Bonilla Lopez, Lea A. Schroeder, Adriana Casarez, Maxim Rabinovich, Ayelet Haimson Lushkov, and Pramit Chaudhuri. PNAS Published online before print April 3, 2017, doi: 10.1073/pnas.1611910114

This paper appears to be open access.

Digitizing and visualizing the humanities

It’s a bit of stretch for this blog but since I sometimes write about ‘big’ data in the context of science, I’ve decided to include this piece on big data and the humanities. First, I looked up a definition for the humanities and it’s far broader than I expected, from the Wikipedia essay on the Humanities (Note: I have removed links),

The humanities are academic disciplines that study the human condition, using methods that are primarily analytical, critical, or speculative, as distinguished from the mainly empirical approaches of the natural sciences.

The humanities include ancient and modern languages, literature, history, philosophy, religion, and visual and performing arts such as music and theatre. Other humanities include social sciences, history, anthropology, area studies, communication studies, cultural studies, law and linguistics.

As for the digital humanities, here’s a brief description from the July 30, 2012 story about big data, the humanities, and Stéfan Sinclair, by Adam Bluestein for Fast Company,

In the burgeoning academic discipline of digital humanities, creating software tools is as important as getting published in a journal. To better understand what this means, take a peek at the pedagogical playbook of Stefan Sinclair, associate professor of digital humanities in McGill University’s Department of Languages, Literatures and Cultures.[Montréal, Québec, Canada] …

At the same time, he’s equipping a new generation of humanities students with the eclectic skill set and entrepreneurial spirit to take on a 21st century job market. …

FAST COMPANY: What is “digital humanities,” exactly?

STEFAN SINCLAIR: There’s a natural tendency to assume it’s a new field, but it’s actually been around for quite a long time. The first research combining computers and the humanities was in the 1940s, and a journal called Computers and Humanities started publishing in the 1960s. But there has been a lot of attention and momentum in the past 3 or 4 years that hasn’t been there before. The core of digital humanities is the critical exploration of how computers and technology can enhance but also influence our modes of research in traditional humanities.

My use of the word visualizing in the title for this posting differs from my general use of the term, i.e., make pictures/images from data (from the Bluestein article),

How does this kind of approach help us see things that we couldn’t before?

One thing that’s compelling about digital humanities is being able to ask questions at a scale you can’t ask without computers. Really, most humanities is very exclusionary–we don’t have time as humans to read a lot of text. So all English studies are a matter of excluding, choosing texts we’re interested in and leaving aside others. With computers, we can now ask questions of, say, all novels in the 19th century. Sometimes that’s called “distant reading”–as opposed to the more traditional literary practice of close reading. You can also combine close and distant reading, when you want to look at a few novels, but offer a comparison to a larger context of novels.

Digital humanities also encompasses a lot more than text. There is a lot of interest in game studies, for instance, and geospatial analysis that’s not what people in geography would do. An example of that is a project on the Republic of Letters–a long-distance intellectual community in the late 17th and 18th century in Europe and America–that maps the transferring of thoughts across geographical space, allowing you to visualize that and see things in generative ways.

Clearly, algorithm are influencing spheres of study and thought that would have been unthinkable recently for most of us, if not the pioneers of the 1940s. I’m glad to see Sinclair, towards the end of the article, discuss one of the dangers of digitizing humanities, i.e., turning the humanities into an hypothesis-proving endeavour (scientific method). From the Bluestein article,

I am particularly passionate about tools and methodologies that allow for the proliferation of perspectives–not to prove a hypothesis I have, but to see a text differently and ask different questions.

I was once asked to define my writing practice as part of a presentation. My answer (I’m sparing your 10 mins. of presentation) was this: asking questions.

You can find out more bout Stéfan Sinclair and his work here.