The dreams of digital humanities

A brief history of the wonders of binary encoding of culture. 

by Andrea Bolioli


Father Busa SJ’s dream is the obvious place to start this brief journey through the history of Digital Humanities. In 1949, the young Jesuit Father Busa dreamt of analyzing the complete works of friar Thomas Aquinas (118 texts) to create the Index Thomisticus. He graduated in Philosophy with a degree thesis entitled “The Thomistic Terminology of Interiority.” According to Father Busa, the concept of “interiority” was not conveyed only by the word “praesentia” in Thomas’s texts (as philosophical literature thought), but also by the preposition “in” (where it means “inside” and not “negation”).
Philosophers of language are fascinated by problems like these: the occurrences of a prefix or a preposition linked to a noun or a verb, and the meaning associated with their uses. Father Busa SJ began to write the concordances by hand; realized it was an impossible task; presented the project to Thomas J. Watson, the founder and CEO of IBM, and persuaded him to back his project with their calculators (punch card machines).

It took him almost 30 years to complete the analysis and annotation of a corpus of about 11 million words. The Index Thomisticus Sancti Thomae Aquinatis Operum Omnium Indices ed Concordantiae was first published as 56 volumes of about 1000 pages each in 1980, then put on CD-ROM, then became a hypertext (also on CD-ROM), and lastly it became a website in 2005 (the Corpus Thomisthicum online). The work on the occurrences of “in” became something more extensive, namely the word index of all the works of Thomas Aquinas (plus some related books), containing the words, lemmas, morphosyntactic features, and concordances.
The European ERC research project LiLa: Linking Latin. Building a Knowledge Base of Linguistic Resources for Latin was created in 2018. Marco Passarotti, a former student of Father Busa, is the project’s principal investigator. Stepping beyond Thomas Aquinas, it deals with all the digital linguistic resources in Latin. Instead of the term “digital humanities,” Marco prefers to use “computational humanities.” He says he is a computational linguist, but it is hard not to consider him a “digital humanist” as well.

Between 1456 and 1468, the copyist Andrea Vitturi, a member of a noble family, dreamed of a “great” library and manually transcribed about ten texts, working at home in Venice or in a stronghold in Novigrad (where he was a keeper). Copyists and illuminators of the fifteenth century did not work only in the writing rooms of convents. Vitturi’s library contains poems of chivalry, laudes in vernacular, works of moral wisdom (such as the fourteenth-century Fior di virtù), and other works little known to us today.
Between 2005 and 2015, Google scanned more than 20 million books and published many of them on the web.
Previously, in 1998, Stanford University students Sergey Brin and Lawrence Page had published a paper entitled The Anatomy of a Large-Scale Hypertextual Web Search Engine, in which they presented the prototype of what was to become Google Search. Brin and Page’s dream was to index the whole web: to seamlessly read all the pages of the web (some billions of them), to extract words from documents, create indexes, and enable people (some billions of them) to use these indexes through a single search engine. “We chose our system name, Google, because it is a common spelling of googol, or 10100 (10 to the 100th power) and fits well with our goal of building very large-scale search engines.”


Being able to analyze vast amounts of digital content is Lev Manovich’s dream, as he discusses in his 2020 book Cultural Analytics. Not only texts, which seem to be a solved problem: “How can we see a billion images? What analytical methods can we bring to bear on the astonishing scale of digital culture—the terabytes of photographs shared on social media every day, the hundreds of millions of songs created by twenty million musicians on SoundCloud, the content of four billion Pinterest boards?”
By doing quantitative and qualitative analysis on all the content in order to see and understand digital culture: “Before we can theorize digital culture, we need to see it, and, because of its scale, to see it we need computers.”

Of course, not only social media content. Great museums like the Rijksmuseum in Amsterdam create digital images of their works and make them available to everyone “to connect the treasures in its collection—as well as its knowledge about them—with as large and diverse an audience as possible.” With no restrictions on their reuse: “Our collection is for everyone. That’s why the Rijksmuseum makes its digitized collections and metadata available at the highest quality level. And we don’t ask for anything in return.” The dream of being able to let everyone learn about the works of art.

If Pico della Mirandola were to come back and visit Dino Buzzetti (Emeritus Professor of Medieval Philosophy at the University of Bologna), they could work together on an automatic analysis of a digital version of Pico’s works. Giovanni Pico could see copies of his manuscripts, transcribed using movable type (letters that move around on a bright white sheet), and indexes of all the words contained in his works. Pico had a prodigious memory; he knew some works of literature and philosophy by heart; he could quote the Divine Comedy backward and had the ability to do multi-figure math in his head without writing down a single number.

Dino’s computer can do these things too, and much more. It can connect to the Internet and access the web, the biggest “library” or “bookshop” of history (to use two words that Pico could understand): a library that grows constantly, collects the speeches and drawings of everybody in the world, and contains many other libraries and archives inside it. Pico speaks many languages (Latin, Greek, Hebrew, Aramaic, and French), but more than half of the documents on the web are written in a language that Dino must translate to his humanistic friend. At some point, Dino would introduce Pico to “data science” (a new, wondrous kind of mathematics) applied to the humanities (philosophy, poetry, classical literature, painting, …). Can we use the methods of data science to answer the complex queries arising in the humanities? What possibilities and limitations do we have, and how can we overcome them? They would then start discussing the “Conclusiones nongentae in omni genere scientiarum” (Dino can create a new copy of the work in a split second), of the 900 concepts inherent to the whole human knowledge, of the semantic change of words over time. And at the end, Pico would exclaim: “Dino, you too are a humanist and a magician!”