Making Modernism Big
This semester, with the Modernist Versions Project (MVP) and Maker Lab in the Humanities, I have been creating a repository of modernist texts for the purposes of text analysis and machine learning. The scope of this project requires a powerful infrastructure, including hardware, software, and technical support, provided in part by Compute Canada, a high performance computing resource platform for universities and institutions across Canada. Last semester was spent aggregating a significant number of modernist texts (in TXT format) and learning the affordances of computer vision. The goal is to mobilize machine learning techniques to infer as yet unseen patterns across modernism. We hope that scripts written in collaboration with Compute Canada will allow us to be comprehensive and equitable in our articulation of modernism.
But producing a repository based on web-based materials is quite tricky. As part of his work for the Routledge Encyclopedia of Modernism, Stephen Ross has created a thorough list of modernist authors that we will use to amass modernist texts housed across the web. These texts are not always “clean,” and they don’t always have sufficient metadata. And even when repositories like Project Gutenberg Australia have relatively clean text files, their selection is limited due to copyright (among other reasons). As such, the version of modernism most people currently access through popular online repositories like Gutenberg Australia often doesn’t contain important works by notable women writers and people of colour. In Project Gutenberg Australia, there is no Nella Larsen, Zora Neale Hurston, or Langston Hughes. No Dorothy Richardson or Djuna Barnes. There is also no poetry. So we don’t get Ezra Pound or T. S. Eliot, either. Put differently, Gutenberg Australia’s version of modernism appears to be very different from the version most North American students will encounter in, say, a university course. As such, we are not relying on just one repository for this work, and we hope that scripts written in collaboration with Compute Canada will allow us to be comprehensive and equitable in our articulation of modernism, especially where difference is concerned. We also hope to fill in gaps where possible, either by adding our own texts to existing repositories or conducting more research on modernist writers who are not (yet) discoverable on the web. In this regard, we are especially inspired by digital humanities practitioners, Amy Earhart and Susan Brown.
In order to undertake this work, I’ve been meeting with Jentery Sayers, Stephen Ross, and Belaid Moa, who is one of Compute Canada’s HPC Specialists from the West Grid sector. Belaid has been extremely helpful, guiding me through the West Grid system and showing me how to develop a Python script that will grab modernist texts from an array of online repositories. Our script needs to locate the required texts within the HTML tree structure of the repository sites, download them, and store them in the Compute Canada database. In order to develop this script, I’ve been learning the syntax and semantics of arrays, functions, strings, and regular expressions. Computer code is a challenging language, but I feel that I am now better versed in Python though there is much left to learn.
How will this computational approach challenge the ongoing assumptions of literary scholars? The idea has been to start small (i.e., with twenty novels) and see how well the analysis scales up when more modernist texts are included. I chose the twenty novels we use based on availability, university syllabi, and MVP familiarity with them. This semester we have been running basic machine learning methods on this sample of texts in order to determine commonalities, differences, and tendencies across them. A year later, I still wonder whether the computer is as confused about modernism as I am.
I often find myself questioning why the computer thinks certain passages are important or interesting, why it comes to the conclusions that it does. The approach is already challenging my own assumptions about modernism, and I’m looking forward to sharing our preliminary results with larger communities of literary scholars.
Image for this post care of Jana Millar Usiskin and Google Images.
This post originally appeared at http://maker.uvic.ca/big/ It was edited by Jentery Sayers, Karly Wilson, and the Maker Lab in the Humanities.
Jana Millar Usiskin
Graduate student, studying modernisms and the digital humanities at UVic.