Vizualizing Communities in Mrs. Dalloway
“It was precisely twelve o’clock; twelve by Big Ben; whose stroke was wafted over the northern part of London; blent with that of other clocks, mixed in a thin ethereal way with the clouds and wisps of smoke, and died up there among the seagulls–twelve o’clock struck as Clarissa Dalloway laid her green dress on her bed, and the Warren Smiths walked down Harley Street. Twelve was the hour of their appointment. Probably, Rezia thought, that was Sir William Bradshaw’s house with the grey motor car in front of it. The leaden circles dissolved in the air.”
According to network analysis, paragraph 349 in Mrs. Dalloway is the most central; that is, in the whole of the novel, this is the paragraph that connects the greatest number of significant character nodes. That it takes place in the middle of the day seems to indicate the extent of Woolf’s, perhaps unconscious, narrative ability.
Using network visualization methods, I have been collaborating once again with Compute Canada’s Belaid Moa, this time to perform network analysis on Mrs. Dalloway, a novel very much interested in the connections between people and the ways these connections are created, mediated, and sustained. Following the work of Franco Moretti, as delineated in “Network Theory, Plot Analysis,” we have been developing a network model which might help us “to see the underlying structures of a complex object.” The messy day in June on which the events of Mrs. Dalloway occur makes for a particularly complex object to play with. Our method of play takes on the spirit of deformance as articulated by Jerome McGann and Lisa Samuels whereby we hope that “disordering [our] senses of the work would make us dwellers in possibility” (“Deformance and Interpretation“). Network analysis requires then a kind of disordering or “deformance” of the text which presumes interest in characters and places at the expense of style and form. Despite these limitations, the model that this form of analysis produces allows us to dwell on the possibility of further insight into the intricate social world of Mrs. Dalloway.
To produce our network model of Mrs. Dalloway we used a plain text file of the novel, an html file with paragraph tags, a natural language processor and network visualization software. The plain text and html files we downloaded from eBooks at Adelaide, and we used the natural language processor from Stanford University. Using these, we built a script which first extracted all the characters, locations, and organizations from the plain text file. Then, using the paragraph tags from the html file, we inferred connections based on the occurrence of the characters, locations, and organizations in each paragraph. These two scripts resulted in a .gdf file which listed characters, places, organizations, and a set of edges to connect them. Here, we had to bring in further human intervention. While the Stanford Language Processor makes it possible to automatically identify the proper names, it does make mistakes. For example, in the case of Mrs. Dalloway it labeled “power” as a character. I’m guessing this was because of the sentence, “Power was hers, position, income,” where power is capitalized and, because of the passive voice, occupies the subject position in the sentence. The parser also treated “bill” as a character, likely because of its capitalization when Sir William is talking to Richard about getting it passed in Commons (‘commons’ was not parsed as a character). Using Nano, a UNIX text editor, we removed these words and then removed all the generated edges that contained them as nodes.
After the preprocessing, we then loaded the file in Gephi, a network visualization software developed by The University of Technology of Compiègne (UTC). The result looked something like this, a jumbled mess of nodes and edges:
Fortunately, Gephi has a number of tools for arranging, and visualizing nodes and edges from these tangled threads.
First, we had to merge nodes that duplicated characters. “Clarissa,” “Clarissa Dalloway,” and “Mrs. Dalloway” we combined into one node, “Clarissa Dalloway.” Some nodes were harder to combine. For example we decided not to combine “Richard Dalloway” and “Richard Wickham” because the second name links Clarissa, Peter, Sally and Richard to each other and to a literary tradition that informs their relationship with each other. We also combined the “Charles” node with the “Charles Morris” node and not the “Charles Darwin” node, because Darwin is never referred to in the novel by first name. Sally Seton becomes Lady Rosseter (becomes Sally Seton) and the two get only one node. “Dalloway” cannot be merged with Clarissa alone, but we did merge it with “Dalloways.” By merging the nodes, we created a more coherent visualization, but perhaps lost some of the nuance that the different names give to character relationships in the novel.
However, for us, the most important Gephi tool is the Multimode Networks Projection, which allows us to work with our data as a bipartite network. The tool allows us to assume there are different types of nodes in the network (in ours paragraphs and characters) which then allows us to automatically connect characters in the novel based on the instances where they occur within the same paragraphs (in order to find what was the most “important” paragraph, we simply found the paragraph most associated with the algorithmically determined central characters). However, to use this tool in this way, we have to make certain assumptions about how relationships are structured in the novel. Here, our methodological premises are in dialogue with the Bloomsday project undertaken by Amanda Visconti, Rhonda Armstrong, Regina Higgins, Steven Hoelscher, and Pamela Andrews, who used crowdsourcing to gather data about character interactions in Ulysses. For their project, they defined interactions in terms of communications between characters, so while they did have a category for omniscient, narrated interaction, the focus of their project was on more explicit communications between characters.
But for our purposes, we might also think of paragraphs are packets of information, usually grouped together by the author to communicate a certain overall message, or theme, or for a certain purpose. When looking at our graphs it is important to consider why characters might exist in the same paragraph. Characters might exist in the same paragraph because they are speaking with or of each other, and sharing information through language. Consider, for example, how Clarissa and Peter update each other on recent life events, each providing the other with new information in the exchange. Characters might occur in the same paragraph because they share experiences with each other; that is, they might share immediate sensory experience of place, or people, or events in the novel. For this we might think of the characters in the street who observe the passing motor car with the mysterious passenger. But perhaps the most challenging kind of paragraph when it comes to thinking about character connection is like the one Gephi has chosen as the most central, where an omniscient narrator links characters through themes, symbols, or motifs in the novel.
At present, the computer does not distinguish among the different kinds of connections between characters. Characters speaking to each other, speaking of each other, observing each other, or acknowledging each other, all these interactions listed by the Bloomsday Project would have to be differentiated by the human critic and appended to our data. Rather, our visualization traces the more tenuous connections brought about by character proximity within the text. As well, pronoun references do not manifest in the dataset. So we can only see where characters listed by proper names interact in the text. The limitation draws attention to the ways characters interact in the text and the difference between a character referred to by name or by abstract qualities, or by pronouns. What does it mean when one character refers to another using a pronoun or by attributing qualities to them? Perhaps one of the most significant examples in Mrs. Dalloway is when the Bradshaws speak of Septimus at the party. When Clarissa reflects on his death, she doesn’t know his name but “she felt somehow very like him—the young man who had killed himself.” Here, Clarissa knows Septimus only as representing a certain group, classified by their youth and their gender. For Clarissa then, at this moment, Septimus might be less a character than an idea. This kind of connection is not taken into account by our visualization.
Still, the Multimode Network Projection tool allows us to see the connections among character names that do appear in the same paragraphs:
Gephi also allows us to apply algorithms to measure the weight of certain nodes in the network. For example, using Google’s Page Rank algorithm, we can see the probability of landing at a certain nodes after a given number of clicks. Unsurprisingly, a traveller in the Mrs. Dalloway network will often click back to “Clarissa”:
But consider also the relative centrality of Peter in this visualization, and the surprising importance of God. Consider also the importance of the Bradshaws in the network. William Bradshaw’s profession allows him to bridge the middle class world and the working class as represented by the Warren Smiths. We can see that when we break down the network by modularity, William Bradshaw falls into the Warren Smith community (highlighted in green). In that category as well, we get Shakespeare, George Bernard Shaw, Anthony and Cleopatra, and Keats. God, on the other hand, belongs to Miss Kilman:
Of course, we can adjust the modularity to redefine the boundaries between communities, but it would take a great degree of readjustment to shift the Bradshaws and the historical and cultural figures out of the Warren Smith community as they appear together in a number of paragraphs.
Clearly there are limitations to this kind of analysis, provocatively delineated by Kathryn Schulz in her New York Times response to Moretti. Considering how important a role time plays in Woolf’s work, our visualizations curiously seem to flatten out the temporality of the novel’s relationships. As well, network visualizations have been critiqued for attempting to quantify the qualitative experience of reading. Indeed as I have mentioned above, it is worth noting the ways pronouns cannot seem to be captured and identified by our current tools. Yet there remains something oddly intriguing about the networks that Gephi produces. And the visualization seems to get a lot of things right. As Lisa Rhody remarks in her response to Schulz and Moretti, the results which seem obvious “act as a type of control” to show you that the model you’re working is functioning properly. And if network analysis can draw attention to the significance of God in modernist networks, or can show us something new about the cultural segregation of literature in Mrs. Dalloway and Septimus as a cultural producer, perhaps this is enough of a reason to continue modelling and analyzing modernist novels as networks.
By the way, when I tried to remove the “Clarissa” node, I broke Gephi …
McGann, Jerome and Lisa Samuels. “Deformance and Interpretation.” New Literary History 30.1 (1999): 25-56. JStor. Web. 14 July 2014.
Moretti, Franco. “Network Theory, Plot Analysis.” New Left Review 68, March-April 2011. New Left Review. Web. 14 July 2014.
Rhody, Lisa. “A Method to the Model: Responding to Franco Moretti’s Network Theory, Plot Analysis” Magazine Modernisms: Dedicated to Modern Periodical Studies. 22 August 2011. Web. 14 July 2014.
Schulz, Kathryn. “What is Distant Reading” The New York Times. 24 June 2011. Web. 14 July 2014.
Jana Millar Usiskin
Graduate student, studying modernisms and the digital humanities at UVic.