How Computers Read Computer Generated Novels

2:00 PM, mardi 14 août 2018 (1 heure 15 minutes)
National Novel Generation Month (NaNoGenMo) is a recurring, annual event patterned after NaNoWriMo (National Novel Writing Month). In the “Gen” version, participants attempt to write computer code that will generate a novel of 50K words. What started as Darius Kazemi’s joke on Twitter in 2013 has since involved between 80 and 190 participants each year, resulting in well over 300 completed novels. But how readable are these novels, and what does it mean to read the prose of a generated text? Are there specific genres or appropriated authors whose work recurs frequently? Are there other trends, patterns or themes that emerge when taking a wide view on these works? Taken together, is there stylistic evidence of computational generation extant within the generated output of these programs?

In this paper, I propose a course of study into the textuality of computer-generated novels, specifically the corpus of work generated for NaNoGenMo. Given the scope of this corpus, my intention is to use text analysis techniques such as topic modeling, frequency analysis, stylometrics, and other varieties of machine reading to explore these questions about the textual characteristics of computer-generated fiction.

These data-driven techniques are useful for providing certain kinds of insights into literary texts. For example, Allen Riddell and N. Katherine Hayles used statistical analysis and chi-squared probability comparisons to the Brown linguistic corpus to find words that were unusually frequent or infrequent in Mark Z. Danielewski’s Only Revolutions. Their analysis revealed previously-unnoticed lippogrammatic constraints in Danielewski’s work, that it does not contain the word “in” or (appropriately) the word “or.” This insight is the foundation of their reading of Only Revolutions as the appropriate converse to his previous work, House of Leaves: since House of Leaves is a book of "obsessive interiority," it makes sense that it’s converse book would categorically exclude the possibility of being inside.

Voyant Tools, a suite of web-based interfaces for text analysis and visualization tools, carries the tagline “see through your text,” and the button to submit a corpus for analysis bears the word “Reveal.” In this way, text analysis is positioned as a method for looking through the ways in which text beguiles its readers’ with meaning and finding the author or other textual factors “behind the curtain” of fiction.

However, for works generated in the spirit of NaNoGenMo, authorship and intentionality necessarily mean something different. Randomness is a common compositional element, for one thing, and even for those works which involve some form of algorithmic decision-making, the subjective position of the algorithm is rigorously and transparently determined by the book’s source code. Because the ideas that machine-assisted text analysis seek to discover about a work are already made perfectly clear for a NaNoGenMo work by way of the source code, what can machine-assisted text analysis reveal by comparing several texts together? Furthermore, many NaNoGenMo novels work through appropriating and remixing earlier texts like Jane Austen’s novels, so will machine-assisted textual analysis on these derivative works reproduce, obfuscate, or amplify the data from the original texts? Whichever the case, this process should reveal insights into the strengths and limitations of computer-assisted text analysis, and in doing so, help “mind the gap” between digital humanities and electronic literature.
University of Mary Washington
Associate Professor