“Ill shapen sounds, and false orthography”: A Computational Approach to Early English Orthographic Variation

Anupam Basu

While certain broad patterns of “standardization” in early modern printing practices have been studied by linguists, the general assumption has been that early English orthography moves from confusion and chaos to the emergence of a standard around the mid seventeenth century, presumably influenced by the writings of a core set of linguists, grammarians, and teachers. In this paper I study the gradual emergence of this standard and argue that it is the product of a long, deeply contested and yet fundamentally structured set of transformationsthat owes more to the mechanical, reiterative processes of setting and redistributing type rather than the prescriptive guidelines of the orthoepists. Using a database of nearly twenty-five million n-grams extracted from the EEBO- TCP corpus, I demonstrate that the underlying shifts that constitute this long process of standardization are fundamentally structured into very specific shifts in the use of individual graphemes. English spelling doesn’t emerge fully-formed from chaos into order. Rather, distinctive changes happen within given periods and standardization is the layered aggregation of these multiple conventions. I also introduce an algorithm for the large-scale extraction and analysis of orthographic patterns. I hope to draw the attention of literary scholars and print historians to an area they have largely disregarded as the domain of linguistic arcana. The history of orthography, I argue, is inextricably intertwined with the material history of print in England and in identifying some of the pressure points that initiate paradigmatic shifts, I hope to enable a dialogue between the linguistic model of orthographic change and a literary-historic model that asks what factors – intellectual debates, authorial preferences, material practices – might drive such change.

About the author(s)

Anupam Basu is an Assistant Professor of English at Washington University in Saint Louis, where he is currently developing a web portal that seeks to make the EEBO-TCP corpus tractable for large-scale computational analysis. He is also working on a monograph on the representation of crime and social change in early modern England.

