Corpus linguistics and the digital humanities: Looking at fictional and real speech

Professor Michaela Mahlberg and Viola Wiegand from the CLiC Project will be giving a talk as part of the Aston University's Centre for Critical Inquiry into Society and Culture (CCISC) seminar series on 30 April 2019. Please register via Eventbrite.

Corpus linguistics and the digital humanities: Looking at fictional and real speech

Michaela Mahlberg & Viola Wiegand

30 April, 16.00 - 17.30, Room MB708C

While corpus research has traditionally focused on non-literary texts, there has been increasing interest in the study of fiction, which is often covered under the umbrella term ‘corpus stylistics’ (Semino and Short 2004). In order to be able to account as fully as possible for features of literary texts we need to create new tools and develop methodologies that are tailored to the task at hand. There are numerous digital humanities tools for the study of fiction, but similarities and overlap with corpus linguistic concerns are rarely brought to the fore. In this talk, we illustrate key functionalities of the web application CLiC ( and its latest release CLiC 2.0 (March 2019). CLiC has been specifically designed for the corpus linguistic study of narrative fiction. The CLiC corpora of 19th century fiction comprise over 140 books and 16 million words. For all CLiC texts, direct speech and specific places around speech have been marked up (Mahlberg et al. 2016). Hence, CLiC can run searches within and across defined textual subsets and support the analysis of features of narration and fictional speech. An important question is how a range of  features and patterns in fiction can be brought together in a coherent theoretical framework. The search for such a framework also highlights where corpus linguistics and the digital humanities can come more closely together. Our suggestion will focus on a lexically-driven approach that can account for fictional worlds while at the same time highlighting the fuzzy boundaries between fiction and the real world. We explore these boundaries with a focus on speech patterns. 


Mahlberg, M., Stockwell, P., Joode, J. de, Smith, C., & O’Donnell, M. B. (2016). CLiC Dickens: 
novel uses of concordances for the integration of corpus stylistics and cognitive poetics. Corpora, 11(3), 433–463.

Semino, E., & Short, M. (2004). Corpus Stylistics. Speech, Writing and Thought Presentation in a Corpus of English Writing. London: Routledge