From Samuel Pickwick to Oliver Twist, David Copperfield to Ebenezer Scrooge, Charles Dickens is the creator of some of the most well-known characters in fiction. He is heralded as a literary genius, with works that have been adapted and retold numerous times on stage and screen for generations, irrefutably woven into the fabric of British culture and identity.

How can we shed new light on Dickens 150 years after his death?

When a rise in poverty was noted by a charity in 2018, they warned of a return to ‘Dickensian poverty’, summoning Dickens’ evocative descriptions of the hardships faced by his characters in Victorian Britain.

“That is a common example. I would always hear people refer to certain words or descriptions as being ‘Dickensian’ and thought, as a corpus linguist, why not look for the evidence? We can systematically study language patterns and see if they really are Dickensian,” explains Professor Michaela Mahlberg, Director of the Centre for Corpus Research at the University of Birmingham.

And so, the CLiC Dickens project was born. At the heart of the project was the free-to-use concordance tool, CLiC, a web app that was designed to tackle such questions.

“We use language to understand and shape the world we live in,” says Professor Mahlberg. “Dickens was a master at doing just that. His works are much studied, analysed and debated but, through this tool, we’ve been able to find out what more the language can tell us about the everyday human interactions, how people talked and even their body language.”

"Perhaps there's more meaning in them words than you suspect." "Perhaps there is," said the strange man, gruffly - Barnaby Rudge

The CLiC idea started with a prototype created jointly with colleagues at the University of Liverpool. In 2013, funding from the Arts and Humanities Research Council (AHRC) gave life to the CLiC Dickens project at the University of Nottingham, with Professor Peter Stockwell as Co-Investigator. CLiC moved to Birmingham, when Professor Mahlberg joined the University in 2015.

“We know plenty about the more exaggerated characters in Dickens’ work,” adds Professor Mahlberg. “He left, for instance, number plans in which he would explicitly outline noteworthy elements of his plots and characters. While the CLiC tool allows us to review texts to verify our understanding of those standout characteristics, there is much more to be found mining the whole canon of work and looking closer at the ‘everyday’ interactions between characters. There are no number plans for the ‘normal’.”


CLiC started with Dickens’ fifteen novels. Uniquely for a concordance tool, CLiC users can search by speech and non-speech, allowing more detailed analysis of what the characters say.

“That is what is at the heart of corpus linguistics. So much of his skill happens without you noticing explicitly, just as much of language happens unconsciously. We are not aware of how often we say ‘I don’t know’, or even ‘like’. A concordance tool can do that, it can quantify language use and with CLiC we wanted to really put ‘speech’ in the spotlight.”

“People reference Dickens’ great ear for mimicking the spoken language of the era, but what more is there? There is a natural tendency to look at what is unique to the characters in his work, but there are equally interesting observations to be made on what Dickens’ characters have in common with ‘normal’ people. To be able to do that you need the mass of data, every word, and an ability to analyse that in an effective manner.”

Active preparations were made for the day on which some of its treasures were to be publicly displayed - Little Dorrit

Since its inception, the CLiC tool has grown to incorporate other works of mainly nineteenth century fiction. At the time of writing CLiC contains over 150 books and 16 million words, available across five corpora: the corpus of Dickens’s Novels, a 19th Century Reference Corpus, a corpus of 19th Century Children’s Literature, The African American Writers corpus, and Additional Requested Texts. Users can scour the works of HG Wells, Anne Brontë, Arthur Conan Doyle and more.

Books available in CLiC:

  • Charles Dickens: 15 books, 3,833,544 words
  • 19th Century: 29 books, 4,512,568 words
  • 19th Century Children’s Literature: 71 books, 4,441,808 total  words
  • African American Writers 1892-1912: 8 books, 520,268 words
  • Additional Requested Texts: 31 books, 3,424,164  words

As the scope has grown, so has the influence. CLiC has become a vital tool for corpus linguists, literary scholars and students of Victorian culture and history, being used in about 100 countries.

Professor Mahlberg sees the tool as an example of how the gap between linguists and literary scholars can be bridged through digital developments.

“Look at how most English departments are set up. More often than not you have got English Language and English Literature as separate sections. This sort of tool combines the two disciplines.

“It is more computational than traditional ‘close reading’, but less distant than some of the digital humanities tools, where you can feed in data and not have to get that hands-on with the text itself. By using computational tools but requiring a closeness to the text itself it helps to bridge that gap, and that promises to benefit the experts in both fields. That is why the initial feedback from our peers has been so fantastic.”

Crucially for Professor Mahlberg, it is user-friendly. The update to CLiC 2.0 has further improved that user experience and has helped reach audiences beyond the academic sphere.

“One of the most pleasing things of all is that we have been able to take it into schools,” says Professor Mahlberg. “A lot of digital tools are designed for academic research, and that often excludes schools and children by its very nature. Seeing CLiC being used by teachers and students, seeing them explore the world of Dickens and other Victorian literature in new ways, has been incredibly rewarding.”

Tom triumphed very much in this discovery, and rubbed his hands with great satisfaction -  Martin Chuzzlewit

Sharing the CLiC tool with schools was always the intention. Once the contact was established, the team were delighted to see teachers’ enthusiasm and creativity in integrating the tool into their teaching.

One of the outputs from the project has been the regular blog posts from academics, students and teachers.

One such blog, by Birmingham based teacher Claire Stoneman, explored the Robert Louis Stevenson classic, The Strange Case of Dr Jekyll and Mr Hyde. Using the CLiC tool to analyse the 51 occurrences of the word ‘door’, she examines the importance of doors, and the people who interact with them, in the novella.

"My initial examination of the noun door yielded fascinating results. As I thought, most of the occurrences of door at the very start of the novella were initially linked to the building (so, it could be argued, Hyde), and then to Utterson, and partially Enfield. But certainly from Chapter 5 onwards, it is the servants who open doors, close doors, refuse entry (some under their master’s orders, some not). Even in Chapter 8, when Utterson has decided he will break down Jekyll’s laboratory door, he only does so after the encouragement of the butler, Poole."


In another blog post, English teacher Lorraine Adriano describes how she used CLiC in the classroom.

"Sharing findings and encouraging others to take investigations a step further increased a sense of connection with the text and the skill of Dickens in presenting his settings, characters and themes. Colleagues who observed parts of the lessons noted the enthusiasm for the project and pupils reported that they had watched a production of A Christmas Carol by a visiting theatre company with a greater interest in whether the representation was faithful to what they viewed as Dickens’s intentions."

“These are really good examples of what can be done in a school setting,” says Professor Mahlberg. “The stories are still so prevalent in our culture. We talk about having a mind like Sherlock, being miserly like Scrooge, being a bit of a ‘Jekyll and Hyde’ character. It speaks to people, and it allows passionate readers to go further into the literature.”

For Professor Mahlberg, being able to explore Charles Dickens in new ways has enhanced her own enjoyment and reading of the text.

“For example, I looked at words related to fire in A Christmas Carol. The patterns are different at the start and the end. At the outset, Scrooge does not want Bob to put more coal on the ‘small’ fire because he is so miserly. At different points in the story we see characters interact with one another and fire and fireplaces being crucial to the situation, illustrating community and family. So just by tracing that one word through the story you can see so much about the structure of the narrative – and also reflect on the contexts in which our social interactions take place.”

In 2019 / 2020, there have been so far five publications by the CLiC team that shed new light on digital approaches to studying fiction and  the analysis of body language and conversational speech in particular.

The group’s research has delved into the ‘normal’ descriptions of body language that can only really be fully understood when looking at the frequency of use. Eyes in particular are often considered the mirror to the character’s soul. Patterns of eye language that the group identified relate to the direction and rigidity of gaze. So an example like his eyes fixed on the ground can suggest an isolated mental state.

Fictional speech has been another important focus of the group’s research. The CLiC web app makes it possible to study the language of direct speech speech separately from narrated text. By comparing this data to authentic spoken language from the 20th century, the team was able to identify phrases that are more typical of 19th century speech (very much obliged to you, I beg your pardon sir) as well as phrases in the novels that are still prevalent in modern, authentic spoken language (such as it seems to me that, what do you think of).

For Professor Mahlberg, the plan is to evolve the CLiC platform further, and continue bridging the gap between all parties with a mutual interest in preserving and exploring the works of Dickens and other authors of the nineteenth century.

“These stories are enshrined in our culture, Dickens seems like he belongs to people,” says Professor Mahlberg. “How better than to create something that helps anyone to access even more of his language and the literary context of his time?”


Discover more stories about our work and insights from our leading researchers.