Exploring frequency and textual units

Metallurgy and Materials building (G6 on the university campus map)
Monday 3 October 2016 (12:00-13:00)
  • Centre for Corpus Research seminar

Speaker: Michael Barlow, University of Auckland

Venue: Metallurgy and Materials Building, Room GC19

The highlighting of lexicogrammatical patterns using a KWIC format, while beneficial for many studies, inevitably leads to the backgrounding of the broader connections with text or discourse structure. Based on an investigation in Barlow (2004), some approaches in Hoey (2005), and the use of a customised database in Römer & O’Donnell (2010), the program WordSkew was developed to link data on the frequency of occurrence of words and phrases with specified aspects of text structure.  Thus the typical corpus search routines are modified in WordSkew in order to provide frequency information for specific positions or segments of sentences, paragraphs, and other units.

While we know that words or phrases are not uniformly distributed within a text, we have very little information on how the clustering of words relates to positions in particular units in text structure.  The software allows us to pose at least two kinds of research questions. First is a distribution question: how is a particular word or phrase (or other abstract category such as part-of-speech class) distributed with respect to different positions in the units of discourse? The skew part of Wordskew refers to the assumption that the more interesting patterns of distribution across sentences or paragraphs or other text units will not be uniform but biased towards beginnings or middles or ends of the text unit. The second question leads to a more nuanced analysis of usage, broadly construed. We can investigate how the meaning or function of a specific word or phrase varies as we look at different parts of a discourse. The latter question can be expanded to examine the distribution of words or phrases in a literary text such as a novel.

To illustrate the function of WordSkew, I present some exploratory studies illustrating the kind of patterns that emerge when taking textual units into account within a corpus-based study.

About the speaker

Michael Barlow received his PhD in Linguistics from Stanford University. He is currently Associate Professor in the Applied Language Studies and Linguistics Department at the University of Auckland in New Zealand and divides his time between Auckland and Houston in the United States. Dr. Barlow has written books and articles on corpus linguistics and regularly gives presentations and workshops at institutions and conferences around the world. He has created several text analysis programs including concordancers MonoConc and ParaConc and a collocation extraction program, Collocate. A recently developed program, WordSkew, is designed to apply corpus analysis techniques while at the same time taking note of the structure of texts.


  • Barlow, M. (2016) WordSkew: Linking corpus data and discourse structure. International Journal of Corpus Linguistics 21:1  104-114.
  • Barlow, M. (2004) Software for corpus access and analysis. In J. Sinclair (ed). How to use corpora in language teaching. Amsterdam: John Benjamins.
  • Hoey, M. (2005). Lexical Priming. A New Theory of Words and Language. London: Routledge.
  • Hoey, M. & O'Donnell, M. B. (2015). Examining associations between lexis and textual position in hard news stories, or according to a study by... . In Groom, N., Charles, M. & John, S. (eds) Corpora, Grammar and Discourse. In honour of Susan Hunston. Amsterdam: John Benjamins.
  • Mahlberg, M. (2009). Local textual functions of move in newspaper story patterns. In Exploring the Lexis-Grammar Interface. U. Römer and R. Schulze (eds), 265-287. Amsterdam: John Benjamins.
  • Römer, U., & O’Donnell, M. B. (2010, May). Positional variation of n-grams and phrase-frames in a new corpus of proficient student writing. Paper presented at the ICAME 31 Conference, Giessen, Germany