Arguments for and against DIY corpus tools creation: A debate about programming

Photograph of Laurence AnthonySpeaker

Laurence Anthony


  • Faculty of Science and Engineering, Waseda University, Japan.
  • Honorary Research Fellow, Lancaster University, UK.


In much of corpus linguistics work, the researcher does not directly observe the raw corpus data, but instead, views it indirectly through the lens of a corpus tool, such as a concordancer, keyword list generator, or network graph visualizer. To date, the development of corpus tools has largely been the work of a few individual software developers, and as a result, we now have a very useful set of powerful, user-friendly tools. However, these tools are ultimately limited in their scope and features, being designed for certain research questions and data sets, or reflecting the particular interests of the developer. One solution to this problem is for the corpus linguist to learn a programming language and start developing tailor-made DIY tools for their research questions. However, learning to program introduces a new set of problems, such as which language to use, and how much time should be devoted to the task.

In this presentation, I will first review common tools that are used in corpus linguistics research, highlighting their strengths and weaknesses, and showing ways in which researchers can maximize their use. Next, I will consider the arguments for and against corpus linguists developing DIY corpus tools that complement or replace existing tools. For those interested in learning programming, I will suggest a few good places to start and some traps that should be avoided during the learning process. For those who feel that programming is unnecessary or beyond them, I will suggest ways to find and work successfully with software engineers in the development of tailor-made tools. Then, at the end of the presentation, I will give some thoughts on the important roles that programming and tools development are likely to have on the future of corpus linguistics research.


Laurence Anthony is Professor of Applied Linguistics at the Faculty of Science and Engineering, Waseda University, Japan. He has a BSc degree (Mathematical Physics) from the University of Manchester, UK, and MA (TESL/TEFL) and PhD (Applied Linguistics) degrees from the University of Birmingham, UK. He is a former director and current program coordinator at the Center for English Language Education (CELESE), Waseda University. His main interests are in corpus linguistics, educational technology, and English for Specific Purposes (ESP) program design and teaching methodologies. He received the National Prize of the Japan Association for English Corpus Studies (JAECS) in 2012 for his work in corpus software tools design. He is the developer of various corpus tools including AntConc, AntWordProfiler, EncodeAnt, ProtAnt, and TagAnt.