Launching the Corpus Statistics Group

Arts and Law, Research
Thursday 11th February 2016 (14:00)
Download the date to your calendar (.ics file)

Please register your attendance with Lucia Puricelli

The corpus statistics group is a collaboration between the University of Birmingham and the University of Nottingham. It brings together researchers from corpus linguistics and statistics who are interested in investigating linguistic patterns across large electronic data sets.

The launch event of the group will present work-in-progress research from this collaboration. The event will also be an opportunity to discuss issues around the availability of data sets, infrastructural needs and challenges in the development of appropriate tools. A key focus of the group is the exploration of the reach of methods across a range of humanities, social sciences and science disciplines - so researchers from any area with an interest in textual data and / or statistical and computational methods are very welcome to attend the event and explore new opportunities for collaborative work.

The half-day event will conclude with an invited talk by Professor Laurence Anthony from Waseda University, Japan. Professor Anthony is the creator of the corpus analysis software AntConc.

Venue: The Digital Humanities Hub, ERI Building

The event is free, but for catering purposes please register your attendance with Lucia Puricelli by Friday 5 February.


Please note there might be further updates to the schedule which will be emailed to registered participants

  • 14:00 – 14:20 The need for Corpus Statistics: Corpus analysis and the identification of linguistically relevant patterns - Michaela Mahlberg
  • 14:20 – 14:40 Getting to know your corpus: applying Topic Modelling to a corpus of research articles - Paul Thompson, Akira Murakami and Susan Hunston
  • 14:40 - 15:00 Identifying surveillance discourses – Viola Wiegand
  • 15:00  – 15:20 Corpus Analysis from a mathematical perspective – Simon Preston
  • 15:20 – 15:40 Coffee Break
  • 15:40 – 16:00 Preliminary results on modelling time dependence in the Times Digital Archive – Anthony Hennessey
  • 16:00 – 16:20 Graphical representations of a corpus, and clustering on graphs – Yves van Gennip
  • 16:20 – 16:40 The right to read is the right to mine –library resources for cross-disciplinary work – Sarah Bull and Neil Smyth
  • 16:45 – 17:30 Key talk: Arguments for and against DIY corpus tools creation: A debate about programmingLaurence Anthony
  • 17:30 Discussion and round-up followed by drinks reception 

The event is supported by the EPSRC ISF.