Professor Paul Baker (Lancaster) delivered the 2019 Sinclair Lecture at the University of Birmingham on 24 June 2019.

Paul Baker is Professor of English Language at the Department of Linguistics and English Language, Lancaster University where he is a member of the Corpus Approaches to Social Sciences ESRC Research Centre. He specialises in corpus linguistics, particularly using and developing corpus methods to carry out discourse analysis, as well as being involved in research in health, media language, variation and change and social identities. He has written 16 books, including Using Corpora to Analyse Discourse (2006), Sexed Texts: Language, Gender and Sexuality (2008) and Discourse Analysis and Media Attitudes (2013). He is commissioning editor of the journal Corpora (EUP) and a fellow of the Royal Society of Arts. 

Human beings currently create around 2.5 quintillion bytes of data every day. The ability to quickly and accurately identify trends and linguistic patterns across massive and continuously growing data-sets is viewed as important although automated techniques are not yet able to out-perform human analysts in numerous ways.

In this talk I focus on what a corpus linguistics approach can offer in an increasingly crowded computational analysis field by describing an ESRC-funded project which explored how corpus methods could be used to improve NHS services. The project involved the analysis of 200,000 pieces of feedback left by members of the public on the NHS Choices website and resulted in several challenges relating to aspects of the data and the research questions that were given to us by the Patients and Information Directorate at NHS England.

A key feature of this involved a learning curve where we realised that our initial understandings around language use in this context and the right techniques of analysis were not always as precise as they could have been. As we refined our methods we found that some of the most interesting aspects of the analysis resulted in answers to questions that we had not asked. However, the process of engaging with this unfamiliar form of data and being set questions we would not have chosen ourselves ultimately resulted in benefits - we were able not only to gain insight into the nature of the data, but insight into how corpus approaches can be more effectively used for close analysis of large datasets.

The talk concludes with a discussion of the extent to which insight alone is the key to making impact.

Great turnout tonight at 14th Sinclair lecture #Bham. Interesting to hear about impact in CL & practical applications. The field has a lot to offer & is being underutilised! @_paulbaker_ @CCR_UoB @unibirmingham @CorpusSocialSci #corpuslinguistics #trustmeimalinguist #healthcare

Great Sinclair lecture by @_paulbaker_ today, followed by a lovely mixer with all the corp ling peeps from the summer school organized by @FlorentPerek. On days like these one really feels that Birmingham is quite a scholarship hub!