CL2017 Pre-conference workshop 1

CLARIN-UK: Promoting Cross-disciplinary Corpus Linguistics

Workshop convenor

Martin Wynne
University of Oxford
martin.wynne@bodleian.ox.ac.uk

Workshop programme

Please visit the separate workshop website created by Martin Wynne for the up-to-date programme.

Workshop summary

Language is at the core of all academic disciplines, sometimes directly as the sole object of study, more often as one component in more complex objects and processes, and almost ubiquitously as the medium of communication. While linguists take language as their object of study, all human and social scientists study social and cultural phenomena which are expressed and shaped through language. Furthermore, many scientists study physical, cognitive and medical aspects of language. For many years researchers have compiled and used language data to understand, for example, how children learn languages, or how politicians argue during political discourse. These rich resources have given rise to language technologies allowing the exploration of digital collections of language materials.

In the UK, collaboration between experts in language technologies and other disciplines has been a constant and recurrent theme for several decades. Many of the linguists and language technologists involved in this work have recently come together to form the CLARIN-UK consortium, to start to use their expertise and resources to contribute to the construction of the Europe-wide CLARIN research infrastructure. CLARIN aims to establish persistent and effective services so that language resources and tools can be used effectively across the social sciences and humanities. While in many European countries this involves starting anew to build capacity in language technologies and to seek out new links with other disciplines, the UK is remarkable for having carried out such work for a long time.

This workshop will offer insights into successful projects, as well as demonstrations and tutorials for key resources and tools, which we hope will generate ideas for new ventures exploring new ways of using the resources across disciplines.

CLARIN is a Europe-wide initiative to promote the use and re-use of the data, tools and methods in research across the humanities, social sciences and beyond. CLARIN is operating and building an infrastructure to make corpora and other forms of language data easier to find, use and combine, as well as integrating key software applications into the infrastructure to make it easier to deploy them with the relevant datasets. An overview of the current set of services can be found on the CLARIN website. Major CLARIN operations are funded in nineteen European countries, with more joining all the time, supporting repositories, training and research programmes. Although Universities in the UK are major players in corpus and computational linguistics, the UK has not made a formal commitment to join CLARIN. Instead, a 'bottom-up' initiative has seen the formation of the CLARIN-UK Consortium in 2015, with a number of important centres, all of whom are involved in various ways in promoting digital language resources and applications, and collaborating in research topics and themes beyond corpus linguistics. Also in 2015, thanks to the initiative of this consortium, the UK joined CLARIN as an Observer country.

The main aims of the workshop are:1. to offer a contemporary and historical overview of the ongoing success of collaborations between linguistics and other disciplines in digital research;2. to lower the barriers to using language resources and tools by highlighting available software and corpora;3. to identify new areas of collaboration through discussion and the presentation of exemplary research.

Audience

The workshop will be of general interest to all conference participants, and especially:

Researchers in the social sciences and humanities (primarily, but also open to researchers from other disciplines);
Linguists, computational linguists and computer scientists engaged in collaborations with other disciplines;
Linguists looking for ideas and resources for participating in multi-disciplinary research;
Strategists, policy makers and support staff seeking to build research programmes and research infrastructure involving digital research.

Programme

The programme includes a series of presentations from the CLARIN-UK Centres about the areas in which they have experience of working in collaboration with disciplines outside of linguistics, and in which they have promoted the use of digital language resources and tools to facilitate new and innovative forms of research. There will also be presentations about the Europe-wide activities of CLARIN, and from another important CLARIN country.

The CLARIN-UK Centres presentations will include all or almost all of the following, with indications below of some of the academic domains with which they have collaborated:

ESRC Centre for Corpus Approaches to the Social Sciences and UCREL, Lancaster University (Social Sciences, Literary studies, History, Business Studies, Economics, Healthcare Studies)
Centre for Corpus Research, University of Birmingham (Literary studies, Lexicography, Language Teaching and Learning)
School of Critical Studies, University of Glasgow (History, Political Science, Literary Studies)
Centre for Translation Studies, University of Leeds (Translation Studies, Language Teaching and Learning)
National Centre for Text mining, University of Manchester (Biology, Medicine, Healthcare, Biodiversity, History, Social Sciences)
Oxford Text Archive, Bodleian Libraries, The University of Oxford (Literary Studies, History)
Natural Language Processing Group, University of Sheffield (Political Science, Environmental Science, History, Classics, Archive and Information Science, Criminology, Business Studies, Healthcare Studies, Cultural Heritage)
Endangered Languages Archive, SOAS World Languages Institute, SOAS University of London (Linguistics, Anthropology)
Research Group in Computational Linguistics, University of Wolverhampton (Translation Studies, Healthcare Studies, Language Teaching and Learning)
Research Group in Corpus Linguistics, School of Humanities, Coventry University (Business Communication, Cultural Archives, English-Medium Instruction, Higher Education)

Organizing Committee

Marc Alexander, University of Glasgow
Sophia Ananiadou, University of Manchester
Eric Atwell, University of Leeds
Sheena Gardner, Coventry University
Andrew Hardie, Lancaster University
Michaela Mahlberg, University of Birmingham
Tony McEnery, Lancaster University
John McNaught, University of Manchester
Hilary Nesi, Coventry University
Constantin Orasan, Wolverhampton University
Wim Peters, University of Sheffield
Paul Rayson, Lancaster University
Nick Riches, University of Newcastle
Mandana Seyfeddinipur, SOAS University of London
Serge Sharoff, University of Leeds
Martin Wynne, University of Oxford