Web-as-Corpus approaches and tools for translators

52 Pritchatts Road SG07 (Computer Cluster)
Arts and Law
Thursday 3rd November 2016 (14:00-15:30)
Speaker: Maristella Gatto, Università di Bari (Italy)

Corpus resources have undergone significant changes over the past two decades, taking on such characteristics as dynamic content, distributed architecture, virtuality, connection with web search, which Wynne (2002) identified at the beginning of the new Millennium as distinctive of all linguistic resources in the 21st century. These changes are nowhere more evident than in a specific trend at the confluence between Corpus Linguistics and Computational Linguistics, where the enormous potential of the web as a linguistic resource has been addressed under the umbrella term “Web as Corpus” (Kilgarriff–Grefenstette 2003; Baroni & Bernardini 2006). From the widespread – albeit controversial – practice of accessing the web thorough ordinary search engines for immediate evidence of attested usage, to the development of web concordancers, to specific tools for the semi–automated compilation and exploration of disposable monolingual and comparable corpora, the web is now a fundamental resource in corpus linguistics.

Such approaches and methods have, however, not only changed the practice of corpus compilation and exploration, but have also – more crucially – affected the way in which we conceive of corpora today. This can be represented as a shift occurring in the basic metaphor underlying corpus resources, whereby the reassuring notion of a corpus as a ‘body’ of texts (i.e. a well–proportioned corpus of authentic texts sampled so as to be representative of the whole) is complemented by a less reassuring, but possibly more functional, image of a corpus as a ‘web’ of texts. While the notion of a linguistic corpus as a body of texts rests on some related issues, such as finite size, balance, representativeness, permanence, the very idea of a web of texts brings about notions of non–finiteness, flexibility, provisionality, all of which need to be addressed if the web is to be used as a corpus on sound methodological bases (Gatto 2009; 2014).

It is against this background that Web as Corpus approaches and tools will be surveyed in the present seminar, with specific reference to their growing impact on Translation Studies (Ferraresi 2009; Bernardini and Ferraresi 2013; López-Rodríguez 2016; Stewart 2016). Firstly the web’s controversial status as a virtual multilingual corpus on demand is investigated and the most common methods and approaches to the web–as–corpus are compared and contrasted with reference to the accomplishment of specific translation tasks. Secondly specific tools and resources, such as BootCaT (Baroni & Bernardini 2004) and Sketch Engine (Kilgariff et al 2004) are introduced and discussed, with an eye to their specific potential as part of the technological competence required of translators. All the tools and resources will be evaluated through ad hoc hands-on practice in terms of their usability for professional purposes.

Maristella Gatto is Assistant Professor of English at the University of Bari, Italy, where she teaches English Linguistics and Translation Studies. Her main research interests are corpus linguistics, computer-mediated communication, translation studies, and tourism discourse. She is the author of the monograph The Web as Corpus. Theory and Practice (Bloomsbury 2014) and several other essays and book chapters on corpus linguistics and the web. She edited the volume Translation. The State of the Art/La Traduzione. Lo Stato dell'Arte (Longo 2007) and is the author of a handbook for students of English for tourism (Dublin. From paralysis to international tourism (Aracne 2007).