Conference e-journal

This is a collection of the papers presented at the Corpus Linguistics 2005 conference which was held in Birmingham July 14-17 2005. Some of the papers are either as Word documents or as PDF files.

The proceedings have been divided into 11 subcategories: 

  • Compiling a corpus
  • Contrastive corpus linguistics
  • Discourse
  • Evaluation and stance
  • Grammar
  • Language learning and error analysis through corpora
  • Language processing and corpus tool
  • The lexicon
  • Phraseology and patterns in language
  • The web as a corpus
  • Spoken discourse

Compiling a corpus

Rachel Aires, Diana Santos & Sandra Aluisio: "Yes, user!": compiling a corpus according to what the user wants:

Latifa Al-Sulaiti and Eric Atwell: Extending the Corpus of Contemporary Arabic  

Wendy Anderson & Dave Beavan: Internet delivery of time-synchronised multimedia: the SCOTS Projects

Caroline Barrire & Akakpo Agbago: Corpus Construction for  Terminology

Sara Piccioni: The Lorca corpus at the crossroads of philology and corpus linguistics  

Gong Wengao: English in computer-mediated environments: a neglected dimension in large English corpus compilation

Hilary Nesi, Sheena Gardner, Richard Forsyth, Dawn Hindle, Paul Wickens, Signe Ebeling, Maria Leedham, Paul Thompson and Alois Heuboeck: Towards the compilation of a corpus of assessed student writing 

Contrastive Corpus Linguistics

Gisle Andersen: Assessing algorithms for automatic extraction of anglicisms in Norwegian texts

Jozsef Andor: A Lexical Semantic-Pragmatic Analysis of the Meaning Potentials of Amplifying Prefixes in English and Hungarian A Corpus-based Case Study of Near Synonymy

Sandrelli Annalisa & Bendazzoli Claudio: Lexical patterns in simultaneous interpreting: a preliminary investigation of EPIC (European Parliament Interpreting Corpus)

Marianna Apidianaki: Translation prediction using word co-occurrence graphs

Tatjana Balaic Bulc: Connectors in students' academic writing in two closely related languages

Silvia Bernardini & Marco Baroni: Spotting translationese: A corpus-driven approach using support vector machines

Gabriela Castelo Branco Ribeiro & Maria Carmelita Padua Dias: Two corpus-based studies about the translation of adjectives in English and Brazilian Portuguese

Wallace Chen: Patterns of Connectors in the English-Chinese Parallel Corpus of Popular Science Texts

Debbie Elliott: Using corpora to automatically detect untranslated and ?outrageous? words in machine translation output

Ana Frankenberg-Garcia: A corpus-based study of loan words in original and translated texts

Randall L. Jones : Analysis of lexical correspondence in an English-German parallel corpus

Zhenglin Jin & Caroline Barriere: Exploring sentence variations in bilingual corpora

Tony McEnery and Richard Xiao: Passive constructions in English and Chinese: A contrastive and translation study

Stella Neumann and Silvia Hansen-Schirra : The CroCo Project: Crosslinguistic corpora for the investigation of explicitation in translations

Pablo Romero Fresco: The translation of phraseology in a parallel (English-Spanish) audiovisual corpus.

Doaa A. Samy: Named Entities: Structure and Translation. A Study Based on a Parallel Corpus (Arabic-Spanish-English)

Tamas Varadi: Taking stock of the Bilingual Lexicon


Nadine Aldinger: Corpus-driven genitive disambiguation

Minhee Bang: Representation of foreign countries in two US newspapers: premodifications of keywords, countries, country, nations and nation

Michael Barlow: Input grammars and output grammars: Investigating the language of individual speakers Christian Chiarcos & Olga Krasavina: Rhetorical Distance Revisited: A pilot study

Huaqing Hong: SCORE: A Multimodal Corpus Database of Education Discourse in Singapore Schools

Henk Louw: Really Too Very Much: Adverbial Intensifiers in Black South African English

Ling Yin & Richard Power: Investigation of the structure of topic expressions: a corpus-based approach

Massimo Poesio & Ron Artstein: Annotating (anaphoric) ambiguity

Evaluation and Stance

Monika A. Bednarek: "He's nice but Tim" -- contrastive evaluation in the British press

Sara Radighieri: Arts in the news: Evaluative language use in the 'arts review'


Solveig Granath & Michael Wherrity: Prepositions with that-clause complements in tagged corpora, with a special focus on in that

Vladimir Petkevic & Frantisek Cermak:Linguistically motivated tagging as the base for a corpus-based grammar

Simone Sarmento: Distribution of Modal Verbs in an Aviation Corpus

Chris Shei: Analysing Chinese Sentence-final Particles Using Academia Sinica Balanced Corpus of Modern Chinese

Seo-in Shin: Automatic Pattern Extraction for Korean Sentence Parsing

Language Learning & Error Analysis through Corpora

Mariko Abe and Yukio Tono: Variations in L2 spoken and written English: investigating patterns of grammatical errors a cross proficiency levels

María Belén Díez Bedmar - Struggling with English at University level: error patterns and problematic areas of first-year students interlanguage

Xiaotian Guo: Modal Auxiliaries in Phraseology: A Contrastive Study of learner English and NS English

Anke Ludeling, Peter Adolphs, Emil Kroymann & Maik Walter: Multi-level error annotation in learner corpora

Zhang Yang: College English Course Corpus

Language processing and corpus tool

Sabine Bartsch, Elke Teich, Monica Holtz & Richard Eckart: Corpus-based register profiling of texts from mechanical engineering

Anja Belz: Corpus-driven Generation of weather Forecasts

Pernilla Danielsson & Andrew Sayers: Enhancing Concordance Method: Introducing the CHAB

Stefan Evert & Manuela Schonenberger : Separating the sheep from the goats: Clarifying corpus content using XML

David Hardcastle: Using the distributional hypothesis to derived co-occurrence scores from the British National Corpus

Laura Lofberg Scott Piao, Asko Nykanen, Krista Varantola, Paul Rayson and Jukka-Pekka Juntunen: A semantic tagger for the Finnish language

Yuji Matsumoto, Masayuki Asahara, Kou Kawabe, Yurika Takashi, Yukio Tono, Akira Ohtani and Toshio Morita: ChaKi: An Annotated Corpora Management and Search System

D?bora Oliveira, Diana Santos, Luis Sarmento & Belinda Maia: Corpus analysis for indexing: when corpus-based terminology makes a difference

Shih-Ping Wang: Integrating corpora and word-focused tasks into a linguistics project for word growth

Maria ZIMINA- Bi-text topography and quantitative approaches of parallel text processing

Eros Zanchetta and Marco Baroni: Morph-it! A free corpus-based morphological resource for the Italian language

The lexicon

Antti Arppe: The role of morphological features in distinguishing semantically similar words

Jorg Asmussen: Automatic determination of new words within domain-specific vocabularies using document classification and frequency profiling

Marco Baroni & Stefan Evert: Testing the extrapolation quality of word frequency models

Dr Paul Doyle: Replicating Corpus-Based Linguistics: Investigating Lexical Networks in Text

Cvetana Krstev & Dusko Vitas: Corpus and Lexicon Mutual In-completeness

Jennifer Pedler: Using semantic associations for the detection of real-word spelling errors

Scott S.L. Piao, Dawn Archer, Olga Mudraya, Paul Rayson, Roger Garside, Tony McEnery, Andrew Wilson: A Large Semantic Lexicon for Corpus Annotation

Elisabete Marques Ranchhod: Using Corpora to Increase Portuguese MWE Dictionaries. Tagging MWE in a Portuguese Corpus.

Sofie Van Gijsel, Dirk Speelman & Dirk Geeraerts: A Variationist, Corpus Linguistic Analysis of Lexical Richness

Phraseology and patterns in language

Frantisek Cermak & Michal Kren: Large Corpora, Lexical Frequencies and Coverage of Texts

Christopher Gledhill  & Pierre Frath: A Reference-based Theory of Phraseological Units: the Evidence of Fossils

Eva Hajicova, Jiri Havelka & Katerina Vesela: Corpus Evidence of Contextual Boundness and Focus

Csaba Oravecz, Karoly Varasdi & Viktor Nagy: Lexical idiosyncrasy in MWE extraction

Bertus van Rooy: Expressions of modality in Black South African English  

Petra Storjohann: Corpus-driven vs. corpus-based approach to the study of relational patterns  

Christiane Wanzeck: The Determination of Phraseological Units in Historical Corpora: An Analysis System for Early New High German  

The web as a corpus

Abdulrahman Almuhareb & Massimo Poesio: Finding Attributes in the Web

Ilias Koutsis, Geroge Kouklakis, George Mikros & George Markopoulos: MINOTAVROS A tool for the semiautomated creation of large corpora from the Web

Alexander Mehler & Rudiger Gleim: Polymorphism in Generic Web Units - A corpus Linguistic Study

Antoinette Renouf: The WebCorp Search Engine: a holistic approach to web text search

Jesus Tomas, Francisco Casacuberta & Jaime Lloret: WebMining: Non-supervised system to obtain parallel corpus from the Web  

Motoko Ueyama & Marco Baroni: Automated construction and evaluation of a Japanese web-based reference corpus


Spoken discourse

Adriano Allora: A Tentative Typology of Net-mediated Communication 

Knut Hofland & Annette Myre Jorgensen: COLA: A Spanish spoken corpus of youth language

Kikuo Maekawa: Quantitative Analysis of Word-form Variation Using a Spontaneous Speech Corpus

Antonio Moreno-Sandoval & Ana Gonzales-Ledesma: Pragmatic analysis of man-machine interactions in a spontaneous speech corpus