This is a collection of the papers presented at the Corpus Linguistics 2005 conference which was held in Birmingham July 14-17 2005. Some of the papers are either as Word documents or as PDF files.
The proceedings have been divided into 11 subcategories:
- Compiling a corpus
- Contrastive corpus linguistics
- Discourse
- Evaluation and stance
- Grammar
- Language learning and error analysis through corpora
- Language processing and corpus tool
- The lexicon
- Phraseology and patterns in language
- The web as a corpus
- Spoken discourse
Compiling a corpus
Rachel Aires, Diana Santos & Sandra Aluisio: "Yes, user!": compiling a corpus according to what the user wants:
Latifa Al-Sulaiti and Eric Atwell: Extending the Corpus of Contemporary Arabic
Wendy Anderson & Dave Beavan: Internet delivery of time-synchronised multimedia: the SCOTS Projects
Caroline Barrire & Akakpo Agbago: Corpus Construction for Terminology
Sara Piccioni: The Lorca corpus at the crossroads of philology and corpus linguistics
Gong Wengao: English in computer-mediated environments: a neglected dimension in large English corpus compilation
Hilary Nesi, Sheena Gardner, Richard Forsyth, Dawn Hindle, Paul Wickens, Signe Ebeling, Maria Leedham, Paul Thompson and Alois Heuboeck: Towards the compilation of a corpus of assessed student writing
Contrastive Corpus Linguistics
Gisle Andersen: Assessing algorithms for automatic extraction of anglicisms in Norwegian texts
Jozsef Andor: A Lexical Semantic-Pragmatic Analysis of the Meaning Potentials of Amplifying Prefixes in English and Hungarian A Corpus-based Case Study of Near Synonymy
Sandrelli Annalisa & Bendazzoli Claudio: Lexical patterns in simultaneous interpreting: a preliminary investigation of EPIC (European Parliament Interpreting Corpus)
Marianna Apidianaki: Translation prediction using word co-occurrence graphs
Tatjana Balaic Bulc: Connectors in students' academic writing in two closely related languages
Silvia Bernardini & Marco Baroni: Spotting translationese: A corpus-driven approach using support vector machines
Gabriela Castelo Branco Ribeiro & Maria Carmelita Padua Dias: Two corpus-based studies about the translation of adjectives in English and Brazilian Portuguese
Wallace Chen: Patterns of Connectors in the English-Chinese Parallel Corpus of Popular Science Texts
Debbie Elliott: Using corpora to automatically detect untranslated and ?outrageous? words in machine translation output
Ana Frankenberg-Garcia: A corpus-based study of loan words in original and translated texts
Randall L. Jones : Analysis of lexical correspondence in an English-German parallel corpus
Zhenglin Jin & Caroline Barriere: Exploring sentence variations in bilingual corpora
Tony McEnery and Richard Xiao: Passive constructions in English and Chinese: A contrastive and translation study
Stella Neumann and Silvia Hansen-Schirra : The CroCo Project: Crosslinguistic corpora for the investigation of explicitation in translations
Pablo Romero Fresco: The translation of phraseology in a parallel (English-Spanish) audiovisual corpus.
Doaa A. Samy: Named Entities: Structure and Translation. A Study Based on a Parallel Corpus (Arabic-Spanish-English)
Tamas Varadi: Taking stock of the Bilingual Lexicon
Discourse
Nadine Aldinger: Corpus-driven genitive disambiguation
Minhee Bang: Representation of foreign countries in two US newspapers: premodifications of keywords, countries, country, nations and nation
Michael Barlow: Input grammars and output grammars: Investigating the language of individual speakers Christian Chiarcos & Olga Krasavina: Rhetorical Distance Revisited: A pilot study
Huaqing Hong: SCORE: A Multimodal Corpus Database of Education Discourse in Singapore Schools
Henk Louw: Really Too Very Much: Adverbial Intensifiers in Black South African English
Ling Yin & Richard Power: Investigation of the structure of topic expressions: a corpus-based approach
Massimo Poesio & Ron Artstein: Annotating (anaphoric) ambiguity
Evaluation and Stance
Monika A. Bednarek: "He's nice but Tim" -- contrastive evaluation in the British press
Sara Radighieri: Arts in the news: Evaluative language use in the 'arts review'
Grammar
Solveig Granath & Michael Wherrity: Prepositions with that-clause complements in tagged corpora, with a special focus on in that
Vladimir Petkevic & Frantisek Cermak:Linguistically motivated tagging as the base for a corpus-based grammar
Simone Sarmento: Distribution of Modal Verbs in an Aviation Corpus
Chris Shei: Analysing Chinese Sentence-final Particles Using Academia Sinica Balanced Corpus of Modern Chinese
Seo-in Shin: Automatic Pattern Extraction for Korean Sentence Parsing
Language Learning & Error Analysis through Corpora
Mariko Abe and Yukio Tono: Variations in L2 spoken and written English: investigating patterns of grammatical errors a cross proficiency levels
María Belén Díez Bedmar - Struggling with English at University level: error patterns and problematic areas of first-year students interlanguage
Xiaotian Guo: Modal Auxiliaries in Phraseology: A Contrastive Study of learner English and NS English
Anke Ludeling, Peter Adolphs, Emil Kroymann & Maik Walter: Multi-level error annotation in learner corpora
Zhang Yang: College English Course Corpus
Language processing and corpus tool
Sabine Bartsch, Elke Teich, Monica Holtz & Richard Eckart: Corpus-based register profiling of texts from mechanical engineering
Anja Belz: Corpus-driven Generation of weather Forecasts
Pernilla Danielsson & Andrew Sayers: Enhancing Concordance Method: Introducing the CHAB
Stefan Evert & Manuela Schonenberger : Separating the sheep from the goats: Clarifying corpus content using XML
David Hardcastle: Using the distributional hypothesis to derived co-occurrence scores from the British National Corpus
Laura Lofberg Scott Piao, Asko Nykanen, Krista Varantola, Paul Rayson and Jukka-Pekka Juntunen: A semantic tagger for the Finnish language
Yuji Matsumoto, Masayuki Asahara, Kou Kawabe, Yurika Takashi, Yukio Tono, Akira Ohtani and Toshio Morita: ChaKi: An Annotated Corpora Management and Search System
D?bora Oliveira, Diana Santos, Luis Sarmento & Belinda Maia: Corpus analysis for indexing: when corpus-based terminology makes a difference
Shih-Ping Wang: Integrating corpora and word-focused tasks into a linguistics project for word growth
Maria ZIMINA- Bi-text topography and quantitative approaches of parallel text processing
Eros Zanchetta and Marco Baroni: Morph-it! A free corpus-based morphological resource for the Italian language
The lexicon
Antti Arppe: The role of morphological features in distinguishing semantically similar words
Jorg Asmussen: Automatic determination of new words within domain-specific vocabularies using document classification and frequency profiling
Marco Baroni & Stefan Evert: Testing the extrapolation quality of word frequency models
Dr Paul Doyle: Replicating Corpus-Based Linguistics: Investigating Lexical Networks in Text
Cvetana Krstev & Dusko Vitas: Corpus and Lexicon Mutual In-completeness
Jennifer Pedler: Using semantic associations for the detection of real-word spelling errors
Scott S.L. Piao, Dawn Archer, Olga Mudraya, Paul Rayson, Roger Garside, Tony McEnery, Andrew Wilson: A Large Semantic Lexicon for Corpus Annotation
Elisabete Marques Ranchhod: Using Corpora to Increase Portuguese MWE Dictionaries. Tagging MWE in a Portuguese Corpus.
Sofie Van Gijsel, Dirk Speelman & Dirk Geeraerts: A Variationist, Corpus Linguistic Analysis of Lexical Richness
Phraseology and patterns in language
Frantisek Cermak & Michal Kren: Large Corpora, Lexical Frequencies and Coverage of Texts
Christopher Gledhill & Pierre Frath: A Reference-based Theory of Phraseological Units: the Evidence of Fossils
Eva Hajicova, Jiri Havelka & Katerina Vesela: Corpus Evidence of Contextual Boundness and Focus
Csaba Oravecz, Karoly Varasdi & Viktor Nagy: Lexical idiosyncrasy in MWE extraction
Bertus van Rooy: Expressions of modality in Black South African English
Petra Storjohann: Corpus-driven vs. corpus-based approach to the study of relational patterns
Christiane Wanzeck: The Determination of Phraseological Units in Historical Corpora: An Analysis System for Early New High German
The web as a corpus
Abdulrahman Almuhareb & Massimo Poesio: Finding Attributes in the Web
Ilias Koutsis, Geroge Kouklakis, George Mikros & George Markopoulos: MINOTAVROS A tool for the semiautomated creation of large corpora from the Web
Alexander Mehler & Rudiger Gleim: Polymorphism in Generic Web Units - A corpus Linguistic Study
Antoinette Renouf: The WebCorp Search Engine: a holistic approach to web text search
Jesus Tomas, Francisco Casacuberta & Jaime Lloret: WebMining: Non-supervised system to obtain parallel corpus from the Web
Motoko Ueyama & Marco Baroni: Automated construction and evaluation of a Japanese web-based reference corpus
Spoken discourse
Adriano Allora: A Tentative Typology of Net-mediated Communication
Knut Hofland & Annette Myre Jorgensen: COLA: A Spanish spoken corpus of youth language
Kikuo Maekawa: Quantitative Analysis of Word-form Variation Using a Spontaneous Speech Corpus
Antonio Moreno-Sandoval & Ana Gonzales-Ledesma: Pragmatic analysis of man-machine interactions in a spontaneous speech corpus