Pattern Grammar and academic writing tools

WriteAhead

WriteAhead was developed at the Natural Language Processing Lab of National Tsing Hau University, led by Professor Jason S. Chang.

 WriteAhead is an interactive writing environment that displays the grammar patterns of the last word entered by learners. These prompts, accompanied by examples, help learners to write fluently and accurately. They are provided at the point of writing, so there is no need for dictionary lookup, which might interrupt the writing process. The suggested grammar patterns are in the form of Pattern Grammar (Hunston & Francis 2000): a “theoretical and pedagogically sound language representa­tion” (Yen et al. 2015: 141). WriteAhead has been online since 2015; together with related tools, it achieves roughly 3,000 daily visits (p.c. Prof Jason S. Chang).

The incorporation of grammar patterns marks WriteAhead as a uniquely useful autocompletion tool. Much previous work focuses on lexical autocompletion, with tools that provide a suggested next word or string of words. Typically, these suggestions involve assumptions about the writer’s purpose and intention, which are not always appropriate. Since grammar patterns are at a higher level of generality than words, WriteAhead supports the writer in making their own lexical choices to produce well-formed sentences. Conse­quently, the tool is effective for all users in all different kinds of writing situations. WriteAhead is not a fully automatic grammatical error correction tool, but provides “context-sensitive writing suggestions” (Yen et al. 2015: 139) so that writing becomes a continuous learning process. WriteAhead combines “problem solving and information seeking together to create a satisfying user experience” (Yen et al. 2015: 143). Teaching experiments have found that the tool fosters learner independence and encourages self-editing, pointing to improved writing skills in the long term.

The grammar patterns that learners see are retrieved automatically from either an academic, general or learner corpus, as selected by the user. WriteAhead can also indicate patterns that are commonly overused or dubious, based on information from the learner corpus (e.g. discuss about something). In developing WriteAhead, the meta-patterns of Pattern Grammar (Hunston & Francis 2000) were used to discover patterns for each headword in the corpora. WriteAhead’s developers also experimented with combining patterns (e.g. play: V n in -ing) from patterns of verbs (e.g. play) and nouns (e.g. role) in order to capture the finer-grained “semantic sequences” of patterns (Hunston 2008; 2009) that are associated with academic writing (e.g. play a role in something). This work has shown that it is feasible to extract grammar patterns for nouns, verbs and adjectives on a large scale, using corpora with hundreds of millions of words. A different method is used from Mason and Hunston’s (2004) small-scale pilot study, which outlined the possibility of automatic recognition of grammar patterns in a corpus.

WriteAhead is therefore a proof-of-concept prototype that is innovative in many different ways, and it has featured in Microsoft.com research that identifies new and better technologies for supporting writing – ones that “accommodate the variability and complexity of the writing process” (Greer, Teevan, and Iqbal 2016). Many avenues exist for the future of WriteAhead. The techniques developed could be used to customize Pattern Grammar for different domains (e.g. genre or topic) and for learners at different levels of proficiency (p.c. Prof Jason S. Chang). The developers have also been experimenting with automatic extraction of “synchronous grammar patterns” (SGPs) in parallel corpora (Wu et al. 2017). This involved aligning the English grammar patterns of Pattern Grammar with their counterparts in Mandarin, and the development of a prototype writing assistant system for Chinese learners of English as a second language: WriteAhead/bilingual. Wu et al. (2017: 56) found that the translation of English grammar patterns leads to “correct and useful SGPs for learner-writers”. “Synchronous Pattern Grammar” is now being used in the development of statistical machine translation systems.

Greer, Nick, Jaime Teevan and Shamsi Iqbal. (2016) ‘An introduction to technical support for writing’ (accessed May 2020)

Hunston, Susan. (2008) ‘Starting with the small words: Patterns, lexis and semantic sequences’. International Journal of Corpus Linguistics 13(3): 271–295.

Hunston, Susan. (2009) ‘The usefulness of corpus-based descriptions of English for learners: the case of relative frequency’. in Aijmer, Karin. (ed.) Corpora and Language Teaching. Amsterdam: John Benjamins. 141–156.

Hunston, Susan, and Gill Francis. (2000) Pattern grammar: A corpus-driven approach to the lexical grammar of English. Amsterdam: John Benjamins.

Mason, Oliver, and Susan Hunston. (2004) ‘The automatic recognition of verb patterns: A feasibility study.’ International journal of corpus linguistics 9.2: 253–270.

Wu, Chi-En, Jhih-Jie Chen, Jim Chang, and Jason S. Chang. (2017) ‘Learning Synchronous Grammar Patterns for Assisted Writing for Second Language Learners’. The Companion Volume of the IJCNLP 2017 Proceedings: System Demonstrations, 53–56, Taipei, Taiwan.

Yen, Tzu-His, Jian-Cheng Wu, Joanne Boisson, Jim Chang, and Jason Chang. (2015) ‘WriteAhead: Mining Grammar Patterns in Corpora for Assisted Writing’. Proceedings of ACL-IJCNLP 2015: System Demonstrations, 139–144. Beijing, China.

SciE-Lex

SciE-Lex was developed by the Lexicology and Corpus Linguistics Research Group (GreLiC) at the University of Barcelona.  

SciE-Lex is a lexical database of the grammatical and collocational patterns of non-technical words frequently used in biomedical English. It was designed to help scientists produce phraseologically competent texts in English, prompted by a shortage of reference tools offering such information. Dictionaries of scientific terms usually provide information only about the meaning of specialised words. Instead, SciE-Lex is rooted in Pattern Grammar’s observation that authentic language is made up of prototypical patterns in which lexical and grammatical information is interrelated (p.c. Dr Natalia Judith Laso).

The development of SciE-Lex involved application of the Pattern Grammar framework (Hunston & Francis 2000) to health science discourse. An analysis of the patterns associated with general terms in biomedical discourse (Verdaguer et al. 2013) called attention to the fact that in addition to resources that provide lexicogrammatical and discourse features of general English, further tools are required to suit the needs of specialised discourse communities (see also Hunston 2008; 2009). The SciE-Lex tool provides non-native English-speaking writers with the prototypical use of lexicogrammatical patterns of non-technical words as well as the conventionalised phraseological characteristics of their discourse community. This is important in the scientific register, where the writer must adhere to conventional style norms and to appropriate collocations in order that the reader will not be distracted by inappropriate expressions and can read fluently and focus on the content.

SciE-Lex has been tested with targeted users to examine the pedagogical benefits of the tool for the biomedical community writing English for Research Publication Purposes (ERPP) (Laso et al. 2019). “Writing for Publication” workshops showed that SciE-Lex can be used to improve draft biomedical research articles from a lexicogrammatical point of view, and provided a strong indication of the benefits of non-native English speaking writers being able to recognise the formulaic patterning of biomedical discourse.

Hunston, Susan. (2008) ‘Starting with the small words: Patterns, lexis and semantic sequences’. International Journal of Corpus Linguistics 13(3): 271–295.

Hunston, Susan. (2009) ‘The usefulness of corpus-based descriptions of English for learners: the case of relative frequency’. in Aijmer, Karin. (ed.) Corpora and Language Teaching. Amsterdam: John Benjamins. 141–156.

Hunston, Susan, and Gill Francis. (2000) Pattern grammar: A corpus-driven approach to the lexical grammar of English. Amsterdam: John Benjamins.

Laso N. J., Comelles, E. & Verdaguer, I. (2019). “Research report on the adequacy of SciE-Lex as a lexicographic tool for the writing of biomedical papers in English”. Digital Scholarship in the Humanities, 35(1): 32–47. 

Verdaguer, I.; Laso, I. & Salazar, D. (2013). Biomedical English: a corpus-based approach. Amsterdam: John Benjamins Publishing.