Co-creating an OER with student teachers to bridge the corpus linguistics research-practice gap

Elen Le Foll

The practice of ELT (English Language Teaching) to date, at least, seems to be largely unaffected by the advances of corpus research, and comparatively few teachers and learners know about the availability of useful resources and get their hands on corpus computers or concordancers themselves. (Römer 2010: 18)

The development of this resource was spurred on by the observation that many corpus linguists have made: namely that the much-awaited “corpus revolution” has yet to have widely reached language teachers and learners outside of higher education contexts (e.g., McCarthy 2008; Chambers 2019). In this introductory chapter, I briefly outline the rationale and genesis of the project that led to the creation of this Open Educational Resource (OER). It was co-written with student teachers from Osnabrück University (Germany) as part of three iterations of a project-based seminar taught by the author between September 2019 and March 2021.

1. Corpus linguistics and language teaching

From the outset, applications of corpus linguistics to language teaching have been shown to be highly congruent with contemporary, evidence-based insights into second language acquisition (SLA). Rather than being made up of individual words strung together according to strict, grammatical rules, corpus linguistics has shown that language consists of vast networks of more or less fixed, frequently occurring patterns. In line with communicative approaches to language teaching, corpus linguistic methods help reveal the actual usage patterns and frequencies of real language as spoken or written in natural, real-world contexts.

Language learners can be explicitly taught these lexico-grammatical patterns or, as in Data-Driven Learning approaches (DDL; e.g., Boulton & Tyne 2013; Leńko-Szymańska & Boulton 2015), be encouraged to discover them for themselves. In a recent meta-analysis, Boulton & Cobb (2017) summarised the results of 64 experimental and quasi-experimental studies on the effectiveness of using corpus linguistics for second language (L2) learning or use, concluding that such approaches, on average, yield promisingly large positive effects (= 0.95 for control/experimental group comparisons and = 1.50 for pre/post-test designs).

Furthermore, allowing students to interact hands-on with corpora fosters interdisciplinary skills, such as digital skills and data literacy, and boosts learner autonomy. Not to be neglected either are the positive effects of successful teacher-corpus interactions. Even basic corpus literacy enables teachers to check their language intuition in an empirical manner, on the basis of authentic data. Thus, this project’s foremost aim was to empower (trainee) teachers to create their own corpus-informed teaching materials, many of which also encourage data-driven learning and student-corpus interactions.

All four components of corpus literacy summarised by Callies (2019: 247, see also Mukherjee 2006: 14) are targeted:

  1. Understanding basic concepts in corpus linguistics: What is a corpus and what types of corpora are available and how? What can you do – and cannot do – with a corpus?
  2. Searching corpora and analysing corpus data by means of corpus software tools, e.g. concordancers: What is corpus software and how can it be used to search a corpus? How can corpus output be analysed?
  3. Interpreting corpus data: How may general trends in language use/change be extrapolated from corpus data?
  4. Using corpus output to generate teaching material and activities: How can you make use of corpus material for teaching purposes?

In the spirit of this OER, and of “open pedagogy” more generally (Willey & Hilton 2018), particular emphasis is placed on the use of freely accessible resources.

2. The research-practice gap

Whilst the value of corpora in foreign language teaching has long been understood, researched and documented (e.g., O’Keeffe, McCarthy & Carter 2007; Boulton & Cobb 2017; Pérez-Paredes 2019a), the “large gap between the wealth of applied corpus-linguistic research and the teaching practice” (Mukherjee 2004: 247; see also Chambers 2019; Pérez-Paredes 2019b) is still a reality. In school English as a Foreign Language (EFL) contexts, in particular, the “need to convince practising teachers to use corpora and concordances in the classroom” (Römer 2006: 129) has yet to have been met. To close this gap, a number of researchers have pointed to the centrality of teacher training (e.g., Mukherjee 2004; Hüttner, Smit & Mehlmauer-Larcher 2009; Breyer 2009). However, as Leńko-Szymańska (2014: 261) notes “[u]nfortunately, there are only a few books which serve as manuals for teachers in this area”. Things, however, do seem to be changing in this direction with some notable recent publications tackling the issue head-on: Eric Friginal’s “Corpus linguistics for English teachers: new tools, online resources, and classroom activities” (Routledge, 2019), Robert Poole’s “A guide to using corpora for English language learners” (Edinburgh University Press, 2018), and Dana Gablasova’s “Corpus for Schools” project (launched in 2017). A more comprehensive list of books and resources with full references can be found as an appendix to this e-book.

Aside from the resources from Gablasova’s Corpus for Schools project, which focus on spoken British English, the majority of these few existing resources are commercial book publications. This format bears two manifest disadvantages for teacher training purposes. First, if such books include practical information on how to use specific corpus tools, they very quickly become outdated (for instance, my students noticed that the host of changes made to english-corpora.org in 2020 made the step-by-step instructions of even very recent publications difficult, if not impossible, for corpus novices to follow). Secondly, but crucially, commercial book publications are only accessible to a small minority of priviliged student teachers, teachers and teacher trainers.

With these aspects in mind, I decided to co-create, together with my students, an Open Education Resource to guide (trainee) teachers through the practical aspects of using corpora for language teaching using accessible, online resources.

3. The development of a project-based seminar

[T]here is, at present, a large gap between the wealth of applied corpus-linguistic research and the teaching practice in Germany which so far has only been affected to a very limited extent by this research. Closing this gap is a challenge to applied corpus linguists and, perhaps more importantly, to those who are involved in teacher training (both for trainee and qualified teachers). (Mukherjee 2004: 247)

Venn diagram with Corpus Linguistics, English Pedagogy, and Teacher Training
Fig. 1: The three core areas of the project-based seminar

Following Römer’s (2010) call to focus more on language teachers’ needs and inspired by previous such endeavours (Breyer 2009; Hüttner, Smit & Mehlmauer-Larcher 2009; Leńko-Szymańska 2014), I developed and subsequently evaluated a project-based seminar for M.Ed. students training to become English teachers (at primary, secondary and vocational school level) in Lower Saxony Germany (Le Foll 2020). The institutional constraints were: 13 weekly sessions of 90 minutes with ca. 30 students. The seminar was designed to convince pre-service student teachers of the value of corpus linguistics approaches to language teaching and learning and of its potential to boost learner autonomy. At the same time, the seminar aims to empower the aspiring teachers to be able to create corpus-informed materials autonomously using a range of tools and methods. The final project task consisted in co-writing and peer-reviewing a chapter for the present OER.


Seminar Learning Objectives

By the end of the semester, students should:

▪ Understand how Second Language Acquisition (SLA) research can inform materials design;

▪ Be able to evaluate existing English Language Teaching (ELT) materials;

▪ Be able to design corpus-informed ELT materials, incl. effective differentiated tasks, activities, instructions;

▪ Be confident users of basic corpus linguistic tools and methods;

▪ Be able to teach others how to design corpus-informed ELT materials;

▪ Be skilled at giving and learning from critical (peer) feedback.

The first few seminar sessions consisted of a hands-on introduction to the basic principles of corpus linguistics, including debunking some normative linguistic myths (cf. “surprise-the-teacher” modules suggested by Mukherjee 2004: 245) using online corpus tools (english-corpora.org, SketchEngine, BNClab, CQPweb, etc.) and a range of freely available corpora (COCA, BNC1994, Spoken BNC2014, Cambridge Learner Corpus, GloWbE, etc.). Following a Flipped Classroom (e.g., Reidsema et al. 2017) and Just-in-Time Teaching (e.g., Simkins & Maier 2010) approach, students “consumed” theoretical input (in the form of journal articles, book chapters and videos) in their own time and answered questions testing their understanding and asking them to reflect on what they had learnt. These answers informed the content of the synchronous class sessions.

German school EFL teachers are highly reliant on commercial textbooks and often reluctant to move away from them as they are perceived to embody the “one and only way to teach the curriculum” and certainly represent a “safe option”. To tackle this issue, the first phase in the seminar was to explore what makes good ELT materials and which SLA principles can be applied when evaluating textbook materials. Students overwhelmingly agreed that authentic (spoken) language usage and learner autonomy are two crucial aspects of language learning and that both are largely neglected in existing school EFL textbooks.

Based on this conclusion, I introduced the notion of corpus-based DDL in a hands-on approach: students tested their own (overwhelmingly non-native) language intuitions using corpora and, for greater ecological validity, completed DDL tasks designed to help them improve their own for academic writing (along others, with tasks adapted from Poole 2018). The advantages and limitations of data-driven learning (DDL) were discussed and students explored the alignment of the DDL approach with current SLA principles.


Diagram outlining the cyclical nature of the seminar. Four key areas: corpus linguistics basics, analysing EFL teachers' needs, problem-solving with corpus linguistics, evaluating and improving chapters.
Fig. 2: Students’ process for developing their co-authored OER chapters

Following a problem-solving approach, students were encouraged to think about the difficulties that ELF teachers typically face and, using a problem-solving approach, gathered ideas to develop appropriate corpus-based materials and tasks to help solve some of these issues. In lieu of an end-of-term examination or seminar paper, course assessment consisted in the contribution of a co-authored chapter to a “Practical Guide to Using Corpora for English as a Foreign Language Teachers”, to be published online (the attentive reader will note that the title of the OER has since changed!). The chapters submitted as coursework ranged from talking about breakfast (primary EFL), giving, accepting and refusing invitations in conversation (secondary EFL), teaching fractions (content and language integrated learning; CLIC), to talking to patients and next-of-kin about organ donation in hospital interactions (vocational education and training; VET).

Although the creation of the present OER was always at the heart of the seminar project, submitting co-authored coursework for publication was entirely optional and required the consent of all group members. In addition, I made clear that I would make a selection of the most suitable chapters. No additional credits could be obtained for taking part in the publication process. The contributing authors worked on improving their chapters on the basis of my and their peers’ feedback, sometimes over several revision rounds, in their own time, often long after the seminar had ended. Given these conditions, it was very heartening to see that many of the authors of the selected chapters were very keen to contribute to the present publication.

4. Challenges and lessons learnt

Whilst very rewarding for all involved, designing and implementing such a project-based seminar is certainly not without its challenges. The present OER is the result of three iterations of the seminar, with three different groups of Master of Education students. Considerable adjustments to the seminar content and its implementation were made over the three semesters. In the following, I’d like to share some of aspects that initially did not go to plan, as well as the reasoning behind the adjustments made as the project unfolded and lessons were learnt.

The first iteration in the winter semester 2019/2020 was entitled “Corpus Linguistics and Language Teaching”. As the title suggests, the primary focus of the taught input was on corpus linguistics. Though student feedback was largely positive and the quality of the coursework overall very satisfactory, it was clear that some aspects required substantial changes. First, the primary focus of seminar needed to be shifted away from the theory and practice of using corpora, towards more theoretical and practical grounding in ELT materials design. Indeed, a number of students misinterpreted the seminar’s learning objectives as “learning all about the intricacies of as many corpora and corpus tools as possible”, thus largely relegating the pedagogical content of the seminar to the background. This resulted in student coursework that was more akin to (often well-designed) step-by-step guides to conducting corpus queries than about how to actually make pedagogically meaningful use of corpora in the classroom. As a result, I chose to entitle the second iteration of the seminar: “Designing and Evaluating Materials for Language Teaching”. Whilst student feedback from this second iteration was overall more positive than from the first, two students expressed the wish to have known before signing up the course that it involved designing ELT materials on the basis of corpus data. In order to avoid this misunderstanding and to make clear that corpus linguistics is a core part of the seminar, the third iteration was entitled: “Designing and evaluating corpus-informed language teaching materials”. Interestingly, a number of students later reported choosing this third seminar specifically because they wanted to learn more about how to use corpora.

A number of students from the first iteration of the “Corpus Linguistics and Language Teaching” seminar also reported that the wealth of available corpus tools and functions was overwhelming. Consequently, the second major change, which came hand in hand with the first, was to drastically reduce the course’s teacher-led input on corpus software, platforms and query functions. The first seminar covered english-corpora.org, SketchEngine, LancsBox, AntConc and CQPweb. The latter was only used to access the Spoken BNC2014, which was subsequently also uploaded onto SketchEngine so that future iterations did not need to include it. LancsBox was introduced together with a collaborative session in which the class created a higher education EFL/ESL learner corpus to which students contributed their own anonymised coursework from previous semesters. In practice, however, few students were willing to contribute any coursework to this toy learner corpus, which made it difficult to draw any meaningful conclusion from its analysis. A second activity required students to use AntCorGen to create their own discipline-specific academic corpus. These two activities proved difficult for many students. First, there were a number of technical difficulties. Although both software are freely accessible and work well cross-platform, a handful of students had seemingly never installed any piece of software before and/or only had access to computers for which they did not have installation rights. All in all, the number of hours I invested in one-to-one technical support proved to be unsustainable. In order to make the best use of the limited time available and ensure equal participation for all, I therefore decided that future iterations of the course would only include browser-based online tools. Another strong motivator for this decision was my wish for the seminar objectives to be as long-term and sustainable as possible. As many practicing teachers will know, installing software on school computers/tablets is, at best, a frustratingly laborious task and, often, quite simply mission impossible! Using (freely accessible) browser-based tools removes this first, practical barrier to using corpora in school contexts. Moreover, a number of students later reported querying online corpus tools on their mobile phones to find quick solutions to their own language questions, which, again, highlights the long-term sustainability of such an approach. In any case, I strongly believe that having acquired a sound knowledge of corpus skills in any of the commonly used online or offline corpus tools, motivated teachers will have no trouble finding their way around other corpus tools. Indeed, this was shown to be case with some of my students successfully choosing to use CQPweb, SKELL and the TED Corpus Search Engine for their final projects, in spite of these not having been introduced in any of the compulsory or optional course tasks. Equally, I always referred interested students to AntConc, LancsBox, and other free, offline software, whenever these were likely to be better suited for their project ideas.

In addition, the first iteration of the seminar included a few technology-adverse students who reportedly did not enjoy the hands-on computer-based work and did not respond to DDL as positively as the rest of the participants. In future iterations, I made sure to stress that corpus-informed materials can just as well be paper-based and need not necessarily rely on hands-on learner-corpus interactions (cf. Boulton 2010 on paper-based DDL activities). It is difficult to judge whether any future technology-adverse students were warned by their peers not to join this course, or whether the subsequent versions of the seminar better addressed their concerns because, as a result of the COVID-19 pandemic, the second and third iterations of the seminar were taught online so that all students were, at any rate, forced to study online and work with their own devices. The online iterations of the seminar involved a combination of an elaborate asynchronous learning portfolio with individual and collaborative tasks and bi-weekly synchronous webinar sessions. Whilst this new seminar format required a lot of adaptations on my part, a number of students highlighted the advantages of the online course format for this project-based seminar:

Gerade hinsichtlich des behandelten Themas war es sehr praktisch, dass jeder an seinem eigen Computer saß. Die Möglichkeit, Arbeitsprozesse mit einem Corpus erst über einen Sceencast vermittelt zu bekommen und es quasi simultan ausprobieren zu können, empfand ich als vorteilhaft (gegenüber herrkömmlichen Seminarstrukturen) [Given the nature of the topics covered in the seminar, it was actually very handy that everyone was sitting in front of their own computers. Compared to the usual seminar structures, I found it advantageous to be taught how to use a corpus via screencast and be able to try it out for myself almost instantaneously.]” (anonymous feedback from student evaluation, summer 2020)

In spite of all the challenges, the majority of students reported a high degree of satisfaction with the course and especially with the outcome of the final project task:

Over the course of the last weeks, I started to really appreciate the power of corpora and I am now excited to do a corpus-based teaching unit with a class. (student self-reflection, winter 2019/2020)

Since I would like to become a teacher for secondary education, I can imagine using exactly the chapter my fellow students and I have written in school once. (student self-reflection, winter 2020/2021)

Generally, I missed the connection to my future teaching job in a lot of my other courses, so I really liked that this course’s final task was designing something that could actually be used in school one day. (student self-reflection, winter 2020/2021)

5. Learner autonomy

The seminars aimed to persuade student teachers of the value of corpus linguistics approaches to language teaching and their potential to boost learner autonomy. It empowered teachers to create corpus-informed materials autonomously and, via the creation and publication of the present OER, enabled them to become future corpus-informed materials design and data-driven learning (DDL) multipliers.

In the seminars, learner autonomy was tackled at two levels. First, all the participants were future English teachers and, although they are M.Ed. students, the majority of their university courses consist of frontal teaching and feature relatively few elements of self-regulated learning. Thus, this seminar also aimed to develop future teachers’ own learning autonomy by allowing them to explore, apply, reflect and exchange on various ELT and SLA principles. They learnt about creating and evaluating ELT materials that support learner autonomy through DDL. DDL was new to all participating student teachers and was, initially, met with some resistance. Given their own relative lack of experience with such approaches, this was to be expected. In the pre-intervention questionnaire, two-thirds of the students agreed or strongly agreed with the statement: “I have no idea what corpus linguistics is about”. Only a handful of participants reported already using corpora. All students responded negatively to the statement: “I know how to use corpora to prepare classroom materials”. In light of these results, the following chapters, which were developed by the contributing student authors in just one semester, are even more impressive!

6. Conclusion and outlook

I am delighted that a small but highly motivated team of thirty students from the three iterations of this project-seminar has remained on board to see the project through to the publication stage. Although most students co-wrote their chapters in groups of two or three, five chapters were developed by students who had decided to work individually. I would like to thank all contributing authors, as well as all the students who participated in the seminars and who therefore also directly (e.g., through their peer review of the present chapters) and indirectly (e.g., through their course feedback which helped me improve future seminars) contributed to the successful completion of this project. Heartfelt thanks are also due to Tatjana Winter who, in her capacity as a student research assistant at the Institute of English and American Studies (IfAA) at Osnabrück University, formatted the chapters for publication on pressbooks.com, and to my PhD supervisor Dirk Siepmann without whose support this (very time consuming!) side project would not have been possible.

All that remains for me to write is that I hope that this OER will prove to be a useful, practical resource for EFL/ESL teachers from across the world to learn to design their own corpus-informed materials and draw inspiration from. It is also highly suitable for use as a textbook or complementary resource in both pre- and in-service teacher training programmes on corpus linguistics in language education.

All the chapters are licensed under the Creative Commons Attribution-NonCommercial 4.0 International License, which means that they can be freely adapted, copied and distributed for non-commercial purposes as long as the original source and authors are cited. Editable XML and HTML versions of this e-book can be downloaded from this project’s Zenodo repository. A PDF version with hyperlinks is also available from the same repository for readers with unreliable internet connections – though we recommend using the web-book version available on pressbooks.com.

The full OER may be cited as follows:

Le Foll, Elen (Ed.). (2021). Creating Corpus-Informed Materials for the English as a Foreign Language Classroom. A step-by-step guide for (trainee) teachers using online resources (Third Edition). Open Educational Resource. https://pressbooks.pub/elenlefoll. CC-BY-NC 4.0. DOI: 10.5281/zenodo.4992504.

Individual chapters should be referenced with the authors’ names and the full link to the chapter, e.g.,:

Nottmeier, Marie, Alina Sophie Peters & Lara Warnecke (2021). “Commas in argumentative writing”. In Le Foll, Elen (Ed.), Creating Corpus-Informed Materials for the English as a Foreign Language Classroom. https://pressbooks.pub/elenlefoll/chapter/nottmeier_peters_warnecke. CC-BY-NC 4.0.


We welcome your feedback!

If you spot any errors, dead links, or have any other kind of feedback, do please get in touch via e-mail (elefoll@uos.de) or on Twitter (@ElenLeFoll). The contributing authors and myself very much look forward to hearing how you are using this e-book. Do drop us a note to tell us how you are using and/or adapting the resource for your teaching and learning context! We also very much welcome your suggestions to improve this evolving and hopefully dynamic resource.

