Lesson ideas: Upper secondary school

6 How to do data-driven learning (DDL) with secondary school pupils to improve their writing

Katharina Busmann

1 Introduction and rationale

According to Anne Wichmann et al. (1997), the main rationale for using corpora in teaching is their immediate availability for students’ use. To do so, students need to acquire the essential know-how to be able to explore corpora as part of their self-regulated learning (cf. Wichmann et al. 1997: 7). Likewise, the curriculum for secondary schools in Lower Saxony requires students to acquire essential learning strategies and study techniques that enable them to solve problems effectively on their own using a range of resources (cf. Niedersächsisches Kultusministerium 2015: 27). Accordingly, this chapter focuses on how teachers can introduce data-driven learning (DDL) to their students so that they can autonomously use corpora to improve their writing after having a writing assignment marked by the teacher and returned to them. Additionally, it gives further ideas on how to use the Corpus of Contemporary American English (COCA) for writing in general, since it allows learners to quickly and easily see how native speakers use language in a wide variety of texts.

Outline and objectives

The proposed lesson…

  • is suitable for secondary school pupils at the age of 12 or older.
  • includes a pre-task which introduces the COCA and explains the basic functions of the web interface.
    • Preparation time: ca. 15-20 min.
    • Class time: ca. 10-15 min.
  • features a main task guided by a worksheet, so that students learn to correct some of their own mistakes independently thanks to the information gleaned from the COCA.
    • Preparation time: none
    • Class time: ca. 20-25 min.
  • ends with three review questions that encourage students to reflect on the use of the concordance function.
    • Preparation time: none
    • Class time: ca. 10 min.

In light of the above, the following learning objectives emerge:
After the lesson, students …

  • can search for concordance lines in the COCA.
  • can communicate how specific words or phrases are used in writing by consulting the COCA.
  • can correct their mistakes independently by inferring information given by the COCA and thus improve their writing skills.
  • can reflect on the utility of the concordance function.

If you are familiar with using the COCA, there is hardly any teacher preparation needed for this lesson. Otherwise, please make sure that you work through the introductory steps below beforehand, in order to be able to explain the tools to your students and answer students’ questions in class.

2 Corpus, tools and methods

The resource used for this lesson is the COCA (Corpus of Contemporary American English), which contains more than one billion words and is equally divided up into spoken English, fiction, popular magazines, newspapers and academic texts. With the latest update in March 2020, TV and movie subtitles, blogs, and other webpages have been added to the corpus as well (cf. Davies 2020b: 1). According to english-corpora.org, the COCA is the most widely used corpus of general English. Since the latest texts are dated from 2019, it is also the most up to date (cf. ibid). Once you have learnt about its basic functions, the web interface is very easy to handle. It is free for all to use. It follows from all these considerations that the COCA is a good choice of resource for such a lesson.

3 Step-by-step guide

Lesson plan

The lesson plan (Table 1) gives an overview of how the lesson may be structured.

Table 1: Lesson Plan.






a) Practice searching in the COCA by working through the fewer/less example in plenum

or b) Choose any other suitable warm-up activity (e.g. Spot the mistake: Write lexically or grammatically unidiomatic sentences on the blackboard and ask your students to find and correct the mistakes (cf. Boerger 2020: 2)).

Computers, copies of the instructions

Main Task


Students complete the worksheet by consulting the COCA

Computers, worksheet (see appendix)



Students answer the review questions.

Review questions


The following instructions cover explanations about general functions of the web interface while working through an example that focuses on the different usages of two commonly confused words, namely fewer and less. They both have the same meaning. However, when do you use less and when should you use fewer? Perhaps, this confusion is a mistake which quite a few students made in the writing assignment that you recently corrected and aim to hand back in the designed lesson.


  1. Go to www.english-corpora.org.
  2. Click on “Corpus of Contemporary American English (COCA)” to open the corpus (Fig. 1).
Fig. 1: Selecting a corpus.
  1. You should now be able to see the following display (Fig. 2):
Fig. 2: Conducting a search in the COCA.
  1. Click on “Sections” (Fig. 3).
Fig. 3: Conducting a search in the COCA.
  1. Choose the register(s) you want to focus on. If you want to find out how less and fewer are used in typical written text types, select “Fiction”, “Magazine”, “Newspaper” and “Academic”, for instance (Fig. 4).
Fig. 4: Limiting the search to specific registers.
  1. Make sure you have selected the “List”-option. Then, type in fewer and press “Find matching strings”.
  2. Click on the word to see in which contexts it has been used (Fig. 5).
Fig. 5: Result page from the corpus query.
  1. In the concordance lines you can now look for regularities in the usage of the word fewer (Fig. 6).
Fig. 6: Concordances of fewer.
  1. Now repeat the steps but search for the word less. Compare its usage to that of fewer and try to formulate “rules” for the use of these words on the basis of their respective concordance lines.

Solution: ‘Less’ is used with uncountable nouns and with adjectives or adverbs, while ‘fewer’ is used with countable nouns.

  1. Additionally, you can ask students to pay special attention to register differences. For this, you might want to go back to step 5 and include more registers to allow a better comparison. Especially the inclusion of spoken and spoken-like text types (e.g. TV scripts or web language) could be interesting as this allows students to explore potential differences between conversational spoken and formal written English: are there any instances of less/fewer that deviate from the usage “rules” above? If yes, which nouns are most often affected by this? In which contexts and text types is this most common? See “Options and further ideas” below for a detailed explanation of how to compare the distribution of a particular construction in different registers using the “Chart”-function of the web interface.



If your students are not yet familiar with using the concordance function of english-corpora.org, it is advisable to explain the procedure while working through the fewer/less example in plenum. Alternatively, you can choose any warm-up activity suitable for a lesson that asks students to remedy errors in their writing assignments.

Main task

In the main task of the lesson, ask your students to fill in the worksheet (see appendix) by querying the COCA. The worksheet consists of the following instructions:

  1. Write down three language issues that came up in your writing assignment.
  2. For each issue, find a sentence in the corpus which uses the word/phrase/construction in a suitable way and write it down.
  3. Describe how each word/phrase is used.
  4. Use the information that you gathered from the corpus to rewrite your sentences.

Review questions

After having completed the main task, ask your students to answer the following review questions (cf. Friginal 2018: 229):

  • Was using the concordance function helpful for learning more about word/phrase/grammar usage in English?
  • In your opinion, is it worth spending the time learning how to use the concordance function?
  • Are you likely to use the concordance function on your own in the future? Why/why not?

4 Options and further ideas


  • Your students can also use their mobile phones instead of computers.
  • As all of the research interfaces on english-corpora.org are similar, your students can apply their knowledge to explore other corpora, such as the British National Corpus (BNC), for instance.

Additional ideas

This section suggests further ideas on how your students can use corpus data to further improve their writing skills. If your students enjoy working with the COCA, it is highly recommendable for you to work through the following functions and present them one by one to your class. Should some of your students finish the tasks of the designed lesson earlier than expected, you may also use these ideas as gap fillers.

Exploring collocations

The term ‘collocation’ is used to describe words that tend to appear together. This is the case with to make mistakes, for instance. Thus, mistake is most frequently associated with, or collocates with, the verb make. By contrast, to do mistakes is considered unidiomatic because mistake does not collocate with the verb do. Exploring typical collocations involving different word classes, such as adjective + noun, or verb + noun, is another way of using the COCA in writing classes. Using the collocation function, you can see which words frequently co-occur with other words, which provides great insights into collocation usage and meaning. Here is an overview of how to use this function:

  1. Open the COCA on www.english-corpora.org/coca, click on the ‘+’-icon and choose “Collocates” (Fig. 7).
Fig. 7: Using the “Collocates” function.

2. You should now be able to view the following display (Fig.8):

Fig. 8: Using the “Collocates” function.

Imagine the following scenario: you want to write a text about finances and are unsure about verbs that are commonly used with the noun money. Figure 9 shows an exemplary search for typical verbs one or two positions before the noun money in newspaper writing. Make sure to set the search options to “Group by lemmas” (not words) so that inflected forms such as making, made or makes will be classified as belonging to the same verb, in this case make.

Fig. 9: Searching for typical verb collocates of the word money.

Thus, you can conclude that in newspaper writing, the five most frequently used verbs in front of money are make, raise, spend, save and get.

Searching for synonyms

Searching for synonyms is another possible way of using the COCA to increase the variety of word choice when writing texts. Open the COCA on www.english-corpora.org and choose “List”. Type in an equal symbol (=), followed by any search string of interest. Figure 10 shows an exemplary search for synonyms of the adjective beautiful:

Fig. 10: Synonyms of beautiful.

Exploring different registers

Searching in the COCA provides students with information about the frequency of words or phrases in different styles of English such as spoken, fiction or academic, for instance. Imagine the following situation: you need to write an assignment and you would like to know if it is appropriate to use the expression freak out in formal contexts. To answer this question, you have to follow three easy steps:

  1. Open the COCA on english-corpora.org and choose the “Chart”-option, which shows the frequency of a search string in each section (Fig.11):
Fig. 11: Using the “Chart” function.
  1. Type freak out in the search field and press “See frequency by section”. You should get the following results (Fig. 12):
Fig. 12: The frequency of freak out in different registers.
  1. Finally, you need to interpret the chart: Fig. 12 shows that the expression freak out in academic texts is only used 0.08 times per million words, which is the lowest value of all displayed sections. Thus, it is advisable to use a different expression in the assignment, such as the synonymously used term to lose control, for instance (see above: “Searching for Synonyms”).

The latest option of the COCA

In March 2020, the COCA was expanded in scope, size and features to make it even more useful for researchers, teachers, and learners, who can now browse through a list of the most frequent 60,000 words in the corpus, and see a wide range of useful information on each of these words (cf. Davies 2020b: 1). To make use of this, you can either click on “Word” and search for a specific word or click on “Browse” (see Fig. 13) to search by word form (e.g. *ism), rank order (e.g. words among the 6,000th, 26,000th or 46,000th most frequent words), pronunciation (e.g. words rhyming with light, or three-syllable words accented on the last syllable), or any combination of these (cf. Davies 2020b: 11).

Fig. 13: Using the “Browse” function.

To explore this new feature, click on “Browse” and “Find random words” (Fig. 14).

Fig. 14: Using the “Browse” function.
Fig. 15: Using the “Browse” function.

For each word on the list, you can hear the word pronounced, watch videos with the word in the text, see related images from Google Images, and get a translation for your preferred target language (Fig. 15). When you see a word of interest, click on it. You will be redirected to its homepage, which contains frequency information, definitions, translations, links to audio, images, and videos, as well as synonyms, topics (words that co-occur anywhere in the ca. 500,000 texts), collocates, clusters, and concordance lines (Fig. 18) (cf. Davies 2020b: 6). On the top right corner of each word’s homepage, you will find direct links to the respective pages of some of these features (cf. Davies 2020b: 8). For a more detailed account of the word’s collocates, clusters, concordance lines (KWIC) etc., simply click on the feature in question (Fig. 16). Figure 17 shows the collocation page of the word trail.

Fig. 16: Navigation tabs at the top of the word’s homepage.
Fig. 17: Collocates of trail.


Fig. 18: Homepage of the word trail (Davies 2020b: 7).

It is hoped that this short introduction into the new “Browse”-option of the COCA gives valuable insights into its great potential for classroom use. Looking at the summary statistics page for each word, you can see a great wealth of information at once (collocates, clusters, synonyms, concordance lines and the distribution across genres), which saves a lot of time and effort and thus makes this new function very convenient. However, note that words that are not among the 60,000 most frequent words in the corpus do not have such a summary statistics page. Thus, you may well need to know about the original, basic functions of the interface to carry out your search. Starting with the basic functions gives students a better understanding of how corpora work and also gives them the necessary tools to search in all the other corpora featured on English-corpora.org because the COCA is currently the only corpus on the web interface that supports this new function.

5 Caveats and limitations

After approximately 10-15 queries the website will ask you to register. Registration, however, is free to everyone and takes one or two minutes only. Afterwards you can carry on using the website as normal.

Fig. 19: Registering on the website.

Click on the yellow icon in the top right corner of the website (Fig. 19) to register. You should then see the following display (Fig. 20, Fig. 21):

Fig. 20: Registering on the website.
Fig. 21: Registering on the website.

You will be asked to select your user status. Note that you must provide a web address that verifies your status in order to set up a Researcher or Semi-Researcher account. However, bear in mind that even if you are not approved as Researcher or Semi-Researcher, you can still use the corpora – just at a lower level of access (cf. Davies 2020a), which, however, is entirely sufficient for the purpose of this chapter. Within one or two minutes after having submitted your information, you will receive an email. Click on the link in that email and you will be able to continue using the web interface.

Occasionally, the following message (Fig. 22) will pop up, asking you to consider upgrading to a premium account. However, there is no need to upgrade your account. Please be patient and wait for a few seconds, after which you will be able to continue with your search.

Fig. 22: Upgrade notification.

6 Conclusion

The main advantage of this lesson is that students are given the means to correct their mistakes independently with the aid of the COCA. Thus, students acquire learning strategies that enable them to solve problems in writing effectively by themselves, which meets one of the requirements of the curriculum. In addition, a recent meta-analysis of data-driven language learning interventions revealed that students are more likely to remember the language patterns they infer from their own corpus queries (cf. Boulton & Cobb 2017). Furthermore, the use of digital media in the classroom can contribute to higher motivation and engagement on the part of the students. On the teachers’ side, there is not much preparation required for lesson, which is also beneficial. Moreover, the additional ideas mentioned above present many more interesting features of the web interface, which provides inspiration for many more COCA-based lessons.

7 Resources and references

Boerger, Claudia. 2020. Examples warm-ups. http://www.jochenenglish.de/misc/warm_up_beispiele.pdf (23 March, 2020).

Boulton, Alex & Tom Cobb. 2017. Corpus use in language learning: A meta-analysis. Language Learning 67(2). 348-393. https://doi.org/10.1111/lang.12224.

Davies, Mark. 2020a. English Corpora. https://www.english-corpora.org/

Davies, Mark. 2020b. The COCA corpus. https://www.english-corpora.org/coca/help/coca2020_overview.pdf.

Davies, Mark. 2008-2019. The Corpus of Contemporary American English (COCA): 600 million words, 1990-present. https://www.english-corpora.org/coca/

Friginal, Eric. 2018. Corpus linguistics for English teachers: New tools, online resources, and classroom activities. New York; London: Routledge.

Niedersächsisches Kultusministerium. 2015. Kerncurriculum für das Gymnasium, Schuljahrgänge 5-10: Englisch. https://cuvo.nibis.de/cuvo.php?p=download&upload=139 (12 October, 2020).

Wichmann, Anne, Steven Fligelstone, Tony MacEnery & Gerry Knowles (eds.). 1997. Teaching and language corpora. London: Pearson Education.

8 Appendix



Share This Book