Thank you to all those who attended my presentation Online English Learning: Resources, Activities, and Evidence at JALT Fukuoka. Thank you also to the CALL SIG for giving me the opportunity to attend the event. The slides shown during the presentation are now available to download as a PDF.
- Do it without looking. Tell the students to look down at the line, then look up and say it.
- Do it with the book closed (students can open it briefly to check if they forget the line).
- Substitute words and phrases for the students’ own ideas, change names, places, or any other words.
- Do it with emotion – happy, sad, angry, confused, etc. Get the students to try a variety of combinations.
- Do it with an accent – American, British, robot, zombie – get the students to use their imaginations!
- Do it with gesture only but no sound, over emphasizing the gestures to convey the meaning of the text.
- Tell the students to stand up and act it out. Get them to use props and costumes if available.
- Have the students write another five or ten lines for the dialogue, and then repeat steps 1 to 7.
- Repeat steps 1 to 7 with a different partner.
- Have the students translate the dialogue into their first language(s), and then back to English again without looking at the original.
The list is available under a Creative Commons license, and can be viewed and downloaded here.
The list of real sounding “fake” words used for the new Apps 4 EFL activity “Fight the Fakes” is now available for download.
The list was generated by looping through each of the words from the SIL list and splitting them into three-letter chunks. A Markov chain process was then used to determine which of the three letter chunks were most likely to precede or follow each other. The three-letter chunks were then recombined according to these likelihoods in order to create realistic sounding neologisms of various lengths, e.g.
The words were doubled checked against the SIL list to ensure no real words were accidentally generated.
Fun ways to teach with the words
- Try the new Apps 4 EFL activity Fight the Fakes, which uses the words as distractors against low frequency items from the BNC
- Ask your students to try and invent “definitions” for the fake words based on what they sound like, e.g. “hispanelist (n.), chat show panelist from Latin America”, “mandibilious (adj.), used to describe an animal with extraordinarily strong jaws”, “rattlesnatcher (n.), a person who goes around stealing toys from small children”
- Use them as in Yes/No vocabulary knowledge tests to ensure students don’t cheat by clicking “Yes, I know this word” for every item
- Word: the word (lemma) as it appears on the original list
- POS: the most common part-of-speech for the word according to the Moby Part-of-Speech database
- BNC Rank: the frequency ranking of the word according to the British National Corpus (lower number equals higher frequency)
- Google Rank: the frequency ranking of the word according to the Google Corpus (lower number equals higher frequency)
- IPA: the International Phonetic Alphabet transcription of the word, using data derived from the CMU Pronuncing Dictionary
- Conjugations: variations of the form of the word according to tense, person, etc*
- Synonyms: a list of words with similar or related meanings*
- – 23. Multilingual definitions: Arabic, Chinese, German, Greek, English, French, Italian, Japanese, Korean, Dutch, Portuguese, Russian, Spanish, Swedish, Thai, and Turkish*
*Data provided by public domain dictionary/thesaurus sources, where available.
Download the data:
- New General Service List (NGSL)
- New Academic Word List (NAWL)
- TOEIC Service List (TSL)
- Business Service List (BSL)
(Click the name of the list you require to open a read-only Google Spreadsheet. From the Google Spreadsheet, click “File” => “Download as” then choose your required format)
This supplementary data is available under the same license as the original lists: Creative Commons Attribution-ShareAlike 4.0 International License.
The final list consists of 3,773 high frequency TOEFL words, and can be downloaded here.
Step 1: Assemble a corpus of TOEFL materials
For my corpus, I used material from both the older CBT (Computer Based Test) and the current iBT (Internet Based Test). I found most of the materials online for free. Some were already in plain text format, but most were PDFs and required Optical Character Recognition (OCR) to convert to plain text. I used ABBYY’s FineReader Pro for Mac, but there are plenty of other options out there too. Some files were Microsoft Word format (.doc/.docx), and MacOS X’s batch conversion utility came in hand for these. I included model answers, listening transcripts, reading passages and multiple choice questions (prompts, distractors and answers). I tried to exclude explanations, advice and instructions from the authors and/or publishers.
Ultimately, I ended up with corpus just shy of a million words (959,124 to be precise). In general, bigger is better when it comes to corpus research. The TOEIC Service List (TSL) utilizes a corpus of about 1.5 million words, so my TOEFL corpus seems roughly comparable to this.
Step 2: Count the number of occurrences of each word
I used some custom PHP code to process my corpus data (although Python is probably more suited for corpus analysis). I lemmatized each token where possible using Yasumasa Someya’s list of lemmas. I then cross referenced each lemma occurrence with the NGSL, NAWL and TSL. Finally, I exported to a CSV, and ended up with 13,287 rows of data.
Step 3: Curate the final list
For my final list I removed any words which also appear on the NGSL, any contractions (e.g. “Don’t”,”I’m”,”that’s”), any numbers written in word form (e.g. “two”,”million”), any vocalizations (e.g. “uh”,”oh”), any ordinals (e.g. “first”,”second”,”third”), any proper nouns (“James”, “Elizabeth”, “America”, “San Francisco”, “New York”), and any words with fewer than 5 occurrences in the corpus. Next, I ran the list through a spell checker, and excluded any unrecognized words. I also excluded any non-lexical words, to leave a list consisting only of nouns, verbs, adjectives and adverbs.