230,000 real sounding “fake” words

The list is available under a Creative Commons license, and can be viewed and downloaded here.


The list of real sounding “fake” words used for the new Apps 4  EFL activity “Fight the Fakes” is now available for download.

The list was generated by looping through each of the words from the SIL list and splitting them into three-letter chunks. A Markov chain process was then used to determine which of the three letter chunks were most likely to precede or follow each other. The three-letter chunks were then recombined according to these likelihoods in order to create realistic sounding neologisms of various lengths, e.g.

  • generotizing
  • liminativate
  • coronably
  • solarians
  • troscorifyingly

The words were doubled checked against the SIL list to ensure no real words were accidentally generated.

Fun ways to teach with the words

  • Try the new Apps 4 EFL activity Fight the Fakes, which uses the words as distractors against low frequency items from the BNC
  • Ask your students to try and invent “definitions” for the fake words based on what they sound like, e.g. “hispanelist (n.), chat show panelist from Latin America”, “mandibilious (adj.), used to describe an animal with extraordinarily strong jaws”, “rattlesnatcher (n.), a person who goes around stealing toys from small children”
  • Use them as in Yes/No vocabulary knowledge tests to ensure students don’t cheat by clicking “Yes, I know this word” for every item

Rankings, definitions, pronunciations and additional data for NGSL, NAWL, TSL, BSL and TfSL

Download the data:


Each spreadsheet contains 23 columns:

  1. Word: the word (lemma) as it appears on the original list
  2. POS: the most common part-of-speech for the word according to the Moby Part-of-Speech database
  3. BNC Rank: the frequency ranking of the word according to the British National Corpus (lower number equals higher frequency)
  4. Google Rank: the frequency ranking of the word according to the Google Corpus (lower number equals higher frequency)
  5. IPA: the International Phonetic Alphabet transcription of the word, using data derived from the CMU Pronuncing Dictionary
  6. Conjugations: variations of the form of the word according to tense, person, etc*
  7. Synonyms: a list of words with similar or related meanings*
  8. – 23. Multilingual definitions: Arabic, Chinese, German, Greek, English, French, Italian, Japanese, Korean, Dutch, Portuguese, Russian, Spanish, Swedish, Thai, and Turkish*

*Data provided by public domain dictionary/thesaurus sources, where available

Introducing the TfSL (TOEFL Service List)

The final list consists of 3,773 high frequency TOEFL words, and can be downloaded here.


Step 1: Assemble a corpus of TOEFL materials

TOEFLFor my corpus, I used material from both the older CBT (Computer Based Test) and the current iBT (Internet Based Test). I found most of the materials online for free. Some were already in plain text format, but most were PDFs and required Optical Character Recognition (OCR) to convert to plain text. I used ABBYY’s FineReader Pro for Mac, but there are plenty of other options out there too. Some files were Microsoft Word format (.doc/.docx), and MacOS X’s batch conversion utility came in hand for these. I included model answers, listening transcripts, reading passages and multiple choice questions (prompts, distractors and answers). I tried to exclude explanations, advice and instructions from the authors and/or publishers.

Ultimately, I ended up with corpus just shy of a million words (959,124 to be precise). In general, bigger is better when it comes to corpus research. The TOEIC Service List (TSL) utilizes a corpus of about 1.5 million words, so my TOEFL corpus seems roughly comparable to this.

Step 2: Count the number of occurrences of each word

I used some custom PHP code to process my corpus data (although Python is probably more suited for corpus analysis). I lemmatized each token where possible using Yasumasa Someya’s list of lemmas. I then cross referenced each lemma occurrence with the NGSL, NAWL and TSL. Finally, I exported to a CSV, and ended up with 13,287 rows of data.

Step 3: Curate the final list

For my final list I removed any words which also appear on the NGSL, any contractions (e.g. “Don’t”,”I’m”,”that’s”), any numbers written in word form (e.g. “two”,”million”), any vocalizations (e.g. “uh”,”oh”), any ordinals (e.g. “first”,”second”,”third”), any proper nouns (“James”, “Elizabeth”, “America”, “San Francisco”, “New York”), and any words with fewer than 5 occurrences in the corpus. Next, I ran the list through a spell checker, and excluded any unrecognized words. I also excluded any non-lexical words, to leave a list consisting only of nouns, verbs, adjectives and adverbs.

The Trouble with Bring Your Own Device (BYOD) in ELT

smartphone_zombies
Bring Your Own Device (BYOD) can allow teachers to use technology where it would otherwise be unavailable
BYOD (Bring Your Own Device) is the only solution for educators who wish to use technology in the classroom when access to a CALL lab or institutional set of devices is not available.

Almost all university freshmen in Japan now possess a smartphone of some description. These are generally either iPhones running iOS or OEM handsets running Android. iOS seems to be somewhat more popular in Japan, but there are still a fair number of students with Android handsets, and a few with rarer hardware/software combinations.

If you are relying on BYOD for your tech-powered teaching, the fact that not all your students will have the exact same device is where your problems begin, but not, unfortunately, where they end.

Fragmentation

OS fragmentation is “a barrier to a consistent user experience, a security risk, and a challenge for app developers.” It is caused by mobile device owners’ unwillingness or inability to update to the latest version of their device operating system whenever an update is released. This problem is particularly pronounced for Android handsets, but also exists in relation to iOS.

This might not be a problem for individual users, but it becomes a major issue when leading a group of students in lock-step through a structured learning process. The fact that the “user experience” is inconsistent means that there is no single set of instructions that all students will be able to follow. The fact that developing for every possible OS/handset combination is a challenge means that many apps only run on the latest OS versions of the most popular handsets.

So, although every student may possess a smartphone, not every smartphone will be able to run the cool CALL app you have in mind. Even if they can, you will either have to give individual support to every student in helping them set up the activity, or create multiple iterations of the instructions to cover every OS/device eventuality.

3b9f0c33bb877055bfdb2b01d02b25e0
Mobile devices are cornucopias of personal and private data
Privacy

Unlike institutionally owned devices, which can be easily wiped after the user logs out or finishes the class, student owned devices contain a trove of personal data: photos, messages, appointments, contact information, and more.

Most students would probably feel uncomfortable sharing at least some of this information with their teachers. So when we walk around the room monitoring students to make sure they are on-task, or helping them set up the mobile-based CALL activities, we have to be careful not to inadvertently peek into the personal lives behind the tiny glowing screens in their hands.

Distractions

Ever since Apple overhauled the iOS notification system, it seems that every app and its dog wants to send me updates, offers, news and status reports. While I endeavor to disable notifications for any app that doesn’t absolutely need them, my students tend to be less discerning. There’s nothing worse than setting up a class activity on mobile devices, only to have students navigate away from the app or site the moment a giant emoji-laden message drops down from the top of the screen. Even the students who diligently dismiss annoying messages from friends must find them a distraction from the learning process.

And I haven’t even begun to mention the students who will double click the home button and go back to Candy Crush the minute you’re not hovering over their shoulders and spying on their screens.

tumblr_nbavaea8he1qz5ttno1_12801
The millennial version of Maslow’s hierarchy of needs
Battery

The modified version of Maslow’s hierarchy of needs now puts battery life right at the bottom of the pyramid, directly below “Wi-Fi”. Yes, this a sarcastic dig at millennials’ seeming inability to pull themselves away from their devices and do something healthy like.. climb a tree. However, in the CALL-based EFL classroom, it is a very pertinent observation.

Battery life hasn’t really improved as much as we’d like in recent years, and certainly not as much as storage capacity or processor speeds. It seems that battery life isn’t subject to Moore’s law, as the science behind it is based on thermodynamics rather than electrodynamics.

This means that students, who are already heavy mobile users, may simple not have enough juice to utilize their devices during study time as well as break time. Where this is the case, you’d better hope that you have enough power outlets and charging cables to get them hooked back up to the mainline.

Data

Capped data plans on mobile are generally the norm these days. There may be actual technological reasons behind this, but the cynical side of me suspects it’s just the carriers trying to milk heavy users for more money.

In any event, if you don’t have an easily accessible Wi-Fi network in your classroom (which isn’t restricted to just teachers) and you’re asking students to use their own data connections to engage with your chosen app or website, you have to be careful not to inadvertently incur additional charges for your students. Usually they will be quick to let you know when this is the case, but it can be yet another barrier to the successful exploitation of BYOD.

Summary

If you can overcome the difficulties presented by various models of various handsets running various versions of various operating systems, and all students have a fully juiced up device with plenty of bandwidth, and they are able to pull themselves away from Candy Crush, and ignore messages from their friends in other classes, then BYOD can be a good way to gain access to mobile technology in the classroom.

However, we must be careful not to appropriate students personal (and often private) devices as our own teaching tools, despite how cool that new ELT app may be.

25 Tech Tips from JALT 2016

  1. etec-illustrationThe Free Music Archive is a great place to find music for apps, games, and other projects
  2. LiveInk allows you to easily render any text in a “brain friendly” way
  3. FreeHostia allows you to host your own website, blog, or bulletin board for free
  4. Roll20 is a suite of easy-to-use digital tools that expand pen-and-paper game play
  5. Pics4Learning provides free clip art for educational resources..
  6. ..while the Library of Congress can be used to find public domain images of historical significance..
  7. ..and ELT Pics is a Flickr photo stream containing over 25,000 pictures for teachers of ESL
  8. LibSyn provides podcast hosting for only $5 a month
  9. Hopscotch helps you learn to code through creative play..
  10. ..while Javascript Obfuscator helps you protect your ideas when you eventually create that “killer app”..
  11. ..and the Learn How YouTube channel contains video tutorials for beginner programmers
  12. Imiwa is a Japanese dictionary for iOS
  13. Smart Smart produces many different apps for English study
  14. TEDict is an iOS app which allows you to use TED videos for listening dictation practice
  15. News in English provides English news in three different levels of difficulty
  16. Herstory is an innovative and interactive detective game
  17. EFL Technologies provide a number of free apps for learning the NGSL, GSL, NAWL and AWL
  18. English Test Prep Review provides unofficial guides and review materials for TOEFL, TOEIC, and other standardized tests
  19. Leander’s Lexicon Extractor is a free online tool that allows students and teachers of English to quickly extract and list important vocabulary from an inputted text
  20. CleverBot allows students to practice English conversation with an artificially intelligent chat bot
  21. Bloomin’ Apps lists apps and websites for every level of Bloom’s Revised Taxonomy
  22. The Rule of 6 (eBook) propounds a simple framework for how to teach with an iPad
  23. The WikiTude SDK allows you to build your own augmented reality app..
  24. ..while Aurasma enables anyone to easily create, manage, and track augmented reality experiences
  25. Documents 5 allows you to read, listen, view and annotate many kinds of documents on your iPad or iPhone