The Rise of Machine Translation

I was somewhat surprised by several comments on social media in response to my last post, The War Against Machine Translation. Many of the comments spoke out in defense of machine translation (MT). In retrospect, some of the claims I made in my first post were a little far reaching. I’d like to address some of the points made in response to that post, and also clarify and moderate some of the initial claims I made.

I also want to preface this follow-up by stating that I am an avid proponent of Computer Assisted Language Learning (CALL). I have spent the last few years developing a website full of activities and tools for teachers and learners of English as a Foreign Language. However, I believe that anything that can be accurately classed as CALL must by definition assist the learning of a language.

Learning technology should never completely replace the learner. Unfortunately, the way many students view and use the output of MT is as a complete replacement for their own work.
Some students attempt to pass off the results of MT as their own work, which can cause issues for the teacher trying to fairly grade written English assignments

Learning technology should never completely replace the learner. Unfortunately, the way many students view and use the output of MT is as a complete replacement for their own work. In some cases, entire reports are written in L1, pasted into an MT tool, and then the output is submitted as the student’s “own work”. It would be very difficult to say that the students in these cases have learned anything about English. In many cases students fail to even read the result of MT before submitting it, containing as it does so many basic grammatical errors (especially with Japanese to English translation).

Having hopefully clarified my position somewhat, I’ll move on to respond to some of the comments made in relation to my initial post.

Machine Translation is more accurate for language pairs other than English/Japanese

One of my main arguments against the use of MT is that it is simply inaccurate, and is more likely to produce word salad than grammatically correct sentences. Some commenters pointed out that Google Translate, and other MT tools, do much better for other language pairs, particularly the more syntactically and lexically related European languages.

One of the sentences I used in my initial post was “How many close friends do you have?”. After feeding the natural Japanese translation for this sentence (親友は何人いる?) into Google Translate, it output “Best friend How many people”, which is a somewhat unsatisfactory result. However, repeating the same experiment with German (Wie viele enge Freunde haben Sie?) and Italian (Quante amiche intime hai?), Google managed to get the translation bang on, both times outputting “How many close friends do you have?”. However, it failed again for Arabic (How you have close friends?) and Russian (How do you close friends?).

“How many close friends do you have?” is a relatively short and simple sentence. How does MT perform with longer, more complex sentences or paragraphs?

I fed natural Japanese, French, and Italian translations of the following English paragraph into Google Translate:

One of the reasons Twitter is popular in Japan is a characteristic of Japanese itself: Japanese uses ideograms which enable it to convey more information in just 140 characters than other languages, not counting Chinese. Incidentally, the Japanese version of this sentence is written with exactly 140 characters. How many characters does it take in other languages?

For the Japanese translation, the tool output:

One of Twitter popularity of reason in Japan is on the characteristics of the language of Japanese. Japanese use ideographic characters, although not I the enemy is in Chinese, it is possible to incorporate a lot of content to 140 characters compared to many other languages. By the way, I wonder, but he has been written in Japanese version is just 140 characters of this sentence, become what characters in other languages?

For French:

One of the reasons that make the success of Twitter in Japan due to a peculiarity of the Japanese language. Compared to many other languages, Japanese, with its ideograms, can convey a lot of information in 140 characters, without competing with the Chinese. Incidentally, the Japanese version of the sentence is written in exactly 140 characters. But how many characters does it do so in other languages?

For Italian:

One of the reasons why Twitter is so popular in Japan is a very characteristic of the Japanese: Japanese uses ideograms that allow you to convey more information in 140 characters than other languages, without considering the Chinese. Inter-alia, the Japanese version of this sentence is written in exactly 140 characters. How many characters it wants in other languages?

Further research would be required to determine exactly how accurate MT is for any given language pair, but from these preliminary tests, it would seem that the less related the languages, the less accurate the translations. MT seems to do much better with more closely related language pairs, regardless of length or syntactical complexity.

The best approach to MT is not to ban it, but to highlight its (potential) inaccuracies. This is the correct approach regardless of the motivation or level of the students

When MT produces good results, the student may unjustly receive a good grade. When MT produces bad results, the teacher may waste their time giving corrections on English mistakes the student hasn't even made!
The only people who benefit from corrective feedback on MT generated English are Google Engineers

In my initial post, I argued that it would be difficult to ban MT all together (although we could reduce the opportunity to use MT by eliminating coursework, for example). If we ban smart phones, on which students can covertly use MT, we completely discard the other more positive technological affordances they provide. Instead, I suggested that we could highlight its inaccuracies to more highly motivated students.

The reason why I restricted this approach to more “highly motivated” students is because they have a desire to improve their English accuracy and idiomaticity, whereas students with low motivation often simply want to meet the course requirements and receive a passing grade in the easiest possible way. Some unmotivated students see MT as a quick and easy way to produce the required written assignments by writing them entirely in L1 and letting MT do the rest.

If you allow or even endorse the use of MT, when it comes to grading submissions, what are you actually grading? When MT produces good results, the student may unjustly receive a good grade. When MT produces bad results, the teacher may waste their time giving corrections on English mistakes the student hasn’t even made! Although I’m sure the Google engineers would be grateful for the feedback.

Pop-up translation, such as that provided by Rikai.com, is substantively different to MT provided by the likes of Google Translate
Pop-up translation, such as that provided by Rikai.com, is substantively different to MT provided by the likes of Google Translate

Fully featured MT is not the same as pop-up translation

Some commenters highlighted the usefulness of websites such as Rikai.com, which provides automatic pop-up translations of words when a user hovers their mouse over them. There are many other tools offering similar functionality, including PopJisho, ReadLang, Rikai-chan for Firefox, Rikai-kun for Chrome, and my own Pop Translation tool. However, there is a substantive difference between these tools and fully featured MT such as Google Translate.

Pop-up translation tools provide definitions on a word-by-word basis, rather than attempting to translate whole sentences. Allowing students to use pop-up translation to read and understand a passage in English is different to allowing them to translate the whole passage into their L1, and perhaps not even read the English version. Pop-up translation cannot be used to unilaterally produce a complete English passage from the student’s L1, or produce an equivalent passage in the student’s L1 from English. When using pop-up translation to read an English passage, students still have to read the English passage to decipher its meaning. Pop-up translation simply provides a more convenient and powerful alternative to a traditional dictionary.

Concluding remarks

In the preliminary tests I conducted, MT performed much better when translating closely related languages, such as English and French, or English and Italian. It did much less well with English and Russian and English and Arabic. It did quite poorly for English and Japanese.

Fully featured MT, such as that provided by Google Translate, may not be helpful for language learning where students view the output as a replacement for their own work. In the case where a student writes an assignment in L1, pastes it into Google Translate, and submits the output without even reading it, it would be difficult to imagine that any language learning has taken place. The tool is not being used to assist learning, but rather to avoid learning.

Teachers who permit or endorse the use of MT for English written assignments run the risk of unfairly rewarding students where the MT produces good results, and wasting time giving feedback to students where MT produces bad results.

Finally, fully featured MT, such as Google Translate, must be distinguished from pop-up translation tools such as those provided by Rikai.com. Pop-up translation tools do not attempt to translate sentences or paragraphs, but merely provide a more powerful and convenient alternative to traditional dictionaries. It is hoped that they assist the learning of vocabulary in the sense that students will read the English passage, encounter a word or phrase they do not understand, see the pop-up translation, and apply the meaning to the English word in that particular context.

The War Against Machine Translation

56adctw
LINE’s machine translation function can easily be confused for idle chat, but in fact it is potentially much more harmful

The problem of machine translation

As language teachers, it seems that every day we have to battle the pernicious force of machine translation (MT). In 1997, Alta Vista launched Babelfish, one of the first web-based interfaces for MT. Twenty years later, it seems like every web portal, social network, and search engine offers some kind of automatic translation tool. Even LINE, the kawaii messaging service ubiquitous in Japan, offers an instant translation function, which behaves just like regular chat.

But despite its apparent popularity, and arguable usefulness as an assistive tool to human translators, MT is not a helpful technology for language teachers or learners. It is at best a nuisance, and at worst strongly detrimental to students’ second language acquisition.

The main problems with MT with regard to language pedagogy are that:

  1. It is inaccurate, especially for idiomatic expressions; and
  2. It negates students’ opportunities for language learning

The first of these problems can be easily observed when typing any reasonably idiomatic expression into Google Translate, perhaps the best free web-based MT available right now. Unfortunately, as we shall see, that’s not saying very much..

Exhibit 1

screen-shot-2016-10-26-at-10-50-59-pm

In this example we see the translator mess up the word order, and also render the verb “drink” as the noun “drink”. “I went to drink a beer with friends” is the more natural human-produced translation for this sentence.

Exhibit 2

screen-shot-2016-10-26-at-10-50-14-pm

In this example, again, the word order is completely jumbled, and the singular “best friend” doesn’t make sense when the question requires a plural response. Once again, the human generated translation is far superior: “How many close friends do you have?”.

I won’t labor the point here, but you can do your own experiments with any of the currently available MT tools, and you will inevitably come to the same conclusion: MT is still quite bad. Although it can usually convey the gist of the input sentence, it clearly lacks eloquence, idiomaticity and accuracy.

What to do about it

Having concluded that MT is not a good pedagogical tool, the question arises as to how we can eliminate its use both inside and outside the language classroom.

Layout 1
Banning smartphones/laptops seems like overkill, especially considering the more positive technological affordances they offer

Ban smart phones in the classroom?

Within the classroom, you could prevent the influence of MT by banning smart phones entirely. But if you do this, you are indiscriminately blocking off more fruitful avenues to autonomous learning, along with many other positive affordances offered by mobile devices.

Automatic MT detection

Outside the classroom, your power over students is limited, especially over those more inclined to take the “easy” option of MT in the first place. In addition, although we may strongly suspect a student of using MT outside class, it is often difficult to prove. Although progress is being made in developing MT detection tools, it is still nascent technology. Most of the solutions available at the moment require both the source and translation text in order to attempt to detect MT.

Manual MT detection

It can be possible, however, to manually detect and prove machine translation if you have a working knowledge of your students’ L1.

In a recent low-level speaking class, I asked students to record and transcribe their answers to a 1-minute speaking task. One student’s answer seemed suspiciously like “translationese”. One sentence in particular stood out: “Mother of rice is very delicious”. I guessed that the student had tried to translate the Japanese sentence “お母さんのご飯はとても美味しい” which would be more naturally rendered as “My mother’s rice is very tasty” or more idiomatically as “My mother makes very good rice”.

After inputting my hypothesis into Google Translate, I was presented with the exact same broken English as the student had used in his report. He was well and truly “busted”!

Sometimes it is possible to recreate the exact same bad translation through guesswork and a knowledge of your students' L1
Sometimes it is possible to recreate the exact same bad machine translation through guesswork and a knowledge of your students’ L1

Eliminate coursework

Of course, detecting and subsequently proving the use of MT for a pile of 20 or 30 written reports is a huge waste of time. However, because the temptation to use MT, especially for low-level, low-motivation students is so high, simply instructing students not to do so can be ineffective.

The use of MT became so prevalent with one of my lower level writing classes, that I decided to eliminate coursework altogether, and administer every written assessment in exam conditions. This was the only way I found that I could guarantee that students were not using MT in their written assignments.

Highlight the inadequacy of MT

An alternative solution for more highly motivated classes (those that actually care about developing their English accuracy and idiomaticity) is to highlight how bad MT can be, and in the process hopefully dissuade them from using it all together. One way to do this is to input some English phrases into an MT tool, and translate them into your students’ L1. Students will then understand in a more direct way how bad some of the translations can be.

Translating from English to your students' L1 with MT can be a useful consciousness raising activity
Translating from English to your students’ L1 with MT can be a useful consciousness raising activity. The Japanese translation on the right is very unnatural.

Conclusion

One day, machine translation may be accurate enough to make language teachers redundant, along with translators, interpreters, subtitlers, and a host of other language-related professions. It may cause an industry shake-up as far-reaching as self-driving cars. But that day is unlikely to be any time in the near future, despite how far we’ve come in recent years. The current generation of MT tools often produce inaccurate and unidiomatic translations. MT is unhelpful for English language pedagogy, and steps should be taken to detect and prevent students’ use of MT.

30 Links for English Language Data Geeks

A typical corpus linguist
A typical corpus linguist.. Although I personally prefer blue braces.
  1. The Moby Lexicon Project
  2. BNC Baby
  3. Full BNC
  4. Project Gutenberg (Download full database)
  5. CMU Pronouncing Dictionary
  6. GNU Collaborative International Dictionary of English
  7. The Internet Dictionary Project
  8. English Wikitionary Dump
  9. Simple English Wiktionary Dump
  10. JACET 8000
  11. Minimal pairs in English RP
  12. List of homographs
  13. Homophones in English RP
  14. Google’s Official List of Bad Words
  15. Yasumasa Someya’s Lemmas List
  16. MRC Psycholinguistic Database
  17. Million Song Dataset
  18. Penn Treebank P.O.S. Tags
  19. Princeton University’s WordNet
  20. The Sentence Corpus of Remedial English
  21. Summer Institute of Linguistics (SIL) Word List
  22. The Tanaka Corpus
  23. The General Service List
  24. The New General Service List
  25. The Academic Word List
  26. The New Academic Word List
  27. The TOEIC Word List
  28. The Business Service List
  29. Apache Open Office MyThes
  30. Global WordNet

Generating over 2000 flashcards from a DIY corpus of TOEFL material


Download the CSV of all 2,313 terms (inc. Japanese definitions) or access the full list on Quizlet.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


Step 1: Assemble a corpus of TOEFL past papers

TOEFLFor my corpus, I used material from both the older CBT (Computer Based Test) and the current iBT (Internet Based Test). I found most of the materials online for free. Some were already in plain text format, but most were PDFs and required Optical Character Recognition (OCR) to convert to plain text. I used ABBYY’s FineReader Pro for Mac, but there are plenty of other options out there too. Some files were Microsoft Word format (.doc/.docx), and MacOS X’s batch conversion utility came in hand for these. I included model answers, listening transcripts, reading passages and multiple choice questions (prompts, distractors and answers). I tried to exclude explanations, advice and instructions from the authors and/or publishers.

Ultimately, I ended up with corpus just shy of a million words (959,124 to be precise). In general, bigger is better when it comes to corpus research. The TOEIC Service List (TSL) utilizes a corpus of about 1.5 million words, so my TOEFL corpus seems roughly comparable to this.

Step 2: Count the number of occurrences of each word

I used some custom PHP code to process my corpus data (although Python is probably more suited for corpus analysis). I lemmatized each token where possible using Yasumasa Someya’s list of lemmas. I then cross referenced each lemma occurrence with the NGSL, NAWL and TSL. Finally, I exported to a CSV, and ended up with 13,287 rows of data.

Step 3: Curate the final list

For my final list I removed any words which also appear on the NGSL, any contractions (e.g. “Don’t”,”I’m”,”that’s”), any numbers written in word form (e.g. “two”,”million”), any vocalizations (e.g. “uh”,”oh”), any ordinals (e.g. “first”,”second”,”third”), any proper nouns (“James”, “Elizabeth”, “America”, “San Francisco”, “New York”), and any words with fewer than 5 occurrences in the corpus. Next, I ran the list through a spell checker, and excluded any unrecognized words. I also excluded any non-lexical words, to leave a list consisting only of nouns, verbs, adjectives and adverbs.

Step 4. Generate flashcards

I now had a list of 2313 terms, made up of 523 adjectives, 123 adverbs, 1366 nouns, and 301 verbs. I used Text to Flash to generate Japanese definitions for each word, then uploaded the words to Quizlet, separated into part-of-speech and ordered alphabetically.

Multilingual, part-of-speech categorized, difficulty sorted Quizlet flashcards for NGSL, NAWL and TSL

Feb. 2017 update

Unfortunately, after uploading all the flashcard sets to Quizlet, my account started to run so slowly that it became unusable. I had to remove the majority of the data from Quizlet, but I am now offering the data to download in CSV format. Users can upload the flashcards to their own Quizlet accounts if required by using the import function.

Links to the CSVs are as follows:

Each download (.zip) includes translations for: Arabic, Chinese, Dutch, English, French, German, Greek, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Swedish, Thai, and Turkish. Translations were automatically generated from public domain dictionary sources.


I’ve generated multilingual, part-of-speech categorized, difficulty sorted sets of flashcards for the latest New General Service List (NGSL), New Academic Word List (NAWL) and TOEIC Service List (TSL), and added them to Quizlet.

The sets are organized in classes according to the definition language. Each class contains sets of flashcards for the four lexical parts of speech (adverbs, verbs, adjectives and nouns). There are a maximum of 20 flashcards in each set, and the sets are ordered by difficulty (i.e. frequency), with Part 1 of each list containing the easiest (most common) words.

As no information was given about part-of-speech in the word lists themselves, I tagged the words using Moby, and selected only the most common part-of-speech for words which can be used as multiple parts-of-speech. The word “register”, for example, is listed as a noun by Moby before it is listed as a verb, so only the noun definition of “register” was included in the flashcards.