Learning Languages with AI-powered Chatbots

March 19, 2021November 9, 2021 Paul Raine TEFL

Advances in both Natural Language Processing (NLP) and Automatic Speech Recognition (ASR) have raised the question of whether AI-powered chatbots could be an alternative or supplement to flesh-and-blood human teachers in some situations. Can these tools really help learners acquire foreign languages?

From the malevolent Hal 9000 in Stanley Kubrick’s 2001: A Space Odyssey to the charming Samantha in Spike Jonze’s Her, computers that talk have shocked and seduced us in popular culture for many decades.

Spike Jonze’s *Samantha* (pictured) developed an intimate relationship with its (her?) owner

When Apple officially incorporated their voice assistant Siri into iOS in 2011, the reality of having an intelligent assistant that understood and obeyed our every word seemed one step closer for everyone.

The Amazon Echo “smart speaker” hit the market in 2014, and has been the dominant device in that field ever since. Other products in the same space include Google’s Nest, and Apple’s HomePod. Social networks also started to jump on the AI assistant bandwagon, with Facebook incorporating chatbots into its Messenger platform in 2016, and LINE releasing the Clova assistant in 2017.

Smart Speakers such as Amazon’s Echo (pictured) have been gaining popularity since 2014

Advances in both Natural Language Processing (NLP) and Automatic Speech Recognition (ASR) have raised the question of whether AI-powered chatbots could be an alternative or supplement to flesh-and-blood human teachers in some situations. Can these tools really help learners acquire foreign languages?

General purpose AI assistants for language learning

The applicable theory of language learning to bear in mind here is interactionism – the idea that languages are acquired by interacting with other speakers of those languages.

Even though smartphone and smart speaker based AI assistants haven’t usually been designed specifically with language learning in mind, some innovative teachers and researchers have used them for these purposes. One of the major issues to overcome here is the fact that these assistants aren’t optimized for non-native speech, and may struggle to correctly transcribe or understand it.

Research has shown that the tech behind these devices can recognize non-native speech to some extent (with Google Assistant recognizing 87% of learner utterances and Apple’s Siri recognizing 67% in one Japan-based study).

So if a language learner is able to speak clearly enough for an AI assistant to understand them, what kinds of activities can be done to bring about further gains in language ability?

The applicable theory of language learning to bear in mind here is interactionism – the idea that languages are acquired by interacting with other speakers of those languages.

Interacting with another person with language involves taking turns, negotiating meaning (figuring out what the other person is trying to say), and an information gap (transferring information from one speaker to another). There is no reason in theory why the tenets of interactionism cannot apply to human-computer interaction as well as human-human interaction.

General purpose AI assistants can stand in for human interlocutors in interview or quiz type activities, especially where the learner is asking the questions. However, the interaction tends to become one-sided, because AI assistants don’t ask questions unless programmed to do so.

And while learners may receive implicit feedback on pronunciation or grammatical form where the AI doesn’t understand the question that has been uttered, they won’t receive explicit feedback unless they are using an app that has been specifically designed for language learners.

AI assistants specifically designed for language learning

There have been several attempts to develop AI assistants and other interactive speech apps specifically for language learners. Here we will take a look at some of these products and services, and evaluate their usefulness and effectiveness.

Duolingo Chatbots

Duolingo launched chatbots for its iOS app back in 2016, promising to help users “come up with things to say in real-life situations”. Although the feature seemed to be well received by its users, it quietly slipped away and there is no sign of it returning yet.

Duolingo’s chatbots included a “Help me reply” feature, which would suggest words and phrases for the learner to use in their responses. The interactions with the chat bots would become more advanced as the users’ level progressed. There were some limitations to Duolingo’s chatbots though. For example, they only offered structured dialogues, as opposed to open-ended speech.

Duolingo’s chat bots (iOS only) promised to help users “come up with things to say in real life situations”.. but the feature quietly slipped away and shows no sign of returning

We hope to see a new version of these Chatbots from Duolingo as they have shown promising results for language learning outcomes.

Elai (ETS)

In December 2020, ETS released Elai (iOS/Android), an app that allows users to practice speaking about a range of topics and receive feedback on their speech.

Elai includes model answers from other learners and native speakers, and also provides tips for learners who want to repeat the same exercise.

Unlike Duolingo chatbots, Elai’s focus is on open speech. Users must respond to a prompt and record their answers within a 30 second time limit.

Elai offers a variety of feedback on learner speech, including the extent to which the learner repeated the same words; how often the learner paused in during their speech; and whether or not the learner used a lot of “filler” words, e.g. “uh”, “erm”, “ah”.

Elai attempts to improve the speaker’s vocabulary knowledge by providing a list of higher level words at the end of the exercise, which could also be used to respond to the prompt.

Elai is still in Beta status, and the extent to which it will be embraced by learners, teachers, and researchers is still an open question, but being developed by one of the world’s largest English language testing companies (ETS is behind the TOEFL and the TOEIC) certainly puts it in a strong position from the outset.

Buddy.ai

Buddy.ai (iOS/Android) focusses on the young English language learner market, and promises to “[provide] unlimited practice of spoken English.. to millions of students”.

The app offers a variety of language games and activities, including listen and repeat, question and answer, and interactive videos.

One of the drawbacks of the app, however, is that it only supports users with Russian, Spanish, Turkish and Polish as a first language. The app has a bilingual interface, and if the user has a first language other than one of these four, they will struggle to understand the instructions.

ELSA

Elsa (iOS/Android) is a mobile app that focuses specifically on improving the users pronunciation to help them “speak like an American” (although proponents of TEFL Equity might have something to say about this – should “American” be the target for all English learners?).

Through listen-and-repeat and interactive dialogue type exercises, Elsa teaches the user how words are blended together in casual speech, which in turn helps to improve the user’s fluency.

Summary

The principles of interactionism suggest that language learners can improve their skills simply by conversing with another speaker of the target language. However, there are issues to be overcome when using AI-powered virtual assistants for language learners, including lack of optimization for non-native speech, and lack of true discourse-level interaction.

Apps that specifically target language learners can do better when it comes to recognizing non-native speech, and offering more life-like interactions.

English Central, for example, is one of the leaders in the recognition of non-native speech, and gives users instant feedback on their pronunciation and fluency while speaking lines from a library of thousands of videos.

However, many of the other apps discussed here focus on either niche segments of learners (e.g. Russian and Polish speaking children) or niche language language skills (such as fine-grain pronunciation problems).

There is yet to emerge an artificially intelligent chatbot which can be used by all levels and all ages of learners that offers true human-like interaction and feedback.

In addition, student reactions to the recent COVID pandemic have shown that many students value face-to-face learning over online methods. Although chatbots and smart speakers could be a useful supplement to face-to-face or online learning with a human teacher, it seems unlikely that they will be a complete replacement for human teachers any time soon.

Introducing Learn-English.Org!

February 10, 2021June 6, 2021 Paul Raine Tech Tips, TEFL

What is Learn-English.Org?

Learn-English.Org is a free website for learners of English to practice listening, speaking, reading, and writing online, anywhere, anytime!

How do I use Learn-English.Org?

Find an activity you would like to study by using the navigation panel on the left. There are three ways to navigate the activities on this site: by category, by skill, and by level.

Can I track my progress?

Yes, you can track your progress on this site by creating an account and then checking your progress report.

Who is behind Learn-English.Org?

This site is produced and developed by English language and Ed-Tech experts, and powered by TeacherTools.Digital, an innovative digital assignment creation platform for language teachers.

More Meditations on Machine Translation

December 7, 2020December 7, 2020 Paul Raine TEFL

At this year’s CEGLOC virtual conference, I watched a couple of presentations about the role of Machine Translation (MT) in language teaching and learning.

They got me thinking again about a subject I’ve written about a few times before and also edited an article on.

Here are a few key assumptions about the intersection of MT and language teaching/learning:

#1 The accuracy/naturalness of MT is continuing to improve. The output produced by MT is approaching the point where it is virtually indistinguishable from the output produced by human translators.

#2 It is unacceptable for a student to rely solely on MT when submitting work for an assessed course of language learning. An example of this would be if a Japanese student wrote a report entirely in Japanese, pasted the report into an MT tool, copied the resulting English, and handed in the work in as their own (without even looking at the resulting English text)

#3 Notwithstanding #2, MT could have powerful pedagogical applications if used in the right way.

#4 It is difficult, if not impossible, to completely prevent the use of MT without reverting to hand-written essays in exam conditions. Unlike plagiarism, MT cannot be easily detected by software. Although there are still some tell-tale idiosyncrasies of translations produced by MT (such as inappropriate grammatical subjects when translating Japanese to English, for example) such traits/mistakes are becoming less obvious as MT continues to improve

#5 The kind of behavior exemplified in #2 is clearly not a “CALL” technique. Computer Assisted Language Learning entails the use of computers to assist the learning of a language. Using MT to generate a piece of assessed work and handing it in sight-unseen is basically indistinguishable from plagiarism. But instead of passing off the work produced by another human intelligence as their own, the perpetrator is passing off the work generated by artificial intelligence as their own.

The above assumptions (if correct) raise some interesting questions, and force us to re-evaluate the reasons or motivations for learning a language.

The motivation for learning a language is often categorized into three main strands: integrative, intrinsic, and instrumental.

Integrative motivation compels language learners who wish to live in and integrate with a target language community. This kind of motivation might drive an American who wants to emigrate to and settle down in Japan, for example.

Would such an individual be able to rely solely on MT tools and devices to achieve this goal? Could they whisper sweet nothings into their iPhone, and then place the iPhone on their pillow and allow it to translate and convey those sentiments to their significant other? Perhaps not.

Intrinsic motivation comes from inside the individual and often arises from a deep interest in the target language itself. Intrinsically motivated students are interested not only in the syntactic structure of the target language, but also in how speaking the language will change the way they perceive and interact with the world around them.

Could an intrinsically motivated individual leverage the power of MT to further increase their knowledge of the target language? I think so. Would they be happy to completely delegate to MT the task of translating their thoughts from L1 to L2? Would they want to miss out on the philosophical or cultural insights that learning another language can bring about? I think not.

Instrumentally motivated individuals simply treat the target language as a means to an end. They want to get a promotion or avoid being demoted. They want (or have) to do business with speakers of the target language. They want to quickly translate an email or subtitles for a video (e.g., rev.com). They want to pass an exam or entrance test for a particular business organization or academic institution.

Could such an individual rely extensively on MT to achieve their aims? I think so. Would it be fair to allow them to do so, especially with regards to assumptions #2 and #5 above? Perhaps not.

That question would need to be decided by the organizations and institutions involved, who are best placed to judge the skills and competencies they require from candidates.

Given all of the above, I tend to believe that the use of MT tools and devices will continue to increase, especially in situations where instrumental motivation is paramount, or time and money costs are significant.

But in my role as a language teacher who has to assess the written and spoken output of language learners, there are difficult questions to answer with regard to the role that MT can or should play in the language learning process.

MT surely has many powerful pedagogical applications, but the temptation for time-pressed and sleep-starved students to rely solely on MT to produce the required output is high.

And then we’re into the familiar territory of plagiarism – passing off another’s work as your own. Something most academic institutions seriously frown upon.

So, those are my current thoughts on MT.

Would love to hear others.

20 Tech Tips from Joe Dale

June 14, 2020June 14, 2020 Paul Raine Tech Tips, TEFL

Joe Dale is a wealth of ed-tech tips and information

For anyone unfamiliar with Joe Dale, I highly recommend you check out his YouTube channel and follow him on Twitter. The man is an absolute wealth of tech tips for language teachers. Here are a few gems I picked up from him in a single Zoom session:

Make any video your lesson with EdPuzzle
Visualize your ideas in a new and collaborative way using JamBoard
Easily add transcribed voice comments and feedback to shared documents using the Mote Google Chrome extension
Allow students to create digital learning portfolios with Seesaw
Quickly and easily record your voice with Vocaroo or OnlineVoiceRecorder
Immersive Reader, included in OneNote Learning Tools, is a full screen reading experience to increase readability of content in OneNote documents
Ferrite Recording Studio makes it fast and easy to record and edit audio, and includes powerful features such as effects and automation
Voice Record Pro 7 is a professional voice recorder for iOS
Textivate generates a wide range of interactive activities based on your own text and / or matching items. It works with texts of up to 500 words and / or up to 200 matching items
Teach any language with YouTube + TeachVid
LearningApps.org is a Web 2.0 application, to support learning and teaching processes with small interactive modules
You can easily allow anyone to create a copy of a Google doc you have created by changing the end of the URL from /edit?usp=sharing to copy: https://docs.google.com/document/d/1lQdVTkuiT6oi-CZ9A9y6rrCXOyoX8VeSgBw-sH94WHA/edit?usp=sharing -> https://docs.google.com/document/d/1lQdVTkuiT6oi-CZ9A9y6rrCXOyoX8VeSgBw-sH94WHA/copy
Easily create any kind of Google Drive doc with the following URL shortcuts: doc.new, form.new, slides.new
Use Ilini to learn French with the best videos on the web
Create presentations, infographics, and more with Genially
Create your own personal Emoji with Bitmoji
Get popup translations for any website using Lingro
Get easy-to-understand multilingual definitions with WordReference.com
Exam.net is a robust, easy-to-use and secure exam platform
Draftback is a Chrome extension that lets you play back any Google Doc’s revision history

How does Speech Recognition work, and how can it help us teach English? (Part 1)

May 12, 2020May 12, 2020 Paul Raine TEFL

Automatic Speech Recognition (ASR) seems to be everywhere these days, from your smart fridge, to your smart phone, and every device in between. But how does it actually work, and how can it be utilized by teachers of English?

In the first part of this blog post, we learn how speech is transformed from vibrations in the air to text on your screen. In the second part (coming soon!), we take a look at some of the ways speech recognition can be used as a teaching and testing tool in English language pedagogy.

Step 1. Analog to digital

Humans live in an analog world. When we speak to each other, we don’t transmit streams of numbers to each other; we vibrate our vocal chords, which create sound waves that vibrate other people’s eardrums, which send electrical signals into the brain, which the brain interprets as words. Unfortunately, computers can’t process sound waves without first converting them into a digital form, i.e. a stream of numbers.

This is exactly what a microphone does. A microphone is basically an analog-to-digital converter (ADC), which changes vibrations in the air into electrical signals that can be represented by numbers. However, this is all a microphone can do. It can convert an analog audio wave into a digital stream of numbers, but it has no idea what words (or other sounds) those numbers represent.

In order to recognize words, we need a computer program that can break the recorded sound down into its individual phonemes, and then connect those phonemes into the most likely combinations to form words.

Step 2. Identifying phonemes

A phoneme is the smallest significant part of a spoken word. The word “cat”, for example, consists of three phonemes, transcribed in ARPABET as:

K AE T

What rule can we specify to allow our computer to determine whether a certain segment of a sound recording is the phoneme “AE” in “cat”? It is not an exact science. Different speakers pronounce the “AE” phoneme differently depending on their accent, their tone of voice, their vocal timbre, their age, gender, and even emotional state.

Instead of trying to come up with a rule for what the “AE” phoneme sounds like, we can feed a Machine Learning (ML) algorithm thousands of hours of English speech, and allow it to figure out for itself what the “AE” phoneme is supposed to sound like. Then we can ask the algorithm:

Given that these sounds are all “AE”, is this sound also “AE”?

An important point to note here is that the algorithm is not trying to figure out which phonemes individual words are made up of. This process has already been completed by language experts, who have released dictionaries of word-phoneme mappings that can be used to train speech recognition engines.

What the ML algorithm is trying to do is map sounds to phonemes, and then connect those phonemes into the most likely combinations to form words.

It does this by chopping up phonetically annotated sound clips into very short (25ms) frames. Each frame is converted to a set of numbers which represent the different sound frequencies in the frame. The ML algorithm then learns to associate certain frames or combinations of frames with the corresponding parts of the phonetic transcription.

Every time the training program encounters the “AE” phoneme, it accommodates the new example in its Acoustic Model (AM) of the sound, thereby building up a comprehensive representation of what the “AE” phoneme should sound like.

Step 3. Connecting phonemes

Once the algorithm has processed all of the training data, we can then ask it to identify an audio recording of the word “cat”. It will break the recording down and analyze it, as described above, it an attempt to identify its constituent phonemes.

However, because some phonemes (and consequently some words) have incredibly similar pronunciations, sometimes the computer’s best guess at the recording’s constituent phonemes isn’t accurate enough for reliable speech recognition. Fortunately, there is a way to improve the computer’s accuracy.

We can narrow down the possible phoneme choices by employing a statistical algorithm called Hidden Markov Model (HMM). HMM uses statistical probability to determine the likelihood of a future state (the next phoneme in the sound) given a current state (the current phoneme in the sound).

When it comes to phonemes in the English language, certain combinations are much more likely than other combinations. For example, “Z” in “zebra” never follows the phoneme “C” in “cat”, but “AE” in “cat” often follows “C” in “cat”.

When a speech recognizer is attempting to map a sound to its constituent words and phonemes, it will give precedence to likely combinations of words and phonemes over unlikely or impossible combinations. It knows what the likely combinations are by referring to a large database of phonetically transcribed recordings, known as the Language Model (LM).

For example, the sentence “Dolphins swim” is much more likely to occur in the English language than “Doll fins swim”, even though “dolphins” and “doll fins” are comprised of exactly the same sequences of phonemes.

Step 4. Hello computer!

We now have a computer program that can analyze recorded sound and convert it into the most likely sequence of words.

But how does all of this help English learners to improve their speaking skills? Read Part 2 to find out! (Coming soon!)

paulsensei.com

Japan-based EFL teacher, presenter, author, and developer since 2006.