duolingo optimization of the spaced repetition system to

1

Treball de Fi de Grau

Duolingo – optimization of the

Spaced Repetition System to improve

long-term memorization

Fernando Muley Vilamu

Tutora: Maria del Carme Colominas Ventura

Seminari 104: 21589

Curs 2020-2021

2

Abstract

While Duolingo is the most popular online language learning platform and mobile application,

the popularity of the Anki Flashcards application has grown thanks in part to its algorithm. Both

applications use Spaced Repetition System (SRS), but only Anki offers customizable options

and features to modify it, in order to memorize vocabulary better (Seibert, 2019). Today, it is

the third most used application with SRS.

The objective of the study is to illustrate how the SuperMemo 2 algorithm (Wozniak, 1998) has

worked better than Duolingo’s in the short and mid-term. Through a personal learning

experiment with Japanese and Chinese, using Anki and Duolingo, results have shown that

applying the algorithm and adjusting the interval modifier to achieve the Eighty Five Percent

Rule (Wilson, 2019) can be beneficial for short and mid-term retention. This has predicted

better my forgetting curve (Ebbinghaus, 1885). Further research with bigger groups could be

interesting to study long-term benefits of the algorithm.

Glossary

Flashcards, Spaced Repetition System, lag effect, ease

3

Index

1. Introduction ................................................................................................................................... 5

2. An insight into Autonomous Language Learning ...................................................................... 7

3. Analysis........................................................................................................................................... 9

3.1 Why Anki and Duolingo? ...................................................................................................... 10

3.2 Preliminary assumptions ....................................................................................................... 13

3.3 Motivation of the study ......................................................................................................... 16

3.4 Anki’s configuration and ideal algorithm .............................................................................. 22

3.4.1 Introduction to Anki ...................................................................................................... 23

3.4.2 Configuration ................................................................................................................. 25

3.4.3 Ideal algorithm: SuperMemo 2 ...................................................................................... 31

3.5 Duolingo’s configuration and algorithm ............................................................................... 38

3.5.1 Introduction to Duolingo ............................................................................................... 38

3.5.2 Duolingo’s algorithm..................................................................................................... 42

3.6 Chinese and Japanese Experiment ......................................................................................... 47

3.7 Pros and cons of Duolingo VS Anki ..................................................................................... 50

4 Conclusions .................................................................................................................................. 55

5 Webography ................................................................................................................................. 58

4

Figures Index

Figure 1 Speaking exercise of Duolingo that uses speech-to-text technology ......................... 14

Figure 2 Duolingo's settings. In green, the option to enable speaking exercises ..................... 15

Figure 3 Duolingo uses SRS technology to improve long-term memory ................................ 18

Figure 4 Russian experiment, list of 32 words learnt on Day 1. All words at full “strength” . 19

Figure 5 Russian experiment, Day 2: words list the day after studying ................................... 19

Figure 6 Russian experiment, Day 3: words list 2 days after the study ................................... 20

Figure 7 Russian experiment, words list after 1 month ............................................................ 21

Figure 8 Shared decks section in Anki's website ..................................................................... 22

Figure 9 Anki's flashcard example ........................................................................................... 24

Figure 10 Anki's options: New Cards ...................................................................................... 25

Figure 11 Anki's algorithm ....................................................................................................... 26

Figure 12 Anki's options: Reviews ........................................................................................... 27

Figure 13 Anki's Ease : Diagram.............................................................................................. 28

Figure 14 Anki's options: Lapses ............................................................................................. 30

Figure 15 Supermemo prediction on Spaced repetition users .................................................. 31

Figure 16 Retention interval in Mathematics students: Massers and Spacers ......................... 32

Figure 17 True Retention by Card Maturity add-on ................................................................ 35

Figure 18 Anki's New Cards settings with SuperMemo algorithm .......................................... 36

Figure 19 Duolingo exercise example with an image .............................................................. 39

Figure 20 Duolingo exercise: Type the translation .................................................................. 40

Figure 21 Duolingo exercise: Choose from A, B or C ............................................................. 40

Figure 22 Duolingo exercise: Choose words from boxes ........................................................ 41

Figure 23 Duolingo's skills progression tree ............................................................................ 43

Figure 24 Leitner SRS with exponential boxes ........................................................................ 45

Figure 25 Forgetting curve (Ebbinghaus, 1885) ...................................................................... 46

Figure 26 Duolingo's final loss function algorithm .................................................................. 46

Figure 27 Duolingo accepts other correct solutions ................................................................. 51

Figure 28 Duolingo exercise: pairing ....................................................................................... 52

Figure 29 Duolingo exercise: choose 1, 2 or 3 ......................................................................... 53

5

1. Introduction

To master a language, memorize and retain it, the Spaced Repetition System (henceforth

SRS) is key to make the time spent revising vocabulary more efficient in the long term. This

is because of the lag effect, for which people learn anything better if the time of study is

gradually increased over time (Melton, 1970).

For example, we save time if, when we have learnt the word Apfel (“apple” in German), we test

ourselves the next day. And if we answer correctly, then after 4 days, and then 10 days, and

then 23 days, etc. This is called an exponential spaced repetition system. Some applications use

it, and the most popular ones nowadays were developed 10-15 years ago. This system is key in

language learning.

The SRS is a good method to increase the review spans in the long term in order to make the

time spent reviewing efficient. Why efficient? Because we want to use the minimal amount

of time reviewing to not forget the words previously learnt. Therefore, we should try to

review all the flashcards right before we forget them. Consequently, we will have more

time to learn new words. For this reason, it is key to have an application / program with a

very good SRS algorithm that can detect accurately our memory spans. Several mobile

applications and programs use this SRS, but with different algorithms and configuration.

With the objective of illustrating the advantages of a good algorithm in a SRS application, I

have done an experiment to contrast the learning process of 2 different languages, Chinese

and Japanese. Until the experiment day, I did not have any knowledge of them, apart from 3

basic typical expressions like “Nǐ hǎo” for Chinese or “Arigatō” for Japanese.

The experiment consists of the study of both languages for 7.5 hours each, of 15 hours in total,

for a length of 3 months with a maximum day routine of 15 minutes each, so 30 days in

total (30d * 15m = 7.5h). The days that I have studied I have spent the exact amount of time

for both languages, whereas the days that I have not studied I have studied nothing of both, thus

avoiding differences and variables that could interfere the memory spans and algorithms. I have

done Japanese with Duolingo and Chinese with Anki. Both Duolingo and Anki use the SRS,

but in different ways. This difference is the reason of this study as I try to find out which one is

the best.

The results of the Chinese and Japanese experiment have shown that Anki works better in short-

term and mid-term study (reviews with 82% of accuracy in comparison to 19% of Duolingo).

6

SuperMemo 2 algorithm has been more useful for short-term retention in comparison to

Duolingo’s algorithm, which learns from user’s mistakes, but that in this case has not correctly

predicted my retention of words.

On the whole, the objective of the study is to proof that the Anki system, when properly

customized with a good algorithm, in this case the one called SuperMemo 2, which will be

further explained, can be more time-effective in the cards memorization process than the

Duolingo algorithm regarding vocabulary cards (Seibert, 2019). Besides, further investigation

should be carried out in order to demonstrate whether Anki and flashcards could be useful for

learning grammar or not.

In conclusion, Duolingo seems to have a good SRS and a correct algorithm, according to several

scientifical articles carried out by its research team. Nevertheless, it has not proven its short and

mid-term efficacy, but more research should be carried out in order to prove its long-term

benefits, as Duolingo’s SRS works better if the user spends more time using the application.

Anki’s SRS with the SuperMemo 2 algorithm and the Eight Five Percent Rule (Wilson et al,

2019) has proven to work good for short-term retention intervals in the Chinese experiment.

The early retrieval of words from the very beginning allows the user to properly retain words.

This assures that practically no new words are learned before having reviewed the previously

learned ones, and this has not happened with Duolingo. All in all, this is proven to be key for

user’s motivation, engagement and memorization optimization. Therefore, some features and

settings of Anki could be good incorporated into Duolingo’s algorithm and configuration in

order to have similar results.

7

2. An insight into Autonomous Language Learning

Since the beginning of the century, online language learning has gained a lot of popularity.

According to the Aurora Institute, an American organization who works on new education

systems for high-quality learning in secondary students, the number of K-12 students (who have

18 years old) who took online courses grew from 45.000 to more than 3 million from 2000 to

2009 (Horn et al, 2011). With this increasing demand on good online resources, we have seen

the rise of a lot of language learning platforms.

Consequently, this creates new opportunities, available for everyone who has access to internet,

and allow learners to take online courses, most of the times for free. In addition, the widespread

of high-speed internet and the current COVID 19 pandemic has proven that there are many

things that can be done at home. Learning autonomy is a consequence of it.

Learner autonomy is identified as “the skills that learners develop in order to learn effectively

on their own” (Toffoli et al, 2019). According to the same author, Candice Stefanou et al

grouped these autonomy skills into 3 categories in a study in 2004: organizational, procedural

and cognitive autonomy.

Organizational autonomy consists in the own responsibility to choose dates and deadlines.

Procedural autonomy includes the methodological skills, the preferred media and tools.

Finally, the cognitive autonomy concerns how learners analyze the problems, check results

and think about the learning process.

Organizational and cognitive autonomies are available or acquirable for everyone who wants

to follow an online learning method (procedural autonomy). But there is one fundamental point

to take into consideration regarding the process: digital literacy. According to Toffoli, in order

to follow an online method it is a mandatory requirement that students have the technical skills

and online technology knowledge in order to access to it. Digital literacy will affect the online

language learning and the future proficiency. Besides, the motivation of the student will play a

key role.

Let me put a real example that relates to the use of Anki in a university class of Japan. The

teacher Richard C. Bailey (2011) wanted to promote autonomy in his students and introduced

Anki to them for 1 year. Then, he sent them a deck of flashcards in order to memorize some

English words. In the first semester, the usage of the application was very low. He identified 2

problems:

8

1. Students did not know how to use Anki and therefore could not use it independently.

2. Students did not understand how Anki could help them learn because they had not

experienced correct use of the program.

In the 2nd semester, he decided to do an extra formation to the students, in order to give them

all the knowledge to understand how to make the most of it and use it independently. He told

them as well how Anki could be helpful for their learning. After this additional formation, he

saw a dramatic increase of the use of Anki, and some students started using the program outside

of the classroom as well.

As can be seen in the Figure 1, the blue number of repetitions corresponds to the first semester

and the red ones to the second semester. This proves that digital literacy is a key aspect in the

procedural autonomy and therefore in autonomous language learning.

Figure 1 Bailey (2011) Students number of repetitions of Anki

9

3. Analysis

After 2 months, I have studied Chinese with the top-rated cards deck from Anki’s website. I

found it among the Shared decks section, which are made and uploaded by users. The deck I

have chosen is called “Most Common 3000 Chinese Hanzi Characters”, with 80 votes (75

upvotes VS 5 downvotes), as can be seen in the Figure 2. There are 3 decks with more votes,

but they are focused for Chinese speakers who learn English. And the aforementioned deck is

more interesting because it is specifically made for English speakers, like the Japanese

Duolingo course.

Figure 2 In red, the top-rated decks from AnkiWeb Chinese shared decks collection. In yellow, the top-rated Chinese deck

focused on English speakers

In the case of Japanese, I have studied the Duolingo Japanese standard course for English

speakers. It does not exist a Japanese course for Spanish speakers, but even if there was one, I

would have chosen the same. The reason is that it is better to avoid possible variables that could

affect in the objective of the study and its further analysis, given that I have a C1 level of English

according to the CEFR, but a native level of Spanish.

10

3.1 Why Anki and Duolingo?

The application that does not have an SRS, like Quizlet (the 2nd most popular flashcard

application), have the risk of causing a tedious and boring review time for its users because they

have problems differentiating between easy and difficult cards, as there is no exponentiality in

language learning (like it would with 1-4-10-23… days). Therefore, as users do not know which

words they should spend time reviewing, one possible consequence is that they may only use it

to learn but not to review.

This application may end up as an application for “casual learners” because the review aspect

is not correctly developed and people use it above all to learn new words, like a dictionary, but

with a better interface. Thus, they lose potential users, due to the lack of engagement in the long

term.

So why have I chosen both Duolingo and Anki? On the one hand, Duolingo is the most

popular language learning platform application so far, according to the number of

downloads on Google Play, with more than 100M+. On the other hand, Anki is the 3rd most

used flashcards application with 5M+ downloads, after Memrise (1st with 20M+) and Quizlet

(2nd with 10M+). Besides, Anki is the 2nd one using an SRS, after Memrise.

Why Anki and not Memrise, the 1st flashcard application and with SRS? According to

Memrise website’s FAQ, the review time (if the answer is correct) has 8 steps. In addition, each

incorrect answer starts the card over to 4 hours > 12 hours > 24 hours, etc.

4 hours > 12 hours > 24 hours > 6 days > 12 days > 48 days > 96 days > 6 months > 6

months > 6 months > 6 months…

Although these Spaced Repetition timings are not as efficient as Anki’s, as I will further explain

in the analysis, the first 8 steps have a good and clear progression. The problem comes to the

last one, not because of the time (6 months), but because it is the last one. That is to say, the

next review after 6 months will be 6 months over and over again, if the answer is correct.

In my opinion, this is not efficient in the long term. Let me put an example. I started studying

French 4 years ago with Anki. Up to now, I have studied already 12000 flashcards, but 7000 of

11

them (more than 55% altogether) have an interval of 6.1+ months, as it can be seen in the Figure

3. That means that, if I had studied them with Memrise, all those 7k words would have an

interval of just 6 months.

Figure 3 Anki database: the total number of flashcards with a time interval of +6 months is more than half of total

For example, the 2 ones with the longest interval are of +10 years. My last review of both was

in march, but the previous one was on 2019, 2 years ago, as it can be seen in Figure 4.

12

Figure 4 Anki database: flashcards with longest interval

In these 2 cases, the interval grew to 10 years because:

- The previous time I studied the cards was 2 years ago.

- It had a high Ease %.

- I answered Easy.

All these options will be explained in detail throughout the analysis, and how they affect the

algorithm. The point is, why should I learn, after 180 days, words that I studied 2 years ago

and I still remember? Our brain remembers exponentially, with no limit of time. If you let me

put an oversimplified example, we will not forget our mother’s name if we are 180 days with

no contact with her, nor our Spanish phone number if we have just used an American one for

one year because we moved to California. Consequently, this algorithm is not time-efficient,

thus a “waste of time”, considering that we could use this time to learn new words.

This is very important, because massed practice (like studying the day before the exam), aka

cramming, can be useful as well, of course. In opposition with spaced repetition, studying for a

lot of hours, repeating many times the same words and reviewing them more than necessary

will indeed increase our knowledge and memory of them. However, Ebbinghaus (1885), in the

13

first study of spaced repetition, proved that spaced repetition required less total time than

cramming in an experiment on the memorization of verbal utterances. That is to say,

Ebbinghaus showed 2 centuries ago that studying with an exponential progression required

less amount of time in the long run memorization.

Finally, on the one hand we could say that Quizlet is focused on short-term learners, given that

in 2020 they decided to stop using SRS. On the other hand, Memrise looks for mid-term

learners, as the spaced repetition progression ends its exponentiality once it arrives at 6 months.

For this reason, Anki is better for long-term memorization, as I will show in the analysis.

3.2 Preliminary assumptions

This investigation tries to shed light on the advantages of the SRS in the memorization process

of a language. To do it, Anki, a Flashcard Application with a customizable algorithm, is

compared with Duolingo. In order to better illustrate the scope and focus of this study, I will

mention some previous assumptions in order to show that these study does not intend to tackle

them:

- Duolingo can be MORE useful to have a degree of active oral competence, thanks

to a speech-to-text technology that it has integrated. That is to say, a technology capable

to write text from the user’s voice, that Anki has not. Thanks to this technology, there

is one type of exercise where users must pronounce a written text and Duolingo’s

technology can detect if the user pronounces it correctly. An example of this can be seen

in the Figure 5. This option can be activated/disactivated in the settings, in the “Speaking

exercises” option, as can be seen in Figure 6.

14

Figure 1 Speaking exercise of Duolingo that uses speech-to-text technology

15

Figure 2 Duolingo's settings. In green, the option to enable speaking exercises

For the purpose of the study, I have deactivated this type of exercises with Japanese,

along with the listening exercises, motivational messages, and animations. There are 2

reasons for this:

o Study time is only 7.5 hours per language, so I did not want to lose precious time

of Japanese study.

o Final results will be shown with a written vocabulary exam (not oral) that will

test 2 things:

▪ English to Japanese / Chinese character

▪ Japanese / Chinese character to English

Consequently, no active oral nor listening competence is tested. Neither do Pinyin nor

Hepburn, the phonetic transcript systems for both languages (like Arigatō for Japanese

or Nǐ hǎo for Chinese.

16

- Duolingo has a more developed and pew-reviewed content than Anki’s decks, in

general. The reasons are simple:

o Duolingo’s 100M+ downloads VS 5M+ of Anki on Google Play → higher

popularity = higher budget = higher quality content.

o Duolingo’s multiple collaborators & developers for the same, unique and

linear course VS 1 collaborator per deck (normally) without Anki’s developer’s

flashcards-checking.

o Anki’s shared decks quality criteria are subjective, as they are popularly based

on:

▪ Thumbs up VS thumbs down number and ratio.

▪ Date of publication / modification.

This can be seen in Figure 2. Evidently, there are many exceptions. Some high-

rated decks have mistakes and there are high-quality decks with little votes.

- The corpus quality of the Chinese deck is not intended to be put in comparison with the

Duolingo’s Japanese standard course. The motivation of the Chinese deck choice will

be forthcoming explained.

3.3 Motivation of the study

Nowadays it is unpopular to encourage memorization habits in language learning. What is

more, all academies and online courses constantly talk about 2 things:

- Fast

- Effortless

There are many examples: “te enseño cómo aprender chino en 8 meses” (8Belts, 2020), “¡Tú

puedes aprender chino en 6 meses!” (Babel centro de idiomas, 2017), “Aprende alemán en 7

días” (Campayo, 2014), etc.

In addition, nowadays it is less and less important to memorize things. These are some examples

of things that we can currently do without thinking:

17

- Make the Calendar remind us of events and synchronize them with a smart watch and

configure it to vibrate 5 minutes before our meeting.

- Get our tracked routes through running Apps or visited cities or restaurants through

Google Timeline.

- Get the fastest route with or without a bike and using or not public transport to anywhere.

- Select the last chapter we have seen from a series because Netflix or Amazon Prime

know it and with the exact minute and second.

So why do we need to memorize? We do know many things by heart, like our phone number

or our relatives’ birthdays. Even if we are getting worse with the passage of time, it is

worthwhile in language learning because we make the most of 100% of the time spent studying.

And this time is reduced if we implement an SRS system, as we will only review the words that

we are about to forget.

What can Anki offer that Duolingo does not? Its customizable algorithm. Users can configure

it in the settings in a way that Anki makes you review the flashcards that you are learning with

longer or shorter time periods, and you can adapt it to your memory capacities. As every person

has a different intelligence and memory faculties, this is useful to make the time spent more

efficient for everybody.

Duolingo algorithm uses the SRS as well, as it is said in the website’s section “Words”, in the

Figure 7. But this system does not work as good as Anki’s, above all regarding short term

memory, and probably in the long term as well. I have not been able to empirically demonstrate

the long-term differences in my Japanese-Chinese experiment because it has a length of 3

months and more time would be needed. Nevertheless, I did a first and short experiment to

know the short-term effects of Duolingo, to put it in comparison with Anki, before the Japanese-

Chinese experiment.

18

Figure 3 Duolingo uses SRS technology to improve long-term memory

The statement “Duolingo’s algorithms figure out when you should practice words to get them

into your long-term memory” turns out to be a bit optimistic in this experiment that I did while

learning Russian that I will call “the Russian experiment”. I started from the ground up for 5

lessons (50 points of Duolingo) and I studied 32 words in total. The Figures 8, 9 and 10 show:

1- Study day

2- 1 day after

3- 2 days after

19

Figure 4 Russian experiment, list of 32 words learnt on Day 1. All words at full “strength”

Figure 5 Russian experiment, Day 2: words list the day after studying

20

Figure 6 Russian experiment, Day 3: words list 2 days after the study

They appear in order of “Strength”. According to Duolingo, there are 4 levels of strength, each

one reflected with a bar, with 4 bars in total for the maximum level.

The figures show that 2 days after reviewing, all the words are at a half-to-total “strength” (2,

3 or 4 strength out of 4). However, the day after the study I forgot 30 of them (93%), probably

because the Cyrillic alphabet is way different than the Latin one. All in all, a better algorithm

would have marked the words with less strength, in order to review them sooner. This is better

because it is easier and more encouraging to learn new words and concepts once the easiest

ones are well understood beforehand.

Duolingo algorithm encourages the user to review when enough words of the same subject are

at 2/4 strength level or below. After 1 day, just 2 out of 30 forgotten words were at 2/4 strength,

as it can be seen in the Figure 9. Therefore, according to Duolingo’s algorithm, the day 2 of the

Russian experiment I only forgot or had to review 6% of the words. However, there was actually

a 93% of them that I should have been encouraged to review instead.

One month later, there was still one word with full strength. This is shown in the Figure 11. The

problem is that I did not remember this word even the next day of the study. It is true that the

rest of the words were already at minimum strength, but this exception shows how the algorithm

can be disproportionately mistaken.

21

Figure 7 Russian experiment, words list after 1 month

Given that Duolingo’s algorithm did not seem to work very well, I decided to study it better to

get an insight on how it exactly works and why does it does what it does. As the results were

not perfect, I preferred Anki because it is very customizable, and I used the SuperMemo 2

algorithm in it. SuperMemo 2 is based in exponential steps as well, but all cards would be asked

following this pattern, if answered correctly:

1 day > 6 days > 15 days > …

The purpose of these steps will be explained in Anki’s configuration, but what is key is that all

words that are answered wrong will be asked again immediately and cannot follow the 6 days

span if they are not answered correctly after 1 day. Bringing up the Russian experiment again,

22

it would have had more sense to review almost all the Russian words the next day, the

SuperMemo 2 way. But in order to know whether Anki or Duolingo are better in the end, it is

necessary to investigate the algorithm and configuration for both applications.

3.4 Anki’s configuration and ideal algorithm

Anki was born as a software for spaced repetition in 2006, before Quizlet, Memrise and 5 years

before Duolingo. The application, available in Windows, iOS, Linux and Android, has no

studying material incorporated once installed, unlike Duolingo, which offers multiple language

courses to choose right away. Anki uses the flashcard system to study. To start studying a deck

of flashcards, one can download a Shared Deck, available from the website, as can be seen in

Figure 12.

Figure 8 Shared decks section in Anki's website

There are available decks of 10 different languages and 10 other scientifical disciplines. These

decks are made by users and they are free. For example, if we want to learn Chinese, the deck

23

will look like in the Figure 2. From all the decks, we can choose the one we need the most,

based on users’ reviews. We can take into consideration as well the 3 examples that are visible

before the download. These examples show how the flashcards are made and may give an

insight into its quality content. For example, these examples are extremely useful if we want to

learn characters for Chinese, if we are looking for specific CEFR level flashcards, if we want

images and sounds incorporated, etc.

Once we have chosen a deck, we download and import it into the application. Of course, we

can also edit the flashcards deck after importing it from Shared Decks. In addition, we

can create a new flashcards deck with our own cards.

3.4.1 Introduction to Anki

Anki’s flashcards look like the left part of the Figure 13. First, we see the question, which in

this case has an image, a word and an audio. When we are ready to answer, we click on Show

answer and the answer appears just below, as we can see in the right part of the Figure 13.

24

Figure 9 Anki's flashcard example

Once we see the answer, we must choose between Again, Good and Easy. If we fail, we choose

Again, if we answer good Good, and if it is easy Easy. This seems obvious, but the border

between Good and Easy is important to understand as it will affect the algorithm, and

understanding the algorithm helps in the decision. The application will tell us to study each

flashcard sooner or later depending on if we chose Again, Good, Easy or Hard (this last option

is only available once we have mastered an individual flashcard, which will be shortly

explained).

25

Moreover, there is the option to study flashcards in reverse. Then, in this case we would see

just “manzana” and the answer would be the: image, apple and the sound. It is useful if we want

to study vocabulary or, for example, verb conjugations.

3.4.2 Configuration

Anki differentiates from Duolingo above all for its customization. In Duolingo all courses are

linear, and the user can only modify the courses type, that is to say, he can make unable the

listening or speaking exercises, as in the Figure 6, normally for practical reasons, as he may

have not a microphone or headphones. But Anki has many options that the user can modify, but

some knowledge of them is required to make the most of the algorithm.

I will explain only the options that have relation with the SRS, to make it as short as possible.

The first section is “New Cards”, as we can see in the Figure 14:

Figure 10 Anki's options: New Cards

26

1. Steps (in minutes): The “1 10” examples mean that, once we answer our first flashcard,

if we fail, we will see the next flashcard in 1 minute over and over again, until we get it

correct. When we answer correctly, it will be in 10 minutes. If we answer correctly after

10 minutes (the last step), the card will become a “graduated card”, or learned. Before

that, all the cards that have not yet been graduated are “learning cards”. 1 minutes and

10 minutes are just examples, as we can change these numbers or put more, like “1

10 100”, for example.

2. Graduating interval: It is the time (in days) that has to pass after all the steps are

completed, as aforementioned. Graduated cards are considered learned cards and

therefore will follow a different SRS pattern, which is shown in the Figure 15:

Figure 11 Anki's algorithm

The interval modifier is by default 100%, so it makes no difference in the final result. This

option can be seen in Figure 16 and it is adjustable. This is very important, as there are several

studies that speak about an ideal 85% correct answer ratio (Wilson et al, 2019), and according

to them, we should modify this interval in order to achieve that ratio, as I will explain when I

talk about the final ideal algorithm.

27

Figure 12 Anki's options: Reviews

All in all, by default, if we answer Good, the next time that the flashcard will appear is at:

Good button algorithm: current interval * Ease level * interval modifier

As the default interval modifier is 100% (if not modified), then we will take into account just

the ease level and the current interval, in days, months or years. In the default options, the ease

is 250%. It is important to note that learning cards, the ones that have not graduated yet, DO

NOT have an Ease yet. In fact, they get a 250% Ease once they are graduated, no matter

the number of times we have failed them before graduation.

As an example, if we answer correctly once a card is graduated (after 1 day on default options,

so 100% interval modifier and 250% ease), the next time will be 2.5 days, and we answer

correctly, the next time 6.25 days and 15.6 days, 39 days, etc. The Ease factor follows the

procedure of the diagram in the Figure 17.

28

Figure 13 Anki's Ease : Diagram

When we fail a flashcard, the ease is reduced by 20%. If it is hard, by 15%. If we answer good,

the ease is not modified. If it is very easy, the ease is incremented by 15%.

There is not a maximum Ease Factor, but there is a minimum Ease Factor of 130%. This is

because there has to be a minimum exponentiality if we answered Good after many fails

(Wozniak, 1990). For example, the minimum time with a 130% Ease after answering correctly

after 1 day would be 1.3 days, then 1.6 days, 2.1 days, 2.8, etc. The important is to keep creating

longer and longer forgetting curves. The forgetting curves will be further explained in the ideal

algorithm.

3. Easy interval: In new cards, it is a different interval from the graduation interval. If we

answer Good when learning a card before graduation, if will have the interval set in the

Graduation interval, but if we answer easy, it will be the Easy interval (by default, 4,

four times the Graduation interval).

4. Starting ease: It is the aforementioned ease of 250%. Some studies (Wozniak, 1990)

have proven that it is the best exponential progression to implement.

5. Maximum interval (Reviews section): It is the maximum amount of time that can

elapse since we study a flashcard until we will see it again. The default amount of time

is 10 years, but it can be modified to an indefinite time or reduce it. This is the main

difference with Memrise. As aforementioned, the limit in Memrise was half year, which

in the long run can be inefficient and time-consuming.

29

Now that I have explained what happens if we click / type on the Good button, I will proceed

to explain what happens if we click Again, Hard or Easy. As it can be seen in the Figure 15, the

formula is different.

Again button algorithm: current interval * new interval %. Ease -20%

The current interval of days that we have for each card is multiplied by the New interval set in

the Lapses section, in the Figure 18. The default option is 0%, which means that if we have

answered incorrectly a “learnt card”, no matter the current interval of days (10 days or even 10

years), the next time that we will see the card will be the “Steps (in minutes)” of the Lapses

section (it is important not to confuse with the Steps in the New Cards section). Then, once we

have answered correctly again, the current interval will be the New interval, which by default

is set to 0% of the previous one, that is to say, the Minimum interval. Of course, both the

Minimum interval and the New interval can be changed.

For example, in my personal configuration I have the New interval at 50%. If I failed a flashcard

with a current interval of 1.5 years, the next time I would see it would be the Steps (in minutes),

which I have configurated in 20 minutes, and the next time, if answered correctly, in 0.75 years,

and then it would continue with the Anki algorithm in the Figure 15. This modification has

helped me avoiding concentration mistakes that could made me lose big streaks. For example,

with a streak of 1.5 years, I should have already interiorized the word quite well. It could happen

that I was tired in the moment of studying and pressed “Show answer” too early, and then when

I saw the answer I could think “Of course, I knew it!” This happens frequently when using Anki

a lot. Thus, it may be worthwhile increasing the default New interval of 0% to a higher %.

Finally, the Ease of the card is reduced to 20% (reminding: default starting ease = 250% and

the minimum is 130%) but this happens ONLY if the card was already graduated.

30

Figure 14 Anki's options: Lapses

Hard button algorithm: current interval * 1.2. Ease -15%

The hard option is only available with Graduated cards. Therefore, it reduces always the

current Ease by 15%, so that we see it slightly more often, but at the same time the current

interval is multiplied by 1.2. As the minimum Ease is 130%, this means that no matter the

amount of Ease that we have, the Hard button will always increase the current interval, as the

minimum progression possible is 1*1.3*1.2 (day, ease and 1.2 (Hard) respectively) = 1, 1.56,

2.43, 3.7, 5.9, etc.

Easy button algorithm: Good * Easy Bonus. Ease +15%

The Easy button is different for New, Learning and Graduated cards. If the card is not graduated

and we press Easy, the new interval will be the Easy interval previously mentioned in the New

Cards section. In addition, it will automatically become a Graduated card. Once it is graduated

(reminding: when we have the Hard button available as well), if we press Easy, the next interval

31

will be the same as Good multiplied by the Easy Bonus in the Reviews section. The default

value is 130%, but it can be modified as well. In addition, the Ease will be increased by 15%.

3.4.3 Ideal algorithm: SuperMemo 2

Since the first study of spaced repetition in 1885 by Ebbinghaus, many researchers have tried

to find the ideal algorithm for spaced repetition, due to its benefits and the increasing number

of students using it. According to SuperMemo (henceforth SM), the users’ usage of the SRS

has been exponential, as can be seen in the Figure 19.

Figure 15 Supermemo prediction on Spaced repetition users

SM is a software program created by the Polish researcher Piotr Wozniak in 1987 (Godwin-

Jones, 2010). According to Godwin-Jones, this program was created to help learning

vocabulary following a specific pattern for how people learn and forget, the forgetting curve,

32

as Hermann Ebbinghaus firstly called it. This pattern “dictates a particular rhythm for reviewing

items to be learned until they are committed to long-term memory”. So, instead of learning a

big amount of words, it would be better to learn them 1 day, and then maybe after 4 days, 8

days, 15 days, etc.

This tries to answer the first basic question that every student pose: when should I study? Should

I do “cramming” the day before an exam, as Ebbinghaus it, or should I space the study? In an

experiment carried out by researchers from the University of California and South Florida,

about Swahili Foreign Language (FL) learning from English native speakers, Harold et al

(2007) suggested that having “too little spacing is worse than having much”.

In another experiment with Mathematics students they taught them 10 problem sessions. The

first group took them in 1 class, the second group in 2 spaced classes. After 4 weeks, the

retention was as follows in the Figure 20. These experiments could have powerful implications

in mathematics and language teachers, encouraging teachers to implement strategies to

constantly make the students review what they have learned, in a spaced interval.

Figure 16 Retention interval in Mathematics students: Massers and Spacers

In consequence of the apparent benefits of spacing the study, the forgetting curve has to be

implemented. Returning to Godwin-Jones, SuperMemo does it by “calculating when it is

necessary to review an item just before it is likely to be forgotten”. It is very important to ensure

that our study intervals will be long enough to increase our long-term memory, as suggested

33

Harold, because little spacing is bad for it. And as we want to review them before forgetting

them, the best moment to review is right before forgetting. Consequently, Wozniak created

the SM algorithm, later improved with the SM2. The formula is as follows:

SuperMemo 2 algorithm

I(1) = 1

I(2) = 6

for n>2 → I(n) = I(n-1) * EF

“I” refers to the intervals, and the numbers are the days. “EF” is the Easiness Factor (E-Factor),

the same of Anki. “n” is the number of interval repetitions. With this formula we can see that

if the 1st interval of 1 day is achieved (we have memorized the word the next day after the first

time), we should go to the 2nd interval, of 6 days. If in 6 days we remember the word, this card

uses a different formula, multiplying the interval by the E-Factor, or EF.

According to Wozniak (1990), the E-Factor reflects “the easiness of memorizing and retaining

a given item in memory” and all items should start with an EF of 2.5 and decrease depending

on our recall problems. However, the EF should not be under 1.3 because the flashcards then

would repeat too much and end up being annoying, and they would probably have flaws in their

formulation.

All in all, we can see some similarities between SuperMemo 2 and Anki. In fact, Anki’s default

Ease is 250% once a flashcard is Graduated and cannot decrease below 130%. This is exactly

the same as SM 2 because Anki’s algorithm is inspired in SM 2, as it is explained in the Anki

Manual. In addition, Anki has several steps (intervals) before the Graduating interval, and once

the cards are graduated, if the answer is Good, SM 2 follows the same formula, with the

difference of the interval modifier.

Anki answer Good → current interval * Ease level * interval modifier

34

The interval modifier is a great implementation on Anki that SM 2 does not have. Thanks to

this option, we can attain what is called the desirable difficulty (Gaspelin, 2013). According

to Gaspelin, “introducing difficulties during practice often improves memory retention”. This

would go in relation with the forgetting curve, for which we should always try to review as

later as possible. Because, as Gaspelin explains, even if the error rates will be higher due to the

increased difficulty, it will benefit later recall. And this strengthens long-term retention.

We should always try to find a balance between not too difficult that it feels like a torture and

not too easy that it feels like a chore. It must be challenging in order to keep us motivated,

focused and to try that our brain works the most.

The solution of this is a “sweet spot in which training is neither not too easy nor too hard, and

where learning progresses most quickly”, according to Wilson et al (2019). In an article for

magazine Nature, these researchers found the Eighty Five Percent Rule in neural networks

and learning. They explain that we should always try to achieve an optimal accuracy of 85%

when we study, or conversely, around 15.87% error rate.

Thanks to the interval modifier of Anki, it is possible to apply the Eighty Five Percent Rule to

our algorithm. Anki has an addon called True Retention by Card Maturity for which we can

see our current statistics on retention.

For example, in my German flashcards deck, in the Figure 21, I would like to have this True

retention (accuracy rate) of 85% on Mature Cards. Graduated cards can be young or mature,

depending on whether it has a current interval superior to 21 days (mature) or not (young). In

this case, I have an accuracy rate of 59.2% in the 380 mature reviews of this month. In order to

get an 85% accuracy, I should change the interval modifier from the current 100% (current

default value) to 31%.

35

Figure 17 True Retention by Card Maturity add-on

The 31% number is calculated thanks to the formula that follows and that can be found in the

Anki Manual. We can get the desired retention of 85% dividing its logarithm by the logarithm

of our current interval. Personally, I modify it every month, given that big modifications should

entail a significant decrease or increase (in my case, from 100 to 31 it is a huge difference) in

True retention.

Desired retention rate formula → log(desired retention%) / log(current retention%)

The next step after changing the default interval modifier taking into consideration the Eighty

Five Percent Rule, would be applying the SuperMemo 2 algorithm into the Anki settings. As I

have explained, the default Starting ease is already 250% and we should not change it. But we

could change some settings from the New cards section, in order to avoid what is popularly

called Ease Hell. My proposal is the one in the Figure 22.

36

Figure 18 Anki's New Cards settings with SuperMemo algorithm

I have changed the Steps (in minutes) of “1 10” to “15 1440 8640”. 1440 minutes equal 1 day

and 8640 equal 6 days. As the first 2 steps of the SuperMemo 2 algorithm are 1 and 6 days,

we could put them in the Steps (in minutes) this way. And then we must modify the Graduating

interval from 1 day to 15 days, in order to follow the aforementioned formula.

I(n) = I(n-1) * EF

In this case, the 3rd interval would be 15 because the previous interval was 7 (1 day (1st interval)

+ 6 days (2nd interval) = 7 days). And if we multiply it by 2.5 (250%), we get the result of 15.

Now we have the SuperMemo 2 algorithm fully incorporated in our Anki settings.

I(3) = I(7-1) * 2.5 = 6 * 2.5 = 15

To conclude, why is it that we put the first 3 intervals (15m 1d 6d) in the Steps (in minutes),

and they are so long in comparison with the default settings (1m 10m)? We would say that we

are on Ease Hell when we have hit many times Again or Hard and the Ease is 130% or similar.

This happens frequently when studying new words that we have never seen before and that are

not cognates. Let me put an example:

37

Ex. 1 (IT) mangiare → (FR) manger

Ex. 2 (DE) Anrufbeantworter → (EN) voicemail

It is evident that for a French speaker, it will be quite easy to learn the word mangiare in Italian

because the same word in French is a cognate manger. But for an English speaker, learning

Anrufbeantworter would require more time, and probably more repetitions, depending as well

on the daily load of cards that we have scheduled, and that can help us or confuse us.

For example, if we have the flashcard in the Ex. 2 right after the flashcard anrufen, which is the

verb “to call”, this would help us in the next flashcard. But other flashcards could interfere

negatively. For example, yesterday I answered too quickly what I thought it was “Contact” with

Kontakt. However, the flashcard was actually “Contract”, which in German is Vertrag. There

are strategies to avoid this kind of mistakes when studying flashcards with Anki. In this case, I

changed the English name to “conTAct”, in order to not do the same mistake again. This is

frequently a problem in flashcards, because it interferes in the algorithm because we may

not answer considering our memory spans, but to other random interferences.

Going back to the Ease Hell subject, with the Default settings, a card is considered learned

(graduated) if we have had the memory (or chance) to remember the Card 10 minutes ago (Steps

are 1 10). After that, there is a Graduating interval of 1 day and the Ease can start to decrease

or increase.

With these settings, it is probable that we will fail the card many times if we have a card like in

the German example, as the time elapsed has been very big. Consider this: in the first 2 intervals

we can see that 10 minutes (2nd) is 10 times more than 1 minute (1st). But 1 day (graduation) is

144 times 10 minutes. The progression is too high, and our memory will difficultly remember

a totally unfamiliar word if we have to recall it.

Therefore, for example if we fail the card 5 times after graduation, the Ease will be reduced

from 250% to 150% (20% each). And if we answer Good from there, the next intervals will be,

in days:

Ease 1.5 → 1 - 1.5 - 2.25 - 3.3 – 5 - 7.5 - 11.3 – 17 - 25

38

If we compare it if it had an Ease of 2.5:

Ease 2.5 → 1 – 2.5 – 6.2 – 15.6 – 39 – 97 – 244 – 610 – 1525

In the first case, we could say that the card is in an Ease Hell, because its ease has been severely

reduced before it was well-learned. If we compare both cases, the 9th time that we would learn

the 1st card would be after less than a month, while in the 2nd card it would be after more than

4 years. If we are in an Ease Hell with 1 flashcard, we will have to see it many times, and it will

be a bit annoying. For this reason, it is important to graduate cards once they are well learned.

With the SuperMemo 2 settings, if we remember a card after 6 days of not studying, we can

consider it graduated because it is already built-in in our memory.

In conclusion, with all these settings we could achieve our best performance, according to the

previous peer-reviewed research carried out by different researchers who have tried to shed

light on the best SRS flashcard implementation possible. Of course, more research will be

carried out in the future and some things may change, but for now this would be a good

implementation.

3.5 Duolingo’s configuration and algorithm

3.5.1 Introduction to Duolingo

First of all, Duolingo’s method is different from Anki’s in its basic conception. They do not use

the flashcards method, but a translation one. In order to earn the language knowledge, the user

must translate phrases, above all. There are activities where the user has to say the meaning of

an image as well, like in the Figure 23, or transcribe an audio.

39

Figure 19 Duolingo exercise example with an image

However, even if the method is different, the approach is similar. Both Anki and Duolingo use

short exercises where the user must translate little and short information. With Anki, flashcards

in order to be efficient must be short. With Duolingo, phrases must be short as well because

then the user can focus on little units of learning in each unit. And this is important for the

algorithm.

In Anki, the user uses clicks in order to answer: Again, Hard, Good or Easy. In Duolingo there

are many types of exercises: In some exercises they type the translations (Figure 24), in another

ones they choose an option from A, B or C (Figure 25), put in order some random words in

different boxes (Figure 26), transcribe an audio, pronounce a list of words, etc. I have not added

this last type of exercise in my research because it uses a speech-to-text technology that Anki

has not, but that can be very beneficial for active oral competence and to improve the accent.

40

Figure 20 Duolingo exercise: Type the translation

Figure 21 Duolingo exercise: Choose from A, B or C

41

Figure 22 Duolingo exercise: Choose words from boxes

Duolingo’s web platform is different from the mobile version. In the online version, user is

allowed to do as many units as he wants the same day. In the phone version, he has lives and if

he loses all his lives, he has to wait a certain number of hours to be able to learn again. Plus,

the mobile version has more exercises with boxes, in comparison to the web version, because

it is more time consuming and less comfortable to type with the phone (2 fingers) than with a

keyboard (10 fingers). For this reason, I have done my experiment with the computer version.

I did not want a limit of time of studying and I wanted to make the most of the 15 minutes of

study for the 15 hours in total of the experiment (7.5 each language).

This range of exercises that Duolingo has, gives dynamism and variety to the learning

process, in comparison to flashcards. What’s more, if we consider the Duolingo’s appealing

interface, images and motivational messages as well, Duolingo is apparently more engaging

than Anki.

There is another very important aspect that Duolingo has: gamification. Anki has the possibility

to incorporate many customable add-ons that implement this component as well, but Duolingo

has a special competitive aspect that Anki has not. This method captivates even more the user

due to the fact that he sees a progression in his learning. It is defined as a “playfully illustrated,

gamified design that combines point-reward incentives with implicit instruction, mastery

learning, explanations, and other best practices” (Settles, 2016).

For instance, the first thing that a new user does when installing the App or registering on the

website, is choosing a daily goal. Once he has chosen a goal, he will always see a badge of his

42

current streak, giving him motivation to keep the badge high and to stick with his 10 EXP

routine, for example.

Another gamification example is the fact that the user competes with other users if he studies

more, entering in the Silver division, or the Gold one. Moreover, he can compete with his

friends, synchronizing Duolingo with Facebook and inviting them to the App, thus being able

to compete to see who gets the longest streak.

3.5.2 Duolingo’s algorithm

In the article A Trainable Spaced Repetition Model for Language Learning, Settles explains

that Duolingo has a “gamelike implementation of mastery learning” where students have to get

some knowledge before accessing into new material. Therefore, they unlock the first basic skills

in order to move into more advanced skills, as if it was a tree that has many branches that grow

up with time, like the Figure 27.

43

Figure 23 Duolingo's skills progression tree

During the process, language courses incorporate many different words, and each one has a

lexeme tagger. Settles defines it as a “statistical NLP [Natural Language Processing] pipeline”

and it is useful for tagging, indexing and classifying the corpus. This will be very important for

the algorithm, given that each lexeme tagger (word) will appear depending on our previous

studies and results, like Anki does with each flashcard.

This lexeme tags include the minimal units of meaning. For example, for the verb étant, in

French, it would have the lexeme tag être.V.GER, for the verb être in its gerund form. This way,

all verb forms, including conjugations, number, gender, etc. have different lexeme taggers, and

44

this allows the algorithm to be more precise in the moment of reviewing words. For example,

the gerund form of the verb être, étant, could be difficult and if we fail it many times, if would

have a shorter interval than the infinitive one.

Duolingo, takes into account the lag effect, as well, as Anki, first studied by Melton (1970).

Melton observed that people learn better if the spacing gradually as time goes on. Thanks in

part to this system, Duolingo claim that 34 hours of study with their model is equivalent to a

full university semester of Spanish as a FL (Vesselinov & Grego, 2012).

With the lag effect progression, they incorporate all the lexeme taggers into a student model.

And this model captures what the student learn and how he can recall all the information at any

given time.

And there it comes the biggest difference between Anki and Duolingo’s algorithm: Duolingo

uses the vast amount of student data of all its users. As Settles explains it in the article “just

two weeks of data is plenty given the number of users, number of tests, number of languages to

train our models.” In How we learn how you learn (2016) he says: “At the core of Duolingo is

a student model that tracks statistics about every word we've ever taught you: for example, how

often you've seen a word, remembered it correctly, and so on. (This is a huge database with

billions of entries that get updated 3,000 times per second!)”

All this can be used to “empirically train richer statistical models”. On the one hand, Anki uses

only information from each individual user. On the other hand, Duolingo uses all the data that

they have from the more than 100M students, in order to perfectionate its predictability. They

call this method the Half-Life Regression, or HLR.

The HLR is an SRS algorithm which is trainable, taking into account “modern machine learning

techniques”. With this, they collect all of their students data, and that helps in the future

personalization of the learning system. This system analyzes the error patterns of all the learners

to predict the “half-life” for every specific word in a unique long-term memory (VentureBeat,

2019). That is to say, the HLR figures out what you are struggling with and what material you

should target. Since HLR implementation, Duolingo experienced a 12% boost in user

engagement, according to Settles.

However, HLR was not the first method that Duolingo used. As Settlers explains, in the

beginning, Duolingo used a similar variant of the Leitner System, from 1972. It was a spaced

45

repetition algorithm for flashcards, and thus the intervals increased or decreased depending on

student performance, as can be seen in the Figure 28.

Figure 24 Leitner SRS with exponential boxes

The Leitner System was not like the HLR, which uses big data from millions of users. It was

more like the Anki’s algorithm, where each flashcard increased or decreased its interval

depending of the accuracy of the individual study. All the flashcards were in different boxes,

depending of the current interval, as can be seen in the previous figure.

The Half-Life Regression is a totally new approach. It is inspired by the Leitner System and the

Pimsleur method in 1967, and it intends to perfectionate them. According to Settlers, they

carried out 2 user experiments with 12 millions students of Duolingo, and they analyzed which

of the 3 algorithms worked better. The results showed that with the HLR algorithm they

improved the user’s performance and retention, in comparison with the other methods. It was

“more accurate at predicting student recall rates”.

The HLR is also inspired by the forgetting curve of Ebbinghaus (1885). Ebbinghaus formula is

as follows in the Figure 29, where p is the probability of correctly recalling an item, ∆ is the

lag time since the last time we studied the card, and h is the strength in the learner’s long-term

memory.

46

Figure 25 Forgetting curve (Ebbinghaus, 1885)

With this formula, if the result p = 1 (100% chance of recalling), it means that we have just

learned the word, as the time would ∆ = 0. If p = 0.5 we would be on the verge of forgetting it

(50% chance of recalling), and if p = 0 we would have probably forgotten the item.

With the HLR implementation in 2016, Duolingo wanted to improve the learning experience

of the user, and that required and update of the forgetting curve, the Leitner System and the

Pimsleur method. Thanks to the incorporation of the aforementioned machine learning and

user’s data, they got the formula of the Figure 30.

Figure 26 Duolingo's final loss function algorithm

In general terms, this algorithm works as the Ebbinghaus, but it predicts the forgetting curve of

millions of student answers as well. Thanks to this, they can detect which lexeme tags (about

20k in total) are more difficult to remember and which words will be more difficult to learn.

Following the German example of Anki, Duolingo’s algorithm would insist more often in

studying Anrufbeantworter (Voicemail) for English learners in comparison to mangiare

(manger – eat) for French learners of Italian.

Therefore, for every language, each individual word has a unique lexeme tag with an inherent

difficulty. This degree of difficulty forms part of the formula, along with:

- Number of times a student has seen the word

- Number of times it was correctly and incorrectly recalled

- If last answer was correct.

47

All this helps the model to make more personalized predictions.

As Blurr Settles explains: “the difficulty of the words, the grammar, and the way we present it

to you in the test, all play a role to pick the exact configuration so that in less than five minutes

we have a really good sense of where you’re going to start the course (…). We can inject what

you need to keep practicing, exactly when you need it.”

3.6 Chinese and Japanese Experiment

In order to know if Anki with the SuperMemo 2 algorithm or Duolingo work better, I have done

an experiment to know the performance of both in the short and the mid-term. To do so, I have

spent 7.5h in Duolingo learning Japanese and 7.5h in Anki learning Chinese, with the SM 2

settings. Given that the time that I had to do the experiment (3 months) was not very big, I

decided to study for this time period a maximum of 15 minutes of each language during the

days that I have studied. These days I have done both 15 minutes of Japanese and Chinese, in

order to have the same spaced study for both and different elapsed time variables.

The deck that I have chosen to study Chinese is from Shared decks and I chose the one named

“Most Common 3000 Chinese Hanzi Characters”. There are several reasons for which I have

chosen this deck:

- It is the one with more upvotes among the ones focused to English speakers.

- It has characters and gifs that show the way how characters should be written.

- It has sounds for each character.

- All the characters are divided in difficulty levels (HSK 1-6), so I could focus on the easy

ones.

48

In the case of Japanese with Duolingo, I have done the standard course and I have done the

following units, trying always to reach the maximum level of the first units in order to review

them if needed (the golden unit would appear broken if so) before advancing into new

content.

49

The purpose of the study was to study just 2 things:

1. Characters (JAP + CH) → English

2. English → Characters (JAP + CH)

So, no alphabetic transcription has been tested (pinyin for Chinese nor Hepburn for Japanese),

nor the sounds or pronunciation. Therefore, I have deactivated the Sound and Speaking

exercises from the Duolingo’s settings, to not lose time with Japanese’s characters study. The

objective was to examine myself in the end to know with which application I have fully learned

more Characters.

I will say the results in total number of cards (front + reverse), that is to say, the total number

of characters added up to their English translation:

- Anki → Chinese total number of cards studied: 80

o Correct answers: 66

▪ Character to EN → 36

▪ EN to Character → 30

50

o Incorrect answers: 14

- Duolingo → Japanese total number of cards studied: 140

o Correct answers: 27

▪ Character to EN → 18

▪ EN to Character → 9

o Incorrect answers: 113

In both cases we can see that the Character to EN answers have been slightly easier, as it is

evident. Generally, it is easier to remember when the answer is your native language, above all

if the opposite is a foreign language with different alphabetic system.

Results have shown that I have correctly remembered 82% of words in Chinese (in total) and

19% of words in Japanese (in total). This proves that in this experiment, Anki’s SuperMemo

2 algorithm configuration has worked better than Duolingo’s algorithm in short-term

memorization, even though I have studied a greater number of Japanese characters (140 JAP in

comparison to 80 CH).

Nevertheless, I believe that, as Duolingo’s algorithm works better as time goes on, as it learns

more and more from each individual users, results would be different if I continued this study

for more time. Consequently, it would make me review more often, given that, during the

experiment, I have had to review little. And with Anki it has been the opposite, hence the results.

In addition, given that I have done the experiment myself, I have had to do the experiment with

2 different languages. An ideal experiment would include more people divided into 2 groups,

where each group would learn from the ground up the SAME Chinese or Japanese

flashcards with Duolingo and Anki and would be tested in the end who has fully remembered

the most with identical time. For example, there could be a group who followed the Japanese

course with Duolingo and the other group should study the same Duolingo’s characters in the

same order but in flashcards, with Anki and with the SM 2 algorithm. This would be very

revealing and interesting for further research and SRS experimentation.

3.7 Pros and cons of Duolingo VS Anki

51

Duolingo has a big advantage in comparison to Anki: its huge database, perfectionated day

by day by its own AI and multiple contributors. However, this AI has to be improved yet: “It’s

not always clear to tease out from the signal we get back what the cause was. There’s a lot more

AI to do”, says Settles (Wired, 2018). As an example of this statement, in the Figure 31 we can

see the Japanese translation for “drink”, but the translation of “alcohol” is also considered

another correct solution. Therefore, if I would have answered “alcohol” in Japanese instead of

“drink” in Japanese, the answer would have been correct as well. Thus, the algorithm would

increase the interval, even if I did not remember the exact answer.

Figure 27 Duolingo accepts other correct solutions

On the one hand, an important inconvenient is that boxes answers may be too easy to answer,

because the solution is already there, as we can see in the Figure 31 as well. Consequently, we

may know the answer for other reasons than our true knowledge of it. Following this example,

there come to my mind 3 possible reasons for which I could answer correctly but at the same

time not being able to write it on a paper:

1. Chance (33.3% at random choice).

2. Discard the other options because we already know them or because they are not

familiar (in comparison to the word that we are currently learning, if it actually was

familiar).

52

3. We remember just a part of the solution (for example, 1 of the 3 characters of the

Figure 31 but that do not coincide with the other 2 options).

With this, Duolingo’s algorithm could think that we know a word because we answered good,

and therefore increase its interval. Nevertheless, maybe we just got it right by chance and we

should not increase the interval. We can see another example in the Figure 32.

Figure 28 Duolingo exercise: pairing

In this exercise, the user has to pair the Japanese characters with its Hepburn phonetic

transcription. However, there is a high % of randomness of results in this type of exercise,

increased by the fact that we can hear the audio if we click on the characters. As a consequence,

users may hear first the Japanese characters and then choose the phonetic transcription, but then

it would be too easy. Therefore, this type of exercise could be useless depending on how the

user chooses how to answer, if hearing at the solutions first or trying in the first place.

Besides, it would be good to know if this type of exercise increases / decreases the interval

the in the same degree as another exercise, like typing. It would have more sense to give

typing exercises more value than an exercise like the Figure 33, where the solution is obvious,

as you can hear the sound.

53

Figure 29 Duolingo exercise: choose 1, 2 or 3

On the other hand, the gamification aspect of Duolingo is a great aspect of the application and

it increases user’s engagement, compromise and sense of competitiveness if he or she learns

with other friends. Anki does have some add-ons where user can choose sounds for correct and

incorrect answers, set a limit of time for each answer, etc. These are good features as well.

Furthermore, in my Chinese and Japanese experiment Anki’s algorithm has proven to work

better in items retention in the short-term and mid-term than Duolingo. In addition, the

Russian experiment showed as well that there was a problem with Duolingo’s perception of my

forgetting curve. However, it is difficult to reach a conclusion about which one is better because

a longer study should be carried out, with more students and with the same language.

Probably the results would differ, as Duolingo’s Half-Life Regression algorithm works better

as time goes on, adapting itself to each student’s results and answers. In addition, Japanese has

10M learners, in comparison to the 26M English speakers who learn Spanish. Therefore, the

algorithm would be better if more people studied the Japanese course, because there would be

more data results.

There are many things that Duolingo could learn from Anki’s customization and transparency

of the algorithm. They could attract a lot of committed and thourough students if some options

were implemented, so that for example a minimum starting interval was implemented, like 1

day. I believe that in the Japanese experiment it would have been a great implementation and it

would have changed dramatically my retention results. Apparently, Duolingo expected that I

had a longer forgetting curve than I actually have.

54

Finally, another interesting feature that could be added would be to have a database like Anki’s

where users had a notion of the current interval of their words and units. Something as simple

as the current interval or the next expected interval would be a good implementation, to let the

users know how many of their cards are already in their long-term memory, if they will have to

review a lot of units the next day or, on the contrary, if they will have to review little.

55

4 Conclusions

On the one hand, before starting the TFG, I was convinced that Duolingo’s algorithm had no

SRS system at all, and when I have got an insight of how it works, I have realized that it has a

lot of potential as well. The interface is appealing, the gamification is entertaining, the

settings are clear and the algorithm has a deep investigation behind. Duolingo works

better than I expected and now I really consider using it for some particular purposes.

For example, I would like to keep using it to keep learning Chinese, but I would not use it to

learn German, as I already have many flashcards in Anki I would not like to review some words

2 times. Maybe if Duolingo incorporated in its Words list a feature to delete some of the words

that we have already learned very well, I would use it. I think that it is an awesome platform

full of potential. However, some implementations like this one would attract people who want

to make the most of their learning time.

On the other hand, Anki still seems to have a more trustworthy algorithm short-term and

mid-term performance-wise. With the results of my shorts experiments, Duolingo’s algorithm

have not worked ideally, and I have missed some statistics to see my current intervals and

how Duolingo expects my forgetting curve to be, in order to understand why my retention has

been so low. It would be interesting to add some features from Anki, in order to better track our

pace and progression in Language Learning.

Another good implementation would be adding the Eighty Five Percent Rule formula, in order

to look always for that 85% accuracy rate in reviews. This is proven to be motivating for

students because it is that perfect “sweet spot” in language learning retention, and that

encourages students and improves its memory performance and faculties in the long run. What’s

more, the ideal would be to have a customizable interval modifier like Anki, in order to look

for a 90-100% accuracy rate if we wanted to increase our performance before an exam,

for example.

On reflection it is interesting to see how little has been implemented in education, with this

technology available for everyone. Harold et al (2007) mention how compressed learning

programs with short time spans are flourishing, like immersion learning periods or summer

crash courses. Of course, cramming can be a good strategy at some point, and in practical terms

56

it can be more economical, but teachers could then give these resources to ensure that the

information learned is retrieved in the long run.

Personally, I have used Anki since 2017 to keep all the words, idioms and conjugations I have

learned since. This have been useful for all the crash courses I have done it for all my language

classes. This have been very useful, as for now 95% of my French deck of 13000 flashcards are

mature. None of my university colleagues during my education did know about this technology,

except for 1, who used Memrise.

In my 4 years of education in Applied Languages at the University Pompeu Fabra, with 2

Erasmus exchange programs in the University of Liverpool and ISIT Institute in Paris, and with

4 years of language academies attendance, no language teachers provided to students

spacing tools nor apparently had knowledge of them. In Liverpool, some Italian teachers

used Quizlet as a method of learning, but it had no SRS technology, nor did they know about

this powerful feature.

I find three reasons for this lack of awareness. First, ignorance, as spaced repetition and Anki

are still not popular because they are new, and few people know that Duolingo has a Spaced

repetition technology incorporated. Second, the amount of effort to put in in order to:

- Master flashcards creation

- Having powerful, innovative and well-built decks with images, colors, add-ons, etc.

- Time to educate students in the correct use of the program and their implementation in

their routine.

Thirdly, and the one that I find the most important, the educational system. Deadline exams

force short-term and mid-term memorization, as crash courses last maybe 1 month and

semesters max. 5 months. Therefore, the implementation of an SRS would only be short-term

and mid-term, which could be worthwhile as well but not idealistic with a SRS. For example,

if we have to do just 1 semester of Physics in the first year of our 3-year Computer Engineering

degree, most of the students will stop reviewing the formulas and the problem solutions of their

flashcards once they stop because it will not be useful for them in the future.

Let me put the last example. One teacher in Texas wants to implement SRS flashcards of Anki

with the Supermemo 2 configuration to their Spanish students of first year of Middle School

(12 years old) who will do Spanish for their first time in their life. These students would do

Spanish until they finish High School (18 years) and they all have a PC and a smartphone

57

(unlikely but idealistic in this case). In order to perfectly implement Anki’s decks in students,

it would be important to motivate them in their use and in their long-term benefits. If this teacher

does classes to his students for 1 year, he may have enough time to prove the benefits to the

students that use it.

However, this teacher idealistically should be their teacher for the next years or implement the

SRS of Anki and decks with all the other Spanish teachers of the same school, in order to keep

the students engaged with the program, so that they keep getting Anki new cards homework

and having to review them often. They should keep using it to make the most of the long-term

benefits of retrieving flashcards.

For this reason, the educational system and the cooperation of other teachers, the organization

and even the parents, is key. A lot of effort must be done to change a paradigm and a universal

method. New technologies like the SRS can be powerful tools and with time I think that we

will incorporate them into our daily routines.

A lot of research has still to be carried out, above all considering long-term benefits of an SRS,

the study of grammar with flashcards and the incorporation of AI and machine learning into the

SRS. However, a lot of research has already proven that SRS is beneficial for long-term

memory, the Eighty Five Percent Rule stimulates users engagement and motivation and the

forgetting curve has to be taken into account in order to increase retention. Hence, I think that

it is worthwhile to implement them as soon as possible in our educational systems and day-to-

day lives.

58

5 Webography

8Belts. (2021). Te enseño cómo aprender chino en 8 meses. 08.06.2021, de 8Belts Sitio web:

https://w.8belts.com/aprender-chino/

A.W. Melton. 1970. The situation with respect to the spacing of repetitions and memory.

Journal of Verbal Learning and Verbal Behavior, 9:596–606.

Anki. (2021). About Anki. 21/04/2021, de Anki Sitio web: https://apps.ankiweb.net/

Anki. (2021). Introduction. Deck Options. Reviews. 28.05.2021, de Anki Sitio web:

https://docs.ankiweb.net/#/

AnkiDroid Open Source Team. (2021). Tarjetas AnkiDroid. 08/04/2021, de Google Play Sitio

web: https://play.google.com/store/apps/details?id=com.ichi2.anki&hl=es&gl=US

AnkiWeb. (2021). Shared Decks. 08.06.2021, de Anki Sitio web:

https://ankiweb.net/shared/decks/

Anonymous. (2015). Most Common 3000 Chinese Hanzi Characters. 16.03.2021, de AnkiWeb

Sitio web: https://ankiweb.net/shared/info/39888802

Aroline E. Seibert Hanson, Christina M. Brown. (2019). Enhancing L2 learning through a

mobile assisted spaced-repetition tool: an effective but bitter pill?. 09.06.2021, de

Taylor Francis Online Sitio web:

https://www.tandfonline.com/doi/abs/10.1080/09588221.2018.1552975

59

Babel. (2021). Tú puedes aprender chino en 6 meses …¡y lo sabes!. 08.06.2021, de Babel Sitio

web: https://www.babelidiomas.es/tu-puedes-aprender-chino-en-6-meses-y-lo-sabes/

Burr Settles, Brendan Meeder. (2016). A Trainable Spaced Repetition Model for Language

Learning. 20/04/2021, de Association for Computational Linguistics Sitio web:

https://www.aclweb.org/anthology/P16-1174.pdf

Burr Settles. (2016). How we learn how you learn. 09.06.2021, de Duolingo blog Sitio web:

https://blog.duolingo.com/how-we-learn-how-you-learn/

Cynthya Peranandam. (2018). AI Helps Duolingo Personalize Language Learning. 09.06.2021,

de Wired Sitio web: https://www.wired.com/brandlab/2018/12/ai-helps-duolingo-

personalize-language-learning/

Denyze Toffoli, Laurent Perrot. (2019). Autonomy, the Online Informal Learning of English

(OILE) and Learning Resource Centers (LRCs): The Relationships Between Learner

Autonomy, L2 Proficiency, L2 Autonomy and Digital Literacy. 26.05.2021, de HAL

Sitio web: https://hal.archives-ouvertes.fr/hal-02332599/document

Duolingo. (2021). Duolingo - Aprende inglés y otros idiomas gratis. 08/04/2021, de Google

Play Sitio web:

https://play.google.com/store/apps/details?id=com.duolingo&hl=es&gl=US

H. Ebbinghaus. 1885. Memory: A Contribution to Experimental Psychology. Teachers College,

Columbia University, New York, NY, USA.

60

Harold Pashler, Dough Rohrer et al. (2007). Enhancing learning and retarding forgetting:

Choices and consequences. 27.05.2021, de Psychonomic Society, Inc. Sitio web:

http://thesciencenetwork.org/docs/BrainsRUs/Enhancing%20Learning_Pashler.pdf

Memrise. (2021). How does the spaced repetition system work?. 21/03/2021, de Memrise Sitio

web: https://memrise.zendesk.com/hc/en-us/articles/360015889057-How-does-the-

spaced-repetition-system-work-

Memrise. (2021). Memrise: Fun & Fast Language Learning App. 08/04/2021, de Google Play

Sitio web:

https://play.google.com/store/apps/details?id=com.memrise.android.memrisecompanio

n&hl=es&gl=US

Michael B. Horn, Heather Staker. (2011). The Rise of K–12 Blended Learning. 26.05.2021, de

Aurora Institute Sitio web: https://aurora-institute.org/wp-content/uploads/The-Rise-of-

K-12-Blended-Learning.pdf

Paul Sawers. (2019). How Duolingo is using AI to humanize virtual language lessons.

09.06.2021, de VentureBeat Sitio web: https://venturebeat.com/2019/07/05/how-

duolingo-is-using-ai-to-humanize-virtual-language-lessons/

Piotr Wozniak. (1998). Application of a computer to improve the results obtained in working

with the SuperMemo method. 27.05.2021, de SuperMemo Sitio web:

https://www.supermemo.com/en/archives1990-2015/english/ol/sm2

Piotr Wozniak. (2018). Exponential adoption of spaced repetition. 27.05.2021, de Supermemo

Sitio web: https://supermemo.guru/wiki/Exponential_adoption_of_spaced_repetition

61

Quizlet Inc.. (2021). Quizlet: Aprende con fichas educativas. 08/04/2021, de Google Play Sitio

web:

https://play.google.com/store/apps/details?id=com.quizlet.quizletandroid&hl=es&gl=

US

Ramón Campayo. (2021). Aprende alemán en 7 días (Autoayuda y superación). 08.06.2021, de

Amazon Sitio web: https://www.amazon.es/Aprende-alem%C3%A1n-en-7-

d%C3%ADas/dp/8408131672

Richard C. Bailey. (2011). Internet-Based Spaced Repetition Learning In and Out of the

Classroom: Increasing Independent Student Use. 26.05.2021, de Asia University Sitio

web: https://core.ac.uk/download/pdf/72791536.pdf

Robert C. Wilson, Amitai Shenhav et al. (2019). The Eighty Five Percent Rule for optimal

learning. 27.05.2021, de Nature Sitio web: https://www.nature.com/articles/s41467-

019-12552-4

Robert Godwin-Jones. (2010). Emerging technologies from memory palaces to spacing

algorithms: approaches to second-language vocabulary learning. 27.05.2021, de

Virginia Commonwealth University Sitio web:

https://scholarspace.manoa.hawaii.edu/bitstream/10125/44208/14_02_emerging.pdf

Roumen Vesselinov. (2012). Duolingo'Effectiveness'Study . 08.06.2021, de Duolingo Sitio

web: http://static.duolingo.com/s3/DuolingoReport_Final.pdf

duolingo optimization of the spaced repetition system to

Documents