duolingo optimization of the spaced repetition system to
TRANSCRIPT
1
Treball de Fi de Grau
Duolingo – optimization of the
Spaced Repetition System to improve
long-term memorization
Fernando Muley Vilamu
Tutora: Maria del Carme Colominas Ventura
Seminari 104: 21589
Curs 2020-2021
2
Abstract
While Duolingo is the most popular online language learning platform and mobile application,
the popularity of the Anki Flashcards application has grown thanks in part to its algorithm. Both
applications use Spaced Repetition System (SRS), but only Anki offers customizable options
and features to modify it, in order to memorize vocabulary better (Seibert, 2019). Today, it is
the third most used application with SRS.
The objective of the study is to illustrate how the SuperMemo 2 algorithm (Wozniak, 1998) has
worked better than Duolingo’s in the short and mid-term. Through a personal learning
experiment with Japanese and Chinese, using Anki and Duolingo, results have shown that
applying the algorithm and adjusting the interval modifier to achieve the Eighty Five Percent
Rule (Wilson, 2019) can be beneficial for short and mid-term retention. This has predicted
better my forgetting curve (Ebbinghaus, 1885). Further research with bigger groups could be
interesting to study long-term benefits of the algorithm.
Glossary
Flashcards, Spaced Repetition System, lag effect, ease
3
Index
1. Introduction ................................................................................................................................... 5
2. An insight into Autonomous Language Learning ...................................................................... 7
3. Analysis........................................................................................................................................... 9
3.1 Why Anki and Duolingo? ...................................................................................................... 10
3.2 Preliminary assumptions ....................................................................................................... 13
3.3 Motivation of the study ......................................................................................................... 16
3.4 Anki’s configuration and ideal algorithm .............................................................................. 22
3.4.1 Introduction to Anki ...................................................................................................... 23
3.4.2 Configuration ................................................................................................................. 25
3.4.3 Ideal algorithm: SuperMemo 2 ...................................................................................... 31
3.5 Duolingo’s configuration and algorithm ............................................................................... 38
3.5.1 Introduction to Duolingo ............................................................................................... 38
3.5.2 Duolingo’s algorithm..................................................................................................... 42
3.6 Chinese and Japanese Experiment ......................................................................................... 47
3.7 Pros and cons of Duolingo VS Anki ..................................................................................... 50
4 Conclusions .................................................................................................................................. 55
5 Webography ................................................................................................................................. 58
4
Figures Index
Figure 1 Speaking exercise of Duolingo that uses speech-to-text technology ......................... 14
Figure 2 Duolingo's settings. In green, the option to enable speaking exercises ..................... 15
Figure 3 Duolingo uses SRS technology to improve long-term memory ................................ 18
Figure 4 Russian experiment, list of 32 words learnt on Day 1. All words at full “strength” . 19
Figure 5 Russian experiment, Day 2: words list the day after studying ................................... 19
Figure 6 Russian experiment, Day 3: words list 2 days after the study ................................... 20
Figure 7 Russian experiment, words list after 1 month ............................................................ 21
Figure 8 Shared decks section in Anki's website ..................................................................... 22
Figure 9 Anki's flashcard example ........................................................................................... 24
Figure 10 Anki's options: New Cards ...................................................................................... 25
Figure 11 Anki's algorithm ....................................................................................................... 26
Figure 12 Anki's options: Reviews ........................................................................................... 27
Figure 13 Anki's Ease : Diagram.............................................................................................. 28
Figure 14 Anki's options: Lapses ............................................................................................. 30
Figure 15 Supermemo prediction on Spaced repetition users .................................................. 31
Figure 16 Retention interval in Mathematics students: Massers and Spacers ......................... 32
Figure 17 True Retention by Card Maturity add-on ................................................................ 35
Figure 18 Anki's New Cards settings with SuperMemo algorithm .......................................... 36
Figure 19 Duolingo exercise example with an image .............................................................. 39
Figure 20 Duolingo exercise: Type the translation .................................................................. 40
Figure 21 Duolingo exercise: Choose from A, B or C ............................................................. 40
Figure 22 Duolingo exercise: Choose words from boxes ........................................................ 41
Figure 23 Duolingo's skills progression tree ............................................................................ 43
Figure 24 Leitner SRS with exponential boxes ........................................................................ 45
Figure 25 Forgetting curve (Ebbinghaus, 1885) ...................................................................... 46
Figure 26 Duolingo's final loss function algorithm .................................................................. 46
Figure 27 Duolingo accepts other correct solutions ................................................................. 51
Figure 28 Duolingo exercise: pairing ....................................................................................... 52
Figure 29 Duolingo exercise: choose 1, 2 or 3 ......................................................................... 53
5
1. Introduction
To master a language, memorize and retain it, the Spaced Repetition System (henceforth
SRS) is key to make the time spent revising vocabulary more efficient in the long term. This
is because of the lag effect, for which people learn anything better if the time of study is
gradually increased over time (Melton, 1970).
For example, we save time if, when we have learnt the word Apfel (“apple” in German), we test
ourselves the next day. And if we answer correctly, then after 4 days, and then 10 days, and
then 23 days, etc. This is called an exponential spaced repetition system. Some applications use
it, and the most popular ones nowadays were developed 10-15 years ago. This system is key in
language learning.
The SRS is a good method to increase the review spans in the long term in order to make the
time spent reviewing efficient. Why efficient? Because we want to use the minimal amount
of time reviewing to not forget the words previously learnt. Therefore, we should try to
review all the flashcards right before we forget them. Consequently, we will have more
time to learn new words. For this reason, it is key to have an application / program with a
very good SRS algorithm that can detect accurately our memory spans. Several mobile
applications and programs use this SRS, but with different algorithms and configuration.
With the objective of illustrating the advantages of a good algorithm in a SRS application, I
have done an experiment to contrast the learning process of 2 different languages, Chinese
and Japanese. Until the experiment day, I did not have any knowledge of them, apart from 3
basic typical expressions like “Nǐ hǎo” for Chinese or “Arigatō” for Japanese.
The experiment consists of the study of both languages for 7.5 hours each, of 15 hours in total,
for a length of 3 months with a maximum day routine of 15 minutes each, so 30 days in
total (30d * 15m = 7.5h). The days that I have studied I have spent the exact amount of time
for both languages, whereas the days that I have not studied I have studied nothing of both, thus
avoiding differences and variables that could interfere the memory spans and algorithms. I have
done Japanese with Duolingo and Chinese with Anki. Both Duolingo and Anki use the SRS,
but in different ways. This difference is the reason of this study as I try to find out which one is
the best.
The results of the Chinese and Japanese experiment have shown that Anki works better in short-
term and mid-term study (reviews with 82% of accuracy in comparison to 19% of Duolingo).
6
SuperMemo 2 algorithm has been more useful for short-term retention in comparison to
Duolingo’s algorithm, which learns from user’s mistakes, but that in this case has not correctly
predicted my retention of words.
On the whole, the objective of the study is to proof that the Anki system, when properly
customized with a good algorithm, in this case the one called SuperMemo 2, which will be
further explained, can be more time-effective in the cards memorization process than the
Duolingo algorithm regarding vocabulary cards (Seibert, 2019). Besides, further investigation
should be carried out in order to demonstrate whether Anki and flashcards could be useful for
learning grammar or not.
In conclusion, Duolingo seems to have a good SRS and a correct algorithm, according to several
scientifical articles carried out by its research team. Nevertheless, it has not proven its short and
mid-term efficacy, but more research should be carried out in order to prove its long-term
benefits, as Duolingo’s SRS works better if the user spends more time using the application.
Anki’s SRS with the SuperMemo 2 algorithm and the Eight Five Percent Rule (Wilson et al,
2019) has proven to work good for short-term retention intervals in the Chinese experiment.
The early retrieval of words from the very beginning allows the user to properly retain words.
This assures that practically no new words are learned before having reviewed the previously
learned ones, and this has not happened with Duolingo. All in all, this is proven to be key for
user’s motivation, engagement and memorization optimization. Therefore, some features and
settings of Anki could be good incorporated into Duolingo’s algorithm and configuration in
order to have similar results.
7
2. An insight into Autonomous Language Learning
Since the beginning of the century, online language learning has gained a lot of popularity.
According to the Aurora Institute, an American organization who works on new education
systems for high-quality learning in secondary students, the number of K-12 students (who have
18 years old) who took online courses grew from 45.000 to more than 3 million from 2000 to
2009 (Horn et al, 2011). With this increasing demand on good online resources, we have seen
the rise of a lot of language learning platforms.
Consequently, this creates new opportunities, available for everyone who has access to internet,
and allow learners to take online courses, most of the times for free. In addition, the widespread
of high-speed internet and the current COVID 19 pandemic has proven that there are many
things that can be done at home. Learning autonomy is a consequence of it.
Learner autonomy is identified as “the skills that learners develop in order to learn effectively
on their own” (Toffoli et al, 2019). According to the same author, Candice Stefanou et al
grouped these autonomy skills into 3 categories in a study in 2004: organizational, procedural
and cognitive autonomy.
Organizational autonomy consists in the own responsibility to choose dates and deadlines.
Procedural autonomy includes the methodological skills, the preferred media and tools.
Finally, the cognitive autonomy concerns how learners analyze the problems, check results
and think about the learning process.
Organizational and cognitive autonomies are available or acquirable for everyone who wants
to follow an online learning method (procedural autonomy). But there is one fundamental point
to take into consideration regarding the process: digital literacy. According to Toffoli, in order
to follow an online method it is a mandatory requirement that students have the technical skills
and online technology knowledge in order to access to it. Digital literacy will affect the online
language learning and the future proficiency. Besides, the motivation of the student will play a
key role.
Let me put a real example that relates to the use of Anki in a university class of Japan. The
teacher Richard C. Bailey (2011) wanted to promote autonomy in his students and introduced
Anki to them for 1 year. Then, he sent them a deck of flashcards in order to memorize some
English words. In the first semester, the usage of the application was very low. He identified 2
problems:
8
1. Students did not know how to use Anki and therefore could not use it independently.
2. Students did not understand how Anki could help them learn because they had not
experienced correct use of the program.
In the 2nd semester, he decided to do an extra formation to the students, in order to give them
all the knowledge to understand how to make the most of it and use it independently. He told
them as well how Anki could be helpful for their learning. After this additional formation, he
saw a dramatic increase of the use of Anki, and some students started using the program outside
of the classroom as well.
As can be seen in the Figure 1, the blue number of repetitions corresponds to the first semester
and the red ones to the second semester. This proves that digital literacy is a key aspect in the
procedural autonomy and therefore in autonomous language learning.
Figure 1 Bailey (2011) Students number of repetitions of Anki
9
3. Analysis
After 2 months, I have studied Chinese with the top-rated cards deck from Anki’s website. I
found it among the Shared decks section, which are made and uploaded by users. The deck I
have chosen is called “Most Common 3000 Chinese Hanzi Characters”, with 80 votes (75
upvotes VS 5 downvotes), as can be seen in the Figure 2. There are 3 decks with more votes,
but they are focused for Chinese speakers who learn English. And the aforementioned deck is
more interesting because it is specifically made for English speakers, like the Japanese
Duolingo course.
Figure 2 In red, the top-rated decks from AnkiWeb Chinese shared decks collection. In yellow, the top-rated Chinese deck
focused on English speakers
In the case of Japanese, I have studied the Duolingo Japanese standard course for English
speakers. It does not exist a Japanese course for Spanish speakers, but even if there was one, I
would have chosen the same. The reason is that it is better to avoid possible variables that could
affect in the objective of the study and its further analysis, given that I have a C1 level of English
according to the CEFR, but a native level of Spanish.
10
3.1 Why Anki and Duolingo?
The application that does not have an SRS, like Quizlet (the 2nd most popular flashcard
application), have the risk of causing a tedious and boring review time for its users because they
have problems differentiating between easy and difficult cards, as there is no exponentiality in
language learning (like it would with 1-4-10-23… days). Therefore, as users do not know which
words they should spend time reviewing, one possible consequence is that they may only use it
to learn but not to review.
This application may end up as an application for “casual learners” because the review aspect
is not correctly developed and people use it above all to learn new words, like a dictionary, but
with a better interface. Thus, they lose potential users, due to the lack of engagement in the long
term.
So why have I chosen both Duolingo and Anki? On the one hand, Duolingo is the most
popular language learning platform application so far, according to the number of
downloads on Google Play, with more than 100M+. On the other hand, Anki is the 3rd most
used flashcards application with 5M+ downloads, after Memrise (1st with 20M+) and Quizlet
(2nd with 10M+). Besides, Anki is the 2nd one using an SRS, after Memrise.
Why Anki and not Memrise, the 1st flashcard application and with SRS? According to
Memrise website’s FAQ, the review time (if the answer is correct) has 8 steps. In addition, each
incorrect answer starts the card over to 4 hours > 12 hours > 24 hours, etc.
4 hours > 12 hours > 24 hours > 6 days > 12 days > 48 days > 96 days > 6 months > 6
months > 6 months > 6 months…
Although these Spaced Repetition timings are not as efficient as Anki’s, as I will further explain
in the analysis, the first 8 steps have a good and clear progression. The problem comes to the
last one, not because of the time (6 months), but because it is the last one. That is to say, the
next review after 6 months will be 6 months over and over again, if the answer is correct.
In my opinion, this is not efficient in the long term. Let me put an example. I started studying
French 4 years ago with Anki. Up to now, I have studied already 12000 flashcards, but 7000 of
11
them (more than 55% altogether) have an interval of 6.1+ months, as it can be seen in the Figure
3. That means that, if I had studied them with Memrise, all those 7k words would have an
interval of just 6 months.
Figure 3 Anki database: the total number of flashcards with a time interval of +6 months is more than half of total
For example, the 2 ones with the longest interval are of +10 years. My last review of both was
in march, but the previous one was on 2019, 2 years ago, as it can be seen in Figure 4.
12
Figure 4 Anki database: flashcards with longest interval
In these 2 cases, the interval grew to 10 years because:
- The previous time I studied the cards was 2 years ago.
- It had a high Ease %.
- I answered Easy.
All these options will be explained in detail throughout the analysis, and how they affect the
algorithm. The point is, why should I learn, after 180 days, words that I studied 2 years ago
and I still remember? Our brain remembers exponentially, with no limit of time. If you let me
put an oversimplified example, we will not forget our mother’s name if we are 180 days with
no contact with her, nor our Spanish phone number if we have just used an American one for
one year because we moved to California. Consequently, this algorithm is not time-efficient,
thus a “waste of time”, considering that we could use this time to learn new words.
This is very important, because massed practice (like studying the day before the exam), aka
cramming, can be useful as well, of course. In opposition with spaced repetition, studying for a
lot of hours, repeating many times the same words and reviewing them more than necessary
will indeed increase our knowledge and memory of them. However, Ebbinghaus (1885), in the
13
first study of spaced repetition, proved that spaced repetition required less total time than
cramming in an experiment on the memorization of verbal utterances. That is to say,
Ebbinghaus showed 2 centuries ago that studying with an exponential progression required
less amount of time in the long run memorization.
Finally, on the one hand we could say that Quizlet is focused on short-term learners, given that
in 2020 they decided to stop using SRS. On the other hand, Memrise looks for mid-term
learners, as the spaced repetition progression ends its exponentiality once it arrives at 6 months.
For this reason, Anki is better for long-term memorization, as I will show in the analysis.
3.2 Preliminary assumptions
This investigation tries to shed light on the advantages of the SRS in the memorization process
of a language. To do it, Anki, a Flashcard Application with a customizable algorithm, is
compared with Duolingo. In order to better illustrate the scope and focus of this study, I will
mention some previous assumptions in order to show that these study does not intend to tackle
them:
- Duolingo can be MORE useful to have a degree of active oral competence, thanks
to a speech-to-text technology that it has integrated. That is to say, a technology capable
to write text from the user’s voice, that Anki has not. Thanks to this technology, there
is one type of exercise where users must pronounce a written text and Duolingo’s
technology can detect if the user pronounces it correctly. An example of this can be seen
in the Figure 5. This option can be activated/disactivated in the settings, in the “Speaking
exercises” option, as can be seen in Figure 6.
14
Figure 1 Speaking exercise of Duolingo that uses speech-to-text technology
15
Figure 2 Duolingo's settings. In green, the option to enable speaking exercises
For the purpose of the study, I have deactivated this type of exercises with Japanese,
along with the listening exercises, motivational messages, and animations. There are 2
reasons for this:
o Study time is only 7.5 hours per language, so I did not want to lose precious time
of Japanese study.
o Final results will be shown with a written vocabulary exam (not oral) that will
test 2 things:
▪ English to Japanese / Chinese character
▪ Japanese / Chinese character to English
Consequently, no active oral nor listening competence is tested. Neither do Pinyin nor
Hepburn, the phonetic transcript systems for both languages (like Arigatō for Japanese
or Nǐ hǎo for Chinese.
16
- Duolingo has a more developed and pew-reviewed content than Anki’s decks, in
general. The reasons are simple:
o Duolingo’s 100M+ downloads VS 5M+ of Anki on Google Play → higher
popularity = higher budget = higher quality content.
o Duolingo’s multiple collaborators & developers for the same, unique and
linear course VS 1 collaborator per deck (normally) without Anki’s developer’s
flashcards-checking.
o Anki’s shared decks quality criteria are subjective, as they are popularly based
on:
▪ Thumbs up VS thumbs down number and ratio.
▪ Date of publication / modification.
This can be seen in Figure 2. Evidently, there are many exceptions. Some high-
rated decks have mistakes and there are high-quality decks with little votes.
- The corpus quality of the Chinese deck is not intended to be put in comparison with the
Duolingo’s Japanese standard course. The motivation of the Chinese deck choice will
be forthcoming explained.
3.3 Motivation of the study
Nowadays it is unpopular to encourage memorization habits in language learning. What is
more, all academies and online courses constantly talk about 2 things:
- Fast
- Effortless
There are many examples: “te enseño cómo aprender chino en 8 meses” (8Belts, 2020), “¡Tú
puedes aprender chino en 6 meses!” (Babel centro de idiomas, 2017), “Aprende alemán en 7
días” (Campayo, 2014), etc.
In addition, nowadays it is less and less important to memorize things. These are some examples
of things that we can currently do without thinking:
17
- Make the Calendar remind us of events and synchronize them with a smart watch and
configure it to vibrate 5 minutes before our meeting.
- Get our tracked routes through running Apps or visited cities or restaurants through
Google Timeline.
- Get the fastest route with or without a bike and using or not public transport to anywhere.
- Select the last chapter we have seen from a series because Netflix or Amazon Prime
know it and with the exact minute and second.
So why do we need to memorize? We do know many things by heart, like our phone number
or our relatives’ birthdays. Even if we are getting worse with the passage of time, it is
worthwhile in language learning because we make the most of 100% of the time spent studying.
And this time is reduced if we implement an SRS system, as we will only review the words that
we are about to forget.
What can Anki offer that Duolingo does not? Its customizable algorithm. Users can configure
it in the settings in a way that Anki makes you review the flashcards that you are learning with
longer or shorter time periods, and you can adapt it to your memory capacities. As every person
has a different intelligence and memory faculties, this is useful to make the time spent more
efficient for everybody.
Duolingo algorithm uses the SRS as well, as it is said in the website’s section “Words”, in the
Figure 7. But this system does not work as good as Anki’s, above all regarding short term
memory, and probably in the long term as well. I have not been able to empirically demonstrate
the long-term differences in my Japanese-Chinese experiment because it has a length of 3
months and more time would be needed. Nevertheless, I did a first and short experiment to
know the short-term effects of Duolingo, to put it in comparison with Anki, before the Japanese-
Chinese experiment.
18
Figure 3 Duolingo uses SRS technology to improve long-term memory
The statement “Duolingo’s algorithms figure out when you should practice words to get them
into your long-term memory” turns out to be a bit optimistic in this experiment that I did while
learning Russian that I will call “the Russian experiment”. I started from the ground up for 5
lessons (50 points of Duolingo) and I studied 32 words in total. The Figures 8, 9 and 10 show:
1- Study day
2- 1 day after
3- 2 days after
19
Figure 4 Russian experiment, list of 32 words learnt on Day 1. All words at full “strength”
Figure 5 Russian experiment, Day 2: words list the day after studying
20
Figure 6 Russian experiment, Day 3: words list 2 days after the study
They appear in order of “Strength”. According to Duolingo, there are 4 levels of strength, each
one reflected with a bar, with 4 bars in total for the maximum level.
The figures show that 2 days after reviewing, all the words are at a half-to-total “strength” (2,
3 or 4 strength out of 4). However, the day after the study I forgot 30 of them (93%), probably
because the Cyrillic alphabet is way different than the Latin one. All in all, a better algorithm
would have marked the words with less strength, in order to review them sooner. This is better
because it is easier and more encouraging to learn new words and concepts once the easiest
ones are well understood beforehand.
Duolingo algorithm encourages the user to review when enough words of the same subject are
at 2/4 strength level or below. After 1 day, just 2 out of 30 forgotten words were at 2/4 strength,
as it can be seen in the Figure 9. Therefore, according to Duolingo’s algorithm, the day 2 of the
Russian experiment I only forgot or had to review 6% of the words. However, there was actually
a 93% of them that I should have been encouraged to review instead.
One month later, there was still one word with full strength. This is shown in the Figure 11. The
problem is that I did not remember this word even the next day of the study. It is true that the
rest of the words were already at minimum strength, but this exception shows how the algorithm
can be disproportionately mistaken.
21
Figure 7 Russian experiment, words list after 1 month
Given that Duolingo’s algorithm did not seem to work very well, I decided to study it better to
get an insight on how it exactly works and why does it does what it does. As the results were
not perfect, I preferred Anki because it is very customizable, and I used the SuperMemo 2
algorithm in it. SuperMemo 2 is based in exponential steps as well, but all cards would be asked
following this pattern, if answered correctly:
1 day > 6 days > 15 days > …
The purpose of these steps will be explained in Anki’s configuration, but what is key is that all
words that are answered wrong will be asked again immediately and cannot follow the 6 days
span if they are not answered correctly after 1 day. Bringing up the Russian experiment again,
22
it would have had more sense to review almost all the Russian words the next day, the
SuperMemo 2 way. But in order to know whether Anki or Duolingo are better in the end, it is
necessary to investigate the algorithm and configuration for both applications.
3.4 Anki’s configuration and ideal algorithm
Anki was born as a software for spaced repetition in 2006, before Quizlet, Memrise and 5 years
before Duolingo. The application, available in Windows, iOS, Linux and Android, has no
studying material incorporated once installed, unlike Duolingo, which offers multiple language
courses to choose right away. Anki uses the flashcard system to study. To start studying a deck
of flashcards, one can download a Shared Deck, available from the website, as can be seen in
Figure 12.
Figure 8 Shared decks section in Anki's website
There are available decks of 10 different languages and 10 other scientifical disciplines. These
decks are made by users and they are free. For example, if we want to learn Chinese, the deck
23
will look like in the Figure 2. From all the decks, we can choose the one we need the most,
based on users’ reviews. We can take into consideration as well the 3 examples that are visible
before the download. These examples show how the flashcards are made and may give an
insight into its quality content. For example, these examples are extremely useful if we want to
learn characters for Chinese, if we are looking for specific CEFR level flashcards, if we want
images and sounds incorporated, etc.
Once we have chosen a deck, we download and import it into the application. Of course, we
can also edit the flashcards deck after importing it from Shared Decks. In addition, we
can create a new flashcards deck with our own cards.
3.4.1 Introduction to Anki
Anki’s flashcards look like the left part of the Figure 13. First, we see the question, which in
this case has an image, a word and an audio. When we are ready to answer, we click on Show
answer and the answer appears just below, as we can see in the right part of the Figure 13.
24
Figure 9 Anki's flashcard example
Once we see the answer, we must choose between Again, Good and Easy. If we fail, we choose
Again, if we answer good Good, and if it is easy Easy. This seems obvious, but the border
between Good and Easy is important to understand as it will affect the algorithm, and
understanding the algorithm helps in the decision. The application will tell us to study each
flashcard sooner or later depending on if we chose Again, Good, Easy or Hard (this last option
is only available once we have mastered an individual flashcard, which will be shortly
explained).
25
Moreover, there is the option to study flashcards in reverse. Then, in this case we would see
just “manzana” and the answer would be the: image, apple and the sound. It is useful if we want
to study vocabulary or, for example, verb conjugations.
3.4.2 Configuration
Anki differentiates from Duolingo above all for its customization. In Duolingo all courses are
linear, and the user can only modify the courses type, that is to say, he can make unable the
listening or speaking exercises, as in the Figure 6, normally for practical reasons, as he may
have not a microphone or headphones. But Anki has many options that the user can modify, but
some knowledge of them is required to make the most of the algorithm.
I will explain only the options that have relation with the SRS, to make it as short as possible.
The first section is “New Cards”, as we can see in the Figure 14:
Figure 10 Anki's options: New Cards
26
1. Steps (in minutes): The “1 10” examples mean that, once we answer our first flashcard,
if we fail, we will see the next flashcard in 1 minute over and over again, until we get it
correct. When we answer correctly, it will be in 10 minutes. If we answer correctly after
10 minutes (the last step), the card will become a “graduated card”, or learned. Before
that, all the cards that have not yet been graduated are “learning cards”. 1 minutes and
10 minutes are just examples, as we can change these numbers or put more, like “1
10 100”, for example.
2. Graduating interval: It is the time (in days) that has to pass after all the steps are
completed, as aforementioned. Graduated cards are considered learned cards and
therefore will follow a different SRS pattern, which is shown in the Figure 15:
Figure 11 Anki's algorithm
The interval modifier is by default 100%, so it makes no difference in the final result. This
option can be seen in Figure 16 and it is adjustable. This is very important, as there are several
studies that speak about an ideal 85% correct answer ratio (Wilson et al, 2019), and according
to them, we should modify this interval in order to achieve that ratio, as I will explain when I
talk about the final ideal algorithm.
27
Figure 12 Anki's options: Reviews
All in all, by default, if we answer Good, the next time that the flashcard will appear is at:
Good button algorithm: current interval * Ease level * interval modifier
As the default interval modifier is 100% (if not modified), then we will take into account just
the ease level and the current interval, in days, months or years. In the default options, the ease
is 250%. It is important to note that learning cards, the ones that have not graduated yet, DO
NOT have an Ease yet. In fact, they get a 250% Ease once they are graduated, no matter
the number of times we have failed them before graduation.
As an example, if we answer correctly once a card is graduated (after 1 day on default options,
so 100% interval modifier and 250% ease), the next time will be 2.5 days, and we answer
correctly, the next time 6.25 days and 15.6 days, 39 days, etc. The Ease factor follows the
procedure of the diagram in the Figure 17.
28
Figure 13 Anki's Ease : Diagram
When we fail a flashcard, the ease is reduced by 20%. If it is hard, by 15%. If we answer good,
the ease is not modified. If it is very easy, the ease is incremented by 15%.
There is not a maximum Ease Factor, but there is a minimum Ease Factor of 130%. This is
because there has to be a minimum exponentiality if we answered Good after many fails
(Wozniak, 1990). For example, the minimum time with a 130% Ease after answering correctly
after 1 day would be 1.3 days, then 1.6 days, 2.1 days, 2.8, etc. The important is to keep creating
longer and longer forgetting curves. The forgetting curves will be further explained in the ideal
algorithm.
3. Easy interval: In new cards, it is a different interval from the graduation interval. If we
answer Good when learning a card before graduation, if will have the interval set in the
Graduation interval, but if we answer easy, it will be the Easy interval (by default, 4,
four times the Graduation interval).
4. Starting ease: It is the aforementioned ease of 250%. Some studies (Wozniak, 1990)
have proven that it is the best exponential progression to implement.
5. Maximum interval (Reviews section): It is the maximum amount of time that can
elapse since we study a flashcard until we will see it again. The default amount of time
is 10 years, but it can be modified to an indefinite time or reduce it. This is the main
difference with Memrise. As aforementioned, the limit in Memrise was half year, which
in the long run can be inefficient and time-consuming.
29
Now that I have explained what happens if we click / type on the Good button, I will proceed
to explain what happens if we click Again, Hard or Easy. As it can be seen in the Figure 15, the
formula is different.
Again button algorithm: current interval * new interval %. Ease -20%
The current interval of days that we have for each card is multiplied by the New interval set in
the Lapses section, in the Figure 18. The default option is 0%, which means that if we have
answered incorrectly a “learnt card”, no matter the current interval of days (10 days or even 10
years), the next time that we will see the card will be the “Steps (in minutes)” of the Lapses
section (it is important not to confuse with the Steps in the New Cards section). Then, once we
have answered correctly again, the current interval will be the New interval, which by default
is set to 0% of the previous one, that is to say, the Minimum interval. Of course, both the
Minimum interval and the New interval can be changed.
For example, in my personal configuration I have the New interval at 50%. If I failed a flashcard
with a current interval of 1.5 years, the next time I would see it would be the Steps (in minutes),
which I have configurated in 20 minutes, and the next time, if answered correctly, in 0.75 years,
and then it would continue with the Anki algorithm in the Figure 15. This modification has
helped me avoiding concentration mistakes that could made me lose big streaks. For example,
with a streak of 1.5 years, I should have already interiorized the word quite well. It could happen
that I was tired in the moment of studying and pressed “Show answer” too early, and then when
I saw the answer I could think “Of course, I knew it!” This happens frequently when using Anki
a lot. Thus, it may be worthwhile increasing the default New interval of 0% to a higher %.
Finally, the Ease of the card is reduced to 20% (reminding: default starting ease = 250% and
the minimum is 130%) but this happens ONLY if the card was already graduated.
30
Figure 14 Anki's options: Lapses
Hard button algorithm: current interval * 1.2. Ease -15%
The hard option is only available with Graduated cards. Therefore, it reduces always the
current Ease by 15%, so that we see it slightly more often, but at the same time the current
interval is multiplied by 1.2. As the minimum Ease is 130%, this means that no matter the
amount of Ease that we have, the Hard button will always increase the current interval, as the
minimum progression possible is 1*1.3*1.2 (day, ease and 1.2 (Hard) respectively) = 1, 1.56,
2.43, 3.7, 5.9, etc.
Easy button algorithm: Good * Easy Bonus. Ease +15%
The Easy button is different for New, Learning and Graduated cards. If the card is not graduated
and we press Easy, the new interval will be the Easy interval previously mentioned in the New
Cards section. In addition, it will automatically become a Graduated card. Once it is graduated
(reminding: when we have the Hard button available as well), if we press Easy, the next interval
31
will be the same as Good multiplied by the Easy Bonus in the Reviews section. The default
value is 130%, but it can be modified as well. In addition, the Ease will be increased by 15%.
3.4.3 Ideal algorithm: SuperMemo 2
Since the first study of spaced repetition in 1885 by Ebbinghaus, many researchers have tried
to find the ideal algorithm for spaced repetition, due to its benefits and the increasing number
of students using it. According to SuperMemo (henceforth SM), the users’ usage of the SRS
has been exponential, as can be seen in the Figure 19.
Figure 15 Supermemo prediction on Spaced repetition users
SM is a software program created by the Polish researcher Piotr Wozniak in 1987 (Godwin-
Jones, 2010). According to Godwin-Jones, this program was created to help learning
vocabulary following a specific pattern for how people learn and forget, the forgetting curve,
32
as Hermann Ebbinghaus firstly called it. This pattern “dictates a particular rhythm for reviewing
items to be learned until they are committed to long-term memory”. So, instead of learning a
big amount of words, it would be better to learn them 1 day, and then maybe after 4 days, 8
days, 15 days, etc.
This tries to answer the first basic question that every student pose: when should I study? Should
I do “cramming” the day before an exam, as Ebbinghaus it, or should I space the study? In an
experiment carried out by researchers from the University of California and South Florida,
about Swahili Foreign Language (FL) learning from English native speakers, Harold et al
(2007) suggested that having “too little spacing is worse than having much”.
In another experiment with Mathematics students they taught them 10 problem sessions. The
first group took them in 1 class, the second group in 2 spaced classes. After 4 weeks, the
retention was as follows in the Figure 20. These experiments could have powerful implications
in mathematics and language teachers, encouraging teachers to implement strategies to
constantly make the students review what they have learned, in a spaced interval.
Figure 16 Retention interval in Mathematics students: Massers and Spacers
In consequence of the apparent benefits of spacing the study, the forgetting curve has to be
implemented. Returning to Godwin-Jones, SuperMemo does it by “calculating when it is
necessary to review an item just before it is likely to be forgotten”. It is very important to ensure
that our study intervals will be long enough to increase our long-term memory, as suggested
33
Harold, because little spacing is bad for it. And as we want to review them before forgetting
them, the best moment to review is right before forgetting. Consequently, Wozniak created
the SM algorithm, later improved with the SM2. The formula is as follows:
SuperMemo 2 algorithm
I(1) = 1
I(2) = 6
for n>2 → I(n) = I(n-1) * EF
“I” refers to the intervals, and the numbers are the days. “EF” is the Easiness Factor (E-Factor),
the same of Anki. “n” is the number of interval repetitions. With this formula we can see that
if the 1st interval of 1 day is achieved (we have memorized the word the next day after the first
time), we should go to the 2nd interval, of 6 days. If in 6 days we remember the word, this card
uses a different formula, multiplying the interval by the E-Factor, or EF.
According to Wozniak (1990), the E-Factor reflects “the easiness of memorizing and retaining
a given item in memory” and all items should start with an EF of 2.5 and decrease depending
on our recall problems. However, the EF should not be under 1.3 because the flashcards then
would repeat too much and end up being annoying, and they would probably have flaws in their
formulation.
All in all, we can see some similarities between SuperMemo 2 and Anki. In fact, Anki’s default
Ease is 250% once a flashcard is Graduated and cannot decrease below 130%. This is exactly
the same as SM 2 because Anki’s algorithm is inspired in SM 2, as it is explained in the Anki
Manual. In addition, Anki has several steps (intervals) before the Graduating interval, and once
the cards are graduated, if the answer is Good, SM 2 follows the same formula, with the
difference of the interval modifier.
Anki answer Good → current interval * Ease level * interval modifier
34
The interval modifier is a great implementation on Anki that SM 2 does not have. Thanks to
this option, we can attain what is called the desirable difficulty (Gaspelin, 2013). According
to Gaspelin, “introducing difficulties during practice often improves memory retention”. This
would go in relation with the forgetting curve, for which we should always try to review as
later as possible. Because, as Gaspelin explains, even if the error rates will be higher due to the
increased difficulty, it will benefit later recall. And this strengthens long-term retention.
We should always try to find a balance between not too difficult that it feels like a torture and
not too easy that it feels like a chore. It must be challenging in order to keep us motivated,
focused and to try that our brain works the most.
The solution of this is a “sweet spot in which training is neither not too easy nor too hard, and
where learning progresses most quickly”, according to Wilson et al (2019). In an article for
magazine Nature, these researchers found the Eighty Five Percent Rule in neural networks
and learning. They explain that we should always try to achieve an optimal accuracy of 85%
when we study, or conversely, around 15.87% error rate.
Thanks to the interval modifier of Anki, it is possible to apply the Eighty Five Percent Rule to
our algorithm. Anki has an addon called True Retention by Card Maturity for which we can
see our current statistics on retention.
For example, in my German flashcards deck, in the Figure 21, I would like to have this True
retention (accuracy rate) of 85% on Mature Cards. Graduated cards can be young or mature,
depending on whether it has a current interval superior to 21 days (mature) or not (young). In
this case, I have an accuracy rate of 59.2% in the 380 mature reviews of this month. In order to
get an 85% accuracy, I should change the interval modifier from the current 100% (current
default value) to 31%.
35
Figure 17 True Retention by Card Maturity add-on
The 31% number is calculated thanks to the formula that follows and that can be found in the
Anki Manual. We can get the desired retention of 85% dividing its logarithm by the logarithm
of our current interval. Personally, I modify it every month, given that big modifications should
entail a significant decrease or increase (in my case, from 100 to 31 it is a huge difference) in
True retention.
Desired retention rate formula → log(desired retention%) / log(current retention%)
The next step after changing the default interval modifier taking into consideration the Eighty
Five Percent Rule, would be applying the SuperMemo 2 algorithm into the Anki settings. As I
have explained, the default Starting ease is already 250% and we should not change it. But we
could change some settings from the New cards section, in order to avoid what is popularly
called Ease Hell. My proposal is the one in the Figure 22.
36
Figure 18 Anki's New Cards settings with SuperMemo algorithm
I have changed the Steps (in minutes) of “1 10” to “15 1440 8640”. 1440 minutes equal 1 day
and 8640 equal 6 days. As the first 2 steps of the SuperMemo 2 algorithm are 1 and 6 days,
we could put them in the Steps (in minutes) this way. And then we must modify the Graduating
interval from 1 day to 15 days, in order to follow the aforementioned formula.
I(n) = I(n-1) * EF
In this case, the 3rd interval would be 15 because the previous interval was 7 (1 day (1st interval)
+ 6 days (2nd interval) = 7 days). And if we multiply it by 2.5 (250%), we get the result of 15.
Now we have the SuperMemo 2 algorithm fully incorporated in our Anki settings.
I(3) = I(7-1) * 2.5 = 6 * 2.5 = 15
To conclude, why is it that we put the first 3 intervals (15m 1d 6d) in the Steps (in minutes),
and they are so long in comparison with the default settings (1m 10m)? We would say that we
are on Ease Hell when we have hit many times Again or Hard and the Ease is 130% or similar.
This happens frequently when studying new words that we have never seen before and that are
not cognates. Let me put an example:
37
Ex. 1 (IT) mangiare → (FR) manger
Ex. 2 (DE) Anrufbeantworter → (EN) voicemail
It is evident that for a French speaker, it will be quite easy to learn the word mangiare in Italian
because the same word in French is a cognate manger. But for an English speaker, learning
Anrufbeantworter would require more time, and probably more repetitions, depending as well
on the daily load of cards that we have scheduled, and that can help us or confuse us.
For example, if we have the flashcard in the Ex. 2 right after the flashcard anrufen, which is the
verb “to call”, this would help us in the next flashcard. But other flashcards could interfere
negatively. For example, yesterday I answered too quickly what I thought it was “Contact” with
Kontakt. However, the flashcard was actually “Contract”, which in German is Vertrag. There
are strategies to avoid this kind of mistakes when studying flashcards with Anki. In this case, I
changed the English name to “conTAct”, in order to not do the same mistake again. This is
frequently a problem in flashcards, because it interferes in the algorithm because we may
not answer considering our memory spans, but to other random interferences.
Going back to the Ease Hell subject, with the Default settings, a card is considered learned
(graduated) if we have had the memory (or chance) to remember the Card 10 minutes ago (Steps
are 1 10). After that, there is a Graduating interval of 1 day and the Ease can start to decrease
or increase.
With these settings, it is probable that we will fail the card many times if we have a card like in
the German example, as the time elapsed has been very big. Consider this: in the first 2 intervals
we can see that 10 minutes (2nd) is 10 times more than 1 minute (1st). But 1 day (graduation) is
144 times 10 minutes. The progression is too high, and our memory will difficultly remember
a totally unfamiliar word if we have to recall it.
Therefore, for example if we fail the card 5 times after graduation, the Ease will be reduced
from 250% to 150% (20% each). And if we answer Good from there, the next intervals will be,
in days:
Ease 1.5 → 1 - 1.5 - 2.25 - 3.3 – 5 - 7.5 - 11.3 – 17 - 25
38
If we compare it if it had an Ease of 2.5:
Ease 2.5 → 1 – 2.5 – 6.2 – 15.6 – 39 – 97 – 244 – 610 – 1525
In the first case, we could say that the card is in an Ease Hell, because its ease has been severely
reduced before it was well-learned. If we compare both cases, the 9th time that we would learn
the 1st card would be after less than a month, while in the 2nd card it would be after more than
4 years. If we are in an Ease Hell with 1 flashcard, we will have to see it many times, and it will
be a bit annoying. For this reason, it is important to graduate cards once they are well learned.
With the SuperMemo 2 settings, if we remember a card after 6 days of not studying, we can
consider it graduated because it is already built-in in our memory.
In conclusion, with all these settings we could achieve our best performance, according to the
previous peer-reviewed research carried out by different researchers who have tried to shed
light on the best SRS flashcard implementation possible. Of course, more research will be
carried out in the future and some things may change, but for now this would be a good
implementation.
3.5 Duolingo’s configuration and algorithm
3.5.1 Introduction to Duolingo
First of all, Duolingo’s method is different from Anki’s in its basic conception. They do not use
the flashcards method, but a translation one. In order to earn the language knowledge, the user
must translate phrases, above all. There are activities where the user has to say the meaning of
an image as well, like in the Figure 23, or transcribe an audio.
39
Figure 19 Duolingo exercise example with an image
However, even if the method is different, the approach is similar. Both Anki and Duolingo use
short exercises where the user must translate little and short information. With Anki, flashcards
in order to be efficient must be short. With Duolingo, phrases must be short as well because
then the user can focus on little units of learning in each unit. And this is important for the
algorithm.
In Anki, the user uses clicks in order to answer: Again, Hard, Good or Easy. In Duolingo there
are many types of exercises: In some exercises they type the translations (Figure 24), in another
ones they choose an option from A, B or C (Figure 25), put in order some random words in
different boxes (Figure 26), transcribe an audio, pronounce a list of words, etc. I have not added
this last type of exercise in my research because it uses a speech-to-text technology that Anki
has not, but that can be very beneficial for active oral competence and to improve the accent.
40
Figure 20 Duolingo exercise: Type the translation
Figure 21 Duolingo exercise: Choose from A, B or C
41
Figure 22 Duolingo exercise: Choose words from boxes
Duolingo’s web platform is different from the mobile version. In the online version, user is
allowed to do as many units as he wants the same day. In the phone version, he has lives and if
he loses all his lives, he has to wait a certain number of hours to be able to learn again. Plus,
the mobile version has more exercises with boxes, in comparison to the web version, because
it is more time consuming and less comfortable to type with the phone (2 fingers) than with a
keyboard (10 fingers). For this reason, I have done my experiment with the computer version.
I did not want a limit of time of studying and I wanted to make the most of the 15 minutes of
study for the 15 hours in total of the experiment (7.5 each language).
This range of exercises that Duolingo has, gives dynamism and variety to the learning
process, in comparison to flashcards. What’s more, if we consider the Duolingo’s appealing
interface, images and motivational messages as well, Duolingo is apparently more engaging
than Anki.
There is another very important aspect that Duolingo has: gamification. Anki has the possibility
to incorporate many customable add-ons that implement this component as well, but Duolingo
has a special competitive aspect that Anki has not. This method captivates even more the user
due to the fact that he sees a progression in his learning. It is defined as a “playfully illustrated,
gamified design that combines point-reward incentives with implicit instruction, mastery
learning, explanations, and other best practices” (Settles, 2016).
For instance, the first thing that a new user does when installing the App or registering on the
website, is choosing a daily goal. Once he has chosen a goal, he will always see a badge of his
42
current streak, giving him motivation to keep the badge high and to stick with his 10 EXP
routine, for example.
Another gamification example is the fact that the user competes with other users if he studies
more, entering in the Silver division, or the Gold one. Moreover, he can compete with his
friends, synchronizing Duolingo with Facebook and inviting them to the App, thus being able
to compete to see who gets the longest streak.
3.5.2 Duolingo’s algorithm
In the article A Trainable Spaced Repetition Model for Language Learning, Settles explains
that Duolingo has a “gamelike implementation of mastery learning” where students have to get
some knowledge before accessing into new material. Therefore, they unlock the first basic skills
in order to move into more advanced skills, as if it was a tree that has many branches that grow
up with time, like the Figure 27.
43
Figure 23 Duolingo's skills progression tree
During the process, language courses incorporate many different words, and each one has a
lexeme tagger. Settles defines it as a “statistical NLP [Natural Language Processing] pipeline”
and it is useful for tagging, indexing and classifying the corpus. This will be very important for
the algorithm, given that each lexeme tagger (word) will appear depending on our previous
studies and results, like Anki does with each flashcard.
This lexeme tags include the minimal units of meaning. For example, for the verb étant, in
French, it would have the lexeme tag être.V.GER, for the verb être in its gerund form. This way,
all verb forms, including conjugations, number, gender, etc. have different lexeme taggers, and
44
this allows the algorithm to be more precise in the moment of reviewing words. For example,
the gerund form of the verb être, étant, could be difficult and if we fail it many times, if would
have a shorter interval than the infinitive one.
Duolingo, takes into account the lag effect, as well, as Anki, first studied by Melton (1970).
Melton observed that people learn better if the spacing gradually as time goes on. Thanks in
part to this system, Duolingo claim that 34 hours of study with their model is equivalent to a
full university semester of Spanish as a FL (Vesselinov & Grego, 2012).
With the lag effect progression, they incorporate all the lexeme taggers into a student model.
And this model captures what the student learn and how he can recall all the information at any
given time.
And there it comes the biggest difference between Anki and Duolingo’s algorithm: Duolingo
uses the vast amount of student data of all its users. As Settles explains it in the article “just
two weeks of data is plenty given the number of users, number of tests, number of languages to
train our models.” In How we learn how you learn (2016) he says: “At the core of Duolingo is
a student model that tracks statistics about every word we've ever taught you: for example, how
often you've seen a word, remembered it correctly, and so on. (This is a huge database with
billions of entries that get updated 3,000 times per second!)”
All this can be used to “empirically train richer statistical models”. On the one hand, Anki uses
only information from each individual user. On the other hand, Duolingo uses all the data that
they have from the more than 100M students, in order to perfectionate its predictability. They
call this method the Half-Life Regression, or HLR.
The HLR is an SRS algorithm which is trainable, taking into account “modern machine learning
techniques”. With this, they collect all of their students data, and that helps in the future
personalization of the learning system. This system analyzes the error patterns of all the learners
to predict the “half-life” for every specific word in a unique long-term memory (VentureBeat,
2019). That is to say, the HLR figures out what you are struggling with and what material you
should target. Since HLR implementation, Duolingo experienced a 12% boost in user
engagement, according to Settles.
However, HLR was not the first method that Duolingo used. As Settlers explains, in the
beginning, Duolingo used a similar variant of the Leitner System, from 1972. It was a spaced
45
repetition algorithm for flashcards, and thus the intervals increased or decreased depending on
student performance, as can be seen in the Figure 28.
Figure 24 Leitner SRS with exponential boxes
The Leitner System was not like the HLR, which uses big data from millions of users. It was
more like the Anki’s algorithm, where each flashcard increased or decreased its interval
depending of the accuracy of the individual study. All the flashcards were in different boxes,
depending of the current interval, as can be seen in the previous figure.
The Half-Life Regression is a totally new approach. It is inspired by the Leitner System and the
Pimsleur method in 1967, and it intends to perfectionate them. According to Settlers, they
carried out 2 user experiments with 12 millions students of Duolingo, and they analyzed which
of the 3 algorithms worked better. The results showed that with the HLR algorithm they
improved the user’s performance and retention, in comparison with the other methods. It was
“more accurate at predicting student recall rates”.
The HLR is also inspired by the forgetting curve of Ebbinghaus (1885). Ebbinghaus formula is
as follows in the Figure 29, where p is the probability of correctly recalling an item, ∆ is the
lag time since the last time we studied the card, and h is the strength in the learner’s long-term
memory.
46
Figure 25 Forgetting curve (Ebbinghaus, 1885)
With this formula, if the result p = 1 (100% chance of recalling), it means that we have just
learned the word, as the time would ∆ = 0. If p = 0.5 we would be on the verge of forgetting it
(50% chance of recalling), and if p = 0 we would have probably forgotten the item.
With the HLR implementation in 2016, Duolingo wanted to improve the learning experience
of the user, and that required and update of the forgetting curve, the Leitner System and the
Pimsleur method. Thanks to the incorporation of the aforementioned machine learning and
user’s data, they got the formula of the Figure 30.
Figure 26 Duolingo's final loss function algorithm
In general terms, this algorithm works as the Ebbinghaus, but it predicts the forgetting curve of
millions of student answers as well. Thanks to this, they can detect which lexeme tags (about
20k in total) are more difficult to remember and which words will be more difficult to learn.
Following the German example of Anki, Duolingo’s algorithm would insist more often in
studying Anrufbeantworter (Voicemail) for English learners in comparison to mangiare
(manger – eat) for French learners of Italian.
Therefore, for every language, each individual word has a unique lexeme tag with an inherent
difficulty. This degree of difficulty forms part of the formula, along with:
- Number of times a student has seen the word
- Number of times it was correctly and incorrectly recalled
- If last answer was correct.
47
All this helps the model to make more personalized predictions.
As Blurr Settles explains: “the difficulty of the words, the grammar, and the way we present it
to you in the test, all play a role to pick the exact configuration so that in less than five minutes
we have a really good sense of where you’re going to start the course (…). We can inject what
you need to keep practicing, exactly when you need it.”
3.6 Chinese and Japanese Experiment
In order to know if Anki with the SuperMemo 2 algorithm or Duolingo work better, I have done
an experiment to know the performance of both in the short and the mid-term. To do so, I have
spent 7.5h in Duolingo learning Japanese and 7.5h in Anki learning Chinese, with the SM 2
settings. Given that the time that I had to do the experiment (3 months) was not very big, I
decided to study for this time period a maximum of 15 minutes of each language during the
days that I have studied. These days I have done both 15 minutes of Japanese and Chinese, in
order to have the same spaced study for both and different elapsed time variables.
The deck that I have chosen to study Chinese is from Shared decks and I chose the one named
“Most Common 3000 Chinese Hanzi Characters”. There are several reasons for which I have
chosen this deck:
- It is the one with more upvotes among the ones focused to English speakers.
- It has characters and gifs that show the way how characters should be written.
- It has sounds for each character.
- All the characters are divided in difficulty levels (HSK 1-6), so I could focus on the easy
ones.
48
In the case of Japanese with Duolingo, I have done the standard course and I have done the
following units, trying always to reach the maximum level of the first units in order to review
them if needed (the golden unit would appear broken if so) before advancing into new
content.
49
The purpose of the study was to study just 2 things:
1. Characters (JAP + CH) → English
2. English → Characters (JAP + CH)
So, no alphabetic transcription has been tested (pinyin for Chinese nor Hepburn for Japanese),
nor the sounds or pronunciation. Therefore, I have deactivated the Sound and Speaking
exercises from the Duolingo’s settings, to not lose time with Japanese’s characters study. The
objective was to examine myself in the end to know with which application I have fully learned
more Characters.
I will say the results in total number of cards (front + reverse), that is to say, the total number
of characters added up to their English translation:
- Anki → Chinese total number of cards studied: 80
o Correct answers: 66
▪ Character to EN → 36
▪ EN to Character → 30
50
o Incorrect answers: 14
- Duolingo → Japanese total number of cards studied: 140
o Correct answers: 27
▪ Character to EN → 18
▪ EN to Character → 9
o Incorrect answers: 113
In both cases we can see that the Character to EN answers have been slightly easier, as it is
evident. Generally, it is easier to remember when the answer is your native language, above all
if the opposite is a foreign language with different alphabetic system.
Results have shown that I have correctly remembered 82% of words in Chinese (in total) and
19% of words in Japanese (in total). This proves that in this experiment, Anki’s SuperMemo
2 algorithm configuration has worked better than Duolingo’s algorithm in short-term
memorization, even though I have studied a greater number of Japanese characters (140 JAP in
comparison to 80 CH).
Nevertheless, I believe that, as Duolingo’s algorithm works better as time goes on, as it learns
more and more from each individual users, results would be different if I continued this study
for more time. Consequently, it would make me review more often, given that, during the
experiment, I have had to review little. And with Anki it has been the opposite, hence the results.
In addition, given that I have done the experiment myself, I have had to do the experiment with
2 different languages. An ideal experiment would include more people divided into 2 groups,
where each group would learn from the ground up the SAME Chinese or Japanese
flashcards with Duolingo and Anki and would be tested in the end who has fully remembered
the most with identical time. For example, there could be a group who followed the Japanese
course with Duolingo and the other group should study the same Duolingo’s characters in the
same order but in flashcards, with Anki and with the SM 2 algorithm. This would be very
revealing and interesting for further research and SRS experimentation.
3.7 Pros and cons of Duolingo VS Anki
51
Duolingo has a big advantage in comparison to Anki: its huge database, perfectionated day
by day by its own AI and multiple contributors. However, this AI has to be improved yet: “It’s
not always clear to tease out from the signal we get back what the cause was. There’s a lot more
AI to do”, says Settles (Wired, 2018). As an example of this statement, in the Figure 31 we can
see the Japanese translation for “drink”, but the translation of “alcohol” is also considered
another correct solution. Therefore, if I would have answered “alcohol” in Japanese instead of
“drink” in Japanese, the answer would have been correct as well. Thus, the algorithm would
increase the interval, even if I did not remember the exact answer.
Figure 27 Duolingo accepts other correct solutions
On the one hand, an important inconvenient is that boxes answers may be too easy to answer,
because the solution is already there, as we can see in the Figure 31 as well. Consequently, we
may know the answer for other reasons than our true knowledge of it. Following this example,
there come to my mind 3 possible reasons for which I could answer correctly but at the same
time not being able to write it on a paper:
1. Chance (33.3% at random choice).
2. Discard the other options because we already know them or because they are not
familiar (in comparison to the word that we are currently learning, if it actually was
familiar).
52
3. We remember just a part of the solution (for example, 1 of the 3 characters of the
Figure 31 but that do not coincide with the other 2 options).
With this, Duolingo’s algorithm could think that we know a word because we answered good,
and therefore increase its interval. Nevertheless, maybe we just got it right by chance and we
should not increase the interval. We can see another example in the Figure 32.
Figure 28 Duolingo exercise: pairing
In this exercise, the user has to pair the Japanese characters with its Hepburn phonetic
transcription. However, there is a high % of randomness of results in this type of exercise,
increased by the fact that we can hear the audio if we click on the characters. As a consequence,
users may hear first the Japanese characters and then choose the phonetic transcription, but then
it would be too easy. Therefore, this type of exercise could be useless depending on how the
user chooses how to answer, if hearing at the solutions first or trying in the first place.
Besides, it would be good to know if this type of exercise increases / decreases the interval
the in the same degree as another exercise, like typing. It would have more sense to give
typing exercises more value than an exercise like the Figure 33, where the solution is obvious,
as you can hear the sound.
53
Figure 29 Duolingo exercise: choose 1, 2 or 3
On the other hand, the gamification aspect of Duolingo is a great aspect of the application and
it increases user’s engagement, compromise and sense of competitiveness if he or she learns
with other friends. Anki does have some add-ons where user can choose sounds for correct and
incorrect answers, set a limit of time for each answer, etc. These are good features as well.
Furthermore, in my Chinese and Japanese experiment Anki’s algorithm has proven to work
better in items retention in the short-term and mid-term than Duolingo. In addition, the
Russian experiment showed as well that there was a problem with Duolingo’s perception of my
forgetting curve. However, it is difficult to reach a conclusion about which one is better because
a longer study should be carried out, with more students and with the same language.
Probably the results would differ, as Duolingo’s Half-Life Regression algorithm works better
as time goes on, adapting itself to each student’s results and answers. In addition, Japanese has
10M learners, in comparison to the 26M English speakers who learn Spanish. Therefore, the
algorithm would be better if more people studied the Japanese course, because there would be
more data results.
There are many things that Duolingo could learn from Anki’s customization and transparency
of the algorithm. They could attract a lot of committed and thourough students if some options
were implemented, so that for example a minimum starting interval was implemented, like 1
day. I believe that in the Japanese experiment it would have been a great implementation and it
would have changed dramatically my retention results. Apparently, Duolingo expected that I
had a longer forgetting curve than I actually have.
54
Finally, another interesting feature that could be added would be to have a database like Anki’s
where users had a notion of the current interval of their words and units. Something as simple
as the current interval or the next expected interval would be a good implementation, to let the
users know how many of their cards are already in their long-term memory, if they will have to
review a lot of units the next day or, on the contrary, if they will have to review little.
55
4 Conclusions
On the one hand, before starting the TFG, I was convinced that Duolingo’s algorithm had no
SRS system at all, and when I have got an insight of how it works, I have realized that it has a
lot of potential as well. The interface is appealing, the gamification is entertaining, the
settings are clear and the algorithm has a deep investigation behind. Duolingo works
better than I expected and now I really consider using it for some particular purposes.
For example, I would like to keep using it to keep learning Chinese, but I would not use it to
learn German, as I already have many flashcards in Anki I would not like to review some words
2 times. Maybe if Duolingo incorporated in its Words list a feature to delete some of the words
that we have already learned very well, I would use it. I think that it is an awesome platform
full of potential. However, some implementations like this one would attract people who want
to make the most of their learning time.
On the other hand, Anki still seems to have a more trustworthy algorithm short-term and
mid-term performance-wise. With the results of my shorts experiments, Duolingo’s algorithm
have not worked ideally, and I have missed some statistics to see my current intervals and
how Duolingo expects my forgetting curve to be, in order to understand why my retention has
been so low. It would be interesting to add some features from Anki, in order to better track our
pace and progression in Language Learning.
Another good implementation would be adding the Eighty Five Percent Rule formula, in order
to look always for that 85% accuracy rate in reviews. This is proven to be motivating for
students because it is that perfect “sweet spot” in language learning retention, and that
encourages students and improves its memory performance and faculties in the long run. What’s
more, the ideal would be to have a customizable interval modifier like Anki, in order to look
for a 90-100% accuracy rate if we wanted to increase our performance before an exam,
for example.
On reflection it is interesting to see how little has been implemented in education, with this
technology available for everyone. Harold et al (2007) mention how compressed learning
programs with short time spans are flourishing, like immersion learning periods or summer
crash courses. Of course, cramming can be a good strategy at some point, and in practical terms
56
it can be more economical, but teachers could then give these resources to ensure that the
information learned is retrieved in the long run.
Personally, I have used Anki since 2017 to keep all the words, idioms and conjugations I have
learned since. This have been useful for all the crash courses I have done it for all my language
classes. This have been very useful, as for now 95% of my French deck of 13000 flashcards are
mature. None of my university colleagues during my education did know about this technology,
except for 1, who used Memrise.
In my 4 years of education in Applied Languages at the University Pompeu Fabra, with 2
Erasmus exchange programs in the University of Liverpool and ISIT Institute in Paris, and with
4 years of language academies attendance, no language teachers provided to students
spacing tools nor apparently had knowledge of them. In Liverpool, some Italian teachers
used Quizlet as a method of learning, but it had no SRS technology, nor did they know about
this powerful feature.
I find three reasons for this lack of awareness. First, ignorance, as spaced repetition and Anki
are still not popular because they are new, and few people know that Duolingo has a Spaced
repetition technology incorporated. Second, the amount of effort to put in in order to:
- Master flashcards creation
- Having powerful, innovative and well-built decks with images, colors, add-ons, etc.
- Time to educate students in the correct use of the program and their implementation in
their routine.
Thirdly, and the one that I find the most important, the educational system. Deadline exams
force short-term and mid-term memorization, as crash courses last maybe 1 month and
semesters max. 5 months. Therefore, the implementation of an SRS would only be short-term
and mid-term, which could be worthwhile as well but not idealistic with a SRS. For example,
if we have to do just 1 semester of Physics in the first year of our 3-year Computer Engineering
degree, most of the students will stop reviewing the formulas and the problem solutions of their
flashcards once they stop because it will not be useful for them in the future.
Let me put the last example. One teacher in Texas wants to implement SRS flashcards of Anki
with the Supermemo 2 configuration to their Spanish students of first year of Middle School
(12 years old) who will do Spanish for their first time in their life. These students would do
Spanish until they finish High School (18 years) and they all have a PC and a smartphone
57
(unlikely but idealistic in this case). In order to perfectly implement Anki’s decks in students,
it would be important to motivate them in their use and in their long-term benefits. If this teacher
does classes to his students for 1 year, he may have enough time to prove the benefits to the
students that use it.
However, this teacher idealistically should be their teacher for the next years or implement the
SRS of Anki and decks with all the other Spanish teachers of the same school, in order to keep
the students engaged with the program, so that they keep getting Anki new cards homework
and having to review them often. They should keep using it to make the most of the long-term
benefits of retrieving flashcards.
For this reason, the educational system and the cooperation of other teachers, the organization
and even the parents, is key. A lot of effort must be done to change a paradigm and a universal
method. New technologies like the SRS can be powerful tools and with time I think that we
will incorporate them into our daily routines.
A lot of research has still to be carried out, above all considering long-term benefits of an SRS,
the study of grammar with flashcards and the incorporation of AI and machine learning into the
SRS. However, a lot of research has already proven that SRS is beneficial for long-term
memory, the Eighty Five Percent Rule stimulates users engagement and motivation and the
forgetting curve has to be taken into account in order to increase retention. Hence, I think that
it is worthwhile to implement them as soon as possible in our educational systems and day-to-
day lives.
58
5 Webography
8Belts. (2021). Te enseño cómo aprender chino en 8 meses. 08.06.2021, de 8Belts Sitio web:
https://w.8belts.com/aprender-chino/
A.W. Melton. 1970. The situation with respect to the spacing of repetitions and memory.
Journal of Verbal Learning and Verbal Behavior, 9:596–606.
Anki. (2021). About Anki. 21/04/2021, de Anki Sitio web: https://apps.ankiweb.net/
Anki. (2021). Introduction. Deck Options. Reviews. 28.05.2021, de Anki Sitio web:
https://docs.ankiweb.net/#/
AnkiDroid Open Source Team. (2021). Tarjetas AnkiDroid. 08/04/2021, de Google Play Sitio
web: https://play.google.com/store/apps/details?id=com.ichi2.anki&hl=es&gl=US
AnkiWeb. (2021). Shared Decks. 08.06.2021, de Anki Sitio web:
https://ankiweb.net/shared/decks/
Anonymous. (2015). Most Common 3000 Chinese Hanzi Characters. 16.03.2021, de AnkiWeb
Sitio web: https://ankiweb.net/shared/info/39888802
Aroline E. Seibert Hanson, Christina M. Brown. (2019). Enhancing L2 learning through a
mobile assisted spaced-repetition tool: an effective but bitter pill?. 09.06.2021, de
Taylor Francis Online Sitio web:
https://www.tandfonline.com/doi/abs/10.1080/09588221.2018.1552975
59
Babel. (2021). Tú puedes aprender chino en 6 meses …¡y lo sabes!. 08.06.2021, de Babel Sitio
web: https://www.babelidiomas.es/tu-puedes-aprender-chino-en-6-meses-y-lo-sabes/
Burr Settles, Brendan Meeder. (2016). A Trainable Spaced Repetition Model for Language
Learning. 20/04/2021, de Association for Computational Linguistics Sitio web:
https://www.aclweb.org/anthology/P16-1174.pdf
Burr Settles. (2016). How we learn how you learn. 09.06.2021, de Duolingo blog Sitio web:
https://blog.duolingo.com/how-we-learn-how-you-learn/
Cynthya Peranandam. (2018). AI Helps Duolingo Personalize Language Learning. 09.06.2021,
de Wired Sitio web: https://www.wired.com/brandlab/2018/12/ai-helps-duolingo-
personalize-language-learning/
Denyze Toffoli, Laurent Perrot. (2019). Autonomy, the Online Informal Learning of English
(OILE) and Learning Resource Centers (LRCs): The Relationships Between Learner
Autonomy, L2 Proficiency, L2 Autonomy and Digital Literacy. 26.05.2021, de HAL
Sitio web: https://hal.archives-ouvertes.fr/hal-02332599/document
Duolingo. (2021). Duolingo - Aprende inglés y otros idiomas gratis. 08/04/2021, de Google
Play Sitio web:
https://play.google.com/store/apps/details?id=com.duolingo&hl=es&gl=US
H. Ebbinghaus. 1885. Memory: A Contribution to Experimental Psychology. Teachers College,
Columbia University, New York, NY, USA.
60
Harold Pashler, Dough Rohrer et al. (2007). Enhancing learning and retarding forgetting:
Choices and consequences. 27.05.2021, de Psychonomic Society, Inc. Sitio web:
http://thesciencenetwork.org/docs/BrainsRUs/Enhancing%20Learning_Pashler.pdf
Memrise. (2021). How does the spaced repetition system work?. 21/03/2021, de Memrise Sitio
web: https://memrise.zendesk.com/hc/en-us/articles/360015889057-How-does-the-
spaced-repetition-system-work-
Memrise. (2021). Memrise: Fun & Fast Language Learning App. 08/04/2021, de Google Play
Sitio web:
https://play.google.com/store/apps/details?id=com.memrise.android.memrisecompanio
n&hl=es&gl=US
Michael B. Horn, Heather Staker. (2011). The Rise of K–12 Blended Learning. 26.05.2021, de
Aurora Institute Sitio web: https://aurora-institute.org/wp-content/uploads/The-Rise-of-
K-12-Blended-Learning.pdf
Paul Sawers. (2019). How Duolingo is using AI to humanize virtual language lessons.
09.06.2021, de VentureBeat Sitio web: https://venturebeat.com/2019/07/05/how-
duolingo-is-using-ai-to-humanize-virtual-language-lessons/
Piotr Wozniak. (1998). Application of a computer to improve the results obtained in working
with the SuperMemo method. 27.05.2021, de SuperMemo Sitio web:
https://www.supermemo.com/en/archives1990-2015/english/ol/sm2
Piotr Wozniak. (2018). Exponential adoption of spaced repetition. 27.05.2021, de Supermemo
Sitio web: https://supermemo.guru/wiki/Exponential_adoption_of_spaced_repetition
61
Quizlet Inc.. (2021). Quizlet: Aprende con fichas educativas. 08/04/2021, de Google Play Sitio
web:
https://play.google.com/store/apps/details?id=com.quizlet.quizletandroid&hl=es&gl=
US
Ramón Campayo. (2021). Aprende alemán en 7 días (Autoayuda y superación). 08.06.2021, de
Amazon Sitio web: https://www.amazon.es/Aprende-alem%C3%A1n-en-7-
d%C3%ADas/dp/8408131672
Richard C. Bailey. (2011). Internet-Based Spaced Repetition Learning In and Out of the
Classroom: Increasing Independent Student Use. 26.05.2021, de Asia University Sitio
web: https://core.ac.uk/download/pdf/72791536.pdf
Robert C. Wilson, Amitai Shenhav et al. (2019). The Eighty Five Percent Rule for optimal
learning. 27.05.2021, de Nature Sitio web: https://www.nature.com/articles/s41467-
019-12552-4
Robert Godwin-Jones. (2010). Emerging technologies from memory palaces to spacing
algorithms: approaches to second-language vocabulary learning. 27.05.2021, de
Virginia Commonwealth University Sitio web:
https://scholarspace.manoa.hawaii.edu/bitstream/10125/44208/14_02_emerging.pdf
Roumen Vesselinov. (2012). Duolingo'Effectiveness'Study . 08.06.2021, de Duolingo Sitio
web: http://static.duolingo.com/s3/DuolingoReport_Final.pdf