kotonoha: an example sentence based spaced repetition system

23
Kotonoha An Example Sentence Based Spaced Repetition System Arseny Tolmachev, Sadao Kurohashi Kyoto University Graduate School of Informatics D1 2017-03-15

Upload: eiennohito

Post on 21-Mar-2017

248 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Kotonoha: An Example Sentence Based  Spaced Repetition System

KotonohaAn Example Sentence Based

Spaced Repetition SystemArseny Tolmachev, Sadao Kurohashi

Kyoto UniversityGraduate School of Informatics

D1

2017-03-15

Page 2: Kotonoha: An Example Sentence Based  Spaced Repetition System

Background: Learning words with flashcards• Lexical knowledge is crucial for language learning• Mostly self-learning• E.g. Japanese Language Proficiency Test level N1

requires knowing about 10,000 words

2

Flashcards: a method of organizing information, which can be formulated in question-answer form, for learning

Question Answer

Page 3: Kotonoha: An Example Sentence Based  Spaced Repetition System

3

Spaced Repetition

graph is from supermemo.com website

Page 4: Kotonoha: An Example Sentence Based  Spaced Repetition System

4

Spaced Repetition: Software• One of first implementations:

https://www.supermemo.com• The most popular SRS: Anki

http://ankisrs.net/

• And much more

Page 5: Kotonoha: An Example Sentence Based  Spaced Repetition System

5

Japanese (Word) Learning Tools• Most of them are for elementary/beginner learners• Hiragana/katakana• Fixed word lists/lessons

• Tools for advanced learners are scarce• Anki• Several more

Page 6: Kotonoha: An Example Sentence Based  Spaced Repetition System

6

Motivation: Context• We use words in context: with other words• Contextual word usage differ from language to

language

Life example: • バスは乗客を拾った

Non-canonical usage in Japanese, OK in Russian• バスは乗客を乗せた

Canonical usage in Japanese

Page 7: Kotonoha: An Example Sentence Based  Spaced Repetition System

7

Flashcard problems• Creating flashcards from scratch is time-consuming• Need to fill all information• Possibly find example sentences somewhere

• Premade decks do not work as well as manually created• Matter of UI and system implementation

• Lack of context• Especially in questions• Card content is usually fixed

(e.g. only one context)

Page 8: Kotonoha: An Example Sentence Based  Spaced Repetition System

8

Kotonoha SRS• Web (responsive)• +mobile apps (in plans)

• Flashcards• Spaced Repetition• Intermediate+Features• Example sentences• In question cards

• Batch operation• Japanese-oriented

https://kotonoha.ws

Page 9: Kotonoha: An Example Sentence Based  Spaced Repetition System

9

Kotonoha: Usage Pattern• Find new words• Reading books, classes, assignments

• Add words into the system• Kotonoha makes it easy to add new words

• Repeat flashcards• E.g 100 cards every day• Learn word usage too:

Kotonoha shows a new example each repetition

• Have a rich vocabulary (in a long term)

Page 10: Kotonoha: An Example Sentence Based  Spaced Repetition System

Kotonoha: Adding words

10

Batch operation

Words are added in lists

Kotonoha fills reading and glosses from dictionary (JMDict, Warodai)

Kotonoha assigns example sentences

Easy to learn words you want

Word was already added

Word was not added before

Report that you forgot the word

Page 11: Kotonoha: An Example Sentence Based  Spaced Repetition System

11

Kotonoha: Adding words (2)

Check what gets into flashcards Recommendations: words using same characters

Page 12: Kotonoha: An Example Sentence Based  Spaced Repetition System

12

Kotonoha: RepetitionQuestion card (reading) Answer card

Page 13: Kotonoha: An Example Sentence Based  Spaced Repetition System

13

Kotonoha: Writing PrintoutsPrint out and practice writing

difficult words

Page 14: Kotonoha: An Example Sentence Based  Spaced Repetition System

14

Example sentences• Automatically extracted from web corpus

• Tatoeba corpus is small and not very diverse

• Consider a set of sentences for a target word• Three aspects: Value, Diversity, Coverage• Intrinsic Value (for a single sentence)

• Not a garbage sentence like a fragment of something• Representative usage of target• Understandable by a learner• Grammatical

• Diversity (for a sentence set)• Different usages of target, distinct words

• Coverage: acquire usages of rare words and rare senses

Page 15: Kotonoha: An Example Sentence Based  Spaced Repetition System

15

Example sentence extraction overview

私は走るのすき走っている子供を見た…遊びに走る若者酒に走りたい気持ち…悪事千里を走る

Query走る

SearchEngine

High-quality sentences(~10-15)

Preprocessing

Raw Corpus Analyze and index

Search Selection

Solving coverage problem Dealing with value

and diversity

• Distributed • Handles huge corpora• Uses lexical dependency information• Prefers sentences with rich syntactic

structure near target

Example Candidates (~10k sentences)

Details are out of scope of this presentation

Page 16: Kotonoha: An Example Sentence Based  Spaced Repetition System

16

Flashcards: Daily Repetition

Page 17: Kotonoha: An Example Sentence Based  Spaced Repetition System

17

Flashcards: Daily Repetition

Page 18: Kotonoha: An Example Sentence Based  Spaced Repetition System

18

Flashcards: Daily Repetition

Page 19: Kotonoha: An Example Sentence Based  Spaced Repetition System

19

Flashcards: Daily Repetition

Page 20: Kotonoha: An Example Sentence Based  Spaced Repetition System

Example sentence evaluation

This is idea, no results yet

Show different sentences to learners of similar level

Assumption:Good example sentences help to remember words.

Assumption 2:We can use confidence to judge sentence educational quality

Page 21: Kotonoha: An Example Sentence Based  Spaced Repetition System

21

Collecting NLP training dataKotonoha can be useful source of NLP training data for:

Reading estimation

Word sense disambiguation

Learners are interested to get this information right

Presently only reporting is implemented

Page 22: Kotonoha: An Example Sentence Based  Spaced Repetition System

22

Implementation problems• Three segmentation standards in one package• Flashcards are mostly JMDict-based

• And words over there are rather inconsistent• On the other hand, it is not a segmentation dictionary

• Example sentence extraction uses JUMAN/KNP pipeline• Reading estimation is done using KyTea/UniDic

• And resources for reading-annotated Japanese are extremely sparse :(

• Because of this• Some example sentence coverage problems• Reading estimation errors

Page 23: Kotonoha: An Example Sentence Based  Spaced Repetition System

23

Kotonoha: Present• Available: https://kotonoha.ws• Open source (core SRS)

• https://github.com/kotonoha/server

• (Very low volume) open beta test• Will try to increase user base in following months• Potential users (Japanese Learners) are very welcome!

• Side note:• https://github.com/kotonoha/akane• JUMAN/KNP/KyTea + other Scala library