the controlled natural language of randall munroe’s thing explainer

24
The Controlled Natural Language of Randall Munroe’s Thing Explainer Tobias Kuhn http://www.tkuhn.org @txkuhn Department of Computer Science, VU University Amsterdam Fifth Workshop on Controlled Natural Language (CNL 2016) Aberdeen, Scotland 26 July 2016

Upload: tobias-kuhn

Post on 20-Jan-2017

146 views

Category:

Science


1 download

TRANSCRIPT

The Controlled Natural Language of RandallMunroe’s Thing Explainer

Tobias Kuhn

http://www.tkuhn.org

@txkuhn

Department of Computer Science, VU University Amsterdam

Fifth Workshop on Controlled Natural Language (CNL 2016)Aberdeen, Scotland

26 July 2016

Does CNL have a Visibility Problem?

Controlled Natural Languages have been successfullyapplied in a wide range of domains...

... but how many people know what CNLs are, outsideof our small community?

But maybe things are about to change...

Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 2 / 24

Bill Gates on Controlled Natural Language:

“It is a brilliant concept.”

https://www.gatesnotes.com/Books/Thing-Explainer

Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 3 / 24

Well, OK, Bill Gates wasn’t referring to CNL in general but to a veryspecific one:

Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 4 / 24

The author of Thing Explainer, Randall Munroe,is also the creator of the xkcd webcomics:

https://xkcd.com/1443/

Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 5 / 24

Thing Explainer —Complicated Stuff in Simple Words

Intriguingly simple idea: Only use the 1000 most common words ofEnglish

“This is a book of pictures and simple words [...] using only the tenhundred words in our language that people use the most.” [Thing

Explainer]

Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 6 / 24

Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 7 / 24

Excerpt from the Book

from: R. Munroe. Thing Explainer Complicated Stuff in Simple Words. Houghton Mifflin Harcourt, 2015.

Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 8 / 24

Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 9 / 24

from: R. Munroe. Thing Explainer Complicated Stuff in Simple Words. Houghton Mifflin Harcourt, 2015.

Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 10 / 24

Language Properties

PENS class: P1E5N5S1 (like unrestricted English)

Comparison to similar languages:

• Basic English: 850 manually selected words (only 18 verbs!) withspecified categories (nouns, verbs, etc.), and strict usage rulesincluding grammar restrictions

• Special English: 1500 manually selected words (evolve over time)with specified categories

• Thing Explainer language: 1000 words chosen by frequency andwithout specified categories, no restrictions on word usage orgrammar

Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 11 / 24

Similar Language: Special English

from: VOA Special English word book, Voice of America, 2009

Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 12 / 24

Word List

Randall Munroe: “I spent several months going back over a bunch ofdifferent lists and generating some of my own based on the GoogleBooks corpus and even my own email inbox. Then I combined thelists and where they disagreed I just let my sense of consistency bethe tie-breaker.” [Heaven 2015]

“In this set, I count different word forms—like ‘talk,’ ‘talking,’ and‘talked’—as one word. I also allowed most ‘thing’ forms of ‘doing’words, like ‘talker’—especially if, like ‘goer,’ it wasn’t a real word butit sounded funny.” [Thing Explainer]

D. Heaven. Its not a rocket its an up goer. New Scientist, 228(3049):3233, 2015.

Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 13 / 24

Implicit Rules

1. The word forms on the list of the 1’000 most often used words.

2. All conjugation forms of verbs on the list. This includes thirdsingular present (-s), past (-ed), and infinitive from (-ing),including irregular forms.

3. Noun forms built from verbs on the list by -er, for examplecarrier.

4. The plural forms of nouns on the list (-s), for example things,including irregular forms like teeth. This rule can also be appliedto the word other to produce others, even though it is not anoun.

5. Comparative (-er) and superlative forms (-est) built fromadjectives on the list, for example smaller or fastest, andincluding irregular forms like worse.

Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 14 / 24

Implicit Rules (Continued)

6. Adjective forms built from nouns on the list by -y or -ful, forexample pointy or colorful.

7. Adverb forms built from adjectives on the list by -ly, for examplenormally.

8. Noun forms built from adjectives on the list by -ness, forexample thickness.

9. Different case and possessive forms of pronouns on the list: theyfor them, us/ours from we/our, and his from he.

Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 15 / 24

I don’t want to go all Language Nerd, but ...

https://xkcd.com/1443/

Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 16 / 24

Implicit Rules (Continued)

10. Verb forms of nouns on the list and noun forms of verbs on thelist when the two forms are similar but not equal, such asthought from think, and live from life, including deduced formslike thoughts and living. (only these two cases)

11. More basic word forms for words on the list, such as nouns fromwhich adjectives on the list were built (person from personal)and verbs from which nouns on the list were built (build frombuilding). (only these two cases)

12. Verb forms built from adjectives on the list, such as lower fromlow, including conjugated forms like lowering and lowered. (onlythis one case)

13. Common acronyms for words on the list, such as TV fortelevision. (only this one case)

Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 17 / 24

Word Form Counts780listed form (rule 1)

361* + s (rules 2/4)260noun-verb + s (rules 2/4)

168* + er (rules 3/5)167verb + ing (rule 2)

119verb + er (rule 3)114verb + ed (rule 2)

84irr. verb form (rule 2)68noun + s (rule 4)

32verb + s (rule 2)28adj. + er (rule 5)

21adj. + est (rule 5)21verb-adj. + er (rules 3/5)

7noun + y (rule 6)6extra word6basic form (rule 11)4pronoun form (rule 9)4adj. + -ly (rule 7)3noun to verb (rule 10)3adj. + ness (rule 8)3irr. noun form (rule 4)2adj. to verb (rule 12)1verb to noun (rule 10)1other + s (rule 4)1adj. + ful (rule 6)1acronym (rule 13)

0 100 200 300 400 500 600 700 800Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 18 / 24

Word Occurrence Counts40’494listed form (rule 1)

4’688* + s (rules 2/4)3’232noun-verb + s (rules 2/4)

1’795irr. verb form (rule 2)1’274verb + ing (rule 2)1’140* + er (rules 3/5)1’050noun + s (rule 4)790verb + er (rule 3)628verb + ed (rule 2)529pronoun form (rule 9)398verb + s (rule 2)225adj. + er (rule 5)205extra word125verb-adj. + er (rules 3/5)99basic form (rule 11)77adj. + est (rule 5)68noun to verb (rule 10)36noun + y (rule 6)31irr. noun form (rule 4)10adj. + -ly (rule 7)8other + s (rule 4)3adj. + ness (rule 8)3adj. + ful (rule 6)2verb to noun (rule 10)2adj. to verb (rule 12)2acronym (rule 13)

0k 10k 20k 30k 40kTobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 19 / 24

Six Extra Words andthe Importance of Tool Support

These six words are used in Thing Explainer, but are not on the list(nor can they be produced by any obvious rule from the words on thelist): some, mad, hat, apart, rid, and worth

Caused by the use of a faulty spell-checker?

Randall Munroe: “As I wrote, I had tools that would warn me if I useda word that was not on the list, like a spell-checker.” [Heaven 2015]

D. Heaven. Its not a rocket its an up goer. New Scientist, 228(3049):3233, 2015.

Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 20 / 24

Inconsistency with Creation/Use of Word List

Some words (comparative and superlative forms of adjectives;adjectives built by -ly; and different case and possessive forms ofpronouns) count as different when the list is defined, but count asthe same word when the list is used.

Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 21 / 24

Word Distribution

● ● ● ●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

1 5 10 50 500

15

5050

0

Frequency distribution of non−lemmatized terms

rank

coun

t

● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

1 5 10 50 5001

550

500

Frequency distribution of lemmatized terms

rank

coun

t

Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 22 / 24

Conclusion

So, what makes the Thing Explainer language special?

• Very popular (I don’t know of any other book written in a CNLthat is as popular as Thing Explainer...)

• Intriguingly simple restriction of English (even though not assimple as it looks at first)

• Demonstration that virtually everything can be expressed in it inan easy-to-grasp manner (I don’t know of a similarly convincingdemonstration for Basic English or Special English...)

• Some things could be improved to make the language even morepowerful (and maybe a bit less funny)

Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 23 / 24

Thank you for your attention!

Questions?

Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 24 / 24