insight demo 2 16-15
TRANSCRIPT
more matildaskatherine yoshida
Central Characters in Children’s Books
Gender in 20th century children’s books (McCabe et al., 2011)
User Query
Term Ranking
Gender Ranking
Overall Ranking
Term Ranking
Gender Ranking
Overall Ranking
Tokenize & remove irrelevant information
Lowercase, lemmatize, stopword removal
Terms database
User Query
User Query
Term Ranking
Gender Ranking
Overall Ranking
Genderize proper names
Count pronounse.g., her, his, she, he
Social Security DB of names
Naive Bayes classifier
F (PNs + pronouns) - M (PNs + pronouns)
length of description
Katherine Yoshida
▷ PhD in cognitive science (UBC)▷ UX research & consulting
thank you!
Stack
MySQLPython• NLTK
AWS
Gendered pronounsfemale +1: she, her, hers
male +1: he, his, him
Naive Bayes classifier
Proper namesNLTK entities bigram parser (‘people’)
Social Security database of names
Naive Bayes classifier for other names
Social security database: Vast majority of proper names
Three features:
● Last letter of name● Last two letters of name● Is last letter a vowel (aeiouy)
Trained on 80% of names from Social Security names database
Validated on 20% holdout sample
Accuracy = 80%
Naive Bayes classifier
Average Gender Counts
Female Male
Pronouns 4.7 5.1
Proper Names 2.8 4.6
Total 7.5 9.7
Frequency x Female proportion