what and how children search on the web sergio duarte torres, ingmar weber

42
WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

Upload: abraham-harvey

Post on 20-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

WHAT AND HOW CHILDREN SEARCH ON THE WEB

Sergio Duarte Torres, Ingmar Weber

Page 2: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

WHAT IS LOVE?

Page 3: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber
Page 4: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber
Page 5: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber
Page 6: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

Motivation

Page 7: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber
Page 8: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

Goals of this work

• Identify and quantify search struggle of young users

•Retrace stages of child development through their web searches

Page 9: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

What data was used?• US Yahoo! search logs from May to August of 2010• Cleaning steps:

• User wise:• Logs from users without Yahoo! accounts were removed

• Query wise:• Queries issued by a single user were removed• Queries with personally identifiable information• Non alpha-numerical single token queries

Why the cleaning? What could be advantages/disadvantages?

Page 10: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

An aside about the data• Users under 13 years old required the consent of an

responsible adult to register at Yahoo! (costs $.50)

• Some people may lie about their age…• General trends are expected to be robust to noise• People may lie about their age but … usually they tend to make

themselves appear older

Where do you think millions of children lie about their age?http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/3850/3075

Page 11: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

Data segmentation• Users grouped based on their reported birth year• Age estimated as: 2010 – Birth year• Following age buckets were created:

• 6-7: early elementary • 8-9: readers• 10-12: advance readers• 13-15: teenagers• 16-18 : mature teenagers• >18: grown ups

Page 12: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

Data characteristics

• Data set size

Below 10 years old Above 10 years old

Volume of queries >100K >1M

Number of users >10K >100K

Page 13: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

Methodology: Micro- vs. Macro-Averages

• User A:• 100x cooking• 10x science

• User B:• 1x cooking• 5x science

• User C:• 2x cooking• 10x science

• Micro avg.: cooking = (100+1+2)/(100+10+1+5+2+10) = 0.80• Macro avg.: cooking = (100/110 + 1/6 + 2/12) / 3 = 0.41

People search mostly for cooking.True? False?

Page 14: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

Methodology: Detecting Navigational Queriesfacebook, yahoo mail, google, ...

How would you do it?

• Editorial judgments• Ask human judges to mark queries a navigational• Drawbacks?

• Click entropy• Look at the diversity of the results clicked in response• Drawbacks?

• String similarity heuristics• Try to find query as substring in clicked domain• Drawbacks?

Page 15: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

Search Difficulty Outline

1. Query length

2. Natural language usage

3. Click position bias

4. Other signs of click position bias

5. Children expose to adult content

6. Time spent on web results

7. Sessions characteristics

Page 16: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

Query length• Increasing query length through the age groups

• Slightly bigger gap for non-navigational queries

• Greater ambiguity in children queries

6-7 10-12 13-15 Adults2.5

2.6

2.7

2.8

2.9

3.0

3.1

3.2

All

Avg

(T

oke

ns)

Page 17: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

Natural language usage (I)• Questions instead of queries

• what is the only immortal animal?

• Modal queries• I don’t want to go to school

• Factual queries• describe the parts of a cell

• Superlative queries• the fastest dog

• Targeted queries for kids• car photos for kids

Page 18: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

Natural language usage (II)• Greater NL usage at younger ages• Teenagers behavior closer to children than adults

behavior

6-7 10-12 13-15 Adults0.0%1.0%2.0%3.0%4.0%5.0%6.0%7.0%8.0%

NLQuestionTargetedF

ract

ion

Page 19: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

Click position bias

0 1 2 3 40.5

0.7

0.9

1.1

1.3

1.5

1.7

1.9

2.1

2.3

2.5

6-710-1213-15Adults

Rat

io r

elat

ive

to a

du

lts

Other explanations?

Page 20: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

Clicks on ads• Children aged 6-9 more likely to click on ads!• Evidence of disorientation during the search process

6-7 10-12 13-15 Adults0.7

0.8

0.9

1.0

1.1

1.2

1.3

1.4

Rat

io r

elat

ive

to a

du

lts

Page 21: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

How to evaluate search success using click data?

•How would you do it?

Page 22: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

Time spent on web results• Click duration as a signal of search success. Hassan et al

(2010) WSDM ‘10

• Short click (0-10 secs): Unsuccessful click• Long click (≥ 100 secs): Successful click

6-7 10-12 13-15 19-25 Adults0.0

0.5

1.0

1.5

2.0

2.5

3.0

ShortLong

Rat

io

rela

tive

to

ad

ult

s

Page 23: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

Children exposed to adult content• Likelihood of accidental click on adult content:

• Click on adult content is short and the action is immediately reverted by a click on a non-adult content

6 to 7 8 to 9 10 to 12 Adults1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

Rat

io r

elat

ive

to a

du

lts

Page 24: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

Sessions characteristics (I)• Shorter sessions in young users• Jump to adulthood also occurs in the group of users from 19 to 25

6-7 10-12 13-15 19-25 adults3.53.73.94.14.34.54.74.95.15.35.5

Avg

nu

mb

er o

f ac

tio

ns

Page 25: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

Sessions characteristics (II)• Query refinding

c

q

q’

q

What do refinding queries indicate?

Page 26: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

Sessions characteristics (III)• Click refinding

q

c

c’

c

Page 27: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

Sessions characteristics (IV)

6-7 10-12 13-15 19-25 Adults0.1

0.15

0.2

0.25

0.3

0.35

Query ref.Click ref.

Avg

ref

. p

er s

essi

on

Shorter sessions?

Page 28: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

Tracing children development on the web: Outline

1. What do children search for?

2. What entities are children interested in?

3. Does the reading level of the clicks varies across ages and education?

Page 29: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

Classifying queries into topics

Page 30: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

“sigir 2011”?

computers_and_internet/programming_and_development

computers_and_internet/programming_and_development

computers_and_internet/programming_and_development

computers_and_internet/programming_and_development

computers_and_internet/programming_and_development

computers_and_internet/programming_and_development

Classifying queries into topics

Page 31: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

What do children search for?

6 to 7 10 to 12 13 to 15 19 to 25 adults0.0

0.1

0.2

GamesEntertainment/musicAdult contentComputers & InternetNews Entertainment/tvEducation

Fra

ctio

n

• Children and teenager groups have few dominant topics• Adults have more diverse query topics • Also due to smaller vocabulary

Page 32: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

Gender differences (I)• Topic distribution per each group and gender• 1-Norm to quantify gender differences• Example for age group 10-12• ||

Which topic is most responsible for gender differences?

Page 33: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

Gender differences (II)

6-7 10-12 13-15 Adults0.2

0.25

0.3

0.35

0.4

0.45

Avg gender differenceAvg gender difference (without adult content)

Avg

gen

der

dif

fere

nce

Page 34: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

What entities are children interested in?

• Queries mapped to Wikipedia entities using site search on wikipedia.org/wiki

Query Entity

facebook, facebook login en.wikipedia.org/wiki/Facebook

back to school clothes, london schol uniforms

en.wikipedia.org/wiki/School_uniform

Hummus recipe, ideal protein en.wikipedia.org/wiki/Hummus

How to map web queries to Wikipedia pages?

Page 35: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

What entities are children interested in? (10-12)

Page 36: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

What entities are adults interested in? (40+)

Page 37: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

What entities are children interested in?

• Greater used of child oriented entities at young ages

6-7 8-9 10-12 19-25 Adults0.0%

1.0%

2.0%

3.0%

4.0%

5.0%

6.0%

7.0%

Fra

ctio

n

Page 38: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

Does the reading level of the clicks varies across ages?• Based on Google reading level classification

• 70% (kids) vs 50% (adults) of clicks classified as basic

Page 39: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

Does the reading level of the clicks vary across ages? (II)• Reading level also varies according to education level

• Education level of adults according to US census

8-9 10-12 13-15 Adults0%

10%

20%

30%

40%

50%

60%

70%

80%

Basic (Low-edu)Basic(high-edu)F

ract

ion

CIKM 2011. Glasgow, 26 of October

Page 40: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

Gender: MaleBirth year: 1978ZIP code: 95054

cheap holidays

Expected income: $ 31k

Expected education: 45% BA

Race distribution: 38% w, 47% A

Label (Q,D) with $31k, 45%BA, ...

Q

D

US Census Datafactfinder.census.gov

Getting demographics from US census

Page 41: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

Conclusions• Clear behavioral differences between children and adults

• Although not clean between teenagers and children

• Sudden jump to adulthood from 19 to 25 years old

• Stronger position click biased for children, including ads

• Assistance of question queries

• Understanding concerns expressed in their queries

Page 42: WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber

THANK YOU FOR YOUR ATTENTION