practical relevance measurement

16
Confidential Material – Chegg Inc. © 2005 - 2015. All Rights Reserved. Confidential Material – Chegg Inc. © 2005 - 2015. All Rights Reserved. Practical Relevance Measurement Walter Underwood Principal Software Engineer

Upload: enterprisesearchmeetup

Post on 18-Aug-2015

55 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Practical Relevance Measurement

Confidential Material – Chegg Inc. © 2005 - 2015. All Rights Reserved.Confidential Material – Chegg Inc. © 2005 - 2015. All Rights Reserved.

Practical Relevance MeasurementWalter Underwood

Principal Software Engineer

Page 2: Practical Relevance Measurement

Confidential Material – Chegg Inc. © 2005 - 2015. All Rights Reserved.

• Measure• Explain• Diagnose• Fix• Repeat

Data-Driven Improvement

2

Page 3: Practical Relevance Measurement

Confidential Material – Chegg Inc. © 2005 - 2015. All Rights Reserved.

Actionable

Measure: What is a good metric?

10

Common interpretation

Accessible,credibledata

Transparent,simplecalculation

Juice Analytics: http://www.juiceanalytics.com/writing/choosing-right-metric

Page 4: Practical Relevance Measurement

Confidential Material – Chegg Inc. © 2005 - 2015. All Rights Reserved.

• Click-through rate (CTR)• Great for navigational search (one correct answer)• Great for simple search UI

• Success rate• With multiple clicks, identify the result that satisfied• Dwell time, conversion, …• Needed for informational search• Useful for complex UI: facets, filters, …

Measure: What is Success?

2

Page 5: Practical Relevance Measurement

Confidential Material – Chegg Inc. © 2005 - 2015. All Rights Reserved.

• Use these measures to explain search success to others• 0.50 CTR or success rate is pretty good• A one percentage point improvement is very good• Do not over-promise• Report CTR daily and graph it

Explain: Overall Success

2

Page 6: Practical Relevance Measurement

Confidential Material – Chegg Inc. © 2005 - 2015. All Rights Reserved.

• What are the problem queries?• Per-query CTR reports

Diagnose

2

Query CTR Frequency

gorilla 0.45 10,000

orangutan 0.10 100

Page 7: Practical Relevance Measurement

Confidential Material – Chegg Inc. © 2005 - 2015. All Rights Reserved.

• What should the queries’ CTR be?• How many clicks would that be?• How many times were people less happy than expected?

Diagnose: Click Residual

2

Query CTR Frequency Actual Expected Residual

TOTAL 0.50 100,000 50,000

gorilla 0.45 10,000 4,500 5,000 -500

orangutan 0.10 100 10 50 -40

Page 8: Practical Relevance Measurement

Confidential Material – Chegg Inc. © 2005 - 2015. All Rights Reserved.

• Customer uses different word than content• The Vocabulary Problem (Furnas, et al; 1987)• Add synonyms: coat, jacket, parka, anorak, …

• Content doesn’t exist• We don’t sell that — add “no hits” page with

recommendations• Add to site/inventory

• Misspellings (about 10% of queries)• Fuzzy search• Query suggestions

Fix: What is the cause?

2

Page 9: Practical Relevance Measurement

Confidential Material – Chegg Inc. © 2005 - 2015. All Rights Reserved.

• Count attempts (queries)• Count clicks or successes• Associate clicks with queries• Handle anomalies• Hash size

Implementing Metrics

2

Page 10: Practical Relevance Measurement

Confidential Material – Chegg Inc. © 2005 - 2015. All Rights Reserved.

• Log the query• Log a hash or unique ID for this attempt

• Will be used to link clicks to attempts• Random bits generated in server (session log)• Random bits generated in browser (URL param)• Hash calculated from user ID, query, and time

Implementing: Count Attempts

2

Page 11: Practical Relevance Measurement

Confidential Material – Chegg Inc. © 2005 - 2015. All Rights Reserved.

• Search results page will have decorated URLs on results• Tracking hash• Rank (1-based)• Query is optional, you can join with the attempts log

• Result links look like: /product-1234?tk=ABCD&r=1 /product-5678?tk=ABCD&r=2

Implementing: Count Clicks

2

Page 12: Practical Relevance Measurement

Confidential Material – Chegg Inc. © 2005 - 2015. All Rights Reserved.

• Multiple clicks from one SRP• Opening results in new tabs• Cached SRP• Solution: Only count one, maybe the last one

• Clicks that don’t match a query attempt• Bookmarked results• Clicks from a page loaded a previous day• Solution: Ignore them

• Bots hitting SRP• Remove from stats• JavaScript-generated token helps filter bots

Implementing: Anomalies

2

Page 13: Practical Relevance Measurement

Confidential Material – Chegg Inc. © 2005 - 2015. All Rights Reserved.

• Birthday Paradox — you’ll need more bits than you think• Chegg uses:

• 48 random bits• Encoded in URL-safe Base64• Eight characters• Looks like: tk=NmAwoq5e

• 1% chance of collision with 2.4 million searches• See “Birthday Problem” on Wikipedia

Implementing: Hash Size

2

Page 14: Practical Relevance Measurement

Confidential Material – Chegg Inc. © 2005 - 2015. All Rights Reserved.

Questions?

2

Page 15: Practical Relevance Measurement

Confidential Material – Chegg Inc. © 2005 - 2015. All Rights Reserved.

Search Analytics for Your Site — Louis Rosenfeld

3

http://rosenfeldmedia.com/books/search-analytics-for-your-site/

Page 16: Practical Relevance Measurement

Confidential Material – Chegg Inc. © 2005 - 2015. All Rights Reserved.Confidential Material – Chegg Inc. © 2005 – 2015 by Chegg Inc. All Rights Reserved.

Thank [email protected]

5