amy langville, associate professor of mathematics, the college of charleston in south carolina at...
DESCRIPTION
My talk will cover four ranking and clustering projects that I consulted on this past year. The projects range from ranking Olympic athletes, mixed martial arts fighters, and cell phone carriers to clustering sentences to rank individuals by how much humility they evidence in their written language. For each project, I will address the particular data challenges and the solutions and techniques we proposed.TRANSCRIPT
1
4 Consulting Projects from this past yearSeptember 19, 2014
Machine Learning 2014
Amy LangvilleMathematics Department
College of [email protected]
2
Tyler PeriniMathematics Department
College of [email protected]
4 Consulting Projects from this past year
Amy LangvilleMathematics Department
College of [email protected]
3
4 Consulting Projects from this past year
Tyler PeriniMathematics Department
College of [email protected]
Amy LangvilleMathematics Department
College of [email protected]
4
2 Books generate questions
US Olympic Projects
CageRank
Ranking Cell Phone Carriers
The Humility Project
Outline
5
2 Books generate questions
1232-1315
6
2 Books generate questions
1232-1315
Chapter 7 talks about . . . but I need to . . . Any advice?
7
2 Books generate questions
1232-1315
Chapter 7 talks about . . . but I need to . . . Any advice?
I really enjoyed your book, but my problem is . . ., which you
don’t mention. How do I solve it?
8
Project 1: from U.S. Olympic Committee
9
Project 1: from U.S. Olympic Committee
Problem 1:Your book talks a lot about ranking in head-to-head contests (and that was helpful), but we need to rank
multi-competitor sports like downhill skiing and gymnastics.
10
Project 1: from U.S. Olympic Committee
Problem 1:
Solution 1: TRUESKILL
μ = average skill
σ = uncertainty
Your book talks a lot about ranking in head-to-head contests (and that was helpful), but we need to rank
multi-competitor sports like downhill skiing and gymnastics.
11
12
Project 1: from U.S. Olympic Committee
1st
3rd
2nd
13
Project 1: from U.S. Olympic Committee
1st
3rd
2nd
14
Project 1: from U.S. Olympic Committee
2nd
3rd
1st
15
Project 1: from U.S. Olympic Committee
Problem 2:Your book talks a lot about ranking
in head-to-head contests where there are multiple matches
between competitors, but our data is sparse. Any advice?
16
17
Problem:
Solution: FIND SIMILAR FIGHTERS to densify the graph
Project 2: CageRank
You talk a lot about ranking head-to-head contests, like ours [MMA
fights], but our data is really sparse. How do we deal with that?
UFC 163Phil Davis Lyoto Machida
UFC 163Phil Davis Lyoto Machida
had never fought each other
College football vs. UFC
UFC 163Rashad Evans 1
Ryan Bader 2Alexander Gustafson 3
Antonio Rogerio Nogueira 4Quinton “Rampage” Jackson
5Chael Sonnen 6
Matt Hamill 7James Te-Huna 8
Dan Henderson 9Vladimir Matyushenko 10
Phil Davis Lyoto Machida1 Ricardo Arona
2 Jason Brilz
3 Ryan Bader
4 Stephan Bonnar5 Randy Couture6 Trevor Prangley
7 Tito Ortiz
8 Mark Coleman
9 Ovince St. Preux10 Chael Sonnen
Find 10 most similar
fighters to each
Similar by? Fightmetric statsSVD SIGNS
UFC 163Rashad Evans 1
Ryan Bader 2Alexander Gustafson 3
Antonio Rogerio Nogueira 4Quinton “Rampage” Jackson
5Chael Sonnen 6
Matt Hamill 7James Te-Huna 8
Dan Henderson 9Vladimir Matyushenko 10
Phil Davis Lyoto Machida1 Ricardo Arona
2 Jason Brilz
3 Ryan Bader
4 Stephan Bonnar5 Randy Couture6 Trevor Prangley
7 Tito Ortiz
8 Mark Coleman
9 Ovince St. Preux10 Chael Sonnen
6
UFC 163Rashad Evans 1
Ryan Bader 2Alexander Gustafson 3
Antonio Rogerio Nogueira 4Quinton “Rampage” Jackson
5Chael Sonnen 6
Matt Hamill 7James Te-Huna 8
Dan Henderson 9Vladimir Matyushenko 10
Phil Davis Lyoto Machida1 Ricardo Arona
2 Jason Brilz
3 Ryan Bader
4 Stephan Bonnar5 Randy Couture6 Trevor Prangley
7 Tito Ortiz
8 Mark Coleman
9 Ovince St. Preux10 Chael Sonnen
12
6
Question: is the goal to predict the winner or generate buzz?
24
Problem:
Project 3: Ranking Cell Phone CarriersRather than individual games between carriers, we have a
distribution of game scores for each carrier. How do we use this
data to rank carriers?
25
Problem:
Solution: SIMULATE HEAD-TO-HEAD GAMES BY RANDOM DRAWS FROM DATA, then rank aggregate by BORDA COUNT (#carriers each carrier outranks).
Project 3: Ranking Cell Phone CarriersRather than individual games between carriers, we have a
distribution of game scores for each carrier. How do we use this
data to rank carriers?
26
Project 3: Ranking Cell Phone CarriersRather than individual games between carriers, we have a
distribution of game scores for each carrier. How do we use this
data to rank carriers?
Problem:
Solution: SIMULATE HEAD-TO-HEAD GAMES BY RANDOM DRAWS FROM DATA, then rank aggregate by BORDA COUNT (#carriers each carrier outranks).
New Problem: data is loaded with ties!
27
28
Project 3: Ranking Cell Phone CarriersMARKOV CHAIN
Question: what makes a model good?Stability in the face of small data changesExplainability to public
29
Problem:
Project 4: Humility Project
We’re trying to analyze a person’s writing to predict
his/her humility, but we lost our data guy. Can you help us?
30
Problem:
Solution: NON-NEGATIVE MATRIX FACTORIZATION (NMF) to find hidden clusters in text.
Project 4: Humility Project
We’re trying to analyze a person’s writing to predict
his/her humility, but we lost our data guy. Can you help us?
31
Project 4: Humility Project
32
Project 4: Humility Project
33
Project 4: Humility Project
34
Project 4: Humility Project
35
Project 4: Humility Project
36
Project 4: Humility Project
37
ConclusionsWe need you. You open our eyes to problems we never
would have thought about.
Iterative Collaboration
Many GREAT ALGORITHMS exist. Some just need tweaking.
38
ConclusionsWe need you. You open our eyes to problems we never would
have thought about.
Iterative Collaboration
Many GREAT ALGORITHMS exist. Some just need tweaking.
Future Work. . . (you tell me)