![Page 1: David Karger Sewoong Oh Devavrat Shah MIT + UIUC](https://reader030.vdocuments.us/reader030/viewer/2022020111/56649c755503460f94928404/html5/thumbnails/1.jpg)
David Karger Sewoong Oh Devavrat Shah
MIT + UIUC
Efficient crowd-sourcing
![Page 2: David Karger Sewoong Oh Devavrat Shah MIT + UIUC](https://reader030.vdocuments.us/reader030/viewer/2022020111/56649c755503460f94928404/html5/thumbnails/2.jpg)
A classical example
oA patient is asked: rate your pain on scale 1-10o Medical student gets answer : 5o Intern gets answer : 8o Fellow gets answer : 4.5o Doctor gets answer : 6
oSo what is the “right” amount of pain?
oCrowd-sourcingo Pain of patient = tasko Answer of patient = completion of task by a worker
![Page 3: David Karger Sewoong Oh Devavrat Shah MIT + UIUC](https://reader030.vdocuments.us/reader030/viewer/2022020111/56649c755503460f94928404/html5/thumbnails/3.jpg)
Contemporary example
![Page 4: David Karger Sewoong Oh Devavrat Shah MIT + UIUC](https://reader030.vdocuments.us/reader030/viewer/2022020111/56649c755503460f94928404/html5/thumbnails/4.jpg)
Contemporary example
![Page 5: David Karger Sewoong Oh Devavrat Shah MIT + UIUC](https://reader030.vdocuments.us/reader030/viewer/2022020111/56649c755503460f94928404/html5/thumbnails/5.jpg)
Contemporary example
![Page 6: David Karger Sewoong Oh Devavrat Shah MIT + UIUC](https://reader030.vdocuments.us/reader030/viewer/2022020111/56649c755503460f94928404/html5/thumbnails/6.jpg)
oGoal: reliable estimate the tasks with min’l cost oKey operational questions:
o Task assignmento Inferring the “answers”
Contemporary example
![Page 7: David Karger Sewoong Oh Devavrat Shah MIT + UIUC](https://reader030.vdocuments.us/reader030/viewer/2022020111/56649c755503460f94928404/html5/thumbnails/7.jpg)
Model a la Dawid and Skene ‘79oN tasks
o Denote by t1, t2, …, tN – “true” value in {1,..,K}
oM workers o Denote by w1, w2, …, wM – “confusion” matrix
o Worker j: confusion matrix Pj=[Pjkl]
o Worker j’s answer: is l for task with value k with prob. Pj
kl
oBinary symmetric case o K = 2: tasks takes value +1 or -1
o Correct answer w.p. pj
![Page 8: David Karger Sewoong Oh Devavrat Shah MIT + UIUC](https://reader030.vdocuments.us/reader030/viewer/2022020111/56649c755503460f94928404/html5/thumbnails/8.jpg)
Model a la Dawid and Skene ‘79
t1 tNt2 tN-1
w1
w2
wM-1 wM
A11 AN-1 1
AN2
A2M
oBinary tasks:oWorker reliability:
oNecessary assumption: we know
![Page 9: David Karger Sewoong Oh Devavrat Shah MIT + UIUC](https://reader030.vdocuments.us/reader030/viewer/2022020111/56649c755503460f94928404/html5/thumbnails/9.jpg)
Question
oGoal: given N taskso To obtain answer correctly w.p. at least 1-εo What is the minimal number of questions (edges)
needed?o How to assign them, and how to infer tasks values?
t1 tNt2 tN-1
w1
w2
wM-1 wM
A11 AN-1 1
AN2
A2M
![Page 10: David Karger Sewoong Oh Devavrat Shah MIT + UIUC](https://reader030.vdocuments.us/reader030/viewer/2022020111/56649c755503460f94928404/html5/thumbnails/10.jpg)
oTask assignment grapho Random regular grapho Or, regular graph w large girth
Task assignment
t1 tNt2 tN-1
w1
w2
wM-1 wM
A11 AN-1 1
AN2
A2M
![Page 11: David Karger Sewoong Oh Devavrat Shah MIT + UIUC](https://reader030.vdocuments.us/reader030/viewer/2022020111/56649c755503460f94928404/html5/thumbnails/11.jpg)
oMajority:
oOracle:
Inferring answers
t1 tNt2 tN-1
w1
w2
wM-1 wM
A11 AN-1 1
AN2
A2M
![Page 12: David Karger Sewoong Oh Devavrat Shah MIT + UIUC](https://reader030.vdocuments.us/reader030/viewer/2022020111/56649c755503460f94928404/html5/thumbnails/12.jpg)
oMajority:
oOracle:
oOur Approach:
Inferring answers
t1 tNt2 tN-1
w1
w2
wM-1 wM
A11 AN-1 1
AN2
A2M
![Page 13: David Karger Sewoong Oh Devavrat Shah MIT + UIUC](https://reader030.vdocuments.us/reader030/viewer/2022020111/56649c755503460f94928404/html5/thumbnails/13.jpg)
o Iteratively learn
o Message-passingo O(# edges) operations
o Approximation ofo Maximum Likelihood
Inferring answers
t1 tNt2 tN-1
w1
w2
wM-1 wM
A11 AN-1 1
AN2
A2M
![Page 14: David Karger Sewoong Oh Devavrat Shah MIT + UIUC](https://reader030.vdocuments.us/reader030/viewer/2022020111/56649c755503460f94928404/html5/thumbnails/14.jpg)
Inferring answers
t1 tNt2 tN-1
w1
w2
wM-1 wM
A11 AN-1 1
AN2
A2M
o Theorem (Karger-Oh-Shah). o Let n tasks assigned to n workers as per
o an (l,l) random regular graph
o Let ql > √2 o Then, for all n large enough (i.e. n =Ω(lO(log(1/q)) elq)))
after O(log (1/q)) iterations of the algorithm
Crowd Quality
![Page 15: David Karger Sewoong Oh Devavrat Shah MIT + UIUC](https://reader030.vdocuments.us/reader030/viewer/2022020111/56649c755503460f94928404/html5/thumbnails/15.jpg)
How good?
oTo achieve target Perror ≤ε, we need o Per task budget l = Θ(1/q log (1/ε))
oAnd this is minimax optimal
oUnder majority voting (with any graph choice)o Per task budget required is l = Ω(1/q2 log (1/ε))
no significant gain by knowing side-information(golden question, reputation, …!)
![Page 16: David Karger Sewoong Oh Devavrat Shah MIT + UIUC](https://reader030.vdocuments.us/reader030/viewer/2022020111/56649c755503460f94928404/html5/thumbnails/16.jpg)
Adaptive solution
Theorem (Karger-Oh-Shah). Given any adaptive algorithm,
let Δ be the average number of workers required per task
to achieve desired Perror ≤ε
Then there exists {pj} with quality q so that
gain through adaptivity is limited
![Page 17: David Karger Sewoong Oh Devavrat Shah MIT + UIUC](https://reader030.vdocuments.us/reader030/viewer/2022020111/56649c755503460f94928404/html5/thumbnails/17.jpg)
Model from Dawid-Skene ’79
Theorem (Karger-Oh-Shah).
To achieve reliability 1-ε, per task redundancy scales as
K/q (log 1/ε + log K)
Through reducing K-ary problem to K-binary problems
(and dealing with few asymmetries)
![Page 18: David Karger Sewoong Oh Devavrat Shah MIT + UIUC](https://reader030.vdocuments.us/reader030/viewer/2022020111/56649c755503460f94928404/html5/thumbnails/18.jpg)
Experiments: Amazon MTurk
oLearning similaritieso Recommendationso Searching, …
![Page 19: David Karger Sewoong Oh Devavrat Shah MIT + UIUC](https://reader030.vdocuments.us/reader030/viewer/2022020111/56649c755503460f94928404/html5/thumbnails/19.jpg)
oLearning similaritieso Recommendationso Searching, …
Experiments: Amazon MTurk
![Page 20: David Karger Sewoong Oh Devavrat Shah MIT + UIUC](https://reader030.vdocuments.us/reader030/viewer/2022020111/56649c755503460f94928404/html5/thumbnails/20.jpg)
Experiments: Amazon MTurk
![Page 21: David Karger Sewoong Oh Devavrat Shah MIT + UIUC](https://reader030.vdocuments.us/reader030/viewer/2022020111/56649c755503460f94928404/html5/thumbnails/21.jpg)
Task Assignment: Why Random Graph
![Page 22: David Karger Sewoong Oh Devavrat Shah MIT + UIUC](https://reader030.vdocuments.us/reader030/viewer/2022020111/56649c755503460f94928404/html5/thumbnails/22.jpg)
Remarks
oCrow-sourcingo Regular graph + message passingo Useful for designing surveys/taking polls
oAlgorithmicallyo Iterative algorithm is like power-iteration
oBeyond stand-alone taskso Learning global structure, e.g. ranking