alfred demo -
DESCRIPTION
ALFRED: Crowd Assisted Data ExtractionTRANSCRIPT
ALFRED: Crowd Assisted Data Extraction
Valter Crescenzi, Paolo Merialdo, Disheng Qiu
Dipartimento di IngegneriaUniversità degli Studi Roma TreVia della Vasca Navale, 79, Rome
Extracting data
2M pages from IMDB, and we want to extract ... titles, directors etc ....
1/7
Extracting data
2M pages from IMDB, and we want to extract ... titles, directors etc ....
DB#Wrapper!
1/7
Extracting data
2M pages from IMDB, and we want to extract ... titles, directors etc ....
Inference algorithm!
DB#Wrapper!
1/7
Extracting data
2M pages from IMDB, and we want to extract ... titles, directors etc ....
Inference algorithm!
DB#Wrapper!
1/7
Extracting data
2M pages from IMDB, and we want to extract ... titles, directors etc ....
Inference algorithm!
DB#Wrapper!
1/7
Scaling Wrapper Inference
Scaling the number of workers with Crowdsourcing platforms opens new challenges:
Issues: Contributions:
2/7
Scaling Wrapper Inference
Scaling the number of workers with Crowdsourcing platforms opens new challenges:
Issues: Contributions:
Non-expert workers
• Simple interactions to reduce the worker error rate• Membership Query (yes/no answer)
2/7
Scaling Wrapper Inference
Scaling the number of workers with Crowdsourcing platforms opens new challenges:
Issues: Contributions:
Non-expert workers
• Simple interactions to reduce the worker error rate• Membership Query (yes/no answer)
• Active Learning to carefully select queries
Costs
2/7
Scaling Wrapper Inference
Scaling the number of workers with Crowdsourcing platforms opens new challenges:
Issues: Contributions:
Non-expert workers
• Simple interactions to reduce the worker error rate• Membership Query (yes/no answer)
• Active Learning to carefully select queries
Costs
2/7
Quality
• Bayesian Model to evaluate the expected wrapper quality• Sampling algorithms• Tolerant to inaccurate workers
Architecture
ALFRED is a wrapper inference system supervised by workers from a crowdsourcing platform.
*Research Track: A Framework for Learning Web Wrappers from the Crowd WWW 2013 3/7
Input and Rules Generation
4/7
Sample Set and Extracted Values
5/7
Sample Set and Extracted Values
page0 page1 page2
r1
r2
r3
Inception City of God Oblivion
Inception City of God null
Inception null Oblivion
6/7
Sample Set and Extracted Values
page0 page1 page2
r1
r2
r3
Inception City of God Oblivion
Inception City of God null
Inception null Oblivion
6/7
Probability and Noisy
7/7