walmart big data expo
TRANSCRIPT
About Me• Former Member of the Search team at @WalmartLabs
• Former Head of Metrics & Measurements team• I also led the Human Evaluation team
• About the Metrics and Measurements team• A team of engineers, analysts and scientists in charge of
providing accurate and exhaustive measurements• we also had an auditing role towards adjacent teams
• What do we measure?• Engineering metrics related to model and data quality• Business metrics (revenue, etc.)• More exotic customer-‐centric metrics (customer value, customer satisfaction, model impact, etc.)
• Currently Head of Data Science at Atlassian• In charge of the Search & Smarts team
About Me• Former Member of the Search team at @WalmartLabs
• Former Head of Metrics & Measurements team• I also led the Human Evaluation team
• About the Metrics and Measurements team• A team of engineers, analysts and scientists in charge of
providing accurate and exhaustive measurements• we also had an auditing role towards adjacent teams
• What do we measure?• Engineering metrics related to model and data quality• Business metrics (revenue, etc.)• More exotic customer-‐centric metrics (customer value, customer satisfaction, model impact, etc.)
• Currently Head of Data Science at Atlassian• In charge of the Search & Smarts team
About Me• Former Member of the Search team at @WalmartLabs
• Former Head of Metrics & Measurements team• I also led the Human Evaluation team
• About the Metrics and Measurements team• A team of engineers, analysts and scientists in charge of
providing accurate and exhaustive measurements• we also had an auditing role towards adjacent teams
• What do we measure?• Engineering metrics related to model and data quality• Business metrics (revenue, etc.)• More exotic customer-‐centric metrics (customer value, customer satisfaction, model impact, etc.)
• Currently Head of Data Science at Atlassian• In charge of the Search & Smarts team
About Me• Former Member of the Search team at @WalmartLabs
• Former Head of Metrics & Measurements team• I also led the Human Evaluation team
• About the Metrics and Measurements team• A team of engineers, analysts and scientists in charge of
providing accurate and exhaustive measurements• we also had an auditing role towards adjacent teams
• What do we measure?• Engineering metrics related to model and data quality• Business metrics (revenue, etc.)• More exotic customer-‐centric metrics (customer value, customer satisfaction, model impact, etc.)
• Currently Head of Data Science at Atlassian• In charge of the Search & Smarts team
q Humans & Big Data• The role of human beings in the era of Big Data• Why do we need to tag data?• How to get tagged data?
q The Era of Crowdsourcing• What is Crowdsourcing?• Use cases and details about Crowdsourcing• Traditional crowds vs. curated crowds
q The Human-‐in-‐the-‐Loop Paradigm• Definition and details about Human-‐In-‐The-‐Loop ML• Introduction to Active Learning
Outline
q Humans & Big Data• The role of human beings in the era of Big Data• Why do we need to tag data?• How to get tagged data?
q The Era of Crowdsourcing• What is Crowdsourcing?• Use cases and details about Crowdsourcing• Traditional crowds vs. curated crowds
q The Human-‐in-‐the-‐Loop Paradigm• Definition and details about Human-‐In-‐The-‐Loop ML• Introduction to Active Learning
Outline
q Humans & Big Data• The role of human beings in the era of Big Data• Why do we need to tag data?• How to get tagged data?
q The Era of Crowdsourcing• What is Crowdsourcing?• Use cases and details about Crowdsourcing• Traditional crowds vs. curated crowds
q The Human-‐in-‐the-‐Loop Paradigm• Definition and details about Human-‐In-‐The-‐Loop ML• Introduction to Active Learning
Outline
The Era of Very Big Data
q VOLUME• More data created from 2013 to 2015 than in the entire previous history of the human race
• By 2020, accumulated data will reach 44 trillion gigabytes
q VELOCITY• By 2020, ~1.7 MB of new data / second / human being
• 1.2 trillion search queries on Google per year
q VARIETY• 31 million messages/2.8 million videos per minute on Facebook
• Up to 300 hours of video / minute are uploaded to YouTube
• In 2015, 1 trillion photos taken; billions shared online
data center at Google
The Era of Very Big Data
q VOLUME• More data created from 2013 to 2015 than in the entire previous history of the human race
• By 2020, accumulated data will reach 44 trillion gigabytes
q VELOCITY• By 2020, ~1.7 MB of new data / second / human being
• 1.2 trillion search queries on Google per year
q VARIETY• 31 million messages/2.8 million videos per minute on Facebook
• Up to 300 hours of video / minute are uploaded to YouTube
• In 2015, 1 trillion photos taken; billions shared online
data center at Google
The Era of Very Big Data
q VOLUME• More data created from 2013 to 2015 than in the entire previous history of the human race
• By 2020, accumulated data will reach 44 trillion gigabytes
q VELOCITY• By 2020, ~1.7 MB of new data / second / human being
• 1.2 trillion search queries on Google per year
q VARIETY• 31 million messages/2.8 million videos per minute on Facebook
• Up to 300 hours of video / minute are uploaded to YouTube
• In 2015, 1 trillion photos taken; billions shared online
data center at Google
The Era of Very Big Data
q VOLUME• More data created from 2013 to 2015 than in the entire previous history of the human race
• By 2020, accumulated data will reach 44 trillion gigabytes
q VELOCITY• By 2020, ~1.7 MB of new data / second / human being
• 1.2 trillion search queries on Google per year
q VARIETY• 31 million messages/2.8 million videos per minute on Facebook
• Up to 300 hours of video / minute are uploaded to YouTube
• In 2015, 1 trillion photos taken; billions shared online
data center at Google
Supervised vs. Unsupervised Machine LearningSupervised ML
requires tagged data
• Classification: problem where the output variable is a categoryexamples: SVM, random forest, Bayesian classifiers
• Regression: problem where the output variable is a real valueexamples: linear regression, random forest
Supervised vs. Unsupervised Machine LearningSupervised ML
requires tagged data
Unsupervised MLdoesn’t require tagged data
• Classification: problem where the output variable is a categoryexamples: SVM, random forest, Bayesian classifiers
• Regression: problem where the output variable is a real valueexamples: linear regression, random forest
• Clustering:discovery of inherent groupings in the dataexamples: k-‐means, k-‐nearest neighbors
• Association rules:discovery of rules describing the dataexample: Apriori algorithm
Supervised vs. Unsupervised Machine LearningSupervised ML
requires tagged data
Unsupervised MLdoesn’t require tagged data
Supervised:• Image Recognition• Speech Recognition
Unsupervised• Feature Learning• Autoencoders
• Classification: problem where the output variable is a categoryexamples: SVM, random forest, Bayesian classifiers
• Regression: problem where the output variable is a real valueexamples: linear regression, random forest
• Clustering:discovery of inherent groupings in the dataexamples: k-‐means, k-‐nearest neighbors
• Association rules:discovery of rules describing the dataexample: Apriori algorithm
The Case of Deep Learningboth supervised and unsupervised applications
NB: Deep Learning algorithms are data-‐greedy…
• Gathering quality tagged training data is a common bottleneck in ML• Expensive• Quality control is hard, requires second human pass• Hardly scalable à heavy use of sampling strategies
• How do companies doing Machine Learning get tagged data?• Implicit tagging: customer engagement• Explicit tagging: manual labor
• A few strategies to get tagged data for cheap/free:• Games (Google Quick Draw)• Incentivization (extra lives or bonuses in games)
Tagged Data
• Gathering quality tagged training data is a common bottleneck in ML• Expensive• Quality control is hard, requires second human pass• Hardly scalable à heavy use of sampling strategies
• How do companies doing Machine Learning get tagged data?• Implicit tagging: customer engagement• Explicit tagging: manual labor
• A few strategies to get tagged data for cheap/free:• Games (Google Quick Draw)• Incentivization (extra lives or bonuses in games)
Tagged Data
• Gathering quality tagged training data is a common bottleneck in ML• Expensive• Quality control is hard, requires second human pass• Hardly scalable à heavy use of sampling strategies
• How do companies doing Machine Learning get tagged data?• Implicit tagging: customer engagement• Explicit tagging: manual labor
• A few strategies to get tagged data for cheap/free:• Games (Google Quick Draw)• Incentivization (extra lives or bonuses in games)
Tagged Data
https://quickdraw.withgoogle.com/
Why human input matters: the use case of image colorization
The Wisdom from the Crowd
ColorizationModel
à Colorization is straightforward to humans because they can ‘tap’ into their general knowledge
The Wisdom from the Crowd
image recognition
watermelon
grapesbananas
pineappleorange
tagged training data set
“Bananas are generally ”
‘general’ knowledge• obvious for human beings• fastidious for machines
colorization
Why human input matters: the use case of image colorization
What is Crowdsourcing?
the process of getting labor or funding, usually online, from a crowd of peopleCrowdsourc ing
What is Crowdsourcing?
Ø Crowdsourcing = 'crowd' + 'outsourcing' Ø Act of taking a function once performed by employees and outsourcing it to an undefined (generally large) network of people in the form of an open call
the process of getting labor or funding, usually online, from a crowd of people
History of Crowdsourcing• Term was first used in 2005 by the editors atWired• Official definition published in Wired article “The Rise of Crowdsourcing”, June 2016• Describes how businesses were using the Internet to “outsource work to the crowd”
What Crowdsourcing helps with:• Scale à peer-‐production (for jobs to be performed collaboratively) • Reach à connect with a large network of potential laborers (if task undertaken by sole individuals)
Crowdsourc ing
What is Crowdsourcing?
Ø Crowdsourcing = 'crowd' + 'outsourcing' Ø Act of taking a function once performed by employees and outsourcing it to an undefined (generally large) network of people in the form of an open call
the process of getting labor or funding, usually online, from a crowd of people
History of Crowdsourcing• Term was first used in 2005 by the editors atWired• Official definition published in Wired article “The Rise of Crowdsourcing”, June 2006• Describes how businesses were using the Internet to “outsource work to the crowd”
What Crowdsourcing helps with:• Scale à peer-‐production (for jobs to be performed collaboratively) • Reach à connect with a large network of potential laborers (if task undertaken by sole individuals)
Crowdsourc ing
What is Crowdsourcing?
Ø Crowdsourcing = 'crowd' + 'outsourcing' Ø Act of taking a function once performed by employees and outsourcing it to an undefined (generally large) network of people in the form of an open call
the process of getting labor or funding, usually online, from a crowd of peopleCrowdsourc ing
History of Crowdsourcing• Term was first used in 2005 by the editors atWired• Official definition published in Wired article “The Rise of Crowdsourcing”, June 2016• Describes how businesses were using the Internet to “outsource work to the crowd”
What Crowdsourcing helps with:• Scale à peer-‐production (for jobs to be performed collaboratively) • Reach à connect with a large network of potential laborers (if task undertaken by sole individuals)
The Nature of Crowdsourcing
• Data generation: user generated content such as reviews, pictures, translations, etc.• Data validation: validation of translation, etc.• Data tagging: image tagging, product categorization, etc.
• Data curation: curation of news feeds, etc.
Microtasks
Funding
Macrotasks• Solution development: algorithm improvement, etc.
• Crowd contest: design competition, algorithmic competition, etc.
The Nature of Crowdsourcing
• Data generation: user generated content such as reviews, pictures, translations, etc.• Data validation: validation of translation, etc.• Data tagging: image tagging, product categorization, etc.
• Data curation: curation of news feeds, etc.
Microtasks
Funding
Macrotasks• Solution development: algorithm improvement, etc.
• Crowd contest: design competition, algorithmic competition, etc.
The Nature of Crowdsourcing
• Data generation: user generated content such as reviews, pictures, translations, etc.• Data validation: validation of translation, etc.• Data tagging: image tagging, product categorization, etc.
• Data curation: curation of news feeds, etc.
Microtasks
Funding
Macrotasks• Solution development: algorithm improvement, etc.
• Crowd contest: design competition, algorithmic competition, etc.
Some Cool Crowdsourcing Applications
Mapping• Photo Sphere• Google Maps crowdsources info for
wheelchair-‐accessible places
Some Cool Crowdsourcing Applications
Mapping• Photo Sphere• Google Maps crowdsources info for
wheelchair-‐accessible places
Traffic• Google Traffic• Waze: Traffic reporting app
Some Cool Crowdsourcing Applications
Mapping• Photo Sphere• Google Maps crowdsources info for
wheelchair-‐accessible places
Traffic• Google Traffic• Waze: Traffic reporting app
Translation • Google Translate
Some Cool Crowdsourcing Applications
Mapping• Photo Sphere• Google Maps crowdsources info for
wheelchair-‐accessible places
Traffic• Google Traffic• Waze: Traffic reporting app
Epidemiology• Flu tracking applications
Translation • Google Translate
Companies Based on Crowdsourcing
Quora is a question-‐and-‐answer site where questions are asked, answered, edited and organized by its community of users.
Waze is a community-‐based traffic and navigation app where drivers share real-‐time traffic and road info
Kaggle is a platform for predictive modelling competitions in which companies post data and data miners compete to produce the best models.
Stack Overflow is a platform for users to ask and answer questions and to vote questions and answers up or down and edit them.
Flickr is an image and video hosting website that is widely used by bloggers to host images that they embed in social media.
Reliability • Retail: Absence of emotional involvement (judges are not actually spending money on items)• Waze: Locals were sending fake information to limit traffic in their area
Relevance of knowledge• Retail: Judges might not have appropriate knowledge of the items they are evaluating
Subjectivity• Search: Relevance score varies depending on profile and personal preferences
Speed & cost• Human evaluations take time, can only be performed sporadically and on samples• Not practical for measurement purposes
The Challenges of Crowdsourcing
Reliability • Retail: Absence of emotional involvement (judges are not actually spending money on items)• Waze: Locals were sending fake information to limit traffic in their area
Relevance of knowledge• Retail: Judges might not have appropriate knowledge of the items they are evaluating
Subjectivity• Search: Relevance score varies depending on profile and personal preferences
Speed & cost• Human evaluations take time, can only be performed sporadically and on samples• Not practical for measurement purposes
The Challenges of Crowdsourcing
Reliability • Retail: Absence of emotional involvement (judges are not actually spending money on items)• Waze: Locals were sending fake information to limit traffic in their area
Relevance of knowledge• Retail: Judges might not have appropriate knowledge of the items they are evaluating
Subjectivity• Search: Relevance score varies depending on profile and personal preferences
Speed & cost• Human evaluations take time, can only be performed sporadically and on samples• Not practical for measurement purposes
The Challenges of Crowdsourcing
Reliability • Retail: Absence of emotional involvement (judges are not actually spending money on items)• Waze: Locals were sending fake information to limit traffic in their area
Relevance of knowledge• Retail: Judges might not have appropriate knowledge of the items they are evaluating
Subjectivity• Search: Relevance score varies depending on profile and personal preferences
Speed & cost• Human evaluations take time, can only be performed sporadically and on samples• Not practical for measurement purposes
The Challenges of Crowdsourcing
Crowdsourcing vs. Curated CrowdsTraditional Crowdsourcing Model
$$$$$
+ Speed: • many hands generate light work
+ Lower cost:• typically a few pennies per task
-‐ No quality control-‐ Lack of control:
• little to no incentive to deliver on time-‐ High maintenance:
• clear instructions needed • automated understanding checks
-‐ Lower reliability: • high overlap required
-‐ Lack of confidentiality: • anyone can see your tasks
Curated Crowd$$$$$
+ Quality control: • judges submitted to quality metrics • removed if they don’t deliver required quality
+ Better quality: • very little overlap needed
+ Expertise:• judges become experts at required task
+ Constraints on crowd: • judges less likely to drop out
-‐ More expensive:• typically primary source of income for judges
-‐ Consistency required: • need frequent tasks to keep sharp skills
Catalog Curation• Product Description Curation• Product Tagging & Categorization• Product Deduplication• Taxonomy Testing
Search Relevance Evaluation• Relevance score (query-‐item pair scores)• Engine comparison (ranking-‐to-‐ranking)
Review Moderation• Removal/flagging of obscene reviews
Mystery Shopping• Analysis and discovery of new trends • Evaluation of new products• Competitive analysis
Crowdsourcing Applications in e-‐Commerce
Catalog Curation• Product Description Curation• Product Tagging & Categorization• Product Deduplication• Taxonomy Testing
Search Relevance Evaluation• Relevance score (query-‐item pair scores)• Engine comparison (ranking-‐to-‐ranking)
Review Moderation• Removal/flagging of obscene reviews
Mystery Shopping• Analysis and discovery of new trends • Evaluation of new products• Competitive analysis
Crowdsourcing Applications in e-‐CommerceThe exam
ple of Product Tagging
Catalog Curation• Product Description Curation• Product Tagging & Categorization• Product Deduplication• Taxonomy Testing
Search Relevance Evaluation• Relevance score (query-‐item pair scores)• Engine comparison (ranking-‐to-‐ranking)
Review Moderation• Removal/flagging of obscene reviews
Mystery Shopping• Analysis and discovery of new trends • Evaluation of new products• Competitive analysis
Crowdsourcing Applications in e-‐CommerceThe exam
ple of Product Tagging
Catalog Curation• Product Description Curation• Product Tagging & Categorization• Product Deduplication• Taxonomy Testing
Search Relevance Evaluation• Relevance score (query-‐item pair scores)• Engine comparison (ranking-‐to-‐ranking)
Review Moderation• Removal/flagging of obscene reviews
Mystery Shopping• Analysis and discovery of new trends • Evaluation of new products• Competitive analysis
Crowdsourcing Applications in e-‐CommerceThe exam
ple of Product Tagging
Catalog Curation• Product Description Curation• Product Tagging & Categorization• Product Deduplication• Taxonomy Testing
Search Relevance Evaluation• Relevance score (query-‐item pair scores)• Engine comparison (ranking-‐to-‐ranking)
Review Moderation• Removal/flagging of obscene reviews
Mystery Shopping• Analysis and discovery of new trends • Evaluation of new products• Competitive analysis
Crowdsourcing Applications in e-‐CommerceThe exam
ple of Product Tagging
Use Case: Evaluation of Search Engine Relevance
à Human evaluation makes it possible to measure the intangible with little risk
Rank
ing B
Rank
ing A
Side-‐by-‐Side Engine Comparison
Judge 1:Prefers ranking A
Judge 2:Prefers ranking A
Judge 3:Prefers ranking B
Use Case: Evaluation of Search Engine Relevance
5/5
5/5
5/5
4/5
3/5
2/5
5/5
5/5
5/5
5/5
5/5
5/5
Query-‐Item Relevance Scoring for Measurement of Ranking Quality
𝐷𝐶𝐺$ =&𝑟𝑒𝑙*
𝑙𝑜𝑔-(𝑖 + 1)
$
*34
𝑛𝐷𝐶𝐺$ =𝐷𝐶𝐺$𝐼𝐷𝐶𝐺$
𝐼𝐷𝐶𝐺$ = &289:; − 1𝑙𝑜𝑔-(𝑖 + 1)
=>?
*34
where
graded relevance of item at position i
Discounted cumulative gain
Human-‐in-‐the-‐Loop:When Human Beings still Outperform the Machine
Fact: the brain has 38 petaflops (thousand trillion operations per second) of processing power…
The Dream of Automation
FIRST REVOLUTION – 1784Mechanical production, railroad, steam power
SECOND REVOLUTION – 1870Mass production, electrical power,
assembly lines
THIRD REVOLUTION – 1969Automated production, electronics,computers
FOURTH REVOLUTION – ongoingArtificial intelligence, big data
The 4 Industrial Revolutions
The Dream of Automation
FIRST REVOLUTION – 1784Mechanical production, railroad, steam power
SECOND REVOLUTION – 1870Mass production, electrical power,
assembly lines
THIRD REVOLUTION – 1969Automated production, electronics,computers
FOURTH REVOLUTION – ongoingArtificial intelligence, big data
à Automation is not a new idea
The 4 Industrial Revolutions
The Dream of Automation
FIRST REVOLUTION – 1784Mechanical production, railroad, steam power
SECOND REVOLUTION – 1870Mass production, electrical power,
assembly lines
THIRD REVOLUTION – 1969Automated production, electronics,computers
FOURTH REVOLUTION – ongoingArtificial intelligence, big data
à Automation is not a new idea
The 4 Industrial Revolutionsthe use of various control systems for operating equipment such as machinery and processes with minimal or reduced human intervention.
Automat ion
The Dream of Automation
the use of various control systems for operating equipment such as machinery and processes with minimal or reduced human intervention.
FIRST REVOLUTION – 1784Mechanical production, railroad, steam power
SECOND REVOLUTION – 1870Mass production, electrical power,
assembly lines
THIRD REVOLUTION – 1969Automated production, electronics,computers
FOURTH REVOLUTION – ongoingArtificial intelligence, big data
Why?• Automate boring/repetitive tasks• Perform tasks at scale• Perform tasks with enhanced precision• Deliver consistent products• Use machines where they outperform humans
à Automation is not a new idea
The 4 Industrial Revolutions Automat ion
When Full Automation can’t be Achieved…Human -‐ in -‐ the -‐Loop
Human-in-the-loop or HITL is defined as a model or a system that requires human interaction
The idea of using human beings to enhance the machine is not newWe have been doing Human-‐in-‐the-‐Loop all along…• Example: Autopilot technology for planes
Human intervention/presence is useful:• To handle corner cases (outlier management)• To “keep an eye” on the system (sanity check)• To correct unwanted behavior (refinement)• To validate appropriate behavior (validation)
When Full Automation can’t be Achieved…
Human-in-the-loop or HITL is defined as a model or a system that requires human interactionHuman -‐ in -‐ the -‐Loop
The idea of using human beings to enhance the machine is not newWe have been doing Human-‐in-‐the-‐Loop all along…• Example: Autopilot technology for planes
Human intervention/presence is useful:• To handle corner cases (outlier management)• To “keep an eye” on the system (sanity check)• To correct unwanted behavior (refinement)• To validate appropriate behavior (validation)
When Full Automation can’t be Achieved…
Human-in-the-loop or HITL is defined as a model or a system that requires human interactionHuman -‐ in -‐ the -‐Loop
Human-‐in-‐the-‐Loop ParadigmPare to P r inc ip le
aka the 80/20 rule, the law of the vital few, or the principle of factor sparsity-‐ states that, for many events, roughly 80% of the effects come from 20% of the causes
ML version of the Pareto Principle: • Evidence suggests that some of the most accurate ML systems to date need:
• 80% computer AI-‐driven • 19% human input• 1 % unknown randomness
to balance things out• The combination of machine and human intervention achieves maximum machine accuracy
How can human knowledge be incorporated to ML models?A. Helping label the original dataset that will be fed into a ML modelB. Helping correct inaccurate predictions that arise as the system goes live.
Human-‐in-‐the-‐Loop Paradigm
aka the 80/20 rule, the law of the vital few, or the principle of factor sparsity-‐ states that, for many events, roughly 80% of the effects come from 20% of the causes
Pare to P r inc ip le
ML version of the Pareto Principle: • Evidence suggests that some of the most accurate ML systems to date need:
• 80% computer AI-‐driven • 19% human input• 1 % unknown randomness
to balance things out• The combination of machine and human intervention achieves maximum machine accuracy
How can human knowledge be incorporated to ML models?A. Helping label the original dataset that will be fed into a ML modelB. Helping correct inaccurate predictions that arise as the system goes live
Human-‐in-‐the-‐Loop Paradigm
aka the 80/20 rule, the law of the vital few, or the principle of factor sparsity-‐ states that, for many events, roughly 80% of the effects come from 20% of the causes
Pare to P r inc ip le
Human-‐In-‐The-‐Loop Use Case #1
Mary
Roberto
Victoria
LauraSebastian
Cecelia
An example of HITL approach: face recognition
Human-‐In-‐The-‐Loop Use Case #1
Mary
Roberto
Victoria
LauraSebastian
Cecelia
Accuracy• Facebook's DeepFace Software reaches 97.25% of accuracy
HITL as a feedback loop• When the confidence is below a certain threshold, it:
• suggests a label• ask the uploader to validate/approve or correct the
suggestion
• The new data is used to improve the accuracy of the algorithm
An example of HITL approach: face recognition
Human-‐In-‐The-‐Loop Use Case #1
Mary
Roberto
Victoria
LauraSebastian
Cecelia
Accuracy• Facebook's DeepFace Software reaches 97.25% of accuracy
HITL as a feedback loop• When the confidence is below a certain threshold, it:
• suggests a label• ask the uploader to validate/approve or correct the
suggestion
• The new data is used to improve the accuracy of the algorithm
An example of HITL approach: face recognition
Teaching the machine• Driving systems were trained using a human to oversee the process
Accuracy considerations• Autopilot system is now over 99% accurate
• However, a 99% accuracy means that people can die 1% of the time (!!)
• Though we have seen huge advances in accuracy of pure machine-‐driven systems, they tend to fall short of acceptable accuracy rates
Human-‐In-‐The-‐Loop Use Case #2An example of HITL approach: autonomous vehicles
Teaching the machine• Driving systems were trained using a human to oversee the process
Accuracy considerations• Autopilot system is now over 99% accurate
• However, a 99% accuracy means that people can die 1% of the time (!!)
• Though we have seen huge advances in accuracy of pure machine-‐driven systems, they tend to fall short of acceptable accuracy rates
Human-‐In-‐The-‐Loop Use Case #2An example of HITL approach: autonomous vehicles
Teaching the machine• Driving systems were trained using a human to oversee the process
Accuracy considerations• Autopilot system is now over 99% accurate
• However, a 99% accuracy means that people can die 1% of the time (!!)
• Though we have seen huge advances in accuracy of pure machine-‐driven systems, they tend to fall short of acceptable accuracy rates
Human-‐In-‐The-‐Loop Use Case #2An example of HITL approach: autonomous vehicles
Corner cases• Fun fact: Volvo’s self-‐driving cars fail in Australia because of kangaroos• Reaching 100% is hard because of corner cases• A HITL approach helps get the accuracy to ~100%
• get the accuracy to ~100%
Volvo's driverless cars 'confused' by kangaroos
The Human vs. the Machine• In 1997, Chess Master Garry Kasparov is beaten by IBM supercomputer Deep Blue
The Success of Human-‐In-‐The-‐LoopThe Example of Chess
Garry Kasparov
The Human vs. the Machine• In 1997, Chess Master Garry Kasparov is beaten by IBM supercomputer Deep Blue
The Success of Human-‐In-‐The-‐LoopThe Example of Chess
Freestyle or “Advanced” Chess• Advanced: A human chess master works with a computer to find the best possible move
• Freestyle: A team can be made of any combination of human beings + computers
• In 2005, Steven Cramton, Zackary Stephen and their 3 computers win Freestyle Chess Tournament
Why it works• Computers are great at reading tough tactical situations
• But humans are better at understanding long term strategy
• Computers to limit “blunders” while using their intuition to force the opponent into board states that confuses the computer(s)
Garry Kasparov
The Human vs. the Machine• In 1997, Chess Master Garry Kasparov is beaten by IBM supercomputer Deep Blue
The Success of Human-‐In-‐The-‐LoopThe Example of Chess
Freestyle or “Advanced” Chess• Advanced: A human chess master works with a computer to find the best possible move
• Freestyle: A team can be made of any combination of human beings + computers
• In 2005, Steven Cramton, Zackary Stephen and their 3 computers win Freestyle Chess Tournament
Why it works• Computers are great at reading tough tactical situations
• But humans are better at understanding long term strategy
• Computers to limit “blunders” while using their intuition to force the opponent into board states that confuses the computer(s)
Garry Kasparov
Active Learning
a special case of semi-‐supervised ML in which a learning algorithm can interactively query theuser (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance
Act i ve Learn ing
Active Learning
a special case of semi-‐supervised ML in which a learning algorithm can interactively query theuser (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance
General StrategyIf D is the entire data set, a each iteration i , D is broken up into three subsets1. DK, i : data points where the label is known2. DU, i : data points where the label is unknown3. DQ, i : data points for which the label is queried (sometimes, even when the label is known)
Benefits• Query labels only when necessary (lower cost)
Next Generation Algorithms• Proactive learning:
• relaxes the assumption that the oracle is always right• casts the problem as an optimization problem w/ a budget constraint
Act i ve Learn ing
Active Learning
a special case of semi-‐supervised ML in which a learning algorithm can interactively query theuser (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance
General StrategyIf D is the entire data set, a each iteration i , D is broken up into three subsets1. DK, i : data points where the label is known2. DU, i : data points where the label is unknown3. DQ, i : data points for which the label is queried (sometimes, even when the label is known)
Benefits• Query labels only when necessary (lower cost)
Next Generation Algorithms• Proactive learning:
• relaxes the assumption that the oracle is always right• casts the problem as an optimization problem w/ a budget constraint
Act i ve Learn ing
Active Learning
a special case of semi-‐supervised ML in which a learning algorithm can interactively query theuser (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance
General StrategyIf D is the entire data set, a each iteration i , D is broken up into three subsets1. DK, i : data points where the label is known2. DU, i : data points where the label is unknown3. DQ, i : data points for which the label is queried (sometimes, even when the label is known)
Benefits• Query labels only when necessary (lower cost)
Next Generation Algorithms• Proactive learning:
• relaxes the assumption that the oracle is always right• casts the problem as an optimization problem w/ a budget constraint
Act i ve Learn ing
Active Learning: How does it Work?Machine Learning needs
• Logics (algorithm)• Data • Optimization• Feedback ß Human-‐in-‐the-‐Loop
Active Learning = a Machine Learning Algorithm using an “oracle” to reduce mistakes/uncertainty
Query Strategy -‐ Labels are queried when:• Data points for which model uncertainty is high
(uncertainty sampling)• Data points for which the different models of an
ensemble method disagree the most (query by committee)
• Data points causing the most changes on the model(expected model change)
• Data points caused overall variance to be high(variance reduction)
Active Learning: How does it Work?
Unlabeled Data
Active Learning Algorithm
select/remove single example
Labeled Data
Clas
sifie
r Oracle
(Hum
an)
update
add labeled example
provide correct label
Machine Learning needs • Logics (algorithm)• Data • Optimization• Feedback ß Human-‐in-‐the-‐Loop
Active Learning = a Machine Learning Algorithm using an “oracle” to reduce mistakes/uncertainty
Query Strategy -‐ Labels are queried when:• Data points for which model uncertainty is high
(uncertainty sampling)• Data points for which the different models of an
ensemble method disagree the most (query by committee)
• Data points causing the most changes on the model(expected model change)
• Data points caused overall variance to be high(variance reduction)
Active Learning: How does it Work?
Unlabeled Data
Active Learning Algorithm
select/remove single example
Labeled Data
Clas
sifie
r Oracle
(Hum
an)
update
add labeled example
provide correct label
Machine Learning needs • Logics (algorithm)• Data • Optimization• Feedback ß Human-‐in-‐the-‐Loop
Active Learning = a Machine Learning Algorithm using an “oracle” to reduce mistakes/uncertainty
Query Strategy -‐ Labels are queried when:• Data points for which model uncertainty is high
(uncertainty sampling)• Data points for which the different models of an
ensemble method disagree the most (query by committee)
• Data points causing the most changes on the model(expected model change)
• Data points caused overall variance to be high(variance reduction)
Active Learning: How does it Work?
Machine LearningClassifier
Confidence level high?
YES
NO
Output
Annotation by Human Oracle
Human-‐in-‐the-‐Loop
Active Learning
By adding a human feedback loop, we allow the system to: • actively learn• correct itself where it got it wrong• improve the algorithm over iterations
Active Learning: How does it Work?
Machine LearningClassifier
Confidence level high?
YES
NO
Output
Annotation by Human Oracle
Human-‐in-‐the-‐Loop
Active Learning
By adding a human feedback loop, we allow the system to: • actively learn• correct itself where it got it wrong• improve the algorithm over iterations
3 Use Cases using Active Learning in the context of Search/Retail
Active Learning at Walmart e-‐Commerce
qMachine Learning Lifecycle Management (Programming by Feedback)• Automatic monitoring of input and output values for ML algorithm• An algorithm detects failings and outliers in real-‐time and suggest an action• A human validates the action, creating tagged data for full automation
q Diagnosis of Catalog Data Issues (Reinforcement Learning)• Algorithm uncovers demoted items and suggests most likely reason for the demotion• Engineer manually confirms/corrects the suggestion, generating training data for full automation
q Refinement of Query Tagging Algorithm (Optimization)
• Human evaluation team manually measures accuracy of query tagging model• Mistagged queries are used to discover patterns specific to problematic queries, which are reported to engineers• Sample is enriched with problematic queries (evaluation team can diagnose problems with algorithms)
3 Use Cases using Active Learning in the context of Search/Retail
Active Learning at Walmart e-‐Commerce
qMachine Learning Lifecycle Management (Programming by Feedback)• Automatic monitoring of input and output values for ML algorithm• An algorithm detects failings and outliers in real-‐time and suggest an action• A human validates the action, creating tagged data for full automation
q Diagnosis of Catalog Data Issues (Reinforcement Learning)• Algorithm uncovers demoted items and suggests most likely reason for the demotion• Engineer manually confirms/corrects the suggestion, generating training data for full automation
q Refinement of Query Tagging Algorithm (Optimization)
• Human evaluation team manually measures accuracy of query tagging model• Mistagged queries are used to discover patterns specific to problematic queries, which are reported to engineers• Sample is enriched with problematic queries (evaluation team can diagnose problems with algorithms)
3 Use Cases using Active Learning in the context of Search/Retail
Active Learning at Walmart e-‐Commerce
qMachine Learning Lifecycle Management (Programming by Feedback)• Automatic monitoring of input and output values for ML algorithm• An algorithm detects failings and outliers in real-‐time and suggest an action• A human validates the action, creating tagged data for full automation
q Diagnosis of Catalog Data Issues (Reinforcement Learning)• Algorithm uncovers demoted items and suggests most likely reason for the demotion• Engineer manually confirms/corrects the suggestion, generating training data for full automation
q Refinement of Query Tagging Algorithm (Optimization)
• Human evaluation team manually measures accuracy of query tagging model• Mistagged queries are used to discover patterns specific to problematic queries, which are reported to engineers• Sample is enriched with problematic queries (evaluation team can diagnose problems with algorithms)
3 Use Cases using Active Learning in the context of Search/Retail
red t-shirt Size M
color product type size
Active Learning at Walmart e-‐Commerce
• Why do humans and machine complement each other?• Human beings are memory-‐constrained• Computers are knowledge-‐constrained
• Tagged data more important than ever• But getting quality data is challenging given the volume of data• Crowdsourcing offer more flexibility to tag data at scale
• Human-‐in-‐the-‐Loop paradigm• Improve accuracy of machine learning algorithm (classifiers)• Many examples of successful endeavors using “Augmented Intelligence”• Active Learning is a booming area of ML/AI
Conclusion and Takeaways
• Why do humans and machine complement each other?• Human beings are memory-‐constrained• Computers are knowledge-‐constrained
• Tagged data more important than ever• But getting quality data is challenging given the volume of data• Crowdsourcing offer more flexibility to tag data at scale
• Human-‐in-‐the-‐Loop paradigm• Improve accuracy of machine learning algorithm (classifiers)• Many examples of successful endeavors using “Augmented Intelligence”• Active Learning is a booming area of ML/AI
Conclusion and Takeaways
• Why do humans and machine complement each other?• Human beings are memory-‐constrained• Computers are knowledge-‐constrained
• Tagged data more important than ever• But getting quality data is challenging given the volume of data• Crowdsourcing offer more flexibility to tag data at scale
• Human-‐in-‐the-‐Loop paradigm• Improve accuracy of machine learning algorithm (classifiers)• Many examples of successful endeavors using “Augmented Intelligence”• Active Learning is a booming area of ML/AI
Conclusion and Takeaways