1545 amazon maschinelleslernen-frav4 · machine"learning"at"amazon!...
TRANSCRIPT
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Barbara PogorzelskaTPM Machine LearningJune 30, 2016
Machine Learning at Amazon
Agenda
q Introduction to Amazon Machine LearningqMachine Learning at Amazonq Customer Case Studies
Introduction to Amazon Machine Learning
Machine Learning at Amazon
Machine Learning Opportunities @ Amazon
Retail•Demand Forecasting•Vendor Lead Time Prediction•Pricing•Packaging•Substitute Prediction
Customers•Product Recommendation•Product Search•Visual Search•Product Ads•Shopping Advice•Customer Problem Detection
Seller•Fraud Detection•Predictive Help•Seller Search & Crawling
Catalog•Browse-Node Classification•Meta-data validation•Review Analysis
Digital•Named-Entity Extraction•XRay•Plagiarism Detection•Echo Speech Recognition
Retail•Demand Forecasting•Vendor Lead Time Prediction•Pricing•Packaging•Substitute Prediction
Customers•Product Recommendation•Product Search•Visual Search•Product Ads•Shopping Advice•Customer Problem Detection
Seller•Fraud Detection•Predictive Help•Seller Search & Crawling
Catalog•Browse-Node Classification•Meta-data validation•Review Analysis
Digital•Named-Entity Extraction•XRay•Plagiarism Detection•EchoSpeech Recognition
Locations
ML Seattle
ML Bangalore
S9
A9A2Z
Ivona
ML Berlin
Evi
Machine Learning in Berlin
ML @ Amazon
Forecasting
Retail
Content Linkage
Digital
Scalable Algorithms & Services
AWS
Visual Services
Retail & Digital
Machine Learning in Berlin
ML @ Amazon
Forecasting
Retail
Content Linkage
Digital
Scalable Algorithms & Services
AWS
Visual Services
Retail & Digital
Forecasting
• Given past sales of a product in every region, predict regional demand up to one year into the future
Setting
• New Products: No past demand!• Regionalized: 150+ fulfillment centers worldwide• Sparsity: Huge skew – many products sell very few items• Seasonal: Huge variation due to external, seasonal events• Distributions: Future is uncertain è predictions must be distributions• Scale: 20M+ products fulfilled by Amazon alone!• Orders: Customers demand bundle of products• Censored: Past sales ≠ past demand (inventory constraint)
Challenges
Forecasting Seasonality
Machine Learning in Berlin
ML @ Amazon
Forecasting
Retail
Content Linkage
Digital
Scalable Algorithms & Services
AWS
Visual Services
Retail & Digital
Content Linkage
• Enrich Every Piece of Digital Content Continuously by Linking it to Relevant Content on Amazon and the Web
Setting
• Scale: Millions of books – with 1000’s added each day!• Languages: Over 20 different languages (Machine Translation!)• Media: Link books, movies, products and maps together • Web: Web grows by 1B+ pages per day• Representation: Language and media-independent (Wiki?)
Challenges
XRay
ASIN Machine Translation
ASINs
ContributionProfit
Human Translation
Machine Translation
Selection Gap
Machine Learning in Berlin
ML @ Amazon
Forecasting
Retail
Content Linkage
Digital
Scalable Algorithms & Services
AWS
Visual Services
Retail & Digital
Scalable Algorithms & Services
• No limitations on model size and data size!
Setting
• Distributed: Parameters need to be distributed• Fault Tolerance: Data and model chunks might fail• Simplicity: Zero-parameter algorithms for engineers• Any-Time: Any-time convergence of algorithms• Resource-Constrains: Learning algorithms that optimize under resource & budget constraints
Challenges
Three types of data-driven development
Retrospectiveanalysis and reporting
Here-and-nowreal-time processing and
dashboards
Predictionsto enable smart applications
Amazon Kinesis Amazon EC2 AWS Lambda
Amazon Redshift Amazon RDS Amazon S3Amazon EMR
Amazon Machine Learning
Machine Learning in Berlin
ML @ Amazon
Forecasting
Retail
Content Linkage
Digital
Scalable Algorithms & Services
AWS
Visual Services
Retail & Digital
Automated Produce Inspection: The Goal
New Automated InspectionCurrent Inspection
Computer Vision
Customer Case Studies
AdiMap Case Study
AdiMap Jobs
Employee & Employer: what is the salary of jobs in US companies?
AdiMapApps
App Business: what are the financials of
apps and developers?
AdiMap Spend
Advertiser & Publisher: what is the ad spend and revenue worldwide?
AdiMap Elections
Voter & Candidate: what is the ad spend of US presidential candidates?
Company Data science company that combines the disciplines of computer science, statistics, and business
BuildFax Case Study
Company• Aggregates dispersed building permit data from across the United States • Providing the processed to other businesses, such as insurance companies, building inspectors,
and economic analysts
• Old predictive models were based on ZIP codes and other general data using Python and R languages
• New models based on data sets from public sources and from customers estimate job costs with 80% accuracy
Fraud.net Case Study
Company• Aggregating and analyzing large amounts of fraud data from thousands of online merchants in real
time• Protects more than 2 percent of all U.S. e-commerce• Fraud.net saves its customers about $1 million a week by helping them detect and prevent fraud
• “On any given day, we might see 100 different fraud schemes, each one with 100 different variations”.
• “As new fraud schemes pop up, we have to identify and create models around those specialized situations.”
47Lining
Company• 47Lining is an AWS Advanced Consulting Partner with Big Data Competency designation• Develops big data solutions built using AWS building blocks: Redshift, Kinesis, S3, DynamoDB,
Machine Learning and Elastic MapReduce
Churn Prediction with 71% Accuracy
Consumer Credit Behavior
Propensity to Purchase Real
Estate