using hadoop for big data
TRANSCRIPT
Hadoop for (Young) Data Scientist
Komes Chandavimol and TeamData Science Lab, Thailand
Agenda
• Big Data, Analytics and Data Science
• Hadoop + Sparks Workshops
• Sharing Experience: Hadoop (Real) Use Cases
• Hadoop + Spark Trends,
Big Data
http://www.adweek.com/prnewser/how-many-times-do-the-worlds-social-media-users-click-every-minute/117427
https://www.domo.com/learn/data-never-sleeps-3-0
6http://www.adweek.com/prnewser/how-many-times-do-the-worlds-social-media-users-click-every-minute/117427
https://www.domo.com/learn/data-never-sleeps-3-0
The Growth of Data
7http://www.adweek.com/prnewser/how-many-times-do-the-worlds-social-media-users-click-every-minute/117427
https://www.domo.com/learn/data-never-sleeps-3-0
What is Big Data?
8http://blogs.forrester.com/category/hadoophttp://solutions.forrester.com/Global/FileLib/webinars/Big_Data_-_Gold_Rush_or_Illusion.pdf
The Big Data Tools
11http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/
Traditional Data Management Architecture
12http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/
New Data Management Architecture
How the Data Lake works?
15http://www.clearpeaks.com/blog/category/tableau
Traditional Enterprise Data warehouse
16
What you consume from Data Lake?
https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now
17
Volume? Variety? Velocity?
https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now
19
Big Data + Analytics = Values
https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now
Big Data Analytics
20http://hortonworks.com/blog/big-data-refinery-fuels-next-generation-data-architecture/
Big Data Analytics
21http://dataofthings.blogspot.com/2014/04/the-bbbt-sessions-hortonworks-big-data.html
23
How to do Big Data Analytics?
https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now
Data Science Experience Sharing, Big Data Challenge #2,Bangkok Thailand
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
What is Data Science?
The Rise of Data Scientist
27
http://flowingdata.com/2009/06/04/rise-of-the-data-scientist/
2009
https://hbr.org/
28http://hrb.org
http://www.anlytcs.com/2014/01/data-science-venn-diagram-v20.html
2014
The Rise of Data Scientist
Data Science Experience Sharing, Big Data Challenge #2,Bangkok Thailand
http://www.anlytcs.com/2014/01/data-science-venn-diagram-v20.html
2014
The Data Science
33
Doing Data Science by O'Neil et al (2013)
Data Science Team
Analyzing the Analyzers, Harris (2013)
34
Data Science TeamData Scientist & Data Engineer
http://www.kdnuggets.com/2015/11/different-data-science-roles-industry.html
35
Data Science TeamData Scientist & Data Engineer
http://www.kdnuggets.com/2015/11/different-data-science-roles-industry.htmlhttps://www.facebook.com/DataScienceTh/posts/931828353527079:0
36
Data Science Professionals
http://www.kdnuggets.com/2015/11/different-data-science-roles-industry.html
37
Data Science for Dummies Pierson
(2015)
∗Build In-house Team
• Train existing employee
• Train existing employee and hire experts
• Hire experts
∗Outsourcing requirements to private DS consultants
• Outsourcing for comprehensive DS Strategy development
• Outsource for DS Solutions to specific problem
∗Leverage Cloud-based platform solutions
How to build DS Team?
Machine Learning
Improving Performance in some Task with Experience”. Tom Mitchell
Tom Mitchell (1998)
The field of study that gives computers the ability to learn
without being explicitly programmed. Arthur Samuel (1990)
Wikipedia, Data Visualization for Dummies (2014)
Data Points: Visualization That Means Something(2013)38
Machine Learning deals with systems
that can learn from data.
Machine Learning Discovery
• Class Discovery• Correlation Discovery• Novelty (Surprise) Discovery• Association (or Link Discovery)
40
KirkBorne-workshop-ODSC2016.pdf
The XYZ of Data Science
Smart X : • Smart Cities • Smart Highways • Smart Supply Chain Precision Y : • Precision Medicine • Precision Farming • Precision Pricing Personalized Z : • Personalized Health • Personalized Learning • Personalized Shopping Experience
41KirkBorne-Workshop-ODSC2016.pdf
Intelligence at the edge of the network… at the point of data collection
65
Workshop #10.1 -10.1 Create Hive Tables
10.2 Create External Hive Tables10.3 Create External Hive Tables
10.4 Partition
Source: Analytics: The New Path to Value, a joint MIT Sloan Management Review and IBM Institute for Business Value study. Copyright © Massachusetts Institute of Technology 2010.
Top Performers Use Analytics 5
Times More Than Lower
Performers
Monitoring and MaintenanceData sources: IoT Sensors in factory
Data products: predictive maintenance models
http://www.electrex.it/en/news/600-automated-energy-management-system-a-enms-for-cement-production-plants.html
Customer Engagement + LocationData sources: Mobile App, Loyalty Program, GIS
Data products: Buying behavior analysis, coupon-response model , location visualizationhttp://www.fastcompany.com/3020859/most-creative-people/how-chinas-one-child-policy-forced-starbucks-to-rethink-its-beijing-sto
Fuel Saving Data sources: Telematics (sensor), GPS
Data products: Prescriptive analytics – route
optimization, predictive maintenance
(parts/malfunction)http://www.cnet.com/news/ups-turns-data-analysis-into-big-savings/
Fraud DetectionData sources: historical pattern of transaction data
Data products: predictive models – fraud/non-fraudhttps://bluefishway.com/2013/09/13/panic-oh-no-not-again/
HR Analytics – Google Hiring Data sources: Historical hiring attributesData products: Predictive model – recruiting high performer
Behavioral Test
Situational Test
GPA
Brain Teaser
Good School