data science: the art of foul play by serhiy shelpuk
DESCRIPTION
Serhiy Shelpuk, Lead Data Scientist, Competence Manager at SoftServe, Inc., delivered an insightful presentation on Data Science and SoftServe`s Data Science Group Knowledge Model at the 2013 IT Weekend Ukraine conference that took place on September 14, 2013, in Kyiv, Ukraine. Here`s his presentation.TRANSCRIPT
![Page 2: Data Science: The Art of Foul Play by Serhiy Shelpuk](https://reader033.vdocuments.us/reader033/viewer/2022052822/554f9fa0b4c90586258b48b0/html5/thumbnails/2.jpg)
![Page 3: Data Science: The Art of Foul Play by Serhiy Shelpuk](https://reader033.vdocuments.us/reader033/viewer/2022052822/554f9fa0b4c90586258b48b0/html5/thumbnails/3.jpg)
“Your goal should not be to buy players, it should be to buy wins. In order to buy wins you should buy runs” (c)
![Page 4: Data Science: The Art of Foul Play by Serhiy Shelpuk](https://reader033.vdocuments.us/reader033/viewer/2022052822/554f9fa0b4c90586258b48b0/html5/thumbnails/4.jpg)
![Page 5: Data Science: The Art of Foul Play by Serhiy Shelpuk](https://reader033.vdocuments.us/reader033/viewer/2022052822/554f9fa0b4c90586258b48b0/html5/thumbnails/5.jpg)
More data is available for companies
Storage technologies allow storing and operating it
Advanced analytics could be applied to this new data to achieve competitive advantage
![Page 6: Data Science: The Art of Foul Play by Serhiy Shelpuk](https://reader033.vdocuments.us/reader033/viewer/2022052822/554f9fa0b4c90586258b48b0/html5/thumbnails/6.jpg)
Data Scientist: The Sexiest Job of the 21st Century
For Today’s Graduate, Just One Word: Statistics
“I keep saying that the sexy job in the next 10 years will be statisticians, and I’m not kidding.”
Hal Varian chief economist at Google
Data Scientist: The Hottest Job You Haven't Heard Of
![Page 7: Data Science: The Art of Foul Play by Serhiy Shelpuk](https://reader033.vdocuments.us/reader033/viewer/2022052822/554f9fa0b4c90586258b48b0/html5/thumbnails/7.jpg)
• Hybrid IT and Cloud Computing
• Strategic Big Data
• Actionable Analytics
• In Memory Computing
• Integrated Ecosystems
Top 10 Strategic Technology Trends for 2013
• Mobile Device Battles
• Mobile Applications and HTML5
• Personal Cloud
• Enterprise App Stores
• The Internet of Things
![Page 8: Data Science: The Art of Foul Play by Serhiy Shelpuk](https://reader033.vdocuments.us/reader033/viewer/2022052822/554f9fa0b4c90586258b48b0/html5/thumbnails/8.jpg)
McKinsey Global Institute projects approximately 140,000 to 190,000 unfilled positions of data analytics experts in the U.S. by 2018 and a shortage of 1.5 million managers and analysts who have the ability to understand and make decisions using big data.
![Page 9: Data Science: The Art of Foul Play by Serhiy Shelpuk](https://reader033.vdocuments.us/reader033/viewer/2022052822/554f9fa0b4c90586258b48b0/html5/thumbnails/9.jpg)
![Page 10: Data Science: The Art of Foul Play by Serhiy Shelpuk](https://reader033.vdocuments.us/reader033/viewer/2022052822/554f9fa0b4c90586258b48b0/html5/thumbnails/10.jpg)
Business Tasks
• Define prospective customers
• Define traffic jams in the city
• Recommend restaurants and menus
• Adjust UI to the particular user
• Classify body part on X-Ray image
• Define market niche
• Define influencers in the social networks
• Define similar customers or projects in portfolio
• Define informal groups in the organization
• Define fraud bank transaction
• Define network intrusion attempts
• Provide automatic aircraft engine testing
• Provide automatic IT infrastructure monitoring
• Provide clinical test analysis
• Define the best price for the goods or services to maximize profits
• Define best working schedule for the store
• Define best amount of production
• Define best business rules
Model Family Classification Clustering Anomaly Detection
Optimization
Algorithms • Naïve Bayes • Logistic regression • Support Vector
Machines • Neural Networks
• K-Means • K nearest
neighbor • Self-organized
maps • Mixture of
Gaussians
• Mixture of Gaussians
• Self-learning anomaly detection
• Gradient descent • Simplex method • Newton’s method • Normal equations • Genetic algorithms
![Page 11: Data Science: The Art of Foul Play by Serhiy Shelpuk](https://reader033.vdocuments.us/reader033/viewer/2022052822/554f9fa0b4c90586258b48b0/html5/thumbnails/11.jpg)
Cross Industry Standard Process for Data Mining
![Page 12: Data Science: The Art of Foul Play by Serhiy Shelpuk](https://reader033.vdocuments.us/reader033/viewer/2022052822/554f9fa0b4c90586258b48b0/html5/thumbnails/12.jpg)
Business
Level
• Basics of Business Analysis
• Basics of Economics
• Basics of Product Management
• Basics of Organizational Behavior
Logic
Level
• Statistics/Probability
• Machine Learning
• Data Mining
• Artificial Intelligence
Technology
Level
• Matlab/Octave
• R
• SQL
• Parallel Computing
SoftServe Data Science Group Knowledge Model
![Page 13: Data Science: The Art of Foul Play by Serhiy Shelpuk](https://reader033.vdocuments.us/reader033/viewer/2022052822/554f9fa0b4c90586258b48b0/html5/thumbnails/13.jpg)
Deep Learning Neural Networks
![Page 14: Data Science: The Art of Foul Play by Serhiy Shelpuk](https://reader033.vdocuments.us/reader033/viewer/2022052822/554f9fa0b4c90586258b48b0/html5/thumbnails/14.jpg)
Learning algorithm
Task: recognize a motorcycle
Feat
ure
ext
ract
or
![Page 15: Data Science: The Art of Foul Play by Serhiy Shelpuk](https://reader033.vdocuments.us/reader033/viewer/2022052822/554f9fa0b4c90586258b48b0/html5/thumbnails/15.jpg)
The concept of Autoencoder
![Page 16: Data Science: The Art of Foul Play by Serhiy Shelpuk](https://reader033.vdocuments.us/reader033/viewer/2022052822/554f9fa0b4c90586258b48b0/html5/thumbnails/16.jpg)
…
… … …
…
The concept of Autoencoder
© Andrew Y. Ng
![Page 17: Data Science: The Art of Foul Play by Serhiy Shelpuk](https://reader033.vdocuments.us/reader033/viewer/2022052822/554f9fa0b4c90586258b48b0/html5/thumbnails/17.jpg)
Large scale deep learning networks
See more: Building high-level features using large scale unsupervised learning
![Page 18: Data Science: The Art of Foul Play by Serhiy Shelpuk](https://reader033.vdocuments.us/reader033/viewer/2022052822/554f9fa0b4c90586258b48b0/html5/thumbnails/18.jpg)
![Page 19: Data Science: The Art of Foul Play by Serhiy Shelpuk](https://reader033.vdocuments.us/reader033/viewer/2022052822/554f9fa0b4c90586258b48b0/html5/thumbnails/19.jpg)
Pre-trained as Autoencoder Typical classification
neural network
Deep learning neural networks
![Page 20: Data Science: The Art of Foul Play by Serhiy Shelpuk](https://reader033.vdocuments.us/reader033/viewer/2022052822/554f9fa0b4c90586258b48b0/html5/thumbnails/20.jpg)
Video
Text/NLP
Images
Few results
© Andrew Y. Ng
![Page 21: Data Science: The Art of Foul Play by Serhiy Shelpuk](https://reader033.vdocuments.us/reader033/viewer/2022052822/554f9fa0b4c90586258b48b0/html5/thumbnails/21.jpg)
Phase 1 results (old-fashion anomaly detection)
Phase 2 prototype (deep learning approach)
Deep Learning in SoftServe
![Page 22: Data Science: The Art of Foul Play by Serhiy Shelpuk](https://reader033.vdocuments.us/reader033/viewer/2022052822/554f9fa0b4c90586258b48b0/html5/thumbnails/22.jpg)
Useful Resources
• Introduction to Statistics • Introduction to Artificial Intelligence
• Machine Learning • Probabilistic Graphical Models • Statistics One
![Page 23: Data Science: The Art of Foul Play by Serhiy Shelpuk](https://reader033.vdocuments.us/reader033/viewer/2022052822/554f9fa0b4c90586258b48b0/html5/thumbnails/23.jpg)
Thank you!