kaggle - global data science community
DESCRIPTION
slides from the Lviv IT Arena talkTRANSCRIPT
Kaggle – the global community of Data Science professionals
Anastasiia Kornilova
Who am I?
- MS in Applied Mathematics, - 3 years as a Data Scientist
What is Data Science?
Scientific Method
Math
Statistics
Data Engineering
Domain Expertise
Advanced ComputingVisualization
Hacker Mindset
What matters?
What is Kaggle?
2010 - founded in Melbourne, Australia by Antony Goldbloom
What problem they solve?
Data problems
Data solvers
In fact, a McKinsey Global Institute report estimates that by 2018, “the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.” !!!
Between 2010 and 2020, the data scientist career path is projected to increase by 18.7 percent, beat only by video game designers. The big data industry is expected to be a 53.4 billion industry by 2016.
Anyone with "data science" in his or her job title on a LinkedIn page is going to get "100 recruiter emails a day," said Josh Sullivan, who leads a 500-person data-science group at the consulting firm Booz Allen Hamilton Holding
Are you good enough?
First Competition: Forecast Eurovision Song Contest Voting
!
!
- 1000 dollars prize - 22 teams
Outperformed prediction markets: predict 7 countries from Top10, prediction markets only 5.
- 2011 - relocated to San Francisco - November, 2011 - raise 11M dollars fundings - July, 2013 - 100,000 data scientists involved - February, 2014 - more than 140,000 data
scientists
Short story of success
How you can use Kaggle?
Rewarding types
- Knowledge - Money - Job interview
Competitions for knowledge (always open)
!
- Digit recognizer, CIFAR-10, First steps with Julia - Titanic: Machine Learning for Disaster - Bike Sharing Demand - Learning Social Circles in Networks
Competitions with prize:Open: - American Epilepsy Society Seizure Prediction
Challenge: 25, 000 prize - Africa Soil Property Prediction Challenge: 8,000 prize - Tradeshift Text Classification: 5,000 prize
Completed competitions (170+)- Heritage Health Price: 500,000 - GE Flight Quest: 250,000 - GE Hospital Quest: 100,000 - Higgs Boson ML Challenge: 13,000 + invitation to
CERN - Galaxy Zoo: 16,000 - KDD Author Paper Identification Challenge - Job Recommendation Challenge
Job competitions (completed):Facebook:
- recommend missing links in social graph (who to follow) - optimal graph path - predict text tags
Yelp: - estimate the number of useful votes a review will receive
Wallmart: - predict store sales
+ Job Board
How to win?
Dig into the data
Stay on track
!
Kaggle competition == Data science?
1. Understand
2. Collect
3. Data exploration4. Clean and transform
5. Model
6. Validate
7. Communicating results
Deploy
?