data science popup austin: data do's and dont's: lessons from the front line
TRANSCRIPT
![Page 1: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line](https://reader031.vdocuments.us/reader031/viewer/2022030317/586e73131a28ab99598b5331/html5/thumbnails/1.jpg)
DATA SCIENCEPOP UP
AUSTIN
Data Do's and Dont's: Lessons From the Front Line
Ryan OrbanVP of Product and Strategy,
Data Scientist, Galvanize
ryanorban
![Page 2: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line](https://reader031.vdocuments.us/reader031/viewer/2022030317/586e73131a28ab99598b5331/html5/thumbnails/2.jpg)
DATA SCIENCEPOP UP
AUSTIN
#datapopupaustin
April 13, 2016Galvanize, Austin Campus
![Page 4: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line](https://reader031.vdocuments.us/reader031/viewer/2022030317/586e73131a28ab99598b5331/html5/thumbnails/4.jpg)
Data Do’s and Dont’s: Lessons from the Frontline
![Page 5: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line](https://reader031.vdocuments.us/reader031/viewer/2022030317/586e73131a28ab99598b5331/html5/thumbnails/5.jpg)
Co-Founder & CEO Zipfian Academy
Ryan Orban @ryanorban
EVP of Product and Strategy Galvanize
![Page 6: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line](https://reader031.vdocuments.us/reader031/viewer/2022030317/586e73131a28ab99598b5331/html5/thumbnails/6.jpg)
We believe an opportunity belongs to anyone with aptitude and ambition.
![Page 7: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line](https://reader031.vdocuments.us/reader031/viewer/2022030317/586e73131a28ab99598b5331/html5/thumbnails/7.jpg)
4Galvanize 2015
NODES ON THE NETWORK
COLORADO (BOULDER, DENVER, FORT COLLINS)
SEATTLE, WA
SAN FRANCISCO, CA
AUSTIN, TX (OPENING Q1 2016)
Programs: Full Stack Immersive, Data Science Immersive, Entrepreneurship
Programs: Full Stack Immersive, Data Science Immersive, Entrepreneurship
Programs: Full Stack Immersive, Data Science Immersive, Data Engineering Immersive, Masters of Science in Data Science, Entrepreneurship
Programs: Full Stack Immersive, Data Science Immersive, Entrepreneurship
[Explanation Text]
![Page 8: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line](https://reader031.vdocuments.us/reader031/viewer/2022030317/586e73131a28ab99598b5331/html5/thumbnails/8.jpg)
5Galvanize 2015
5 PROGRAMS
• Full Stack Immersive
• Data Science Immersive
• Data Engineering Immersive
Project over 500 Student Member Graduates in 2015
Currently over 1500 Members
• Master of Science in Data Science (University of New Haven)
• Startup Membership
![Page 9: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line](https://reader031.vdocuments.us/reader031/viewer/2022030317/586e73131a28ab99598b5331/html5/thumbnails/9.jpg)
6Galvanize 2015
PLACEMENT STATS
FULL STACK IMMERSIVE DATA SCIENCE IMMERSIVE
$43K $77KPre-program Salary
Average Starting Salary
97% Placement Rate*
*Galvanize is a founder member of NESTA (New Economy Skills Training Association), a trade organization founded to regulate the new “bootcamp” market. This place rate is more rigorous than that requested by state licensure agencies. The placement rate is calculated 6 months after graduation.
$72K $114KPre-program Salary
94% Placement Rate*
Average Starting Salary
![Page 10: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line](https://reader031.vdocuments.us/reader031/viewer/2022030317/586e73131a28ab99598b5331/html5/thumbnails/10.jpg)
Software Engineering
Data Science
Data Analysis
Data Engineering
Machine Learning Java
Linux, UNIX
Mobile Development
Objective C
C, C++, C#
Web Development
Ruby on Rails
JavaScript
Front-endPHP
Full-Stack
Excel
Python
SQL
NLPHadoop
Databases
Network Analysis
Java
AssemblyStatistics
R
The orange words are the most important things we teach.
How These Things Relate to Each Other
Full-Stack Web Development and Data Science are in gray circles.
![Page 11: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line](https://reader031.vdocuments.us/reader031/viewer/2022030317/586e73131a28ab99598b5331/html5/thumbnails/11.jpg)
8Galvanize 2015
DATA SCIENCE IMMERSIVE
Week 1 - Exploratory Data Analysis and Software Engineering Best Practices
Week 2 - Statistical Inference, Bayesian Methods, A/B Testing, Multi-Armed Bandit
Week 3 - Regression, Regularization, Gradient Descent
Week 4 - Supervised Machine Learning: Classification, Validation, Ensemble Methods
Week 5 - Clustering, Topic Modeling (NMF, LDA), NLP
Week 6 - Network Analysis, Matrix Factorization, and Time Series
Week 7 - Hadoop, Hive, and MapReduce
Week 8 - Data Visualization with D3.js, Data Products, and Fraud Detection Case Study
Weeks 9-10 - Capstone Projects
Week 12 - Onsite Interviews
![Page 12: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line](https://reader031.vdocuments.us/reader031/viewer/2022030317/586e73131a28ab99598b5331/html5/thumbnails/12.jpg)
![Page 13: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line](https://reader031.vdocuments.us/reader031/viewer/2022030317/586e73131a28ab99598b5331/html5/thumbnails/13.jpg)
Data Manipulation Model Creation Prediction
![Page 14: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line](https://reader031.vdocuments.us/reader031/viewer/2022030317/586e73131a28ab99598b5331/html5/thumbnails/14.jpg)
Data Manipulation
![Page 15: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line](https://reader031.vdocuments.us/reader031/viewer/2022030317/586e73131a28ab99598b5331/html5/thumbnails/15.jpg)
Do
Don’t
• Assume your data is friendly • ETL and feature engineering is largely opaque to others (and yourself after enough time away)
• Automate cleaning and transformation pipelines • Jupyter and RStudio are great for EDA, but have issues with collaboration and version control
• Build functional code to be reused; export into plain code files, track with Git
![Page 16: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line](https://reader031.vdocuments.us/reader031/viewer/2022030317/586e73131a28ab99598b5331/html5/thumbnails/16.jpg)
Model Creation
![Page 17: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line](https://reader031.vdocuments.us/reader031/viewer/2022030317/586e73131a28ab99598b5331/html5/thumbnails/17.jpg)
Do
Don’t• Never use accuracy as your main metric
• You can have 99% accuracy but 0% predictive power • Unbalanced classes; sampling
• Use metrics like precision and recall • Aggregate metrics like F1-score, AUC/AIC/BIC also good • Remember that models with highest scores are not always the ones you need; permissive vs. conservative based on use case
![Page 18: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line](https://reader031.vdocuments.us/reader031/viewer/2022030317/586e73131a28ab99598b5331/html5/thumbnails/18.jpg)
Do
Don’t• Don’t start with the most complicated models first (deep learning, gradient boosting, SVMs, etc.)
• Don’t focus on the algorithm •“More data always beats better algorithms” • But better features usually beat better algorithms*
• Start with a baseline model, then continuously “close the loop” • Create a base case to optimize against • Does 1% greater F1-score outweigh a 10x training time in production? Not usually unless you’re Google-scale.
![Page 19: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line](https://reader031.vdocuments.us/reader031/viewer/2022030317/586e73131a28ab99598b5331/html5/thumbnails/19.jpg)
Do
Don’t
• Assume your cross-validation metrics will hold up against real-life data
• Separate your application and prediction code • Fast iteration cycles are key. Create a “scoring service” that is uncoupled from application code.
• APIs & service oriented architectures typically work best
![Page 20: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line](https://reader031.vdocuments.us/reader031/viewer/2022030317/586e73131a28ab99598b5331/html5/thumbnails/20.jpg)
Communication
![Page 21: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line](https://reader031.vdocuments.us/reader031/viewer/2022030317/586e73131a28ab99598b5331/html5/thumbnails/21.jpg)
Do
Don’t
• Don’t focus on the “how”, i.e. cover every trial and tribulation
• Cut to the chase • After a presentation, I always ask the class two questions: • What is one sentence that describes what the speaker learned? • Why do I care?
![Page 22: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line](https://reader031.vdocuments.us/reader031/viewer/2022030317/586e73131a28ab99598b5331/html5/thumbnails/22.jpg)
19Galvanize 2015
• Early Access to Students
• Candidate Matching
• Curriculum Development
• Corporate Student Sponsorship
• Diversity
TALENT
![Page 23: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line](https://reader031.vdocuments.us/reader031/viewer/2022030317/586e73131a28ab99598b5331/html5/thumbnails/23.jpg)
20Galvanize 2015
• Membership
• Organic Relationships
• Course Content
• Mentorship
• Community
• Events
ACCESS
![Page 24: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line](https://reader031.vdocuments.us/reader031/viewer/2022030317/586e73131a28ab99598b5331/html5/thumbnails/24.jpg)
21Galvanize 2015
• Galvanize Experts
• Capstone Projects
• Internship
• Corporate Training
EXPERTISE
![Page 26: Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line](https://reader031.vdocuments.us/reader031/viewer/2022030317/586e73131a28ab99598b5331/html5/thumbnails/26.jpg)
DATA SCIENCEPOP UP
AUSTIN
@datapopup #datapopupaustin