on building a data science curriculum
DESCRIPTION
Data Science is a comparatively new field and as such it is constantly changing as new techniques, tools, and problems emerge every day. Traditionally education has taken a top down approach where courses are developed on the scale of years and committees approve curricula based on what might be the most theoretically complete approach. This is at odds however with an evolving industry that needs data scientists faster than they can be (traditionally) trained. If we are to sustainably push the field of Data Science forward, we must collectively figure out how to best scale this type of education. At Zipfian I have seen (and felt) first hand what works (and what doesn't) when tools and theory are combined in a classroom environment. This talk will be a narrative about the lessons learned trying to integrate high level theory with practical application, how leveraging the Python ecosystem (numpy, scipy, pandas, scikit-learn, etc.) has made this possible, and what happens when you treat curriculum like product (and the classroom like a team).TRANSCRIPT
![Page 1: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/1.jpg)
On Building a Data Science CurriculumNovember 23nd, 2014
![Page 2: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/2.jpg)
Jonathan DinuDirector of Education, Galvanize
[email protected]@clearspandex
Questions? tweet @galvanize
![Page 3: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/3.jpg)
Formerly
Questions? tweet @galvanize
![Page 4: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/4.jpg)
Formerly
Questions? tweet @galvanize
![Page 5: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/5.jpg)
+
Currently
Questions? tweet @galvanize
![Page 6: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/6.jpg)
Challenge
The Challenge
Questions? tweet @galvanize
![Page 7: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/7.jpg)
Challenge
![Page 8: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/8.jpg)
Tools
Framework/Library
Big Data (scalability)
Small Data
Bespoke Code
Cloudera ML
Mahout
MLlib (amplab)H20 (0xdata)
C/C++
MapReduce (Streaming)
MapReduce (Java)
Cascading/Crunch
Pig/Hive
Vowpal Rabbit
GiraphGraphLab
SparkStorm
CRANR
Python
Javascikit-learn
pandas
mlpack
Weka
Numpy
Javascript
Questions? tweet @galvanize
![Page 9: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/9.jpg)
Obligatory Name Drop
Questions? tweet @galvanize
Acquisition
Parse
Storage
Transform/Explore
Vectorization
Train
Model
Expose
Presentation
requests
BeautifulSoup4
pandas
pymongo
Flask
At Scale Locally
scrapy
Hadoop Streaming (w/ BeautifulSoup4)
mrjob or Mortar (w/ Python UDF)
Snakebite (HDFS)
MLlib (pySpark)
Flask
scikit-learn/NLTK
![Page 10: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/10.jpg)
Challenge
Questions? tweet @galvanize
![Page 11: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/11.jpg)
Challenge
Now do that in 8 weeks
Questions? tweet @galvanize
![Page 12: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/12.jpg)
Challenge
Questions? tweet @galvanize
![Page 13: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/13.jpg)
Intuition
Iteration 0: Intuition
Questions? tweet @galvanize
![Page 14: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/14.jpg)
Content
Questions? tweet @galvanize Source: Metacademy
![Page 15: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/15.jpg)
Bottom Up Approach
Questions? tweet @galvanize
Content
![Page 16: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/16.jpg)
Content
Source: Coursera
![Page 17: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/17.jpg)
Content
Source: UC Berkeley Masters
![Page 18: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/18.jpg)
Not Everybody Learns This Way
Questions? tweet @galvanize
Issues
![Page 19: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/19.jpg)
Issues
• Not Enough Context
• Not Enough Concept Overlap
• Takes too much Time
• Nothing Happens in a Vacuum
Questions? tweet @galvanize
![Page 20: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/20.jpg)
Digression
Not Just for Data Science
Questions? tweet @galvanize
(relevant to learning any complex subject)
![Page 21: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/21.jpg)
Experience
Iteration 1: Experience
Questions? tweet @galvanize
![Page 22: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/22.jpg)
Theory
Mathematics Statistical Analysis
Mathematics & Statistics
Distributions (Binomial, Poisson,
etc.)
Summary Statistics (Mean, Variance, etc.)
Hypothesis Testing
Bayesian Analysis
Linear Algebra (Matrix Factorization)
Calculus (Integrals,
Derivatives, etc)
Graph Theory
Probability/Combinatorics
Questions? tweet @galvanize
![Page 23: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/23.jpg)
Questions? tweet @galvanize
Worth the Upfront Investment
Theory
![Page 24: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/24.jpg)
Technique
Distributed Computing
Supervised (SVM, Random
Forest)
NLP / Information Retrieval
Algorithms & Data Structures
Data Visualization
Data Munging
Machine Learning & Software Engineering
Machine Learning
Software Engineering
Validation, Model Comparison
Unsupervised (K-means, LDA)
Questions? tweet @galvanize
![Page 25: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/25.jpg)
Questions? tweet @galvanize
Just ask them!
Network
(the students)
![Page 26: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/26.jpg)
Context is King
![Page 27: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/27.jpg)
Questions? tweet @galvanize
Network
![Page 28: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/28.jpg)
Questions? tweet @galvanize
Network
Iris Dataset Classification
![Page 29: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/29.jpg)
Questions? tweet @galvanize
Network
Iris Dataset Classification
NYT Topic Modeling
![Page 30: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/30.jpg)
Questions? tweet @galvanize
Network
Iris Dataset Classification
NYT Topic Modeling
Real-time Fraud scoring service
![Page 31: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/31.jpg)
Questions? tweet @galvanize
Network
Iris Dataset Classification
NYT Topic Modeling
Real-time Fraud scoring service
Personal Capstone Project
![Page 32: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/32.jpg)
Questions? tweet @galvanize
Network
Iris Dataset Classification
NYT Topic Modeling
Real-time Fraud scoring service
Personal Capstone
“Domesticated Data” Learn the tools/theory
![Page 33: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/33.jpg)
Questions? tweet @galvanize
Network
Iris Dataset Classification
NYT Topic Modeling
Real-time Fraud scoring service
Personal Capstone
“Domesticated Data” Learn the tools/theory
Learn the application“Wild Data”
![Page 34: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/34.jpg)
Questions? tweet @galvanize
Network
Iris Dataset Classification
NYT Topic Modeling
Real-time Fraud scoring service
Personal Capstone
“Domesticated Data” Learn the tools/theory
Learn the application“Wild Data”
Simulated Case Study Learn the process
![Page 35: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/35.jpg)
Questions? tweet @galvanize
Network
Iris Dataset Classification
NYT Topic Modeling
Real-time Fraud scoring service
Personal Capstone
“Domesticated Data” Learn the tools/theory
Learn the application“Wild Data”
Simulated Case Study
Greenfield Project Learn the practice/art
Learn the process
![Page 36: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/36.jpg)
Theory
Questions? tweet @galvanize
Theory
Application
Synthesis
$$$ PROFIT!!
![Page 37: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/37.jpg)
Questions? tweet @galvanize
Just ask them!
Network
![Page 38: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/38.jpg)
Network
Questions? tweet @galvanize
![Page 39: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/39.jpg)
Questions? tweet @galvanize
Just ask them!(and be flexible)
Network
![Page 40: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/40.jpg)
Questions? tweet @galvanize
Treat them like customers(because they are)
Network
![Page 41: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/41.jpg)
Questions? tweet @galvanize
Always Validate!
Network
![Page 42: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/42.jpg)
Metrics
Iteration 2: Data!
Questions? tweet @galvanize
![Page 43: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/43.jpg)
Experience
Iteration 2: Data!
Questions? tweet @galvanize
METRICS
METRICS EVERYWHERESaturday, April 9, 2011
![Page 44: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/44.jpg)
Metrics
Questions? tweet @galvanize
![Page 45: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/45.jpg)
Questions? tweet @galvanize
• Commits
• Pull Requests
• Passing Tests
• Etc.
Metrics
![Page 46: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/46.jpg)
Curriculum as Product
![Page 47: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/47.jpg)
Learning Techniques
Questions? tweet @galvanize
![Page 48: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/48.jpg)
Questions? tweet @galvanize
Industry Techniques
Source: http://en.wikipedia.org/wiki/Extreme_programming
![Page 49: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/49.jpg)
Questions? tweet @galvanize
Industry Techniques
Source: http://lostechies.com/scottreynolds/2009/10/07/how-we-do-things-tdd-bdd/
![Page 50: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/50.jpg)
Questions? tweet @galvanize
Industry Techniques
Code Reviews
Source: http://agile.dzone.com/articles/re-pair-programming
![Page 51: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/51.jpg)
Our House
@Zipfian(now Galvanize)
Questions? tweet @galvanize
![Page 52: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/52.jpg)
source: http://www.sebastienmillon.com/Rainbow-Immersion-Therapy-Art-Print-15
![Page 53: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/53.jpg)
Methodology
Commun
ity Education
Industry
Meetup
Student Groups
Corporate Training
Questions? tweet @galvanize
![Page 54: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/54.jpg)
Methodology
Questions? tweet @galvanize
• Outcomes focused
• Project-based curriculum using real datasets
• Guest lectures from leaders in the field
• Mock interviews and hiring preparation
• Full instructional staff + personal mentorship
![Page 55: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/55.jpg)
Employment
Questions? tweet @galvanize Source: http://www.nerdwallet.com/nerdscholar/grad_surveys/highest-employment-rates
University of Massachusetts-Amherst School of Nursing
98%
Georgetown University McDonough School of Business
94%
Michigan State University College of Nursing
92%
Syracuse University School of Architecture
90%
University of Massachusetts-Amherst Isenberg School of Management
90%
Michigan State University School of Hospitality Business
89%
New York University 88%
Boston College Connell School of Nursing
88%
Boston College Carroll School of Management
87%
Case Western Reserve University Frances Payne Bolton School of Nursing
86%
Highest Employment Rates (2012)
1. Princeton University
2. Harvard University
3. Yale University
4. Columbia University
5. Stanford University
6. University of Chicago
7. Duke University
8. MIT
9. University of Pennsylvania
10. California Institue of Technology
U.S. News and World Report Ranking
![Page 56: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/56.jpg)
Timeline
Questions? tweet @galvanize
STRUCTURED CURRICULUM
HIRING DAY
CAPSTONE PROJECT
GRADUATION
08 10.5 12
INTERVIEWS
Data Science Immersive
![Page 57: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/57.jpg)
Questions? tweet @galvanize
Industry Student Projects
![Page 58: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/58.jpg)
Questions? tweet @galvanize
!
• Working knowledge of programming
• Background in a quantitative discipline
• Comfortable with mathematics and statistics
• Child-like curiosity
What We Look For
Our Students
![Page 59: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/59.jpg)
Our Students
Questions? tweet @galvanize
Educational Background
BS
MS
PhD
0 4 8 12 16
![Page 60: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/60.jpg)
Questions? tweet @galvanize
Disciplines
Software EngineeringAnalysts
Finance/EconomicsEngineering
PhysicsPhysical Sciences
MathematicsStatistics
AstronomyLinguistics
Professional Poker
0 2 4 6 8
Our Students
![Page 61: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/61.jpg)
Questions? tweet @galvanize
Data Science Immersive
Masters in Data Science
Data Engineering Immersive
Weekend Workshops
+
![Page 62: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/62.jpg)
Questions? tweet @galvanize
Immersive
Masters
![Page 63: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/63.jpg)
Questions? tweet @galvanize
Immersive
Masters
(not to scale)
![Page 64: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/64.jpg)
Questions? tweet @galvanize
Masters of Science - 1 year (Starts in Spring)
http://www.galvanizeu.com/request-info
![Page 65: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/65.jpg)
Goals
Questions? tweet @galvanize
!
• Present a guest lecture or share a data story
• Donate datasets and propose projects
• Sponsor a scholarship
• Attend our Hiring Day
Get Involved
![Page 66: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/66.jpg)
Goals
Questions? tweet @galvanize
!
• Full-time Instructors
• TAs
• Mentor (volunteer)
We’re Hiring!
![Page 67: On Building a Data Science Curriculum](https://reader033.vdocuments.us/reader033/viewer/2022052907/55943eb71a28abe95b8b46df/html5/thumbnails/67.jpg)
Questions?
Questions? tweet @galvanize
Thank You!
Jonathan DinuDirector of Education, Galvanize
[email protected]@clearspandex