"what is data science?" high school version
TRANSCRIPT
{
What is Data Science?
Renée Teate, March 2016Harrisonburg High School
Let’s start with: “What is Data?”
http://upload.wikimedia.org/wikipedia/commons/f/f0/DARPA_Big_Data.jpg
Bitshttps://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcS9dKu3_Tzi-sWW-yAqee5y0EhuvoIZNSya_rAKnuBBd0JYxPX7pw
Numbers
http://fc01.deviantart.net/fs71/i/2012/326/3/4/cute_dog_by_thomasmeadows345-d5lsah9.jpg
Imageshttp://www.freefoto.com/images/1351/06/1351_06_2---Books--Shakespeare-and-Company-Bookstore--The-Latin-Quarter--Paris_web.jpg
Text
Created & Collected
https://c2.staticflickr.com/
4/3273/3017878633_65beb1c7d6.jpg
http://upload.wikimedia.org/wikipedia/commons/e/e4/
Green_Bank_100m_diameter_Radio_Telescope.jpg
https://c1.staticflickr.com/1/2/1349370_0703fce74c.jpg
http://upload.wikimedia.org/wikipedia/commons/9/96/Bill_Nye,_Barack_Obama_and_Neil_deGrasse_Tyson_selfie_2014.jpg
Analyzed and Visualized
http://upload.wikimedia.org/wikipedia/commons/1/1c/CMS_Higgs-event.jpg
http://upload.wikimedia.org/wikipedia/commons/9/90/Kencf0618FacebookNetwork.jpg
http://upload.wikimedia.org/wikipedia/commons/b/bf/
USDA_Hardiness_zone_map.jpg
https://c1.staticflickr.com/3/2300/2596366618_2d6cb01735.jpg
“Big Data”https://web-assets.domo.com/blog/wp-content/uploads/2014/04/DataNeverSleeps_2.0_v2.jpg
Stored in Databases on Servers in Data Centers
http://pixabay.com/static/uploads/photo/2014/03/13/01/12/datacenter-286386_640.jpg
https://c2.staticflickr.com/2/1296/533233247_b6baa30fdb_z.jpg?zz=1“The Cloud”
What is a database?
Database[dey-tuh-beys] nounA comprehensive collection of related data organized for convenient access, generally in a computer.
-dictionary.com I used a database to look up this definition!
Types of Databases
http://www.oaddo.org
Relational DMBS
Graph Database
Databases You Use
https://www.google.com/maps/@38.8905569,-77.1721577,13z/data=!5m1!1e1
http://upload.wikimedia.org/wikipedia/commons/6/69/Netflix_logo.svg
https://c2.staticflickr.com/4/3324/3507973704_563846fe14_z.jpg?zz=1
How is data collected about you used to help
you?
Who builds these systems?
Data ScientistComputer Scientist
• Gathering data• Writing Code• Designing
Interfaces• Design / Manage /
Query Databases• Data Mining
Mathematician• Statistics• Predictive
Analytics• Data
Visualizations• Evaluating
Results
Business Person
• Domain Expertise• Knowing what
questions to ask• Interpreting
results for business decisions
• Presenting outcomes
No one person needs to have all of these skills. More organizations are now building data science
teams.
Becoming a Data Scientist Podcast
How have I learned data science?
Statistician Data Mining Specialist Biostatistician Social Science Researcher Big Data Analyst Spatial/GIS Analyst Natural Language
Processing Researcher Computational Physicist
Some other names for “Data Scientist”
Pythonista Financial Analyst Recommendation System
Engineer Information Architect Artificial Intelligence
Researcher Neuroscientist Data Visualization
Designer
Data Science jobs pay an average of $118,000 per
yearIt is estimated that by 2018, US could
have a shortage of 140,000+ people with advanced analytical skills & need 1.5M
managers/analysts that can make decisions based on data analysis
Examples Galaxy Classification from Images
http://benanne.github.io/2014/04/05/galaxy-zoo.html
Choosing Audience for Content Promotion on Facebookhttp://citizennet.com/blog/2012/11/10/random-forests-ensembles-and-performance-metrics/
Predicting Seizureshttps://www.kaggle.com/c/seizure-detection
March Madness Pickshttps://www.kaggle.com/c/march-machine-learning-mania-2015
Facial Recognition (auto-tagging)
http://www.mirrordaily.com/facebook-and-google-develop-amazing-facial-recognition-algorithms/22402/
What other things can facial recognition be
used for?
What are the ethical questions about this?
http://xkcd.com/1425/
It’s actually really hard for computers to do things humans consider simple! We train them using
“machine learning”
https://www.linkedin.com/pulse/machine-learning-image-detectioncats-vs-dogs-amrith-kumar
"Once you start working in robotics, you realize that things that kids learn to do up to age 10 ... are actually the hardest things to get a robot to do.“
-Pieter Abbeel, AI Researcher and Professor at UC Berkeley
https://www.youtube.com/watch?v=gy5g33S0Gzo
AI and Robotics
Programming Any language is
good to start with! Just start coding!
Most common: Python or R
Database design, SQL
Math Statistics Linear Algebra
Different places you can start if you’re interested in data science
Research and Analysis Science involving data
collection and interpretation
Working with “messy” real life data
Business Analytics Data Mining
Others Business /
Communication Graphic Design
Doing Data Science by Cathy O’Neil* & Rachel Schutt Data Smart by John Foreman* (uses Excel) Blogs & News Feeds (FlowingData.com is a good one to
start with) Podcasts Twitter – look for curated lists of people to follow
https://twitter.com/BecomingDataSci/lists/women-in-data-science/members
Online courses like those on DataCamp, Codecademy, Coursera
TED talks on Data http://www.ted.com/search?q=data Practice w/public data sets on sites like data.gov Volunteer opportunities via DataKind Ask me…. I have plenty more!
Learning Resources
Find me online:@becomingdatasci
“Data Science Renee”on Twitter
BecomingADataScientist.com
DataSciGuide.com
Becoming a Data Scientist Podcast & Learning Club
Questions?Or want a copy of these slides and
links?
Renée Teaterenee@becomingadatascientist.
com
@becomingdatasci on Twitter