san francisco hacker news - machine learning for hackers
DESCRIPTION
This was for the san francisco hacker news meetup in february at engineyard. This was intended as a basic intro to machine learning for people who wanted to step in to the field. Video coming shortly.TRANSCRIPT
Machine Learning for Hackersis how we make sense of big data.
Adam Gibson
2-27-2014 SFHN
BIG DATA & STATISTICS• Statistics – Group by, aggregate, count,average, mean,p
values,mode,correlations, exploring, < 100 variables
• Machine Learning – Label this image, Predict the next event, Pick out the anomalies – aka learn from data not count it, group data by similarities, > 100 variables.
What is data?!
Unstructured
Text
Video
Images
Time Series
Structured
Many kinds of data Wow.
Data Scientists We know this, and just process it.
SQL
XML
JSON
CSV
WHAT do machines learn?• Machine learning is a general tool that can work with
various data types.• Images = Machine vision• Text = Natural-language processing • Time-series = Prediction• Facial recognition => Security• Text => Customer profiles/Recommendation engines• Time-series => stock-market trading platforms• NLP => Customer service
WHAT IS A DATA SCIENTIST?
Analyst Distributed Systems Engineer
Exploratory analysis of data, typically on smaller data sets.
Understands the algorithms and interprets data.
Implements production data crunching, also known as the nosql person. They handle distributed systems and workloads, APIs, perhaps even data collection and storage
What kinds of Machine Learning Are there?
Unsupervised – Clustering (group things that are similar, regression (correlation != causation ring a bell?)
Supervised – Label all the things! Predict the future!
How does this affect me?
Ad Targeting
Recommends you Movies
Brings you search results
Recognizes your face in the camera
Drives your car
Automatically disables your credit card when you leave the country
I will leave who does this to your imagination
Can I do this?
The shortcut here is to start with basics – for example google analytics, understanding churn rate.
Pick up a more advanced understanding after that if it still seems interesting.
If you are in to backends start with distributed systems, get your math basics up enough to understand what the guy on the other side of the table who's asking you to put the algorithm in to production is saying
ResourcesCoursera Machine Learning
Reddit Machine Learning
DataTau (hacker news for data scientists)
More mathy Stanford Machine Learning
Tools
Analysts
http://scikit-learn.org/stable/
Julia Lang
R Lang
Data Engineers
Spark
Hadoop
Storm
Hadoop QuickStart VM