san francisco hacker news - machine learning for hackers

Post on 06-May-2015

451 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

This was for the san francisco hacker news meetup in february at engineyard. This was intended as a basic intro to machine learning for people who wanted to step in to the field. Video coming shortly.

TRANSCRIPT

Machine Learning for Hackersis how we make sense of big data.

Adam Gibson

2-27-2014 SFHN

BIG DATA & STATISTICS• Statistics – Group by, aggregate, count,average, mean,p

values,mode,correlations, exploring, < 100 variables

• Machine Learning – Label this image, Predict the next event, Pick out the anomalies – aka learn from data not count it, group data by similarities, > 100 variables.

What is data?!

Unstructured

Text

Video

Images

Time Series

Structured

Many kinds of data Wow.

Data Scientists We know this, and just process it.

SQL

XML

JSON

CSV

WHAT do machines learn?• Machine learning is a general tool that can work with

various data types.• Images = Machine vision• Text = Natural-language processing • Time-series = Prediction• Facial recognition => Security• Text => Customer profiles/Recommendation engines• Time-series => stock-market trading platforms• NLP => Customer service

WHAT IS A DATA SCIENTIST?

Analyst Distributed Systems Engineer

Exploratory analysis of data, typically on smaller data sets.

Understands the algorithms and interprets data.

Implements production data crunching, also known as the nosql person. They handle distributed systems and workloads, APIs, perhaps even data collection and storage

What kinds of Machine Learning Are there?

Unsupervised – Clustering (group things that are similar, regression (correlation != causation ring a bell?)

Supervised – Label all the things! Predict the future!

How does this affect me?

Ad Targeting

Recommends you Movies

Brings you search results

Recognizes your face in the camera

Drives your car

Automatically disables your credit card when you leave the country

I will leave who does this to your imagination

Can I do this?

The shortcut here is to start with basics – for example google analytics, understanding churn rate.

Pick up a more advanced understanding after that if it still seems interesting.

If you are in to backends start with distributed systems, get your math basics up enough to understand what the guy on the other side of the table who's asking you to put the algorithm in to production is saying

ResourcesCoursera Machine Learning

Reddit Machine Learning

DataTau (hacker news for data scientists)

More mathy Stanford Machine Learning

Analysts

http://scikit-learn.org/stable/

Julia Lang

R Lang

Data Engineers

Spark

Hadoop

Storm

Hadoop QuickStart VM

top related