everyone can do data science — import.io webinar

Post on 28-Nov-2014

1.033 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Everyone can do data science with the help of tools such as: - import.io for visually scraping data from the web - Pandas to wrangle data in Python - BigML to apply machine learning to data. In this presentation , I introduce what machine learning is before moving on to a case study where I show how to build a real estate pricing model. Check out import.io's webinar for the whole thing: http://blog.import.io/post/become-a-data-scientist-in-an-hour

TRANSCRIPT

Everyone can dodata science"

import.io webinar 23/9/14!

Louis Dorard (@louisdorard)

US real estate portals:"- Realtor - Zillow - Trulia - …

Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)

3 1 860 1950 house 565,000

3 1 1012 1951 house

2 1.5 968 1976 townhouse 447,000

4 1315 1950 house 648,000

3 2 1599 1964 house

3 2 987 1951 townhouse 790,0001 1 530 2007 condo 122,0004 2 1574 1964 house 835,000

4 2001 house 855,000

3 2.5 1472 2005 house

4 3.5 1714 2005 townhouse

2 2 1113 1999 condo

1 769 1999 condo 315,000

Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)

3 1 860 1950 house 565,000

3 1 1012 1951 house

2 1.5 968 1976 townhouse 447,000

4 1315 1950 house 648,000

3 2 1599 1964 house

3 2 987 1951 townhouse 790,0001 1 530 2007 condo 122,0004 2 1574 1964 house 835,000

4 2001 house 855,000

3 2.5 1472 2005 house

4 3.5 1714 2005 townhouse

2 2 1113 1999 condo

1 769 1999 condo 315,000

Let’s create a real estate pricing model

Fabien Durand (@thefabiendurand)

www.louisdorard.com/guest/everyone-can-do-data-science-importio

Data Science:"- domain knowledge - hacking abilities - machine learning

What the @#?~% is ML?

“Which type of email is this? — Spam/Ham”"-> Classification

“How much is this house worth? — X $” -> Regression

Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)

3 1 860 1950 house 565,000

3 1 1012 1951 house

2 1.5 968 1976 townhouse 447,000

4 1315 1950 house 648,000

3 2 1599 1964 house

3 2 987 1951 townhouse 790,0001 1 530 2007 condo 122,0004 2 1574 1964 house 835,000

4 2001 house 855,000

3 2.5 1472 2005 house

4 3.5 1714 2005 townhouse

2 2 1113 1999 condo

1 769 1999 condo 315,000

ML is a set of AI techniques where “intelligence” is built by

referring to examples

??

(McKinsey & Co.)

“A significant constraint on realizing value from big data will be a shortage of talent, particularly of people with deep expertise in statistics

and machine learning.”

(Bret Victor)

Making ML effortless

HTML / CSS / JavaScript

HTML / CSS / JavaScript

squarespace.com

The two phases of machine learning:

• TRAIN a model

• PREDICT with a model

The two methods of prediction APIs:

• TRAIN a model

• PREDICT with a model

The two methods of prediction APIs: • model = create_model(dataset)!

• predicted_output = create_prediction(model, new_input)

from bigml.api import BigML !# create a model!api = BigML()!source = api.create_source('training_data.csv')!dataset = api.create_dataset(source)!model = api.create_model(dataset) !# make a prediction!prediction = api.create_prediction(model, new_input)!print "Predicted output value: ",prediction['object']['output']

http://bit.ly/bigml_wakari

Recap

• Classification and regression

• 2 phases in ML: train and predict

• Prediction APIs make it easy to build models

• Let’s use them on real estate data to predict price from house characteristics

• Encoding domain knowledge

• Making our life easier: restricting data to only 1 city

BigML!

• Look at data

• Split into training and test

• Build model from training

• Evaluate model on test

• Errors: mean absolute error (or percentage?)

Other import.io + BigML use cases:!- Predict ebook rating from description - Predict sales of etsy stores

Talk at #APIconUK!tomorrow in London

ML Algorithm API

Automated Pred. API

Text Classification API

Vertical Pred. API

Fixed-model Pred. API

AB

STRA

CTIO

N

www.louisdorard.com/machine-learning-book 50% off for 24 hours with code “importio”

!

!

@louisdorard

top related