bike sharing demand - ia · -the kaggle forum, lot of code sharing about solutions....

13
Bike Sharing Demand _ _ _ _ www.dataiku.com Goal: forecast use of a city bikeshare system.

Upload: others

Post on 22-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bike Sharing Demand - IA · -The kaggle forum, lot of code sharing about solutions. -Datascience.net, the « french kaggle ».-Paris Machine Learning Meetup (one every month) -Pandas

!

Bike Sharing Demand _ _ _ _

www.dataiku.com

Goal: forecast use of a city bikeshare system.

Page 2: Bike Sharing Demand - IA · -The kaggle forum, lot of code sharing about solutions. -Datascience.net, the « french kaggle ».-Paris Machine Learning Meetup (one every month) -Pandas

Get the Data!

http://www.kaggle.com/c/bike-sharing-demand/data

Page 3: Bike Sharing Demand - IA · -The kaggle forum, lot of code sharing about solutions. -Datascience.net, the « french kaggle ».-Paris Machine Learning Meetup (one every month) -Pandas

Load it in DSS

Page 4: Bike Sharing Demand - IA · -The kaggle forum, lot of code sharing about solutions. -Datascience.net, the « french kaggle ».-Paris Machine Learning Meetup (one every month) -Pandas

TRAIN - 10,886 rows

INPUT COLUMNS: - Datetime - Season - Holiday - Workingday - Weather - Temperature - Atemp "feels like" - Humidity - Windspeed !!ADDITIONAL COLUMNS (not in test): Casual - number of non-registered user rentals initiated Registered - number of registered user rentals initiated !!

Count - number of total rentals

TARGET:

Page 5: Bike Sharing Demand - IA · -The kaggle forum, lot of code sharing about solutions. -Datascience.net, the « french kaggle ».-Paris Machine Learning Meetup (one every month) -Pandas

Make your first preparation script.

Parse the date to extract dates components (year, month, day, ..) Remove columns registered and casual. Create new variables…

Page 6: Bike Sharing Demand - IA · -The kaggle forum, lot of code sharing about solutions. -Datascience.net, the « french kaggle ».-Paris Machine Learning Meetup (one every month) -Pandas

Create a recipe

Use this to « industrialize » the rebuild of this output.

So now, you are ready to make your first model

Our philosophy at Dataiku: Go fast on data cleaning and boring task to have a lot of time for the modeling part! :)

Page 7: Bike Sharing Demand - IA · -The kaggle forum, lot of code sharing about solutions. -Datascience.net, the « french kaggle ».-Paris Machine Learning Meetup (one every month) -Pandas

Wait… let’s analyse a little our data before!

Let’s plot the average count of bike taken by hour on working day & WE.

Add sliders on weather conditions.

Page 8: Bike Sharing Demand - IA · -The kaggle forum, lot of code sharing about solutions. -Datascience.net, the « french kaggle ».-Paris Machine Learning Meetup (one every month) -Pandas

Go create a new model

- Test some algorithms. - Understand the different evaluations metrics. (The leaderboard is score with RMSLE)

and score the test dataset.

Page 9: Bike Sharing Demand - IA · -The kaggle forum, lot of code sharing about solutions. -Datascience.net, the « french kaggle ».-Paris Machine Learning Meetup (one every month) -Pandas

Run the model on the test dataset

Page 10: Bike Sharing Demand - IA · -The kaggle forum, lot of code sharing about solutions. -Datascience.net, the « french kaggle ».-Paris Machine Learning Meetup (one every month) -Pandas

Make your first submission.

- Reshape it as demand by kaggle & download it.

Page 11: Bike Sharing Demand - IA · -The kaggle forum, lot of code sharing about solutions. -Datascience.net, the « french kaggle ».-Paris Machine Learning Meetup (one every month) -Pandas

See your perf on the leaderboad!

Page 12: Bike Sharing Demand - IA · -The kaggle forum, lot of code sharing about solutions. -Datascience.net, the « french kaggle ».-Paris Machine Learning Meetup (one every month) -Pandas

Next: improve your models

RE-BUILD

RE-RE-BUILD…

- Run more models. - Modify the model code in the iPython notebook - Explore the scikit-learn documentation…

Page 13: Bike Sharing Demand - IA · -The kaggle forum, lot of code sharing about solutions. -Datascience.net, the « french kaggle ».-Paris Machine Learning Meetup (one every month) -Pandas

- The kaggle forum, lot of code sharing about solutions. - Datascience.net, the « french kaggle ». - Paris Machine Learning Meetup (one every month) - Pandas (Python for data analysis) - Look for nbviewer in google or twitter. - Check news on datatau. - Check awesome-machine-learning list. - Some cool blog: dataiku, yhat, datarobot, fastML, fulmicoton.com - About dev & viz: D3js, AngularJS, Flask

Some resources you should check: