python for data science

40
PYTHON FOR DATA SCIENCE Gabriel Moreira Machine Learning Engineer @gspmoreira

Upload: gabriel-moreira

Post on 28-Jul-2015

808 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: Python for Data Science

PYTHON FOR DATA SCIENCE

Gabriel MoreiraMachine Learning Engineer

@gspmoreira

Page 2: Python for Data Science

https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century

Page 3: Python for Data Science

Why so much buzz?

Page 4: Python for Data Science

Big Data

Page 5: Python for Data Science

WHERE IS DATA SCIENCE BEEN USED?http://www.kdnuggets.com/2014/12/where-analytics-data-mining-data-science-applied.html

Page 6: Python for Data Science

RECOMMENDATIONS EVERYWHERE

Page 7: Python for Data Science

WHAT IS DATA SCIENCE

http://drewconway.com

Page 8: Python for Data Science

WHAT IS DATA SCIENTIST

http://www.datasciencecentral.com/profiles/blogs/are-you-a-data-scientist

A Data Scientist is someone with deliberate dual personality who can first build a curious business case defined with a telescopic vision and can then dive deep with microscopic lens to sift through DATA to reach the goal while defining and executing all the intermittent tasks.

Page 9: Python for Data Science

WHAT IS A DATA SCIENTIST?Data scientists explore and transform data in novel ways to create and publish new features and combine data from diverse sources to create new value. Data scientists make visualizations with researchers, engineers, web developers, and designers to expose raw, intermediate, and refined data early and often.

Applied researchers solve the heavy problems that data scientists uncover and that stand in the way of delivering value. These problems take intense effort and require novel methods from statistics and machine learning.

[Agile Data Science, O’Reilly, 2014]

Page 10: Python for Data Science

http://nirvacana.com/thoughts/becoming-a-data-scientist/Data Science MetroMap Curriculum

Page 11: Python for Data Science
Page 12: Python for Data Science

IS DATA SCIENTIST THENEW WEBMASTER?

Page 13: Python for Data Science

[Doing Data Science, O’Reilly, 2014]

Page 14: Python for Data Science

[Hillary Mason, Data Scientist]

Inquire(

Obtain(

Scrub(

Explore(

Model(

iNterpret(

DATA SCIENCE IS IOSEMN

Page 15: Python for Data Science

What about Python?

Page 16: Python for Data Science

Inquire(

Obtain(

Scrub(

Explore(

Model(

iNterpret(

PYTHON IS IOSEMN

js

Page 17: Python for Data Science

ANALYSIS CASE CORPORATE SOCIAL NETWORKS

Page 18: Python for Data Science

Inquire(

Obtain(

Scrub(

Explore(

Model(

iNterpret(

Page 19: Python for Data Science

INQUIRE1. Which communities are more popular?

2. Is the engagement of users in corporate communities increasing?

3. What is the distribution of posts publishing time, during the day?

4. What is the percentage of interactions (likes and comments)?

5. How is the likes distribution by user?

6. Is there a relationship between publishing hour and number of interactions?

7. What communities are more engaging (greater avg. interactions on posts)?

8. What are the most relevant words in the posts?

9. How to group posts about similar subjects?

Page 20: Python for Data Science

Inquire(

Obtain(

Scrub(

Explore(

Model(

iNterpret(

Page 21: Python for Data Science

OBTAIN•Download data from another location (e.g., a web page or server)

•Query data from a database (e.g., MySQL or Oracle)

•Extract data from an API (e.g., Twitter, Facebook) •Extract data from another file (e.g., an HTML file or spreadsheet)

•Generate data yourself (e.g., reading sensors or taking surveys)

Page 22: Python for Data Science

TWITTER PUBLIC STREAM API

Page 23: Python for Data Science

Inquire(

Obtain(

Scrub(

Explore(

Model(

iNterpret(

Page 24: Python for Data Science

Show me the code!

Page 25: Python for Data Science

Data Analysis with IPython Notebook Demo bit.ly/python4ds_nb

Page 26: Python for Data Science

Inquire(

Obtain(

Scrub(

Explore(

Model(

iNterpret(

Page 27: Python for Data Science

INTERPRET

•Drawing conclusions from your data

•Evaluating what your results mean

•Communicating your result

Page 28: Python for Data Science

DATA PRODUCTS“If information has context and the context is interactive, insights are not predictable."

[Agile Data Science, O’Reilly, 2014]

Page 29: Python for Data Science

SENTIMENT ANALYSIS

bit.ly/eleicoes2014debatesbt

Analytical Dashboard

Page 30: Python for Data Science

SENTIMENT ANALYSISAnalytical Dashboard

bit.ly/eleicoes2014debatesbt

Page 31: Python for Data Science

SENTIMENT ANALYSISDashboard Online - JavaScript

Page 32: Python for Data Science

NETWORK ANALYSIS

https://linkedjazz.orgjs

Page 33: Python for Data Science

What about Python for Big Data?

Page 34: Python for Data Science

PYTHON IN HADOOP• Hadoop Streaming - Allows MapReduce jobs from any

executable script - including Python!Example using AWS Elastic MapReduce: http://workingsweng.com.br/2014/04/clusterizando-raios-com-hadoop-e-k-means-em-map-reduce/

• Other supporting options for Python in Hadoop

HADOOPY

Pig UDFs in Jython

Page 35: Python for Data Science

THE NEXT-GEM DATA SCIENTIST

The best minds of my generation are thinking about how to make people click ads... That sucks. [Jeff Hammerbacher]

Next-gen data scientists don’t try to impress with complicated algorithms and models that don’t work. They spend a lot more time trying to get data into shape than anyone cares to admit—maybe up to 90% of their time. Finally, they don’t find religion in tools, methods, or academic depar tments . They are versat i le and interdisciplinary.

[Doing Data Science, O’Reilly, 2014]

Page 36: Python for Data Science

DATA SCIENCE COURSES

• Introduction to Data Science (Univ. of Washington)

• Data Science specialization (John Hopkins)

• Intro to Hadoop and MapReduce (Cloudera)

• Machine Learning (Stanford)

• Statistical Learning (Stanford)

http://workingsweng.com.br/2014/04/cursos-mooc-e-especializacoes-em-data-science/

Page 37: Python for Data Science

BOOKS

Page 38: Python for Data Science

The road can be challenging

Page 39: Python for Data Science

But may be fun!

Page 40: Python for Data Science

Gabriel Moreira@gspmoreirahttps://about.me/gspmoreira

Thank you!