data science 101

Post on 24-Feb-2016

71 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Data Science 101. A Love Story. Agenda. Introduction to Data Science Who’s who in Data Science? That Data Science Life. [Case Study] How Spotify manages their data. [VM] The Data Science life at VaynerMedia. Conclusions. “If you can measure it, you can hack it.”. E -> A -> E. - PowerPoint PPT Presentation

TRANSCRIPT

Data Science 101

A Love Story

Agenda

• Introduction to Data Science• Who’s who in Data Science?• That Data Science Life.• [Case Study] How Spotify manages their data.• [VM] The Data Science life at VaynerMedia.• Conclusions.

“If you can measure it, you can hack it.”

E -> A -> E

We’re generating (and tracking) exponentially more data online than ever before.

Big Data is big.

5,000,000,000 GB/2 Days

We’re always playing catch-up.

“Innovative Solutions” >

“Industry Standards”

Data Scientists are “Innovative Problem Solvers”

I get it. “Big Data” is real, and Data Scientists are

awesome.

But what is a Data Scientist? Who are they, and

how do they work with “Big Data”?

VM

DJ Patil is a huge influencer in this space.

Why is DJ Patil so popular?

LinkedIn and People You May Know

Angel has 2 mutual friends with Vikash.Tim has 20 mutual friends with

Vikash.If John is friends with Vikash, he might know Tim and his mutual friends.

This increased platform usage, making the experience on LinkedIn more valuable.

Active Users = selling point for LinkedIn when pitching to Brands.

Leg up to users looking for employment in the informal job market.

Big Data.Real Business objective.

Simple Analysis.Valuable Data-driven Product.

“Patil Effect”

VM analysts do the same thing, we just don’t use the same tools.

10^100

Google started downloading the entire internet in the late 90s-early 00s.

“It’s not you, it’s me.”- Google

Google created a better way to process Big Data. They created MapReduce.

Yahoo! wanted to download the internet too.

They liked MapReduce so much that they created Hadoop.

Hadoop is an open sourced distributed file system technology built using MapReduce.

Developed by the folks over at Facebook.

Hive is a data “warehouse” tool built to query Hadoop systems.

Querying this data also allows us to work on our data retrieval skills.

Less time cleaning data.Less time “fishing”.Less spreadsheets.

BOOM.

Amazon Web Services makes computing data in the cloud easy and cheap.

No need for huge data centers on site.

Pay for what you use.

Makes it easy to move data around in the cloud.

How does a company actually use all of these cool tools?

Spotify Client

AWS EMR(Hadoop)

PostgreSQL

Hive (data warehouse infrastructure; SQL-like

syntax)

AdHoc MapReduce

Jobs

How does all of this fit in to VaynerMedia?

VM

Where do analysts fall under the VM umbrella?

Optimizing Content.Optimizing Ad Spends.

Understanding Overall Trends.

We could also develop data-driven products.

Business Objective (s):

-How are we doing against our competitors/ourselves?

-How is our content performing this week?

Math Skills: How do we calculate engagements appropriately? What are my KPIs?

Hacking Skills: How do I get a hold of all of the public data needed for the analysis?

We can also apply a similar methodology to ads.

Trending topics in real time.

Big Picture

Top Phrases available in API in real time.

Demo information is also available.

Other data points attached to stories.

Using the Bit.ly API, we can pull all of this data.Using R, we can analyze the data.

We can adjust our targeting buckets in real time.

Doesn’t matter what we do, as long we develop our core skills.

All of the cool tools that large companies use aren’t necessary for us to be called “Data scientists”.

A carpenter isn’t judged by the tools he uses, but by the things he builds.

Data Science is a method of problem solving.

We are Data scientists.

Questions/Comments?

top related