an introduction to predictive analytics with big data and open source tools joe heary cto & vp...

24
An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI) November 5, 2015

Upload: horatio-robertson

Post on 21-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)

An Introduction to Predictive Analytics with Big Data and Open Source tools

Joe HearyCTO & VP of Technical OperationsZimmerman Associates, Inc. (ZAI)

November 5, 2015

Page 2: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)

What is Predictive Analytics

“A variety of statistical techniques from modeling, machine learning, and data

mining that analyze current and historical facts to make predictions about future, or otherwise

unknown, events.” - Wikipedia

11/5/2015 Leveraging Data to Lead 2

Page 3: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)

Predicting the Future Not really about

“predicting the future” About using Data,

Statistical Models, and Machine Learning to identify the likelihood of future outcomes from which we make decisions

Produce new insights that lead to better actions

11/5/2015 Leveraging Data to Lead 3

Page 4: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)

Machine Learning Evolved from pattern recognition and computation learning

theory in artificial intelligence Construction of algorithms that can learn from data Algorithms build models from example inputs to make

data-driven predictions rather than static program instructions

11/5/2015 Leveraging Data to Lead 4

Siegel, E. (2013). Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die. Hoboken: Wiley

Page 5: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)

What is Big Data?

“Big data is a collection of data from traditional and digital sources inside and outside your company that

represents a source for ongoing discovery and analysis.”

-- Lisa Arthur, Forbes / CMO Network

11/5/2015 Leveraging Data to Lead 5

Refers to the AMOUNT of data in terms of: VOLUME: the amount of data being generated VARIETY: the type of data (pictures, videos, text, audio, etc.) VELOCITY: the speed at which data is created or changes VERACITY: the truthfulness or adherence to the truth VALUE: the relative value of data to an organization

Page 6: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)

Big Data due to convergence of…

Big Data

Moore’s Law

Mobile Computin

g

Social Networkin

g

Cloud Computin

g

Leveraging Data to Lead11/5/2015 6

Page 7: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)

Data Growth

Leveraging Data to Lead

Atlantic Ocean = (est.) 100 Billion, billion Gallons of water

As of 2010, we currently create

2.5 quintillion bytes of data daily

(1018)

If 1 gallon = 1 byte…

11/5/2015 7

- Ken Gabriel, Director of DARPA, March 2012

The Atlantic Ocean could only contain the data created in 2010

- Eric Schmidt, CEO of Google,

2010

Approx. 80% of all data is

“unstructured”

Page 8: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)

Social Media’s Impact on Data Growth

Leveraging Data to Lead

2010: Eric Schmidt, then CEO of Google, estimates we now create as much data every 2 days as did since the dawn of time through 2003

Source: Skloog Blog

11/5/2015 8

Page 9: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)

Data Processing before Big Data

Leveraging Data to Lead11/5/2015 9

Page 10: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)

NoSQL and Hadoop

11/5/2015 Leveraging Data to Lead 10

Big Data software framework for storing data and running applications on clusters of commodity hardware. Has the ability to handle virtually limitless concurrent tasks or jobs.

Non-relational database in which data is stored and accessed from a model other than tabular relationships typical of Relational Database Management Systems (RDBMS)

Page 11: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)

SQL vs. NoSQL

11/5/2015 Leveraging Data to Lead 11

Vaes, Karem. "Database Variants Explained : SQL or NoSQL? Is That Really the Question?" Random Thoughts on Various Topics by an Information Technology Architect. Karim Vaes, 21 Jan. 2015. Web. 3 Nov. 2015.

Page 12: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)

NoSQL DB’s Classified by Data Model Column: Accumulo, Cassandra, Druid, HBase, Vertica Document: Clusterpoint, Apache CouchDB, Couchbase,

MarkLogic, MongoDB, OrientDB Key-value: Dynamo, FoundationDB, MemcacheDB, Redis,

Riak, FairCom c-treeACE, Aerospike, OrientDB Graph: Allegro, Neo4J, InfiniteGraph, OrientDB, Virtuoso,

Stardog Multi-model: OrientDB, FoundationDB, ArangoDB, Alchemy

Database, CortexDB

11/5/2015 Leveraging Data to Lead 12

Page 13: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)

Hadoop Distributed Filesystem (HDFS)

Leveraging Data to Lead11/5/2015 13

Brings compute resources to the data

Implements MapReduce to aggregate into useable summary data

Page 14: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)

Hadoop Distributed Filesystem (HDFS)

11/5/2015 Leveraging Data to Lead 14

Data NodeA

Data NodeB

Data NodeC

Data NodeD

3

5

1

3

5 4

2

1 4

2

5 3

2

4 1

Client

Name Node

TCP/IP Network

Metadata:Data X -> 1,2,3Data Y -> 4,5

Name Node contains metadata and location of the data

Page 15: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)

Shuffle/Sort

MapReduce in Hadoop Filesystem

11/5/2015 Leveraging Data to Lead 15

Input DataInput DataInput DataInput Data

Map

Map

Map

Map

Reduce

Reduce

Aggregate

Output

Big Data

No rows of data like RDBMS, only Key-value pairs

Page 16: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)

11/5/2015 Leveraging Data to Lead 16

Page 17: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)

Marketing Campaign 1,000,000 prospects $2 each to mail ($2M) 1% (1 out of 100) will buy (10,000) $220 revenue per sale

11/5/2015 Leveraging Data to Lead 17

($220 x 10,000) = $2,200,000- ($2 x 1,000,000) = $2,000,000

Profit = $200,000

Page 18: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)

Assigning a Predictive Score

11/5/2015 Leveraging Data to Lead 18

Siegel, E. (2013). Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die. Hoboken: Wiley

Page 19: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)

Targeted Marketing with PA PA results tell us which prospects are likely to respond ID 25% of prospects on list are 3X’s more likely to respond 1M reduced to 250,000 with a 3% response rate (7,500) $220 revenue per sale

$1,150,000 (452.5% increase) in profit

11/5/2015 Leveraging Data to Lead 19

($220 x 7,500) = $1,650,000 - (2$ x 250,000) = $500,000

Profit = $1,150,000

Page 20: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)

Recommendations: Similar to Others

11/5/2015 Leveraging Data to Lead 20

Page 21: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)

Recommendations: Closer to Home

Leveraging Data to Lead11/5/2015 21

Page 22: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)

Top 20 Open Source PA Software

11/5/2015 Leveraging Data to Lead 22

http://www.predictiveanalyticstoday.com/top-predictive-analytics-freeware-software/

• There are several Open Source and Freeware products available to perform Predictive Analytics

• “R” is one of the most popular, but the link below will provide plenty to choose from

Page 23: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)

Wrap-up and bring it home Convergence of technology leads to Big Data You’re best bet is listening to what the data tells you rather than asking

for an answer to a question that you already know the answer to Real Benefits of Predictive Analytics is the ability to find patterns in

data that you were not aware of before Creating new markets and new opportunities based on data analysis

Using Predictive Analytics with Big Data is truly using data to lead!

Leveraging Data to Lead11/5/2015 23

Page 24: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)

Question & Answer

Leveraging Data to Lead11/5/2015 24