an introduction to predictive analytics with big data and open source tools joe heary cto & vp...

Post on 21-Jan-2016

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

An Introduction to Predictive Analytics with Big Data and Open Source tools

Joe HearyCTO & VP of Technical OperationsZimmerman Associates, Inc. (ZAI)

November 5, 2015

What is Predictive Analytics

“A variety of statistical techniques from modeling, machine learning, and data

mining that analyze current and historical facts to make predictions about future, or otherwise

unknown, events.” - Wikipedia

11/5/2015 Leveraging Data to Lead 2

Predicting the Future Not really about

“predicting the future” About using Data,

Statistical Models, and Machine Learning to identify the likelihood of future outcomes from which we make decisions

Produce new insights that lead to better actions

11/5/2015 Leveraging Data to Lead 3

Machine Learning Evolved from pattern recognition and computation learning

theory in artificial intelligence Construction of algorithms that can learn from data Algorithms build models from example inputs to make

data-driven predictions rather than static program instructions

11/5/2015 Leveraging Data to Lead 4

Siegel, E. (2013). Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die. Hoboken: Wiley

What is Big Data?

“Big data is a collection of data from traditional and digital sources inside and outside your company that

represents a source for ongoing discovery and analysis.”

-- Lisa Arthur, Forbes / CMO Network

11/5/2015 Leveraging Data to Lead 5

Refers to the AMOUNT of data in terms of: VOLUME: the amount of data being generated VARIETY: the type of data (pictures, videos, text, audio, etc.) VELOCITY: the speed at which data is created or changes VERACITY: the truthfulness or adherence to the truth VALUE: the relative value of data to an organization

Big Data due to convergence of…

Big Data

Moore’s Law

Mobile Computin

g

Social Networkin

g

Cloud Computin

g

Leveraging Data to Lead11/5/2015 6

Data Growth

Leveraging Data to Lead

Atlantic Ocean = (est.) 100 Billion, billion Gallons of water

As of 2010, we currently create

2.5 quintillion bytes of data daily

(1018)

If 1 gallon = 1 byte…

11/5/2015 7

- Ken Gabriel, Director of DARPA, March 2012

The Atlantic Ocean could only contain the data created in 2010

- Eric Schmidt, CEO of Google,

2010

Approx. 80% of all data is

“unstructured”

Social Media’s Impact on Data Growth

Leveraging Data to Lead

2010: Eric Schmidt, then CEO of Google, estimates we now create as much data every 2 days as did since the dawn of time through 2003

Source: Skloog Blog

11/5/2015 8

Data Processing before Big Data

Leveraging Data to Lead11/5/2015 9

NoSQL and Hadoop

11/5/2015 Leveraging Data to Lead 10

Big Data software framework for storing data and running applications on clusters of commodity hardware. Has the ability to handle virtually limitless concurrent tasks or jobs.

Non-relational database in which data is stored and accessed from a model other than tabular relationships typical of Relational Database Management Systems (RDBMS)

SQL vs. NoSQL

11/5/2015 Leveraging Data to Lead 11

Vaes, Karem. "Database Variants Explained : SQL or NoSQL? Is That Really the Question?" Random Thoughts on Various Topics by an Information Technology Architect. Karim Vaes, 21 Jan. 2015. Web. 3 Nov. 2015.

NoSQL DB’s Classified by Data Model Column: Accumulo, Cassandra, Druid, HBase, Vertica Document: Clusterpoint, Apache CouchDB, Couchbase,

MarkLogic, MongoDB, OrientDB Key-value: Dynamo, FoundationDB, MemcacheDB, Redis,

Riak, FairCom c-treeACE, Aerospike, OrientDB Graph: Allegro, Neo4J, InfiniteGraph, OrientDB, Virtuoso,

Stardog Multi-model: OrientDB, FoundationDB, ArangoDB, Alchemy

Database, CortexDB

11/5/2015 Leveraging Data to Lead 12

Hadoop Distributed Filesystem (HDFS)

Leveraging Data to Lead11/5/2015 13

Brings compute resources to the data

Implements MapReduce to aggregate into useable summary data

Hadoop Distributed Filesystem (HDFS)

11/5/2015 Leveraging Data to Lead 14

Data NodeA

Data NodeB

Data NodeC

Data NodeD

3

5

1

3

5 4

2

1 4

2

5 3

2

4 1

Client

Name Node

TCP/IP Network

Metadata:Data X -> 1,2,3Data Y -> 4,5

Name Node contains metadata and location of the data

Shuffle/Sort

MapReduce in Hadoop Filesystem

11/5/2015 Leveraging Data to Lead 15

Input DataInput DataInput DataInput Data

Map

Map

Map

Map

Reduce

Reduce

Aggregate

Output

Big Data

No rows of data like RDBMS, only Key-value pairs

11/5/2015 Leveraging Data to Lead 16

Marketing Campaign 1,000,000 prospects $2 each to mail ($2M) 1% (1 out of 100) will buy (10,000) $220 revenue per sale

11/5/2015 Leveraging Data to Lead 17

($220 x 10,000) = $2,200,000- ($2 x 1,000,000) = $2,000,000

Profit = $200,000

Assigning a Predictive Score

11/5/2015 Leveraging Data to Lead 18

Siegel, E. (2013). Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die. Hoboken: Wiley

Targeted Marketing with PA PA results tell us which prospects are likely to respond ID 25% of prospects on list are 3X’s more likely to respond 1M reduced to 250,000 with a 3% response rate (7,500) $220 revenue per sale

$1,150,000 (452.5% increase) in profit

11/5/2015 Leveraging Data to Lead 19

($220 x 7,500) = $1,650,000 - (2$ x 250,000) = $500,000

Profit = $1,150,000

Recommendations: Similar to Others

11/5/2015 Leveraging Data to Lead 20

Recommendations: Closer to Home

Leveraging Data to Lead11/5/2015 21

Top 20 Open Source PA Software

11/5/2015 Leveraging Data to Lead 22

http://www.predictiveanalyticstoday.com/top-predictive-analytics-freeware-software/

• There are several Open Source and Freeware products available to perform Predictive Analytics

• “R” is one of the most popular, but the link below will provide plenty to choose from

Wrap-up and bring it home Convergence of technology leads to Big Data You’re best bet is listening to what the data tells you rather than asking

for an answer to a question that you already know the answer to Real Benefits of Predictive Analytics is the ability to find patterns in

data that you were not aware of before Creating new markets and new opportunities based on data analysis

Using Predictive Analytics with Big Data is truly using data to lead!

Leveraging Data to Lead11/5/2015 23

Question & Answer

Leveraging Data to Lead11/5/2015 24

top related