![Page 1: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697c0061a28abf838cc545c/html5/thumbnails/1.jpg)
An Introduction to Predictive Analytics with Big Data and Open Source tools
Joe HearyCTO & VP of Technical OperationsZimmerman Associates, Inc. (ZAI)
November 5, 2015
![Page 2: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697c0061a28abf838cc545c/html5/thumbnails/2.jpg)
What is Predictive Analytics
“A variety of statistical techniques from modeling, machine learning, and data
mining that analyze current and historical facts to make predictions about future, or otherwise
unknown, events.” - Wikipedia
11/5/2015 Leveraging Data to Lead 2
![Page 3: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697c0061a28abf838cc545c/html5/thumbnails/3.jpg)
Predicting the Future Not really about
“predicting the future” About using Data,
Statistical Models, and Machine Learning to identify the likelihood of future outcomes from which we make decisions
Produce new insights that lead to better actions
11/5/2015 Leveraging Data to Lead 3
![Page 4: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697c0061a28abf838cc545c/html5/thumbnails/4.jpg)
Machine Learning Evolved from pattern recognition and computation learning
theory in artificial intelligence Construction of algorithms that can learn from data Algorithms build models from example inputs to make
data-driven predictions rather than static program instructions
11/5/2015 Leveraging Data to Lead 4
Siegel, E. (2013). Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die. Hoboken: Wiley
![Page 5: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697c0061a28abf838cc545c/html5/thumbnails/5.jpg)
What is Big Data?
“Big data is a collection of data from traditional and digital sources inside and outside your company that
represents a source for ongoing discovery and analysis.”
-- Lisa Arthur, Forbes / CMO Network
11/5/2015 Leveraging Data to Lead 5
Refers to the AMOUNT of data in terms of: VOLUME: the amount of data being generated VARIETY: the type of data (pictures, videos, text, audio, etc.) VELOCITY: the speed at which data is created or changes VERACITY: the truthfulness or adherence to the truth VALUE: the relative value of data to an organization
![Page 6: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697c0061a28abf838cc545c/html5/thumbnails/6.jpg)
Big Data due to convergence of…
Big Data
Moore’s Law
Mobile Computin
g
Social Networkin
g
Cloud Computin
g
Leveraging Data to Lead11/5/2015 6
![Page 7: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697c0061a28abf838cc545c/html5/thumbnails/7.jpg)
Data Growth
Leveraging Data to Lead
Atlantic Ocean = (est.) 100 Billion, billion Gallons of water
As of 2010, we currently create
2.5 quintillion bytes of data daily
(1018)
If 1 gallon = 1 byte…
11/5/2015 7
- Ken Gabriel, Director of DARPA, March 2012
The Atlantic Ocean could only contain the data created in 2010
- Eric Schmidt, CEO of Google,
2010
Approx. 80% of all data is
“unstructured”
![Page 8: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697c0061a28abf838cc545c/html5/thumbnails/8.jpg)
Social Media’s Impact on Data Growth
Leveraging Data to Lead
2010: Eric Schmidt, then CEO of Google, estimates we now create as much data every 2 days as did since the dawn of time through 2003
Source: Skloog Blog
11/5/2015 8
![Page 9: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697c0061a28abf838cc545c/html5/thumbnails/9.jpg)
Data Processing before Big Data
Leveraging Data to Lead11/5/2015 9
![Page 10: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697c0061a28abf838cc545c/html5/thumbnails/10.jpg)
NoSQL and Hadoop
11/5/2015 Leveraging Data to Lead 10
Big Data software framework for storing data and running applications on clusters of commodity hardware. Has the ability to handle virtually limitless concurrent tasks or jobs.
Non-relational database in which data is stored and accessed from a model other than tabular relationships typical of Relational Database Management Systems (RDBMS)
![Page 11: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697c0061a28abf838cc545c/html5/thumbnails/11.jpg)
SQL vs. NoSQL
11/5/2015 Leveraging Data to Lead 11
Vaes, Karem. "Database Variants Explained : SQL or NoSQL? Is That Really the Question?" Random Thoughts on Various Topics by an Information Technology Architect. Karim Vaes, 21 Jan. 2015. Web. 3 Nov. 2015.
![Page 12: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697c0061a28abf838cc545c/html5/thumbnails/12.jpg)
NoSQL DB’s Classified by Data Model Column: Accumulo, Cassandra, Druid, HBase, Vertica Document: Clusterpoint, Apache CouchDB, Couchbase,
MarkLogic, MongoDB, OrientDB Key-value: Dynamo, FoundationDB, MemcacheDB, Redis,
Riak, FairCom c-treeACE, Aerospike, OrientDB Graph: Allegro, Neo4J, InfiniteGraph, OrientDB, Virtuoso,
Stardog Multi-model: OrientDB, FoundationDB, ArangoDB, Alchemy
Database, CortexDB
11/5/2015 Leveraging Data to Lead 12
![Page 13: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697c0061a28abf838cc545c/html5/thumbnails/13.jpg)
Hadoop Distributed Filesystem (HDFS)
Leveraging Data to Lead11/5/2015 13
Brings compute resources to the data
Implements MapReduce to aggregate into useable summary data
![Page 14: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697c0061a28abf838cc545c/html5/thumbnails/14.jpg)
Hadoop Distributed Filesystem (HDFS)
11/5/2015 Leveraging Data to Lead 14
Data NodeA
Data NodeB
Data NodeC
Data NodeD
3
5
1
3
5 4
2
1 4
2
5 3
2
4 1
Client
Name Node
TCP/IP Network
Metadata:Data X -> 1,2,3Data Y -> 4,5
Name Node contains metadata and location of the data
![Page 15: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697c0061a28abf838cc545c/html5/thumbnails/15.jpg)
Shuffle/Sort
MapReduce in Hadoop Filesystem
11/5/2015 Leveraging Data to Lead 15
Input DataInput DataInput DataInput Data
Map
Map
Map
Map
Reduce
Reduce
Aggregate
Output
Big Data
No rows of data like RDBMS, only Key-value pairs
![Page 16: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697c0061a28abf838cc545c/html5/thumbnails/16.jpg)
11/5/2015 Leveraging Data to Lead 16
![Page 17: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697c0061a28abf838cc545c/html5/thumbnails/17.jpg)
Marketing Campaign 1,000,000 prospects $2 each to mail ($2M) 1% (1 out of 100) will buy (10,000) $220 revenue per sale
11/5/2015 Leveraging Data to Lead 17
($220 x 10,000) = $2,200,000- ($2 x 1,000,000) = $2,000,000
Profit = $200,000
![Page 18: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697c0061a28abf838cc545c/html5/thumbnails/18.jpg)
Assigning a Predictive Score
11/5/2015 Leveraging Data to Lead 18
Siegel, E. (2013). Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die. Hoboken: Wiley
![Page 19: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697c0061a28abf838cc545c/html5/thumbnails/19.jpg)
Targeted Marketing with PA PA results tell us which prospects are likely to respond ID 25% of prospects on list are 3X’s more likely to respond 1M reduced to 250,000 with a 3% response rate (7,500) $220 revenue per sale
$1,150,000 (452.5% increase) in profit
11/5/2015 Leveraging Data to Lead 19
($220 x 7,500) = $1,650,000 - (2$ x 250,000) = $500,000
Profit = $1,150,000
![Page 20: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697c0061a28abf838cc545c/html5/thumbnails/20.jpg)
Recommendations: Similar to Others
11/5/2015 Leveraging Data to Lead 20
![Page 21: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697c0061a28abf838cc545c/html5/thumbnails/21.jpg)
Recommendations: Closer to Home
Leveraging Data to Lead11/5/2015 21
![Page 22: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697c0061a28abf838cc545c/html5/thumbnails/22.jpg)
Top 20 Open Source PA Software
11/5/2015 Leveraging Data to Lead 22
http://www.predictiveanalyticstoday.com/top-predictive-analytics-freeware-software/
• There are several Open Source and Freeware products available to perform Predictive Analytics
• “R” is one of the most popular, but the link below will provide plenty to choose from
![Page 23: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697c0061a28abf838cc545c/html5/thumbnails/23.jpg)
Wrap-up and bring it home Convergence of technology leads to Big Data You’re best bet is listening to what the data tells you rather than asking
for an answer to a question that you already know the answer to Real Benefits of Predictive Analytics is the ability to find patterns in
data that you were not aware of before Creating new markets and new opportunities based on data analysis
Using Predictive Analytics with Big Data is truly using data to lead!
Leveraging Data to Lead11/5/2015 23
![Page 24: An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697c0061a28abf838cc545c/html5/thumbnails/24.jpg)
Question & Answer
Leveraging Data to Lead11/5/2015 24