1
Dr. Chen, Data Base Management
Chapter 6:
Big Data and Analytics
Jason C. H. Chen, Ph.D.
Professor of MIS
School of Business Administration
Gonzaga University
Spokane, WA 99258
Dr. Chen, Business Intelligence
2
Learning Objectives
• Learn what Big Data is and how it is changing the world of analytics
• Understand the motivation for and business drivers of Big Data analytics
• Become familiar with the wide range of enabling technologies for Big Data analytics
• Learn about Hadoop, MapReduce, and NoSQL as they relate to Big Data analytics
• Understand the role of and capabilities/skills for data scientist as a new analytics profession
(Continued…)
Dr. Chen, Business Intelligence 3
Learning Objectives
• Compare and contrast the complementary uses of
data warehousing and Big Data
• Become familiar with the vendors of Big Data
tools and services
• Understand the need for and appreciate the
capabilities of stream analytics
• Learn about the applications of stream analytics
Dr. Chen, Business Intelligence 4
Opening Vignette…
Big Data Meets Big Science at CERN
• Situation
• Problem
• Solution
• Results
• Answer & discuss the case questions.
Dr. Chen, Business Intelligence 5
Questions for the Opening
Vignette
• What is CERN, and why is it important to the world of
science?
• How does the Large Hadron Collider work? What does
it produce?
• What is the essence of the data challenge at CERN?
How significant is it?
• What was the solution? How were the Big Data
challenges addressed with this solution?
• What were the results? Do you think the current
solution is sufficient?
Dr. Chen, Business Intelligence 6
• 1. What is CERN, and why is it important to the
world of science?
• CERN is the European Organization for Nuclear
Research. It plays a leading role in fundamental
studies of physics.
• It has been instrumental in many key global
innovations and breakthrough discoveries in
theoretical physics and today operates the world’s
largest particle physics laboratory, home to the
Large Hadron Collider (LHC).
2
Dr. Chen, Business Intelligence 7
• 2. How does Large Hadron Collider work? What
does it produce?
• The LHS houses the world’s largest and the most
sophisticated scientific instruments to study the basic
constituents of matter—the fundamental particles. These
instruments include purpose-built particle accelerators
and detectors.
• Accelerators boost the beams of particles to very high
energies before the beams are forced to collide with each
other or with stationary targets. Detectors observe and
record the results of these collisions, which are
happening at or near the speed of light. This process
provides the physicists with clues about how the
particles interact, and provides insights into the
fundamental laws of nature Dr. Chen, Business Intelligence 8
• 3. What is the essence of the data challenge at CERN?
How significant is it?
• Collision events in LHC occur 40 million times per
second, resulting in 15 petabytes of data produced
annually at the CERN Data Centre. CERN does not have
the capacity to process all of the data that it generates,
and therefore relies on numerous other research centers
all around the world to access and process the data.
• Processing all the data of their experiments requires an
enormously complex distributed and heterogeneous
computing and data management system. With such vast
quantities of data, both structured and unstructured,
information discovery is a big challenge for CERN.
Dr. Chen, Business Intelligence 9
• 4. What was the solution? How were the Big
Data challenges addressed with this solution?
• CMS’s data management and workflow
management (DMWM) created the Data
Aggregation System (DAS), built on MongoDB (a
Big Data management infrastructure) to provide
the ability to search and aggregate information
across this complex data landscape.
• MongoDB’s non-schema structure allowed
flexible data structures to be stored and indexed.
Dr. Chen, Business Intelligence 10
• 5. What were the results? Do you think the
current solution is sufficient?
• DAS is used 24 hours a day, seven days a week,
by CMS physicists, data operators, and data
managers at research facilities around the world.
The performance of MongoDB has been
outstanding, with an ability to offer a free text
query system that is fast and scalable.
• Without help from DAS, information lookup
would have taken orders of magnitude longer. The
current solution is outstanding, but more
improvements can be made and CERN is looking
to apply big data approached beyond CMS.
Dr. Chen, Business Intelligence 11
Big Data - Definition and
Concepts
• Big Data means different things to people with different
backgrounds and interests
• Traditionally, “Big Data” = _______ volumes of data
E.g., volume of data at CERN, NASA, Google, …
• Where does the Big Data come from?
Everywhere! Web logs, RFID, GPS systems, sensor networks,
social networks, Internet-based text documents, Internet search
indexes, detail call records, astronomy, atmospheric science,
biology, genomics, nuclear physics, biochemical experiments,
medical records, scientific research, military surveillance,
multimedia archives, …
massive
Dr. Chen, Business Intelligence 12
Technology Insights 6.1
The Data Size Is Getting Big, Bigger,
…
• Hadron Collider - 1 PB/sec
• Boeing jet - 20 TB/hr
• Facebook - 500 TB/day
• YouTube – 1 TB/4 min
• The proposed Square
Kilometer Array telescope
(the world’s proposed biggest
telescope) – 1 EB/day
3
Dr. Chen, Business Intelligence 13
Big Data - Definition and
Concepts
• Big Data is a misnomer!
• Big Data is more than just “big”
• The Vs that define Big Data
_______
_______
_______
Veracity
Variability
Value
…
Volume
Variety
Velocity
Dr. Chen, Business Intelligence 14
A High-Level Conceptual Architecture for Big Data
Solutions (by AsterData / Teradata)
Math
and Stats
Data
Mining
Business
Intelligence
Applications
Languages
Marketing
ANALYTIC TOOLS & APPS USERS
DISCOVERY PLATFORM
INTEGRATED DATA WAREHOUSE
DATAPLATFORM
ACCESSMANAGEMOVE
UNIFIED DATA ARCHITECTURESystem Conceptual View
Marketing
Executives
Operational
Systems
Frontline
Workers
Customers
Partners
Engineers
Data
Scientists
Business
Analysts
EVENT PROCESSING
ERPERP
SCM
CRM
Images
Audio
and Video
Machine
Logs
Text
Web and
Social
BIG DATA SOURCES
ERP
Dr. Chen, Business Intelligence 15
Fundamentals of Big Data
Analytics
• Big Data by itself, regardless of the size, type, or speed, is worthless
• Big Data + “big” _________ = value
• With the value proposition, Big Data also brought about big challenges
Effectively and efficiently capturing, storing, and analyzing Big Data
New breed of technologies needed (developed or purchased or hired or outsourced …)
analytics
Dr. Chen, Business Intelligence 16
Big Data Considerations
• You can’t process the amount of data that you want to because of
the limitations of your current platform.
• You can’t include new/contemporary data sources (e.g., social
media, RFID, Sensory, Web, GPS, textual data) because it does
not comply with the data storage schema
• You need to (or want to) integrate data as quickly as possible to
be current on your analysis.
• You want to work with a schema-on-demand data storage
paradigm because of the variety of data types involved.
• The data is arriving so fast at your organization’s doorstep that
your traditional analytics platform cannot handle it.
• …
Dr. Chen, Business Intelligence 17
Critical Success Factors for
Big Data Analytics
• A clear business need (alignment with the vision and the strategy)
• Strong, committed sponsorship (executive champion)
• Alignment between the business and IT strategy
• A fact-based decision-making culture
• A strong data infrastructure
• The right analytics tools
• Right people with right skills
Dr. Chen, Business Intelligence 18
Critical Success Factors for Big Data Analytics
Keys to Success
with Big Data Analytics
A Clear business need
Strong , committed sponsorship
Alignment between the
business and IT strategy
A fact - based decision - making
culture
A strong data infrastructure
The right analytics tools
Personnel with advanced
analytical skills
4
Dr. Chen, Business Intelligence 19
Enablers of Big Data Analytics
• __________ analytics Storing and processing the complete data set in RAM
• __________ analytics Placing analytic procedures close to where data is stored
• Grid computing & MPP Use of many machines and processors in parallel (MPP -
massively parallel processing)
• Appliances Combining hardware, software, and storage in a single unit
for performance and scalability
In-memory
In-database
Dr. Chen, Business Intelligence 20
Challenges of Big Data Analytics
• Data _______
The ability to capture, store, and process the huge volume of
data in a timely manner
• Data ________
The ability to combine data quickly and at reasonable cost
• Processing capabilities
The ability to process the data quickly, as it is captured (i.e.,
stream analytics)
• Data __________ (… security, privacy, access)
• Skill availability (… data scientist)
• Solution cost (ROI)
volume
integration
governance
Dr. Chen, Business Intelligence 21
Business Problems Addressed by
Big Data Analytics
• Process efficiency and cost reduction
• Brand management
• Revenue maximization (e.g., cross-selling/ up-selling/ bundled-selling)
• Enhanced customer experience
• Churn identification, customer recruiting
• Improved customer service
• Identifying new products and market opportunities
• Risk management
• Regulatory compliance
• Enhanced security capabilities
• … Dr. Chen, Business Intelligence
22
Application Case 6.2
Top 5 Investment Bank Achieves Single Source of the Truth
Questions for Discussion
1. How can Big Data benefit large-scale trading banks?
2. How did MarkLogic’s infrastructure help ease the leveraging of Big Data?
3. What were the challenges, the proposed solution, and the obtained results?
Dr. Chen, Business Intelligence 23
Application Case 6.2
Before After
Before it was difficult to identify financial exposure across many systems (separate copies of derivatives trade store)
After it was possible to analyze all contracts in single database (MarkLogic Server eliminates the need for 20 database copies)
Moving from many old systems to a unified new system
Dr. Chen, Business Intelligence 24
• 1. How can Big Data benefit large-scale trading
banks?
•
• Big Data can potentially handle the high volume,
high variability, continuously streaming data that
trading banks need to deal with.
• Traditional relational databases are often unable to
keep up with the data demand.
5
Dr. Chen, Business Intelligence 25
• 2. How did MarkLogic infrastructure help ease the
leveraging of Big Data?
•
• MarkLogic was able to meet two needs: (1)
upgrading existing Oracle and Sybase platforms, and
(2) compliance with regulatory requirements.
• Their solution provided better performance,
scalability, and faster development for future
requirements than competitors.
• Most importantly, MarkLogic was able to eliminate
the need for replicated database servers by providing
a single server providing timely access to the data.
Dr. Chen, Business Intelligence 26
• 3. What were the challenges, the proposed solution,
and the obtained results?
•
• The Bank’s legacy system was built on relational
database technology. As data volumes and variability
increased, the legacy system was not fast enough to
respond to growing business needs and requirements.
• It was unable to deliver real-time alerts to manage
market and counterparty credit positions in the desired
timeframe. Big data offered the scalability to address
this problem.
• The benefits included a new alert feature, less downtime
for maintenance, much faster capacity to process
complex changes, and reduced operations costs.
Dr. Chen, Business Intelligence 27
Big Data Technologies
• There are a number of technologies for processing and
analyzing Big Data, but most have some common
characteristics. Namely:
take advantage of commodity hardware to enable scale-out
and parallel-processing techniques;
employ non-relational data storage capabilities to process
unstructured and semi-structured data;
apply advanced analytics and data visualization technology
to Big data to convey insights to end users.
• Three major technologies will transform the business
analytics and data management markets:
MapReduce, Hadoop, and
NoSQL Dr. Chen, Business Intelligence
28
Big Data Technologies
• MapReduce …
• Hadoop …
• Hive
• Pig
• Hbase
• Flume
• Oozie
• Ambari
• Avro
• Mahout, Sqoop, Hcatalog, ….
Dr. Chen, Business Intelligence 29
Big Data Technologies - MapReduce
• MapReduce
distributes the processing of very large multi-structured data
files across a large cluster of ordinary machines/processors
by breaking the processing into small units of work
is a programming model not a programming language
• Goal
achieving high performance with many “simple” computers
processing and analyzing large volumes of multi-structured
data in a timely manner
• Developed and popularized by Google
• Example tasks: indexing the Web for search, graph
analysis, text analysis, machine learning, …
Dr. Chen, Business Intelligence 30
MapReduce Processes
• Objective: to count the number of squares of each color
• Input: a set of colored squares
• Responsibility of programmers: coding the map and reducing
programs; the remainder of the processing is handled by the
software system implementing the MapReduce programming model
• Processes: the MapReduce system first reads the input file and
splits it into multiple pieces.
In this example, there are two splits, but in a real-life scenario, the
number of splits would typically be much higher. These splits are then
processed by multiple map programs running in parallel on the nodes
of the clusters.
• Role of each map program: groups the data in a split by color, the
MapReduce system then takes the output from each map program
and merges (shuffle/sort) the results for input to the reduce
program, which calculates the sum of the number of squares of
each color.
In this example, only one copy of the reduce program is used
6
Dr. Chen, Business Intelligence 31
Big Data Technologies MapReduce
How does
MapReduce
work? 4
3
3
3
3
Raw Data Map Function Reduce Function
Fig. 6.4 A Graphical Depiction of the MapReduce Process Dr. Chen, Business Intelligence 32
Big Data Technologies
Hadoop
• Hadoop is an open source framework for storing and
analyzing massive amounts of distributed, unstructured
data in a cost- and time-effective manner.
• Originally created by Doug Cutting at Yahoo!
• Hadoop clusters run on inexpensive commodity
hardware so projects can scale-out inexpensively
• Hadoop is now part of Apache Software Foundation
• Open source - hundreds of contributors continuously
improve the core technology
• MapReduce + Hadoop = Big Data core technology
Dr. Chen, Business Intelligence 33
Big Data Technologies Hadoop
• How Does Hadoop Work?
Access unstructured and semi-structured data (e.g., log files, social
media feeds, other data sources)
Break the data up into “parts,” which are then loaded into a file
system made up of multiple nodes running on commodity hardware
using Hadoop Distributed File System (HDFS).
Each “part” is replicated multiple times and loaded into the file
system for replication and failsafe processing
A node acts as the Facilitator and another as Job Tracker
Facilitator: communicate back to the client information such aws which nodes
are available, where in the cluster certain data resides, and which node have
failed
Job Tracker: determine which data it needs to access to complete the job and
where in the cluster that data is located.
The job Tracker submits the query to the relevant nodes rather than bringing all
the data back into a central location for processing. Processing then occurs at
each node simultaneously, or in parallel – an essential characteristics of Hadoop. Dr. Chen, Business Intelligence
34
Big Data Technologies Hadoop
• How Does Hadoop Work? (cont.) - Jobs are distributed
to the clients, and once completed, the results are
collected and aggregated using MapReduce
When each node has finished processing its given job, it stores the
results.
The client initiates a “Reduce” job through the Job Tracker in
which results of the map phase stored locally on individual nodes
are aggregated to determine the “answer” to the original query, and
then are loaded onto another node in the cluster.
The client accesses these results, which can then be loaded into one
of a number of analytic environments for analysis.
The MapReduce job has now been completed.
Once the MapReduce phase is complete, the processed data is ready
for further analysis by data scientists and others with advanced data
analytics skills.
Dr. Chen, Business Intelligence 35
Big Data Technologies
Hadoop
• Hadoop Technical Components
Hadoop Distributed File System (HDFS)
Name Node (primary facilitator)
Secondary Node (backup to Name Node)
Job Tracker
Slave Nodes (the grunts of any Hadoop cluster)
Additionally, Hadoop ecosystem is made up of a
number of complementary sub-projects: NoSQL
(Cassandra, Hbase), DW (Hive), …
NoSQL = not only SQL Dr. Chen, Business Intelligence
36
Big Data Technologies
Hadoop - Demystifying Facts
• Hadoop consists of multiple products
• Hadoop is open source but available from vendors too
• Hadoop is an ecosystem, not a single product
• HDFS is a file system, not a DBMS
• Hive resembles SQL but is not standard SQL
• Hadoop and MapReduce are related but not the same
• MapReduce provides control for analytics, not analytics
• Hadoop is about data diversity, not just data volume
• Hadoop complements a DW; it’s rarely a replacement
• Hadoop enables many types of analytics, not just Web analytics
7
Dr. Chen, Business Intelligence 37
Data Scientist
“The Sexiest Job of the 21st Century” Thomas H. Davenport and D. J. Patil
Harvard Business Review, October 2012
• Data Scientist = Big Data guru
One with skills to __________ Big Data
• Very high salaries, very high expectations
• Where do Data Scientists come from?
M.S./Ph.D. in MIS, CS, IE,… and/or Analytics
There is not a specific degree program for DS!
PE, PML, … DSP (Data Science Professional)
investigate
Dr. Chen, Business Intelligence 38
Skills That Define a Data Scientist
Curiosity and Creativity
Internet and Social Media / Social Networking
Technologies
Programming , Scripting and Hacking
Data Access and Management
( both traditional and new data systems )
Domain Expertise , Problem Definition and
Decision Modeling
Communication and Interpersonal
DATA SCIENTIST
Dr. Chen, Business Intelligence 39
A Typical
Job Post
for Data
Scientist
Dr. Chen, Business Intelligence 40
Application Case 6.4
Big Data and Analytics in Politics
Questions for Discussion
1. What is the role of analytics and Big Data in modern-day politics?
2. Do you think Big Data analytics could change the outcome of an election?
3. What do you think are the challenges, the potential solution, and the probable results of the use of Big Data analytics in politics?
Dr. Chen, Business Intelligence 41
Application Case 6.4
Big Data and Analytics in Politics
INPUT: Data Sources
§ Census dataPopulation specifics, age, race, sex, income, etc.
§ Election DatabasesParty affiliations, previous election outcomes, trends and distributions
§ Market research Polls, recent trends and movements
§ Social mediaFacebook, Twitter, LinkedIn, Newsgroups, Blogs, etc.
§ Web (in general)Web pages, posts and replies, search trends, etc.
· Other data sources
OUTPUT: Goals
§ Raise money contributions§ Increase number of
volunteers§ Organize movements§ Mobilize voters to get out
and vote§ Other goals and objectives§ ...
Big Data & Analytics
(Data Mining, Web Mining, Text Mining, Multi-media Mining)
§ Predicting outcomes and trends
§ Identifying associations between events and outcomes
§ Assessing and measuring the sentiments
§ Profiling (clustering) groups with similar behavioral patterns
§ Other knowledge nuggets
Dr. Chen, Business Intelligence 42
• 1. What is the role of analytics and Big Data in
modern day politics?
•
• Big Data and analytics have a lot to offer to
modern-day politics. The main characteristics of
Big Data, namely volume, variety, and velocity
(the three Vs), readily apply to the kind of data
that is used for political campaigns.
• Big Data Analytics can help predict election
outcomes as well as targeting potential voters and
donors, and have become a critical part of political
campaigns.
8
Dr. Chen, Business Intelligence 43
• 2. Do you think Big Data analytics could change the
outcome of an election?
•
• It may well have changed the outcome of the 2008 and 2012
elections. For example, in 2012, Barack Obama had a data
advantage and the depth and breadth of his campaign’s digital
operation, from political and demographic data mining to
voter sentiment and behavioral analysis, reached beyond
anything politics had ever seen.
• So there was a significant difference in expertise and comfort
level with modern technology between the two parties. The
usage and expertise gap between the party lines may
disappear over time, and this will even the playing field. But
new data regarding voter sympathies may itself change the
thinking and ideological directions of the major parties. This
in itself could shift election outcomes.
Dr. Chen, Business Intelligence 44
• 3. What do you think are the challenges, the potential
solution, and the probable results of the use of Big Data
analytics in politics?
•
• One of the big challenges is to ensure some sort of reliability
in news media coverage of election issues. For example,
most people, including the so-called political experts (who
often rely on gut feelings and experiences), thought the 2012
presidential election would be very close. They were wrong.
• On the other hand, data analysts, using data-driven
analytical models, predicted that Barack Obama would win
easily with close to 99 percent certainty. The results will be
an ever-increasing use of Big Data analytics in politics, both
by the parties and candidates themselves and by the news
media and analysts who cover them.
Dr. Chen, Business Intelligence 45
Big Data And Data Warehousing
• Two paradigms in BI:
_____ _________ and ___ _____.
Both are competing each other for turning data into
actionable information.
• However, in recent years, the variety and
complexity of data made data warehouse
incapable of keeping up the changing needs.
• Big Data
A new paradigm that the world of IT was forced to
develop, not because the _______ of the structured data
but the ______ and the _______ .
Data Warehouse Big Data
volume
variety velocity
Dr. Chen, Business Intelligence 46
Big Data And Data Warehousing
• What is the impact of Big Data on DW?
Big Data and RDBMS do not go nicely together
Will Hadoop replace data warehousing/RDBMS?
• Use Cases for Hadoop
Hadoop is the repository and refinery for raw data
Hadoop is a powerful, economical, and active archive
• Use Cases for Data Warehousing
Data warehouse performance
Integrating data that provides business value
Interactive BI tools
Dr. Chen, Business Intelligence 47
Hadoop versus Data Warehouse
When to Use Which Platform
Dr. Chen, Business Intelligence 48
Coexistence of Hadoop and DW
• The following scenarios support an environment where
the Hadoop and RDBMS are separate from each other
and connectivity software is used to exchange data
between the two systems:
1. Use Hadoop for storing and archiving multi-structured data
2. Use Hadoop for filtering, transforming, and/or consolidating
multi-structured data
3. Use Hadoop to analyze large volumes of multi-structured data
and publish the analytical results
4. Use a relational DBMS that provides MapReduce capabilities
as an investigative computing platform
5. Use a front-end query tool to access and analyze data
9
Dr. Chen, Business Intelligence 49
Coexistence of Hadoop and DW
Source: Teradata
connectivity
Dr. Chen, Business Intelligence 50
Big Data Vendors
• Big Data vendor landscape is developing very rapidly
• A representative list would include
Claudera - claudera.com
MapR – mapr.com
Hortonworks - hortonworks.com
Also, IBM (Netezza, InfoSphere), Oracle (Exadata, Exalogic), Microsoft, Amazon, Google, …
Software,
Hardware,
Service, …
Dr. Chen, Business Intelligence 51
Top 10 Big Data Vendors
with Primary Focus on Hadoop
$0
$10
$20
$30
$40
$50
$60
$70
Dr. Chen, Business Intelligence 52
Technology Insights 6.4
How to Succeed with Big Data
1. Simplify
2. Coexist
3. Visualize
4. Empower
5. Integrate
6. Govern
7. Evangelize
Dr. Chen, Business Intelligence 53
Big Data And Stream Analytics
• Data-in-motion analytics and real-time data analytics
• One of the Vs in Big Data = _________
• Analytic process of extracting actionable information from
continuously flowing/streaming data
• Why Stream Analytics?
It may not be feasible to store the data, or may lose its value
• Stream Analytics vs. Perpetual Analytics
“Stream analytics” involves applying transaction-level logic to real-
time observations.
Used when transaction volumes are high and the time-to-decision is too
short,
“Perpetual analytics” is the term describing the process of performing
(near) real-time analytics on data streams.
Used when the mission is critical and transaction volumes can managed in
real time.
Velocity
Dr. Chen, Business Intelligence 54
• Q: Why is stream analytics becoming more
popular?
•
• Stream analytics is becoming increasingly more
popular for two main reasons.
• First, time-to-action has become an ever
decreasing value, and
• second, we have the technological means to
capture and process the data while it is being
created.
Application Case 6.7: Turning Machine-Generated Streaming Data into Valuable Business Insights
10
Dr. Chen, Business Intelligence 55
Stream Analytics
A Use Case in Energy Industry
Sensor Data(Energy Production
System Status)
Meteorological Data (Wind, Light,
Temperature, etc.)
Usage Data(Smart Meters,
Smart Grid Devices)
Permanent Storage Area
Streaming Analytics(Predicting Usage, Production and
Anomalies)
Energy Production System(Traditional and/or Renewable)
Energy Consumption System(Residential and Commercial)
Data Integration and Temporary
Staging
Capacity Decisions
Pricing Decisions
Dr. Chen, Business Intelligence 56
Stream Analytics Applications
• e-Commerce
• Telecommunication
• Law Enforcement and Cyber Security
• Power Industry
• Financial Services
• Health Services
• Government
Dr. Chen, Business Intelligence 57
End-of-Chapter Application Case
Discovery Health Turns Big Data into Better Healthcare
Questions for Discussion 1. How big is “big data” for Discovery Health?
2. What big data sources did Discovery Health use for their analytic solutions?
3. What were the main data/analytics challenges Discovery Health was facing?
4. What were the main solutions they have produced?
5. What were the initial results/benefits? What do you think will be the future of “big data” analytics at Discovery?
Dr. Chen, Business Intelligence 58
• 1. How big is “big data” for Discovery Health?
• Discovery Health serves 2.7 million members, and its claims
system generates a million new rows of data each day. For
analytics purposes, they want to use three years of data, which
comes to over a billion rows of claims data alone.
•
• 2. What big data sources did Discovery Health use for their
analytic solutions?
• The data sources include members’ claims data, pathology
results, members’ questionnaires, data from pharmacies,
hospital admissions, and billing data, to name a few. Looking
to the future, Discovery is also starting to analyze unstructured
data, such as text-based surveys and comments from online
feedback forms.
Dr. Chen, Business Intelligence 59
• 3. What were the main data/analytics challenges
Discovery Health was facing?
•
• The challenge faced by Discovery Health was similar
to that faced by any company operating in a Big Data
environment. Given the huge amount of data that
Discovery Health was generating, how could they
identify the key insights that their business and their
members’ health depend on?
• To find the needles of vital information in the big
data haystack, the company not only needed a
sophisticated data-mining and predictive modeling
solution, but also an analytics infrastructure with the
power to deliver results at the speed of business.
Dr. Chen, Business Intelligence
60
• 4. What were the main solutions they have
produced?
•
• To achieve this transformation in its analytics
capabilities, Discovery worked with BITanium, an
IBM Business Partner with deep expertise in
operational deployments of advanced analytics
technologies.
• The solution included use of IBM’s SSPS and
PureData System for Analytics.
11
Dr. Chen, Business Intelligence 61
• 5. What were the initial results/benefits? What do you think
will be the future of “big data” analytics at Discovery?
•
• Discovery Health is now able to run three years’ worth of
data for their 2.7 million members through complex statistical
models to deliver actionable insights in a matter of minutes.
This helps them to better predict and prevent health risks, and
to identify and eliminate fraud. The results have been
transformative.
• From an analytics perspective, the speed of the solution gives
Discovery more scope to experiment and optimize its models.
From a broader business perspective, the combination of
SPSS and PureData technologies gives Discovery the ability
to put actionable data in the hands of its decision makers
faster.
Dr. Chen, Business Intelligence
62
End of the Chapter
• Questions, comments