overview - ibm big data platform
DESCRIPTION
Overview - IBM Big Data PlatformTRANSCRIPT
© 2013 IBM Corporation1 IBM Corporation
Overview - Big Data & Analytics
Vikas K Manoria
Technical Consultant – Big Data & Analytics
IBM Big Data & Analytics
© 2013 IBM Corporation2
Agenda
� What is Big Data?– Concepts– Characteristics
� Business Motivation– Big Data Challenges– How Big Data Impacts Every Aspect of Your Business– A Big Data Journey
� IBM Big Data Platform– InfoSphere Data Explorer– InfoSphere BigInsights– IBM PureData Systems, InfoSphere Warehouse– InfoSphere Streams
� Big Data Use Cases
� Get Started
IBM Big Data & Analytics
© 2013 IBM Corporation3
What is Big Data?
� All kinds of data– Large volumes– Valuable insight, but difficult to extract– May be extremely time sensitive
� Big Data is a Hot Topic Because Technology Makes it Possible to Analyze ALL Available Data
“Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling
high velocity capture, discovery and/or analysis.”Source: Matt Eastwood, IDC
IBM Big Data & Analytics
© 2013 IBM Corporation4
Characteristics of Big Data
� V4 = Volume Velocity Variety Veracity
Collectively analyzing the broadening Variety
Responding to the increasing Velocity
Cost efficiently processing the growing Volume
Establishing the Veracity of big data sources
1 in 3 business leaders don’t trust the information they use to make decisions
50x 35 ZB
20202010
30 Billion RFID sensors and counting
80% of the worlds data is unstructured
IBM Big Data & Analytics
2009800,000 petabytes
202035 zettabytes
as much Data and ContentOver Coming Decade
44x Business leaders frequently make decisions based on information they don’t trust, or don’t have1 in3
83%of CIOs cited “Business intelligence and analytics” as part of their visionary plansto enhance competitiveness
Business leaders say they don’t have access to the information they need to do their jobs
1 in2
of CEOs need to do a better job capturing and understanding information rapidly in order to make swift business decisions
60%
… And Organizations Need Deeper Insights
Of world’s datais unstructured
80%
Information is at the Center of a New Wave of Opportunity…
5 © 2013 IBM Corporation
IBM Big Data & Analytics
Merging the Traditional and Big Data Approaches
IT
Structures the data to answer that question
IT
Delivers a platform to enable creative discovery
Business
Explores what questions could be asked
Business Users
Determine what question to ask
Monthly sales reports
Profitability analysis
Customer surveys
Brand sentiment
Product strategy
Maximum asset utilization
Big Data ApproachIterative & Exploratory Analysis
Traditional ApproachStructured & Repeatable Analysis
6 © 2013 IBM Corporation
IBM Big Data & Analytics
© 2013 IBM Corporation7
Imagine the Possibilities of Harnessing Your Data Resources
� Big data challenges exist in every organization today
Retailer reduces time to run queries by 80% to
optimize inventory
Stock Exchange cuts queries from 26 hours to
2 minutes on 2 PB
Government cuts acoustic analysis from hours to
70 Milliseconds
Utility avoids power failures by analyzing
10 PB of data in minutes
Telco analyses streaming network data to reduce hardware costs by 90%
Hospital analyses streaming vitals to detect illness
24 hours earlier
IBM Big Data & Analytics
Integrate and Govern all Data Sources
Integration, Data Quality, Security, ILM, MDM
Leveraging Big Data Requires Multiple Platform Capabilities
8
Manage Streaming Data Stream Computing
Understand and Navigate Federated Big Data Sources
Federated Discovery and Navigation
Data WarehousingStructure and Control Data
Manage and Store Huge Volume of any Data
Hadoop File SystemMapReduce
Analyze Unstructured Data Text Analytics Engine
IBM Big Data & Analytics
© 2013 IBM Corporation9
IBM’s Business-centric Big Data Platform
� Enables you to start with a critical business needs and expand the foundation for future requirements
� “Big data” isn’t just a technology— it’s a business strategy for capitalizing on information resources
� Getting started is crucial
� Success at each entry point is accelerated by products within the big data platform
� Build the foundation for future requirements by expanding further into the big data platform
IBM Big Data & Analytics
• Financial and tax preparation software and services
• $4.15B rev 2012
A Big Data Journey:Anticipating and Improving Customer Interactions
Project 1: Big Data Foundation-Data Warehousing, Data Quality, Customer Data Hub-Single view of the customer
Project 2: Analytics-Customer behavior and segmentation analysis-Reduced customer churn 10%-$10M new revenue in 12months
Project 3: Unstructured Data Analytics-Social media analysis, Log Analysis, Text Analytics -Augment customer profiles with new data sources-Data warehouse cost optimization-Data Exploration
Project 4: Real Time Analytics -No latency analytics-Real time behavior prediction-Real time customer segmentation
10
IBM Big Data & Analytics
Cloud | Mobile | Security
Gather, extract and explore data
using best of breed
visualization
Speed time to value with analytic
and application accelerators
IBM Big Data Platform
Systems Management
Applications & Development
Visualization & Discovery
Analyze streaming data and large data bursts for
real-time insights
Govern data quality and
manage information
lifecycle
Cost-effectively analyze
Petabytes of structured and unstructured information
Deliver deep insight with advanced
in-database analytics and
operational analytics
Accelerators
Information Integration & Governance
HadoopSystem
Stream Computing
Data Warehouse
Contextual Discovery
Index and federated
discovery for contextual
collaborative insights
Solutions
Analytics and Decision Management
Big Data Infrastructure
Big Data Platform and Application Frameworks
IBM Big Data & Analytics
ETL, MDM, Data Governance
Metadata and Governance Zone
12
Warehousing Zone
Enterprise Warehouse
Data Marts
An example of the big data platform in practice
Ingestion and Real-time Analytic Zone
Streams
Co
nn
ecto
rs
BI & Reporting
PredictiveAnalytics
Analytics and Reporting Zone
Visualization & Discovery
Landing and Analytics Sandbox Zone
Hive/HBaseCol Stores
Documentsin variety of formats
MapReduce
Hadoop
IBM Big Data & Analytics
TECHNOLOGY
Example: Integrate big data sources with enterprise data
SPSS Modeler
CognosRTM
Real-time Analytics
Predictive
InfoSphereBigInsights
CognosInsight
CognosBI
Export and Explore
Social Media Analysis
Reporting / Analysis Dashboards
CognosConsumer
Insight
IBM Business Analytics
IBM Big Data Platform
PureDataSystems
Data In-Motion Data At-Rest
Other Sources
IBM Big Data & Analytics
© 2013 IBM Corporation14
Big Data ExplorationFind, visualize, understand all big data to improve decision making
Enhanced 360o Viewof the CustomerExtend existing customer views (MDM, CRM, etc) by incorporating additional internal and external information sources
Operations AnalysisAnalyze a variety of machinedata for improved business results
Data Warehouse AugmentationIntegrate big data and data warehouse capabilities to increase operational efficiency
Security/Intelligence ExtensionLower risk, detect fraud and monitor cyber security in real-time
Big Data Key Use Cases:
IBM Big Data & Analytics
© 2013 IBM Corporation15
Big Difference: Schema on Run
� Regular database– Schema on load
� Big Data (Hadoop)– Schema on run
Raw data
Schemato filter
Storage(pre-filtered data)
Storage(unfiltered,raw data)
Raw data
Schemato filter
Output
IBM Big Data & Analytics
Basic Edition
Enterprise Edition
- Accelerators
- Performance Optimization
- Visualization Capabilities
- Pre-built applications
- Text analytics
- Spreadsheet-style tool
- RDBMS, warehouse connectivity
- Administrative tools, security
-- Eclipse development tools
-- Enterprise Integration
-- Integrated web console
. . . .
- Jaql- Integrated install
Breadth of capabilities
En
terp
ris
e c
las
s
Free download
Sold by # of terabytes managed
ApacheHadoop
PureData for Hadoop- Appliance simplicity
Quick Start EditionNew for V2.1. Free. Non-production only
PureData for Hadoop
brings BigInsights
as an appliance
form factor
to the market
From Getting Starting to Enterprise Deployment: IBM Big Insights
Usability� BigSheets or use third
party tool vendors like Datameer
� BigSQL � All key open source
components: Java, Hive,PIG, & JAQL, etc.
IBM Big Data & Analytics
BigInsights Enterprise Edition
Connectivity and Integration Streams
Netezza
Text processing engine and library
JDBC
Flume
Infrastructure Jaql
Hive
Pig
HBase
MapReduce
HDFS
ZooKeeper
Indexing LuceneAdaptive MapReduce
Oozie
Text compression
Enhanced security
Flexible scheduler
Optional IBM and partner offerings
Analytics and discovery “Apps”
DB2
BigSheets
Web Crawler
Distrib file copy
DB export
Boardreader
DB import
Ad hoc query
Machine learning
Data processing
. . .
Administrative and development tools
Web console
• Monitor cluster health, jobs, etc. • Add / remove nodes• Start / stop services• Inspect job status• Inspect workflow status• Deploy applications • Launch apps / jobs • Work with distrib file system•Work with spreadsheet interface•Support REST-based API • . . .
R
Eclipse tools
• Text analytics• MapReduce programming• Jaql, Hive, Pig development• BigSheets plug-in development• Oozie workflow generation
Integrated installer
Open Source IBM IBM
Cognos BI
GPFS (EAP)
Accelerator for machine data analysis
Accelerator for social data analysis
Guardium DataStageData Explorer
Sqoop
HCatalog
IBM Big Data & Analytics
Current fact finding
Analyze data in motion – before it is stored
Low latency paradigm, push model
Data driven – bring data to the analytics
Historical fact finding
Find and analyze information stored on disk
Batch paradigm, pull model
Query-driven: submits queries to static data
Traditional Computing Stream Computing
Stream Computing Represents a Paradigm Shift
Real-time Analytics
1818
IBM Big Data & Analytics
ModifyFilter / Sample
Classify
Fuse
Annotate
Big Data in real-time with InfoSphere Streams
Score
Windowed Aggregates
Analyze
IBM Big Data & Analytics
Mining in Microseconds
(included with Streams)
Image & Video (Open Source)
Simple & Advanced Text
(included with Streams)
(IBM Research)
(Open Source UIMA)
Text(listen, verb), (radio, noun)
Acoustic
(IBM Research)
(Open Source)
Geospatial
(IBM Research)
Predictive
(IBM Research)
Advanced
Mathematical
Models
(IBM Research)
Statistics
(included with
Streams)
∑population
tt asR ),(
Analytic Accelerators Designed for Velocity (and Variety)
2020
IBM Big Data & Analytics
Putting it all together …end-to-end big data solution
NetezzaAppliance
InfoSphereBigInsights
IBM Cognos
IBM SPSS
Streaming Data
Sources
Discover
ModelVisualize & Publish
Score
Measure
InfoSphereStreams
InfoSphereWarehouse
2121
IBM Big Data & Analytics
IBM Big Data & Analytics
IBM Big Data & Analytics
IBM Big Data & Analytics
� Big SQL enables the Cognos BI server to delegate many types of analytical computations to BigInsights MapReduceprocessing instead of computing them locally at a performance cost like it would do with Hive
� Faster response times due to increased opportunity for query processing to occur closer to the data
� Not hindered by the latency and other limitations of querying Hadoop via Hive
Application(Map-Reduce)
Storage(HBase, HDFS)
InfoSphere BigInsights
Cognos BI Server
Explore & Analyze Report & Act
SQL Interface
via JDBC
Hive
Cognos Business Intelligence optimized for Big SQL
IBM Big Data & Analytics
Of database queries for reporting2
3838xxAverageAcceleration
2. Based on internal tests.
DynamicQuery
CompatibleQuery
DynamicCubes
DynamicCubes
C1 C2 C3 C4 C5 C6 C7 C8C1 C2 C3 C4 C5 C6 C7 C8C1 C2 C3 C4 C5 C6 C7 C8C1 C2 C3 C4 C5 C6 C7 C8
DB2 with BLU
Cognos BI
+DB2 BLU
+Power
Performance – Cognos BI + DB2 BLU
DynamicQuery
CompatibleQuery
DynamicCubes
DynamicCubes
Faster cube load*
Faster DB Query*
IBM Big Data & Analytics
For apps like E-commerce…Database cluster services optimized for
transactional throughput and scalability
For apps like Customer Analysis…Data warehouse services optimized for high-speed, peta-scale analytics and simplicity
For apps like Real-time Fraud Detection…Operational data warehouse services optimized to
balance high performance analytics and real-time operational throughput
Meeting Big Data Challenges – Fast and Easy!
System for Transactions
System for Analytics
System for Operational Analytics
System for Hadoop
For Exploratory Analysis & Queryable ArchiveHadoop data services optimized for big data analytics and online archive with appliance simplicity
IBM PureData Systems
IBM Big Data & Analytics
© 2013 IBM Corporation28
Use Cases for a Big Data Platform
Innovate New Productsat Speed and Scale
Know Everything about your Customer
� Social Media - Product/brand Sentiment analysis
� Brand strategy� Market analysis� RFID tracking & analysis� Transaction analysis to create insight-
based product/service offerings
� Social media customer sentiment analysis� Promotion optimization� Segmentation� Customer profitability � Click-stream analysis� CDR processing� Multi-channel interaction analysis� Loyalty program analytics� Churn prediction
Run Zero Latency Operations� Smart Grid/meter management� Distribution load forecasting� Sales reporting� Inventory & merchandising optimization� Options trading� ICU patient monitoring� Disease surveillance� Transportation network optimization� Store performance� Environmental analysis� Experimental research
Instant Awareness ofRisk and Fraud� Multimodal surveillance� Cyber security� Fraud modeling & detection� Risk modeling & management� Regulatory reporting
Exploit Instrumented Assets� Network analytics� Asset management and predictive issue resolution� Website analytics� IT log analysis
IBM Big Data & Analytics
29
Every Industry can Leverage Big Data and Analytics.
Insurance
• 360˚
˚̊
˚ View of Domain
or Subject
• Catastrophe Modeling
• Fraud & Abuse
Banking
• Optimizing Offers and
Cross-sell
• Customer Service and
Call Center Efficiency
Telco
• Pro-active Call Center
• Network Analytics
• Location Based
Services
Energy & Utilities
• Smart Meter Analytics
• Distribution Load
Forecasting/Scheduling
• Condition Based
Maintenance
Media & Entertainment
• Business process
transformation
• Audience & Marketing
Optimization
Retail
• Actionable Customer
Insight
• Merchandise
Optimization
• Dynamic Pricing
Travel & Transport
• Customer Analytics &
Loyalty Marketing
• Predictive Maintenance
Analytics
Consumer Products
• Shelf Availability
• Promotional Spend
Optimization
• Merchandising
Compliance
Government
• Civilian Services
• Defense & Intelligence
• Tax & Treasury Services
Healthcare
• Measure & Act on
Population Health
Outcomes
• Engage Consumers in
their Healthcare
Automotive
• Advanced Condition
Monitoring
• Data Warehouse
Optimization
Life Sciences
• Increase visibility into
drug safety and
effectiveness
Chemical & Petroleum
• Operational Surveillance,
Analysis & Optimization
• Data Warehouse
Consolidation, Integration
& Augmentation
Aerospace & Defense
• Uniform Information
Access Platform
• Data Warehouse
Optimization
Electronics
• Customer/ Channel
Analytics
• Advanced Condition
Monitoring
IBM Big Data & Analytics
© 2013 IBM Corporation30
Clients Achieve Breakthrough Outcomes With IBM’s Big Data Platform
Imperative Primary Capability Business Value
Run Zero Latency Operations
InfoSphereBigInsights
Reduce maintenance costs and differentiate by optimal turbine placement
PureData for Analytics
Instant Awareness of Risk and Fraud
Analysis time on 2 PB of data cut from 26 hours to 2 minutes
PureData for Analytics
Increased network availability by identifying and fixing holes
Exploit Instrumented Assets
InfoSphere Data Explorer
Provide single point of access to disparate data sources
Secure single point of access to all enterprise data
Analyzed call records to drive real-time promotions & reduce churn
InfoSphere
Streams
Know Everything about your Customers
Aircraft Manufacturer
IBM Big Data & Analytics
31
A Catalyst for ISV and Partner InnovationTraditional Approach Transformational Outcomes
Customer segmentation based on loyalty data
Historical analysis of
subscriber data
Managing rising cost of care
Capturing information from all interactions to improve customer lifetime value
Combining data from hundreds of hospitals to improve results across the healthcare continuum
2 million events analyzedper minute, delivering real-time insight to mobile operators
Use Big Data analytics to prioritize and isolate areas of risk or rogue activity
Anti-corruption and bribery compliance program
Provide visibility, analysis and reporting across the entire supply chain (planning -> execution)
Measure and predict patient payment behavior, reduce risk from bad debt and boost collection rates
Analyzing parking systems to maximize revenue & improve the parking experience in cities
Treat-first, seek-payment-later and write off bad debt
Manual supply chain
integration
Random parking meter patrols & search for open spots
IBM Big Data & Analytics
Get started!
Identify and prioritize business use cases
Identify and prioritize business use cases
New insights and new possibilities
New insights and new possibilities
New revenue opportunities
New revenue opportunities
Process and performance improvement
Process and performance improvement
Evolve your existing analytics capabilities
Evolve your existing analytics capabilities
Build or acquire new skills required
Build or acquire new skills required
Measure and communicate success
Measure and communicate success
Ensure that the business is engaged
Ensure that the business is engaged
Agree on the key measures for success
Agree on the key measures for success
Think Big Pick your SpotExecute and Deliver Value
IBM Big Data & Analytics
© 2013 IBM CorporationApril 24, 2014
Thank You