overview - ibm big data platform

33
© 2013 IBM Corporation 1 IBM Corporation Overview - Big Data & Analytics Vikas K Manoria Technical Consultant – Big Data & Analytics [email protected]

Upload: vikas-manoria

Post on 15-Jan-2015

1.301 views

Category:

Technology


1 download

DESCRIPTION

Overview - IBM Big Data Platform

TRANSCRIPT

Page 1: Overview - IBM Big Data Platform

© 2013 IBM Corporation1 IBM Corporation

Overview - Big Data & Analytics

Vikas K Manoria

Technical Consultant – Big Data & Analytics

[email protected]

Page 2: Overview - IBM Big Data Platform

IBM Big Data & Analytics

© 2013 IBM Corporation2

Agenda

� What is Big Data?– Concepts– Characteristics

� Business Motivation– Big Data Challenges– How Big Data Impacts Every Aspect of Your Business– A Big Data Journey

� IBM Big Data Platform– InfoSphere Data Explorer– InfoSphere BigInsights– IBM PureData Systems, InfoSphere Warehouse– InfoSphere Streams

� Big Data Use Cases

� Get Started

Page 3: Overview - IBM Big Data Platform

IBM Big Data & Analytics

© 2013 IBM Corporation3

What is Big Data?

� All kinds of data– Large volumes– Valuable insight, but difficult to extract– May be extremely time sensitive

� Big Data is a Hot Topic Because Technology Makes it Possible to Analyze ALL Available Data

“Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling

high velocity capture, discovery and/or analysis.”Source: Matt Eastwood, IDC

Page 4: Overview - IBM Big Data Platform

IBM Big Data & Analytics

© 2013 IBM Corporation4

Characteristics of Big Data

� V4 = Volume Velocity Variety Veracity

Collectively analyzing the broadening Variety

Responding to the increasing Velocity

Cost efficiently processing the growing Volume

Establishing the Veracity of big data sources

1 in 3 business leaders don’t trust the information they use to make decisions

50x 35 ZB

20202010

30 Billion RFID sensors and counting

80% of the worlds data is unstructured

Page 5: Overview - IBM Big Data Platform

IBM Big Data & Analytics

2009800,000 petabytes

202035 zettabytes

as much Data and ContentOver Coming Decade

44x Business leaders frequently make decisions based on information they don’t trust, or don’t have1 in3

83%of CIOs cited “Business intelligence and analytics” as part of their visionary plansto enhance competitiveness

Business leaders say they don’t have access to the information they need to do their jobs

1 in2

of CEOs need to do a better job capturing and understanding information rapidly in order to make swift business decisions

60%

… And Organizations Need Deeper Insights

Of world’s datais unstructured

80%

Information is at the Center of a New Wave of Opportunity…

5 © 2013 IBM Corporation

Page 6: Overview - IBM Big Data Platform

IBM Big Data & Analytics

Merging the Traditional and Big Data Approaches

IT

Structures the data to answer that question

IT

Delivers a platform to enable creative discovery

Business

Explores what questions could be asked

Business Users

Determine what question to ask

Monthly sales reports

Profitability analysis

Customer surveys

Brand sentiment

Product strategy

Maximum asset utilization

Big Data ApproachIterative & Exploratory Analysis

Traditional ApproachStructured & Repeatable Analysis

6 © 2013 IBM Corporation

Page 7: Overview - IBM Big Data Platform

IBM Big Data & Analytics

© 2013 IBM Corporation7

Imagine the Possibilities of Harnessing Your Data Resources

� Big data challenges exist in every organization today

Retailer reduces time to run queries by 80% to

optimize inventory

Stock Exchange cuts queries from 26 hours to

2 minutes on 2 PB

Government cuts acoustic analysis from hours to

70 Milliseconds

Utility avoids power failures by analyzing

10 PB of data in minutes

Telco analyses streaming network data to reduce hardware costs by 90%

Hospital analyses streaming vitals to detect illness

24 hours earlier

Page 8: Overview - IBM Big Data Platform

IBM Big Data & Analytics

Integrate and Govern all Data Sources

Integration, Data Quality, Security, ILM, MDM

Leveraging Big Data Requires Multiple Platform Capabilities

8

Manage Streaming Data Stream Computing

Understand and Navigate Federated Big Data Sources

Federated Discovery and Navigation

Data WarehousingStructure and Control Data

Manage and Store Huge Volume of any Data

Hadoop File SystemMapReduce

Analyze Unstructured Data Text Analytics Engine

Page 9: Overview - IBM Big Data Platform

IBM Big Data & Analytics

© 2013 IBM Corporation9

IBM’s Business-centric Big Data Platform

� Enables you to start with a critical business needs and expand the foundation for future requirements

� “Big data” isn’t just a technology— it’s a business strategy for capitalizing on information resources

� Getting started is crucial

� Success at each entry point is accelerated by products within the big data platform

� Build the foundation for future requirements by expanding further into the big data platform

Page 10: Overview - IBM Big Data Platform

IBM Big Data & Analytics

• Financial and tax preparation software and services

• $4.15B rev 2012

A Big Data Journey:Anticipating and Improving Customer Interactions

Project 1: Big Data Foundation-Data Warehousing, Data Quality, Customer Data Hub-Single view of the customer

Project 2: Analytics-Customer behavior and segmentation analysis-Reduced customer churn 10%-$10M new revenue in 12months

Project 3: Unstructured Data Analytics-Social media analysis, Log Analysis, Text Analytics -Augment customer profiles with new data sources-Data warehouse cost optimization-Data Exploration

Project 4: Real Time Analytics -No latency analytics-Real time behavior prediction-Real time customer segmentation

10

Page 11: Overview - IBM Big Data Platform

IBM Big Data & Analytics

Cloud | Mobile | Security

Gather, extract and explore data

using best of breed

visualization

Speed time to value with analytic

and application accelerators

IBM Big Data Platform

Systems Management

Applications & Development

Visualization & Discovery

Analyze streaming data and large data bursts for

real-time insights

Govern data quality and

manage information

lifecycle

Cost-effectively analyze

Petabytes of structured and unstructured information

Deliver deep insight with advanced

in-database analytics and

operational analytics

Accelerators

Information Integration & Governance

HadoopSystem

Stream Computing

Data Warehouse

Contextual Discovery

Index and federated

discovery for contextual

collaborative insights

Solutions

Analytics and Decision Management

Big Data Infrastructure

Big Data Platform and Application Frameworks

Page 12: Overview - IBM Big Data Platform

IBM Big Data & Analytics

ETL, MDM, Data Governance

Metadata and Governance Zone

12

Warehousing Zone

Enterprise Warehouse

Data Marts

An example of the big data platform in practice

Ingestion and Real-time Analytic Zone

Streams

Co

nn

ecto

rs

BI & Reporting

PredictiveAnalytics

Analytics and Reporting Zone

Visualization & Discovery

Landing and Analytics Sandbox Zone

Hive/HBaseCol Stores

Documentsin variety of formats

MapReduce

Hadoop

Page 13: Overview - IBM Big Data Platform

IBM Big Data & Analytics

TECHNOLOGY

Example: Integrate big data sources with enterprise data

SPSS Modeler

CognosRTM

Real-time Analytics

Predictive

InfoSphereBigInsights

CognosInsight

CognosBI

Export and Explore

Social Media Analysis

Reporting / Analysis Dashboards

CognosConsumer

Insight

IBM Business Analytics

IBM Big Data Platform

PureDataSystems

Data In-Motion Data At-Rest

Other Sources

Page 14: Overview - IBM Big Data Platform

IBM Big Data & Analytics

© 2013 IBM Corporation14

Big Data ExplorationFind, visualize, understand all big data to improve decision making

Enhanced 360o Viewof the CustomerExtend existing customer views (MDM, CRM, etc) by incorporating additional internal and external information sources

Operations AnalysisAnalyze a variety of machinedata for improved business results

Data Warehouse AugmentationIntegrate big data and data warehouse capabilities to increase operational efficiency

Security/Intelligence ExtensionLower risk, detect fraud and monitor cyber security in real-time

Big Data Key Use Cases:

Page 15: Overview - IBM Big Data Platform

IBM Big Data & Analytics

© 2013 IBM Corporation15

Big Difference: Schema on Run

� Regular database– Schema on load

� Big Data (Hadoop)– Schema on run

Raw data

Schemato filter

Storage(pre-filtered data)

Storage(unfiltered,raw data)

Raw data

Schemato filter

Output

Page 16: Overview - IBM Big Data Platform

IBM Big Data & Analytics

Basic Edition

Enterprise Edition

- Accelerators

- Performance Optimization

- Visualization Capabilities

- Pre-built applications

- Text analytics

- Spreadsheet-style tool

- RDBMS, warehouse connectivity

- Administrative tools, security

-- Eclipse development tools

-- Enterprise Integration

-- Integrated web console

. . . .

- Jaql- Integrated install

Breadth of capabilities

En

terp

ris

e c

las

s

Free download

Sold by # of terabytes managed

ApacheHadoop

PureData for Hadoop- Appliance simplicity

Quick Start EditionNew for V2.1. Free. Non-production only

PureData for Hadoop

brings BigInsights

as an appliance

form factor

to the market

From Getting Starting to Enterprise Deployment: IBM Big Insights

Usability� BigSheets or use third

party tool vendors like Datameer

� BigSQL � All key open source

components: Java, Hive,PIG, & JAQL, etc.

Page 17: Overview - IBM Big Data Platform

IBM Big Data & Analytics

BigInsights Enterprise Edition

Connectivity and Integration Streams

Netezza

Text processing engine and library

JDBC

Flume

Infrastructure Jaql

Hive

Pig

HBase

MapReduce

HDFS

ZooKeeper

Indexing LuceneAdaptive MapReduce

Oozie

Text compression

Enhanced security

Flexible scheduler

Optional IBM and partner offerings

Analytics and discovery “Apps”

DB2

BigSheets

Web Crawler

Distrib file copy

DB export

Boardreader

DB import

Ad hoc query

Machine learning

Data processing

. . .

Administrative and development tools

Web console

• Monitor cluster health, jobs, etc. • Add / remove nodes• Start / stop services• Inspect job status• Inspect workflow status• Deploy applications • Launch apps / jobs • Work with distrib file system•Work with spreadsheet interface•Support REST-based API • . . .

R

Eclipse tools

• Text analytics• MapReduce programming• Jaql, Hive, Pig development• BigSheets plug-in development• Oozie workflow generation

Integrated installer

Open Source IBM IBM

Cognos BI

GPFS (EAP)

Accelerator for machine data analysis

Accelerator for social data analysis

Guardium DataStageData Explorer

Sqoop

HCatalog

Page 18: Overview - IBM Big Data Platform

IBM Big Data & Analytics

Current fact finding

Analyze data in motion – before it is stored

Low latency paradigm, push model

Data driven – bring data to the analytics

Historical fact finding

Find and analyze information stored on disk

Batch paradigm, pull model

Query-driven: submits queries to static data

Traditional Computing Stream Computing

Stream Computing Represents a Paradigm Shift

Real-time Analytics

1818

Page 19: Overview - IBM Big Data Platform

IBM Big Data & Analytics

ModifyFilter / Sample

Classify

Fuse

Annotate

Big Data in real-time with InfoSphere Streams

Score

Windowed Aggregates

Analyze

Page 20: Overview - IBM Big Data Platform

IBM Big Data & Analytics

Mining in Microseconds

(included with Streams)

Image & Video (Open Source)

Simple & Advanced Text

(included with Streams)

(IBM Research)

(Open Source UIMA)

Text(listen, verb), (radio, noun)

Acoustic

(IBM Research)

(Open Source)

Geospatial

(IBM Research)

Predictive

(IBM Research)

Advanced

Mathematical

Models

(IBM Research)

Statistics

(included with

Streams)

∑population

tt asR ),(

Analytic Accelerators Designed for Velocity (and Variety)

2020

Page 21: Overview - IBM Big Data Platform

IBM Big Data & Analytics

Putting it all together …end-to-end big data solution

NetezzaAppliance

InfoSphereBigInsights

IBM Cognos

IBM SPSS

Streaming Data

Sources

Discover

ModelVisualize & Publish

Score

Measure

InfoSphereStreams

InfoSphereWarehouse

2121

Page 22: Overview - IBM Big Data Platform

IBM Big Data & Analytics

Page 23: Overview - IBM Big Data Platform

IBM Big Data & Analytics

Page 24: Overview - IBM Big Data Platform

IBM Big Data & Analytics

Page 25: Overview - IBM Big Data Platform

IBM Big Data & Analytics

� Big SQL enables the Cognos BI server to delegate many types of analytical computations to BigInsights MapReduceprocessing instead of computing them locally at a performance cost like it would do with Hive

� Faster response times due to increased opportunity for query processing to occur closer to the data

� Not hindered by the latency and other limitations of querying Hadoop via Hive

Application(Map-Reduce)

Storage(HBase, HDFS)

InfoSphere BigInsights

Cognos BI Server

Explore & Analyze Report & Act

SQL Interface

via JDBC

Hive

Cognos Business Intelligence optimized for Big SQL

Page 26: Overview - IBM Big Data Platform

IBM Big Data & Analytics

Of database queries for reporting2

3838xxAverageAcceleration

2. Based on internal tests.

DynamicQuery

CompatibleQuery

DynamicCubes

DynamicCubes

C1 C2 C3 C4 C5 C6 C7 C8C1 C2 C3 C4 C5 C6 C7 C8C1 C2 C3 C4 C5 C6 C7 C8C1 C2 C3 C4 C5 C6 C7 C8

DB2 with BLU

Cognos BI

+DB2 BLU

+Power

Performance – Cognos BI + DB2 BLU

DynamicQuery

CompatibleQuery

DynamicCubes

DynamicCubes

Faster cube load*

Faster DB Query*

Page 27: Overview - IBM Big Data Platform

IBM Big Data & Analytics

For apps like E-commerce…Database cluster services optimized for

transactional throughput and scalability

For apps like Customer Analysis…Data warehouse services optimized for high-speed, peta-scale analytics and simplicity

For apps like Real-time Fraud Detection…Operational data warehouse services optimized to

balance high performance analytics and real-time operational throughput

Meeting Big Data Challenges – Fast and Easy!

System for Transactions

System for Analytics

System for Operational Analytics

System for Hadoop

For Exploratory Analysis & Queryable ArchiveHadoop data services optimized for big data analytics and online archive with appliance simplicity

IBM PureData Systems

Page 28: Overview - IBM Big Data Platform

IBM Big Data & Analytics

© 2013 IBM Corporation28

Use Cases for a Big Data Platform

Innovate New Productsat Speed and Scale

Know Everything about your Customer

� Social Media - Product/brand Sentiment analysis

� Brand strategy� Market analysis� RFID tracking & analysis� Transaction analysis to create insight-

based product/service offerings

� Social media customer sentiment analysis� Promotion optimization� Segmentation� Customer profitability � Click-stream analysis� CDR processing� Multi-channel interaction analysis� Loyalty program analytics� Churn prediction

Run Zero Latency Operations� Smart Grid/meter management� Distribution load forecasting� Sales reporting� Inventory & merchandising optimization� Options trading� ICU patient monitoring� Disease surveillance� Transportation network optimization� Store performance� Environmental analysis� Experimental research

Instant Awareness ofRisk and Fraud� Multimodal surveillance� Cyber security� Fraud modeling & detection� Risk modeling & management� Regulatory reporting

Exploit Instrumented Assets� Network analytics� Asset management and predictive issue resolution� Website analytics� IT log analysis

Page 29: Overview - IBM Big Data Platform

IBM Big Data & Analytics

29

Every Industry can Leverage Big Data and Analytics.

Insurance

• 360˚

˚̊

˚ View of Domain

or Subject

• Catastrophe Modeling

• Fraud & Abuse

Banking

• Optimizing Offers and

Cross-sell

• Customer Service and

Call Center Efficiency

Telco

• Pro-active Call Center

• Network Analytics

• Location Based

Services

Energy & Utilities

• Smart Meter Analytics

• Distribution Load

Forecasting/Scheduling

• Condition Based

Maintenance

Media & Entertainment

• Business process

transformation

• Audience & Marketing

Optimization

Retail

• Actionable Customer

Insight

• Merchandise

Optimization

• Dynamic Pricing

Travel & Transport

• Customer Analytics &

Loyalty Marketing

• Predictive Maintenance

Analytics

Consumer Products

• Shelf Availability

• Promotional Spend

Optimization

• Merchandising

Compliance

Government

• Civilian Services

• Defense & Intelligence

• Tax & Treasury Services

Healthcare

• Measure & Act on

Population Health

Outcomes

• Engage Consumers in

their Healthcare

Automotive

• Advanced Condition

Monitoring

• Data Warehouse

Optimization

Life Sciences

• Increase visibility into

drug safety and

effectiveness

Chemical & Petroleum

• Operational Surveillance,

Analysis & Optimization

• Data Warehouse

Consolidation, Integration

& Augmentation

Aerospace & Defense

• Uniform Information

Access Platform

• Data Warehouse

Optimization

Electronics

• Customer/ Channel

Analytics

• Advanced Condition

Monitoring

Page 30: Overview - IBM Big Data Platform

IBM Big Data & Analytics

© 2013 IBM Corporation30

Clients Achieve Breakthrough Outcomes With IBM’s Big Data Platform

Imperative Primary Capability Business Value

Run Zero Latency Operations

InfoSphereBigInsights

Reduce maintenance costs and differentiate by optimal turbine placement

PureData for Analytics

Instant Awareness of Risk and Fraud

Analysis time on 2 PB of data cut from 26 hours to 2 minutes

PureData for Analytics

Increased network availability by identifying and fixing holes

Exploit Instrumented Assets

InfoSphere Data Explorer

Provide single point of access to disparate data sources

Secure single point of access to all enterprise data

Analyzed call records to drive real-time promotions & reduce churn

InfoSphere

Streams

Know Everything about your Customers

Aircraft Manufacturer

Page 31: Overview - IBM Big Data Platform

IBM Big Data & Analytics

31

A Catalyst for ISV and Partner InnovationTraditional Approach Transformational Outcomes

Customer segmentation based on loyalty data

Historical analysis of

subscriber data

Managing rising cost of care

Capturing information from all interactions to improve customer lifetime value

Combining data from hundreds of hospitals to improve results across the healthcare continuum

2 million events analyzedper minute, delivering real-time insight to mobile operators

Use Big Data analytics to prioritize and isolate areas of risk or rogue activity

Anti-corruption and bribery compliance program

Provide visibility, analysis and reporting across the entire supply chain (planning -> execution)

Measure and predict patient payment behavior, reduce risk from bad debt and boost collection rates

Analyzing parking systems to maximize revenue & improve the parking experience in cities

Treat-first, seek-payment-later and write off bad debt

Manual supply chain

integration

Random parking meter patrols & search for open spots

Page 32: Overview - IBM Big Data Platform

IBM Big Data & Analytics

Get started!

Identify and prioritize business use cases

Identify and prioritize business use cases

New insights and new possibilities

New insights and new possibilities

New revenue opportunities

New revenue opportunities

Process and performance improvement

Process and performance improvement

Evolve your existing analytics capabilities

Evolve your existing analytics capabilities

Build or acquire new skills required

Build or acquire new skills required

Measure and communicate success

Measure and communicate success

Ensure that the business is engaged

Ensure that the business is engaged

Agree on the key measures for success

Agree on the key measures for success

Think Big Pick your SpotExecute and Deliver Value

Page 33: Overview - IBM Big Data Platform

IBM Big Data & Analytics

© 2013 IBM CorporationApril 24, 2014

Thank You