disaster recovery for the real-time data warehouses

16
Disaster Recovery For the Real-Time Data Warehouse: Replicating and Parallelizing Big Data

Upload: tervela

Post on 27-Nov-2014

1.259 views

Category:

Documents


3 download

DESCRIPTION

More and more, front-line business operations depend on data warehouses and real-time analysis. Decisions are driven by data that’s captured from all over the enterprise, helping companies like yours compete more fiercely in crowded marketplaces. But are your disaster recovery policies keeping up with the changing role of your real-time data warehouse? The sheer volume of data and the rate at which it changes makes traditional backup and restore practices unworkable – so, what techniques do work? In these slides, you will learn how to construct disaster recovery procedures that fit your 24-7, up-all-the-time data warehouse

TRANSCRIPT

Page 1: Disaster Recovery for the Real-Time Data Warehouses

Disaster Recovery For the Real-Time Data Warehouse:

Replicating and Parallelizing Big Data

Page 2: Disaster Recovery for the Real-Time Data Warehouses

What you will learn: 4 strategies

1. Separate operational warehouses from reporting systems

2. Use changed data capture and Big Data replication

3. Implement parallel, active-active data warehouses

4. Maintain a “golden event” warehouse in Hadoop

2Confidential & Proprietary

Page 3: Disaster Recovery for the Real-Time Data Warehouses

Analytics Have a Measurable Effect

• For the median Fortune 1000 Company, a 10% increase in data usability corresponds to $2.01B in annual revenue gains

• A “real-time infrastructure” ranks #3 on the CIO’s list of strategies

• Organizations adept at analytics see 1.6x the revenue growth

2.0x the profit growth, and 2.5x the stock price appreciation of their peers

3Confidential & Proprietary

Big Data, Big Opportunity – University of Texas at Austin, Sept 2011

A “real-time infrastructure” – Gartner

– “Outperforming in a Data-Rich and Hyper-Connected World.” IBM Center for Applied Insights and Economic Intelligence

Page 4: Disaster Recovery for the Real-Time Data Warehouses

Data Warehousing: Now Part of Operations

4Confidential & Proprietary

real-time pricing

real-time marketing

fraud detection

inventory management

customer service

Page 5: Disaster Recovery for the Real-Time Data Warehouses

Analytics in Business Operations:Constant, Up-to-Minute Access to Big Data

5

Click-stream Mobile ads

Energy usage Power production

Market Data Securities Trading

Traffic & Logistics Fleet Deployment

Network Activity IT Root-Cause Call Activity Capacity Allocation

ADVERTISING

UTILITIES

INFORMATION TECHNOLOGY

CAPITAL MARKETS

TRANSPORTATION

TELECOMMUNICATIONS

Page 6: Disaster Recovery for the Real-Time Data Warehouses

Expectations have changed

6

Confidential & Proprietary

Page 7: Disaster Recovery for the Real-Time Data Warehouses

What we need…vs. what we have

7Confidential & Proprietary

Need Have

Up-TimeSLAs: 99.999% Backup and recovery can

take days in the event of an outage or system failure

Real-timeAccess to information as it happens

ETL processes can take hours before information is available

Distribution

Add new applications as the business demands

Access to warehouse is tightly controlled; performance bottlenecks of a single database can impact mission-critical systems

Page 8: Disaster Recovery for the Real-Time Data Warehouses

4 disaster recovery strategies for big data

1. Separate operational warehouses from reporting systems

2. Use changed data capture and Big Data replication

3. Implement parallel, active-active data warehousing

4. Maintain a “golden event” warehouse in Hadoop

8Confidential & Proprietary

Page 9: Disaster Recovery for the Real-Time Data Warehouses

1. Separate operations from reporting

9

DB2

Secondary Warehouse

Primary Warehouse

WAN

Operations

Reporting

application

Run day-to-day applications in one place. Ad-hoc reporting happens in a separate warehouse.

BENEFITBetter control over performance

CHALLENGEKeeping changes in sync

Page 10: Disaster Recovery for the Real-Time Data Warehouses

2. Changed data capture

10

Data Fabric250 MB/s per boxLoad-balancedLinearly scalableBuilt-in persistence

Primary Cluster

1 GB/s

Reporting Cluster

WAN

application

Determine what has changed, then replicate it to achieve parity between environments

BENEFITQuickly propagate changes to remote sites

CHALLENGEIdentifying changes is difficult. The volume of data represents a stop-gap as it continues to grow.

Page 11: Disaster Recovery for the Real-Time Data Warehouses

3. Parallel, active-active data warehousing

11

Data Fabric250 MB/s per boxLoad-balancedLinearly scalableBuilt-in persistence

Primary Cluster

1 GB/s

Reporting Cluster

WAN

Confidential & Proprietary

Capture application data streams and load to parallel data warehouses over the WAN

BENEFITMultiple warehouses are kept up to date

CHALLENGESynchronization of many data streams

Page 12: Disaster Recovery for the Real-Time Data Warehouses

4. “Golden Event” store

12Confidential & Proprietary

application

Data Fabric250 MB/s per boxLoad-balancedLinearly scalableBuilt-in persistence

Golden Event Store

Primary Data Warehouse

Reporting Data Warehouse(Optional)

New Apps & Analytics

Capture raw data and store it in Hadoop

BENEFITNew analytics are always possible

CHALLENGEBest practices are only just being developed

Page 13: Disaster Recovery for the Real-Time Data Warehouses

About Tervela Turbo

• New release!• Capture, share, and distribute data• Accelerate any of the use cases we discussed today

13Confidential & Proprietary

Page 14: Disaster Recovery for the Real-Time Data Warehouses

Big Data Requires Big Data Movement

Confidential & Proprietary 14

As companies implement more big data solutions, the need to use high-performance message delivery with those systems will grow.

Gartner: Hype Cycle for Big Data, 2012

Page 15: Disaster Recovery for the Real-Time Data Warehouses

Key Features and Benefits of Tervela Turbo

15

Data Capture• Adapters for top data stores• Flexible multi-language API• Real-time acquisition

Data Availability• Parallel loading• Large-volume buffering• Automatic retry• Data replay

Data Distribution• Continuous loading• No disruption with bad consumers• Warehouses, DBs, Hadoop, etc• Web, mobile, custom apps

Real-TimeRegardless of data volume or number of sources

ReliableFor mission-critical operations that can’t go down

Multi-PlatformFeeds explosion of analytic apps on any platform without disrupting other consumers

Key Features Key Benefits

Page 16: Disaster Recovery for the Real-Time Data Warehouses

Capture, Share, and Distribute

Big Data For Mission-Critical Analytics

www.terverla.com

@tervela

[email protected]

Learn More About Big Data Movement

16

Access videos, how-to guides, and other

educational materials at:tervela.com/datafabric