vldb 2005 31st international conference on very large databases

25
VLDB 2005 31st International Conference on Very Large Databases Raghunath Othayoth Nambiar Meikel Poess Hewlett-Packard Company Oracle Corporation Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant Systems

Upload: cameroon45

Post on 19-Nov-2014

432 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: VLDB 2005 31st International Conference on Very Large Databases

VLDB 2005 31st International Conference on Very Large Databases

Raghunath Othayoth Nambiar Meikel PoessHewlett-Packard Company Oracle Corporation

Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant

Systems

Page 2: VLDB 2005 31st International Conference on Very Large Databases

April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway

2

Agenda

• Grid Computing

• Hardware Support

• Software Support

• TPC-H Result

Page 3: VLDB 2005 31st International Conference on Very Large Databases

April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway

3

Grid Computing

1) application and user perspective:−just like the power grid: Have computing

power delivered as requested

2) implementation perspective:−Data virtualization−Resource provisioning−High availability

Page 4: VLDB 2005 31st International Conference on Very Large Databases

April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway

4

From Research to Industry

• Research projects using grid technology:− Seti@Home− World Community Grid

• Traditionally companies used islands of systems to implement corporate data warehouses− Unable to share resources− Too rigid to answer rapidly changing business

needs− Cannot be scaled indefinitely

HP and Oracle are applying the grid concept to industry data warehouses (DW)

Page 5: VLDB 2005 31st International Conference on Very Large Databases

April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway

5

Commercial Grid Market

• IDC calls grid computing the fifth generation of computing

• Commercial grid computing revenue was − 2003: 1 Billion USD− 2008: 12 Billion USD [estimate]

• Forrester Research: − 37% of enterprises are piloting, rolling out or

have implemented some form of grid computing. − 30% of firms are considering grid technology.

(IDC,2004.Www.oracle.com/technology/tech/grid/collateral/idc_oracle10g.pdf)(Forrester, 2004. www.forrester.com/go?docid=34449)

Page 6: VLDB 2005 31st International Conference on Very Large Databases

April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway

6

N-tier v/s Grid Computing

Traditional multi tier datacenter infrastructure – Web servers, application servers and database servers are preconfigured and pre allocated.

Internet

Application Servers (middle tier)

Shared Pool of commodity Servers

Storage Area network (SAN) Network Attached Storage (NAS)

Grid Computing - Infrastructure is dynamically provisioned to applications that have been virtualized.

Resource Virtualization and Provisioning

Application Servers (middle tier)

Application Servers (middle tier)

Database Servers Database Servers

OLTP Database Servers and Direct Attach Storage

DSS Servers Direct Attach Storage

DSS ServersDSS Servers

Page 7: VLDB 2005 31st International Conference on Very Large Databases

April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway

7

Commercial Grid Components

• Commodity hardware (x-86 based servers)• Linux OS - cost effective• SAN – highly scalable• High speed interconnect (Gigabit Ethernet,

InfiniBand)• Management software (manage as individual

servers or manage as one large virtual servers)

• Database layer (ties the resources together, Dynamic resource allocation, parallel processing)

Page 8: VLDB 2005 31st International Conference on Very Large Databases

April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway

8

Commercial Grid benefits

• High scalability

• High flexibility

• Low total cost of ownership

• High availability

• Easy manageability

Page 9: VLDB 2005 31st International Conference on Very Large Databases

April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway

9

Oracle Features for a Data Warehouse Grid

• Dynamic parallel processing

• Data virtualization and dynamic resource provisioning in DW

• Smart inter node parallelism

Page 10: VLDB 2005 31st International Conference on Very Large Databases

April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway

10

Dynamic Parallel Processing

• Queries are automatically parallelized to maximize resource utilization

• Degree of Parallelism (DOP) is adjusted according to resource availability and computing demands at parse time

• DOP is automatically adjusted when:− Number of concurrent users change− Nodes are taken down for maintenance− Nodes are added due to increased computing

demand (scale-out)− Nodes are assigned to different application

Page 11: VLDB 2005 31st International Conference on Very Large Databases

April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway

11

Data Virtualization and Dynamic Resource Provisioning in DW

• Oracle’s shard everything architecture provides data virtualization and provisioning in Data Warehouses

1 2 3 4 5 6 7 8

Nodes

Disk Subsystem

Interconnect

Page 12: VLDB 2005 31st International Conference on Very Large Databases

April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway

12

Data Virtualization and Dynamic Resource Provisioning in DW

• Oracle’s shard everything architecture provides data virtualization and provisioning in Data Warehouses

1 2 3 4 5 6 7 8

OLAP Reports ETL

Nodes

Disk Subsystem

WorkloadType

Interconnect

Page 13: VLDB 2005 31st International Conference on Very Large Databases

April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway

13

Data Virtualization and Dynamic Resource Provisioning in DW

• Oracle’s shard everything architecture provides data virtualization and provisioning in Data Warehouses

1 2 3 4 5 6 7 8

OLAP Reports ETLDuring peak working hoursNodes

Disk Subsystem

WorkloadType

Interconnect

Page 14: VLDB 2005 31st International Conference on Very Large Databases

April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway

14

Data Virtualization and Dynamic Resource Provisioning in DW

• Oracle’s shard everything architecture provides data virtualization and provisioning in Data Warehouses

1 2 3 4 5 6 7 8

OLAP Reports ETLDuring the night

Nodes

Disk Subsystem

WorkloadType

Interconnect

Page 15: VLDB 2005 31st International Conference on Very Large Databases

April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway

15

Data Virtualization and Dynamic Resource Provisioning in DW

• Oracle’s shard everything architecture provides data virtualization and provisioning in Data Warehouses

1 2 3 4 5 6 7 8

OLAP Reports ETLDuring short intervals when the DW is synchronized with the OLTP system

Nodes

Disk Subsystem

WorkloadType

Interconnect

Page 16: VLDB 2005 31st International Conference on Very Large Databases

April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway

16

Data Virtualization and Dynamic Resource Provisioning in DW

• Oracle’s shard everything architecture provides data virtualization and provisioning in Data Warehouses

OLAP Reports ETLWithout response time requirements all types of workload can run on all nodes

Nodes

Disk Subsystem

1 2 3 4 5 6 7 8

WorkloadType

Interconnect

Page 17: VLDB 2005 31st International Conference on Very Large Databases

April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway

17

Data Virtualization and Dynamic Resource Provisioning in DW

• This concept can be extended to different applications

1 2 3 4 5 6 7 8

OLTP DW DM

Nodes

Disk Subsystem

WorkloadType

Interconnect

Page 18: VLDB 2005 31st International Conference on Very Large Databases

April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway

18

Data Virtualization and Dynamic Resource Provisioning in DW

• This concept can be extended to different applications

1 2 3 4 5 6 7 8

OLTP DW DM

Nodes

Disk Subsystem

WorkloadType

1 2 3 4 5 6 7 8

Interconnect

Page 19: VLDB 2005 31st International Conference on Very Large Databases

April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway

19

Smart Inter Node Parallelism

• Optimizer avoids inter node parallelism when possible reduced interconnect traffic faster execution time

1) node locality− If possible operations are executed on one node− When the DOP of an operation can be satisfied with

resources of one server it executes locally

2) full partition wise join− If two tables are equipartitioned on their join key, the join

can be divided into smaller joins between partitions

3) partial partition wise join− If only one table is partitioned on the join key, the other

table is dynamically repartitioned on the join key to break the large join into smaller joins.

Page 20: VLDB 2005 31st International Conference on Very Large Databases

April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway

20

TPC-H Benchmark

• The industry standard benchmark for data warehouse applications

• Stresses grid based data warehouses:− Complex queries

• Sequential scans of large amounts of data• Aggregations of large amounts of data• Multi-table joins• Extensive sorting of very large sets of data

− Single-user test− Multi-user test− Parallel insert operations− Parallel delete operations

Page 21: VLDB 2005 31st International Conference on Very Large Databases

April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway

21

Benchmarked Configuration

12 x hp SAN Switch 2/16

48 x hp MSA1000

2 x hp ProCurveSwitch 4148gl

12 x hp ProLiant DL585-4x AMD 848 Opteron™ 2.2GHz/1MB8GB 2 x On-board NICs6 x hp fca 2214 DC1 x InfiniCon Systems InfiniServ 7000 HCA

hp ProLiant DL585 Cluster 48P

Storage Area Network

:InfiniCon Systems InfiIO3016

:

:

Page 22: VLDB 2005 31st International Conference on Very Large Databases

April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway

22

Current results

1,000 GB Results

Rank Company System QphH Price/ QphH System

Availability Database

Operating System

Date Submitted

Cluster

1

HP Integrity Superdome

Enterprise Server 68,100 59.00 US $ 01/18/06

Oracle Database 10g R2 Enterprise

Edt w/Partitioning

HP UX 11.i V2 64 bit

08/08/05 N

2

IBM eServer xSeries 346

53,451 32.80 US $ 02/14/05 IBM DB2 UDB

8.2

SUSE LINUX

Enterprise Server 9

02/14/05 Y

3

HP ProLiant DL585 Cluster 48P

35,141 59.93 US $ 10/21/04 Oracle 10g RAC with

Partitioning

Red Hat Enterprise Linux AS 3

10/22/04 Y

4 PRIMEPOWER 2500 34,492 155.99 Euros 03/08/04

Oracle Database 10g

Enterprise Edition

Sun Solaris 9

09/08/03 N

***

PRIMEPOWER 2500 34,492 140.96 US $ 03/08/04

Oracle Database 10g

Enterprise Edition

Sun Solaris 9

11/13/03 N

5

IBM eServer p5 570 with DB2 UDB

26,156 53.43 US $ 12/15/04 IBM DB2 UDB

8.2 IBM AIX 5L V5.3

09/15/04 Y

6

NEC Express5800/1320Xe

(32SMP) 22,967 68.51 US $ 12/07/05

Microsoft SQL Server 2005 Enterprise

Edition 64bit

Microsoft Windows Server 2003

Datacenter Edition 64-

bit

07/19/05 N

7

Unisys ES7000 Orion 440 Enterprise

Server 21,505 41.92 US $ 12/07/05

Microsoft SQL Server 2005 Enterprise

Edition 64bit

Microsoft Windows Server 2003

Datacenter Edition 64-

bit

06/27/05 N

8

NEC Express5800/1320Xe

(32SMP) 20,231 76.06 US $ 12/07/05

Microsoft SQL Server 2005 Enterprise

Edition 64bit

Microsoft Windows Server 2003

Datacenter Edition 64-

bit

06/07/05 N

9

IBM eServer p655 with DB2 UDB

20,221 69.41 US $ 06/08/04 IBM DB2 UDB

8.1 IBM AIX 5L V5.2

12/08/03 Y

10

NovaScale 5160 15,069 44.32 US $ 12/20/05

Oracle Database 10g

release2 Enterprise

Edt

Microsoft Windows Server 2003

Datacenter Edition

06/20/05 N

Page 23: VLDB 2005 31st International Conference on Very Large Databases

April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway

23

Result Analysis

• Leadership performance− Query performance of 35,141 QphH @ 1000GB− Price-to-performance ratio of

$60/QphH @ 1000GB Database grid of ProLiant systems with multiple

Opteron–-x86 processors deliver performance comparable to large SMP systems

The Linux operating system delivers the throughput and processing demands necessary to achieve the benchmark result

Oracle’s 10g + RAC database delivers consistent, high performance query execution in large grid environments

Page 24: VLDB 2005 31st International Conference on Very Large Databases

April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway

24

Future Hardware for Grid – HP BladeSystems

Page 25: VLDB 2005 31st International Conference on Very Large Databases

April 8, 2023 Large Scale Data Warehouses on Grid: Oracle Database 10g and HP ProLiant ServersVLDB 2005 - 31st International Conference -Trondheim, Norway

25

Conclusion

• Grid is ready for prime time• In grid computing resources are provisioned

on demand and virtualized for applications to meet today’s challenging business needs

• Commodity x-86 based servers and blade servers offer reduced total cost of ownership

• Overcomes the natural limitations of SMP systems such as number of processors, memory and disk arrays