microterabyte leveraging infiniband to build a powerful, scalable oracle database and application...
TRANSCRIPT
MicroTerabyteLeveraging InfiniBand to Build a Powerful, Scalable
Oracle Database and Application Platform
Brian Dougherty
Chief Architect, CMA
Background
• Exploding data volumes are presenting new challenges to small and medium sized organizations.
• These organizations need a new generation of technology that delivers powerful analytical capability with reduced cost and complexity.
• CMA, in partnership with Dell, QLogic, Oracle, and EMC, has developed a sophisticated solution to address this growing need.
MicroTerabyte PlatformWhat is It ?
• A pre-configured, integrated, fault tolerant, high performance, commodity hardware and software based runtime environment
• Leverages the power of the Oracle database running on commodity Linux servers, and a unified InfiniBand fabric over QLogic multi-protocol fabric directors
• Provides a scalable, reliable, lower cost platform for Business Intelligence, Custom Software, or Commercial Software
MicroTerabyte PlatformFeatures
• High Performance• Scalable• Fault Tolerant• Commodity Hardware• Lower Cost• Smaller Footprint • Reduced Power Consumption
MicroTerabyte PlatformAttributes
• Red Hat Linux Operating System• Commodity Hardware (Dell / EMC CX / QLogic)• 42U Rack Footprint with Fault Tolerant 42U rack• Clustered Oracle Database • Clustered Storage Provisioning Layer• Unified Storage and Interconnect Fabric
MicroTerabyte Single Rack Configuration
Sample Hardware Components (per rack)• (2) Dell 1950 Database Servers
• 32GB RAM Each
• 16 Processor Cores
• (1) Dell 1950 ETL Server
• 16GB RAM
• 8 Processor Cores
• (1) Dell 1950 Business Intelligence Server
• 16GB RAM
• 8 Processor Cores
• (1) QLogic 9020 InfiniBand Fabric Director
• (2) FVIC Modules
• (2) EMC CX3-40 Storage Arrays
• 2TB – 5TB Storage
• (1) Dell PowerConnect 48 Port GigE Switch
• (1) Belkin KVM
• (1) Dell Console
Physical Data Guard
MicroTerabyte Mid-Range ConfigurationHardware components (two racks)• (4) Dell 1950 Database Servers
• 32GB RAM Each
• 32 Processor Cores
• (1) Dell 1950 ETL Server
• 32GB RAM
• 8 Processor Cores
• (1) Dell 1950 Business Intelligence Server
• 32GB RAM
• 8 Processor Cores
• (1) QLogic 9040 InfiniBand Fabric Director
• (2) FVIC Modules
• (2) EMC CX3-40 Storage Arrays
• 8TB – 15TB Storage
• (2) EMC SATA 1TB Drives (for backup)
• (1) Dell PowerConnect 48 Port GigE Switch
• (1) Belkin KVM
• (1) Dell Console
MicroTerabyte Large Scale ConfigurationHardware components (three racks)
• (8) Dell 1950 Database Servers
• 32GB RAM Each
• 64 Processor Cores
• (2) Dell 1950 ETL Server
• 32GB RAM Each
• 8 Processor Cores
• (2) Dell 1950 Business Intelligence Server
• 32GB RAM Each
• 8 Processor Cores
• (1) QLogic 9040 InfiniBand Fabric Director
• (4) FVIC Modules
• (4) EMC CX3-40 Storage Arrays
• 24TB – 32TB Storage
• (2) EMC SATA 1TB Drives (for backup)
• (1) Dell PowerConnect 48 Port GigE Switch
• (1) Belkin KVM
• (1) Dell Console
MicroTerabyte ArchitectureMid-Level Diagram
Storage Nodes
Unified Fabric Layer
Compute Nodes
QLogic Multi-Protocol Fabric Director
Dell 1950 PowerEdge Servers / RHEL v5
EMC CX3 / CX4 Flare OS / Navisphere
MicroTerabyte ArchitectureDetailed Diagram
Public Networking Dell PowerConnect 6248
Server Compute Dell 1950/2950 + QLogic 7104 HCA
Unified Fabric QLogic 9020/9040 Multi-Protocol Director
Storage Infrastructure EMC CX3/CX4
Red Hat Linux O/S QLogic OFED InfiniBand Drivers
Oracle ASMlib Oracle Clusterware 11g
Oracle ASM 11g
Oracle EE 11gw/ RAC and Partitioning
ERP Application Package
ISV Third Party Software
CMA BI Suite
Business Intelligence
Mic
roT
erab
yte
Pla
tfor
m
MT Core
MT O/S Binding
MT Clustered Database /
Storage Provisioning
Applications
Oracle Grid Control 11g
MicroTerabyte PlatformOracle Software
Physical Data
Guard
OracleRMAN
Oracle Database Partitioning Parallel Query Oracle VPD Oracle RAC
Oracle ASMASMlib
Oracle Clusterware CRS CSS EVM Cache Fusion
MicroTerabyte Compute NodesDell 1950/2950
• MicroTerabyte solution consists of 2/4/8 RAC Database nodes
• 2 ETL and BI Nodes• Each node is a Dell 1950/2950 consisting of:
– Processor: One/Two quad-core Intel Xeon X5355 @ 2.66GHz
– Memory: 16-64 GB– Hard drives: 218GB-3TB Internal Storage– RAID Controller: PERC 5/i – 1 DDR InfiniBand HCA– Network interface cards: Dual gigabit NICs
(100baseTx-FD)– Power supply: 670W, optional hot-plug redundant
power (1+1)– Operating system: RedHat Enterprise Linux v5
MicroTerabyte Storage NodesEMC CX4 Model 480
Front-End Host Connectivity
• Two storage processors per CX4
• Each processor has:
• Four 4Gb Fibre Channel optical ports
• FCP SCSI-3 protocol
Back-End Disk Connectivity
• Each processor has 4Gb Fibre Channel arbitrated loops.
• Multiple RAID groups may be distributed across redundant loops
• Supports a maximum of 480 disk drives
System Memory
• Each processor has 8GB of Memory
Power Consumption (Processor Chassis)
• 355 VA (290W max)
Power Consumption (Disk Expansion Chassis)
• 440 VA (425W max)
MicroTerabyte Unified Fabric LayerWhat is it?
• QLogic 9020 & 9040 Multi-Protocol Fabric Directors– 9020 with two (2) FVIC IB-FC Virtual I/O Controllers
– 9040 with up to four (4) FVIC IB-FC Virtual I/O Controllers
– each FVIC provides 10 DDR (20Gb) IB ports & 8 4Gb FC ports• supports up to 128 Virtual HBA ports per module
• automatic sensing 1/2/4 Gb/s
• load balancing
• automatic port and module fail-over
• LUN mapping and masking features
• QLogic 7104 Host Channel Adapters– Dual Port, DDR
– IPoIB, RDS, SRP
MicroTerabyte Unified Fabric LayerGeneral Benefits
• Managing one fabric• Reduced footprint• Compact implementation• Fewer host components needed to support I/O and
interconnect• Increased bandwidth and reduced latency• Reduced host resources (1 HCA vs. several HBAs)• Path failover through SRP protocol• Well positioned to take advantage of advances in
InfiniBand technology
MicroTerabyte Unified Fabric LayerOracle RAC Benefits
• Scalable platform to support Oracle RAC• More predictable response times• Capability to drive more Oracle I/O through
fewer compute nodes• Ability to exploit storage capability at a lower
cost• Reduced Oracle messaging latency via RDS/IB
CONFIGURATION
A mid-size MicroTerabyte configuration, including:
Servers: Four (4) Dell 1950 Intel quad core servers. Each 1950 includes 8 cores, 16GB memory and 1 dual channel HCA. The server is running Red Hat Linux 5 update 1 (2.6.18-53 kernel).
Storage: Two (2) EMC CX3-40 storage arrays. Each storage array includes 8 4Gbps front-end fiber channel connections for a total of 16 4Gbps FE adapters -- approx 7.5TB usable storage configured in 4+1 RAID sets.
Unified Interconnect and Storage IB fabric: One (1) QLogic 9040 Multi-Protocol Director with 2 FVIC modules
HARDWARE COST
Approximate total cost (market): $500,000
TEST METHOD AND RESULTS
Testing simulator: Oracle ORION -- Oracle I/O Numbers Calibration Tool for Linux
>> Random I/O Test Results:
Test Type: random I/O
I/O size: 8K
Rate observation 1: 9,600 sustained IOPS @ 1.87 ms per I/O avg node latency
Rate observation 2: 30,566 sustained IOPS @ 2.96 ms per I/O avg node latency
Rate observation 3: 42,615 sustained IOPS @ 2.96 ms per I/O avg node latency
>> Sequential I/O Test Results:
Test Type: sequential I/O
I/O size: 1MB
Rate observation 1: 677MB/sec sustained I/O seq. throughput @ 26.48 ms per 1MB I/O avg node latency
Rate observation 2: 2.098GB/sec sustained I/O seq. throughput @ 91.48 ms per 1MB I/O avg node latency
MicroTerabyte ORION Benchmarks
MicroTerabyte Oracle Benchmarks
Servers: (4) Dell 1950
Dual Socket Quad Core
16 GB RAM per server
Single dual port IB HCA
Unified I/O Fabric: QLogic 9040
(2) FVIC Modules
Storage: (2) EMC CX-3 Model 40
(16) Fibre Channel ports
O/S: Red Hat Linux v 5.1
Kernel 2.6.18-51
UDEV
Unified I/O Fabric: QLogic IB drivers v 4.2.0.0.39
IB/RDS
IB/SRP
IPoIB
Database: Oracle RAC 10gR2
Oracle Clusterware 10gR2
Oracle ASM
Oracle RMAN
Oracle DataGuard
Storage: Flare O/S
EMC Navisphere
Backup: EMC Networker
Software
HardwareSource Target Degree of Row Count Elapsed Tim e
Table Size Segm ent Size Paralle lism
Database Operationterabytes - unless
notedterabytes - unless
notedin billions - unless
notedHH:MM:SS.MS
0.485 N/A 256 17 00:03:48.49
0.48 N/A 448 17.7 00:03:42.42
1.2 * N/A 352 44 00:09:51.93
2.3 N/A 256 35 00:18:15.33
1.2 * 1.41 128 31 01:40:03.04
0.579 * 1.02 128 9 00:51:05.57
1.2 * 1.96 128 35 02:40:03.04
2.2 0.512 * 64 22 03:35:01.01
2.2 1.2 64 17 01:51:02.79
1.2 92 GB 50 44 00:32:17.19
1.2 0.133 50 44 00:23:35.45
1.96 65GB 50 32 00:23:08.48
1.96 86GB 50 32 00:24:29.30
0.3 105GB 128 4.5 01:37:18.34
0.579 168GB 128 9 03:57:23.54
21GB 3.4GB 128 100 million 00:00:32.55
21GB 3.6GB 128 100 million 00:00:38.50
DBM S Stats Global Collection 0.56 N/A N/A 9 00:03:30.67
DBM S Stats Global and Local Collection 0.56 N/A N/A 9 00:04:36.97
Index Creation Global Hash
Table Scan
Direct Path Insert
M aterialized View Creation
Index Creation Local Bitm ap
CM A M icroTerabyte Platform Aw ard Winning Human Services Aw ard Winning Medicaid Aw ard Winning Fraud Investigation
M id-Size Configuration Enterprise Data Warehouse Enterprise Data Warehouse Enterprise Data Warehouse
Dell 1950 / QLogic 9040 / EM C CX Storage SUN E25k and SUN 9990 Storage Array IBM P690 and EMC DMX3 Storage Array HP Superdome and XP Storage Array
> Com parative Real-World Database Operations
Full Scan Time of Largest Fact Table 18.25 minutes 7 minutes 45 minutes 21.75 minutes
Scan Time Normalized / Extrapolated to 2.3 TB 18.25 m inutes 32.2 m inutes 32.4 m inutes 41.5 m inutes
Approximate Direct Path Insert Time 100 m inutes 105 m inutes 215 m inutes N/A
Segment Size 1.4 TB 500 GB 2.3 TB N/A
Row s Inserted 31 Billion 2.8 Billion 3.5 Billion N/A
Local Bitmap Index Create Time 23 m inutes 13.5 m inutes 55 m inutes 43 m inutes
Row Count 35 Billion 2.8 Billion 4.2 Billion 1.1 Billion
Table Size (Moderate Cardinality) 2.3 TB 500 GB 3.2 TB 1.2 TB
> Environm ent Configuration Details
Processors 32 cores (8 quad cores) 48 32 32
Memory 64 GB 128 GB 256 GB 128 GB
Server HBAs 4 dual ported HCAs (1 HCA per server) 16 @ 4Gbps 20 @ 4Gbps 16 @ 4Gbps
Storage Array Ports 16 @ 4Gbps (2 EMC CX3 Model 40s) 16 @ 4Gbps 20 @ 4Gbps 16 @ 4Gbps
Operating System Red Hat Linux 5.1 Solaris 9 AIX 5.3 HP-UX 11i
Oracle ASM Vxvm and VXFS w ith ODM AIX Volume Manager w ith JFS2 Vxvm and VXFS w ith ODM
QLogic SRP 4.2 Veritas DMP EMC Pow erpath Veritas DMP
SRP over Inf iniBand via QLogic FC Gatew ay Fibre Channel Brocade based SAN Direct Attached Fibre Channel Direct Attached Fibre Channel
4+1 RAID 5 LUNs - 73 GB drives 4D+4D Parity Groups - 73 GB 15k Drives Metavolumes w ith 146 GB 15k drives 4D+4D Parity Groups - 73 GB 15k Drives
Oracle EE 64-bit 10.2.0.3 Oracle EE 64-bit 10.2.0.3 Oracle EE 64-bit 11.1.0.6 Oracle EE 64-bit 10.2.0.3
RAC / Inf iniBand w ith RDS non-RAC local IPC non-RAC local IPC non-RAC local IPC
Database Usable Storage 8 TB 8 TB 21 TB 16 TB
Largest Fact Table Row Count 35 Billion 2.8 Billion 4.2 Billion 1.1 Billion
Largest Fact Table Size 2.3 TB 500 GB 3.2 TB 1.2 TB
Approximate HW Cost $500K $2M+ $2M+ $2M+
Volume Manager, File System, Multi-Pathing
Storage Topology
Database Topology
MicroTerabyte Benchmark Comparisons
Summary• As demonstrated in CMA’s MicroTerabyte platform,
Infiniband can provide an extremely capable transport mechanism for unifying interconnect and i/o traffic
• Increases bandwidth and reduces latency• Reduces Oracle messaging latency via RDS/IB• Provides more predictable response times• Reduces host resource requirements (i/o processing
workload off-loaded to HCA card)• Consistent with Oracle’s strategic technology direction
Brian DoughertyChief Architect, [email protected]
For More Information