sap hana performance with intel processors
DESCRIPTION
Intel® Xeon® Processor E7-8800/4800/2800 Product Families provide the reference platform for SAP® HANASAP HANA benefits from generational platform improvements and new features.TRANSCRIPT
1
SAP HANA® Performance SAPPHIRE NOW 2013 Orlando
Dietrich O. Banschbach
2
Legal Disclaimer - Notice INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm
Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families: Go to: http://www.intel.com/products/processor_number
Intel® AES-NI requires a computer system with an AES-NI enabled processor, as well as non-Intel software to execute the instructions in the correct sequence. AES-NI is available on select Intel® processors. For availability, consult your reseller or system manufacturer. For more information, see http://software.intel.com/en-us/articles/intel-advanced-encryption-standard-instructions-aes-ni/
No computer system can provide absolute security under all conditions. Intel® Trusted Execution Technology (Intel® TXT) requires a computer with Intel® Virtualization Technology, an Intel TXT-enabled processor, chipset, BIOS, Authenticated Code Modules and an Intel TXT-compatible measured launched environment (MLE). Intel TXT also requires the system to contain a TPM v1.s. For more information, visit http://www.intel.com/technology/security
Intel® Virtualization Technology requires a computer system with an enabled Intel® processor, BIOS, and virtual machine monitor (VMM). Functionality, performance or other benefits will vary depending on hardware and software configurations. Software applications may not be compatible with all operating systems. Consult your PC manufacturer. For more information, visit http://www.intel.com/go/virtualization
Requires a system with Intel® Turbo Boost Technology. Intel Turbo Boost Technology and Intel Turbo Boost Technology 2.0 are only available on select Intel® processors. Consult your PC manufacturer. Performance varies depending on hardware, software, and system configuration. For more information, visit http://www.intel.com/go/turbo
Intel product is manufactured on a lead-free process. Lead is below 1000 PPM per EU RoHS directive (2002/95/EC, Annex A). No exemptions required
Halogen-free: Applies only to halogenated flame retardants and PVC in components. Halogens are below 900ppm bromine and 900ppm chlorine.
Copyright © 2013 Intel Corporation. All rights reserved. Intel, Intel Xeon, the Intel Xeon logo and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. . *Other names and brands may be claimed as the property of others.
NOTE: Some Configuration details are listed in the notes sections. Please use Notes Page under View to print or PDF.
3
Legal Disclaimers - Performance Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmarks are reported and confirm whether the referenced benchmarks are accurate and reflect performance of systems available for purchase.
Relative performance is calculated by assigning a baseline value of 1.0 to one benchmark result, and then dividing the actual benchmark result for the baseline platform into each of the specific benchmark results of each of the other platforms, and assigning them a relative performance number that correlates with the performance improvements reported.
SPEC, SPECint, SPECfp, and SPECrate are trademarks of the Standard Performance Evaluation Corporation. See http://www.spec.org for more information.
SAP and SAP NetWeaver are the registered trademarks of SAP AG in Germany and in several other countries. See http://www.sap.com/benchmark for more information.
4
Legal Disclaimers - Optimization Notice
Optimization Notice
Intel® compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel® SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804 http://software.intel.com/en-us/articles/optimization-notice/
4
5
SAP HANA performance – platform benefits
Intel® Xeon® Processor E7-8800/4800/2800 Product Families provide the reference platform for SAP® HANA SAP HANA benefits from generational platform improvements and new features.
Examples:
• Intel® Turbo Boost Technology: Increases performance by increasing processor frequency and enabling faster speeds when conditions allow
• Intel® Hyper-threading Technology: Increases performance for threaded applications delivering greater throughput and responsiveness
• Up to 10 cores and 20 threads; 30 MB of on-die cache
• Up to 2 TB of DDR3 memory on a 4 socket system using 32 GB DIMMs
6
2-socket
2+2 (4S)
2+2+2+2 (8S) 4S (64DIMMs)
4S (32DIMMs)
4+4 (8S)
Modular Platform Drives Innovation Wide Range of Xeon® E7-8800/4800/2800 based Platforms Brought to Market
Xeon® CPU Socket
Memory
I/O Hub
Intel QuickPath Interconnect
3rd partry Node Controller (non-Intel) OEM interconnect
Add’l configs via OEM-specific scaling tech
2 +2 +…
(up to 256s)
...
* Other names and brands may be claimed as the property of others.
Huge variety of systems available for optimized choice
7
Intel engineers optimized HANA for Xeon E7
Adaptation of state-of-art microprocessor ISA extension like SSE4.x
Decompression and search benefit 60-100% speedup
7x faster hash function
3.5x faster implementation of bit-vector operations
Intel Decimal Floating-Point Library
Pre-enabling of Haswell new instructions
Integrated Intel VTuneTM APIs for deep code analysis
Scalability improvements for various usage scenarios
Great scalability on 8-socket glue-less reference design
Intel Engineering Engagement Co-Engineering since 2005 (TREXàBWAàHANA)
0.00
1.00
2.00
3.00
4.00
5.00
6.00
Sp
eed
-Up
Bitvector Function
Bit-Vector Optimizations
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7 8
Sp
eed
-up
# of sockets
SAP HANA - Scalability
perfect scaling
8
Scanning at the Speed of Light
• Optimized SSE routines can scan 2B symbols/s on 1 core
• A 4-way Intel® Xeon® processor E7-4870 system can scan 50B symbols/s:
• A fast typist can type 400 chars/min (world record: 542) • You need 7.4B typists to type as fast as scanning the data. • World population is 7.0B.
• Assume a receipt with 5mm per line • 1 system can scan 247,939km/s of receipts. • Speed of light is 299,792km/s.
• Source: Intel internal measurements
9
Beyond Performance
SAP HANA implements recovery routines that allow the database to survive uncorrectable memory errors in many cases
Normal Status With Error Prevention
First Machine Check Architecture Recovery in Xeon®-based Systems
*Errors detected using Patrol Scrub or Explicit Write-back from cache MCA needs to be supported by the OS and the Application
Previously seen only in RISC, mainframe, and Itanium-based systems
REG
DRAM DRAM DRAM DRAM
DRAM
DRAM DRAM DRAM
DDR3 DDR3 DDR3 DDR3
DDR3
DDR3 DDR3
DDR3
REG
DRAM DRAM DRAM DRAM
DRAM
DRAM DRAM DRAM
DDR3 DDR3 DDR3 DDR3
DDR3
DDR3 DDR3
DDR3
REG
DRAM DRAM DRAM DRAM
DRAM
DRAM DRAM DRAM
DDR3 DDR3 DDR3 DDR3
DDR3
DDR3 DDR3
DDR3
REG
DRAM DRAM DRAM DRAM
DRAM
DRAM DRAM DRAM
DDR3 DDR3 DDR3 DDR3
DDR3
DDR3 DDR3
DDR3
S M B
REG
DRAM DRAM DRAM DRAM
DRAM
DRAM DRAM DRAM
DDR3 DDR3 DDR3 DDR3
DDR3
DDR3 DDR3
DDR3
REG
DRAM DRAM DRAM DRAM
DRAM
DRAM DRAM DRAM
DDR3 DDR3 DDR3 DDR3
DDR3
DDR3 DDR3
DDR3
REG
DRAM DRAM DRAM DRAM
DRAM
DRAM DRAM DRAM
DDR3 DDR3 DDR3 DDR3
DDR3
DDR3 DDR3
DDR3
REG
DRAM DRAM DRAM DRAM
DRAM
DRAM DRAM DRAM
DDR3 DDR3 DDR3 DDR3
DDR3
DDR3 DDR3
DDR3
S M B
SMI
SMI
Error Corrected
HW Correctable Errors
Error Detected*
Patrol Scrubber scans memory for errors
Un-correctable Error
Error Contained
Bad memory location flagged so data will not be used by OS or applications
Error information passed to OS
System works in conjunction with OS to recover or restart processes and continue normal operation
9
System Recovery with OS
SAP HANA enabled to analyze OS error signals and executes own recovery routines
SAP HANA Recovery
Error information passed to IMDB
10
What‘s next?
Brickland–EX Platform & Intel® Xeon® processor E7-8800/ 4800/2800 v2 product families (codenamed ‘Ivy Bridge-EX’)
22nm process technology Die shrink of Sandy Bridge microarchitecture Future microarchitecture after Boxboro-EX/Westmere-EX Support of PCIe* 3.0
IVB-EX will include new advanced reliability features MCA Recovery Execution Path, MCA I/O, Enhanced MCA Gen 1, PCIe* Live Error Recovery
IVB-EX will triple the memory capacity Up to 6TB in 4S system; up to 12TB in 8S system
Support for up to 24 DIMMs per socket; DDR3 memory (RDIMM, LRDIMM) up to 64GB (LRDIMMs) density
IVB-EX on-track for Production in Q4’ 2013
11
Intel® Xeon® Processor E7-8800/4800/2800 Product Families Key Performance Claims Backup
Performance claims as of 15 February 2011: 1. Generational: Up to 40% generational compute-intensive throughput claim based on SPECint*_rate_base2006 benchmark comparing next
generation Intel® Xeon® processor E7-4870 (30M cache, 2.40GHz, 6.40GT/s Intel® QPI, formerly codenamed Westmere-EX) scoring 1,010 (includes Intel Compiler XE2011 improvements accounting for about 11% of the performance boost) to X7560 (24M cache, 2.26GHz, 6.40GT/s Intel QPI, formerly codenamed Nehalem-EX) scoring 723 (Intel Compiler 11.1). Source: Intel SSG TR#1131.
2. Scalability: Up to 2.8x scaling transaction improvement claim based on internal OLTP benchmark comparing next generation Intel® Xeon® processor E7-4870 (30M cache, 2.40GHz, 6.40GT/s Intel® QPI, formerly codenamed Westmere-EX) scoring 2.73M transactions (leading database vendor) to X5680 (12M cache, 3.33GHz, 6.40GT/s Intel QPI, formerly codenamed Westmere-EP) scoring 970K transactions. Source: Intel SSG TR#1120.
3. Consolidation: Up to 29:1 server consolidation performance with return on investment in less than one year" claim estimated based on comparison between 4S MP Intel® Xeon® processor 3.33GHz (single-core with Intel® HyperThreading Technology, 8M LLC cache, 3.33GHz, 800MHz FSB, formerly code named Potomac) and 4S Intel® Xeon® processor E7-4870 (30M cache, 2.40GHz, 6.4GT/s Intel® QPI, formerly code named Westmere-EX) based servers. Up to 18:1 server consolidation performance with return on investment in about 14 months" claim estimated based on comparison between 4S MP Intel® Xeon® processor 7041 (dual-core with Intel® HyperThreading Technology, 4M cache, 3.00GHz, 800MHz FSB, formerly code named Paxville) and 4S Intel® Xeon® processor E7-4870 (30M cache, 2.40GHz, 6.4GT/s Intel® QPI, formerly code named Westmere-EX) based servers. Calculation includes analysis based on performance, power, cooling, electricity rates, operating system annual license costs and estimated server costs. This assumes 42U racks, $0.10 per kWh, cooling costs are 2x the server power consumption costs, operating system license cost of $900/year per server, per server cost of $41,523 based on averaged estimated list prices, and estimated server utilization rates. All dollar figures are approximate. Estimated SPECint*_rate_base2006 performance and power results are measured for Intel® Xeon® processor E7-4870 and estimated for Intel Xeon processor 3.33GHz single-core / 7041 dual-core based servers. Platform power was measured during the steady state window of the benchmark run and at idle. Performance gain compared to baseline was 29x for single-core and 18x for dual-core (truncated). * Baseline single-core platform (measured score of 34.1; idle = 480W; active = 780W): Intel server with four MP Intel® Xeon® processor 3.33GHz processors, 16GB memory (8x 2GB DDR2-400), 1 hard drive, 1 power supply, Microsoft Windows Server* 2008 Enterprise x64 Edition R2 operating system, Intel Compiler 11 built SPECcpu* 2006 November 2009 binaries. Estimated result. * Baseline dual-core platform (estimated score of 54.6; idle = 546W; active = 812W): Intel server with four Intel® Xeon® processor 7041 processors, 32GB memory (16x 2GB DDR2-400), 1 hard drive, 1 power supply, Microsoft Windows Server* 2008 Enterprise x64 Edition R2 operating system, Intel Compiler 11 built SPECcpu* 2006 November 2009 binaries. Estimated result. * New platform (measured score of 1,000; idle = 552W; active = 1053W): Intel internal reference server with four Intel® Xeon® processor E7-4870 (30M cache, 2.40GHz, 6.40GT/s Intel® QPI), 256GB memory (64 x Samsung 4GB 2Rx8 PC3L-10600R), 1 hard drive, 3 power supplies, using SUSE* Linux Enterprise Server 11 operating system, Intel C++ and Fortran Composer XE2011 built SPECcpu* 2006 January 2011 binaries. Source: Intel SSG TR#1131.
4. Flexible Virtualization: Up to 25% better virtual machine performance claim based on SPECvirt_sc2010 benchmark comparing next generation Intel® Xeon® processor E7-4870 (30M cache, 2.40GHz, 6.40GT/s Intel® QPI, formerly codenamed Westmere-EX) scoring 2,540 @ 162VMs to X7560 (24M cache, 2.26GHz, 6.40GT/s Intel QPI, formerly codenamed Nehalem-EX) scoring 2,024 @ 126VMs. Source: Intel SSG TR#1118.
12
Orders of magnitude