results of the fermilab 64-bit linux hardware and software evaluation
DESCRIPTION
Results of the Fermilab 64-bit Linux Hardware and Software Evaluation. Spring 2005 HEPiX meeting Karlsruhe, Germany Ken Schumacher, Steven Timm. Goals of the Evaluation. Gain experience with x86_64 architecture of Linux kernel and see if it is a stable OS platform - PowerPoint PPT PresentationTRANSCRIPT
May 11, 2005 Fermilab 64-bit Linux Evaluation 1
Results of the Fermilab 64-bit Linux Hardware and Software
Evaluation
Spring 2005 HEPiX meeting
Karlsruhe, Germany
Ken Schumacher, Steven Timm
May 11, 2005 Fermilab 64-bit Linux Evaluation 2
Goals of the Evaluation
• Gain experience with x86_64 architecture of Linux kernel and see if it is a stable OS platform
• Evaluate AMD Opteron CPU and the associated hardware platforms to see if they are reliable hardware platforms.
• Obtain relative performance numbers between Intel Xeon EM64T “Nocona” and AMD Opteron processors
• Obtain relative performance numbers on applications compiled in 32-bit and 64-bit mode.
May 11, 2005 Fermilab 64-bit Linux Evaluation 3
64-bit hardware
• Intel IA64 as implemented in Itanium 2– Not considered in this evaluation, – Not binary-compatible with IA32 instruction set– Expensive
• Intel - EM64T Xeon “Nocona”– Fermilab already has >240 of these in production
• AMD - AMD64 Opteron• Note-Spec CINT2000 are about the same.
Opteron 250=1452 and Xeon 3.6GHz=1429
May 11, 2005 Fermilab 64-bit Linux Evaluation 4
Extending 32-bit instruction set• Intel and AMD scheme very
similar• 48-bit virtual address space• 64-bit General Purpose
Registers– Support 64-bit addressing
and integer math– Eight extra GPR added– Eight extra XMM added
• Difference—EM64T supports SSE3 instructions, Opteron has 3DNow!
May 11, 2005 Fermilab 64-bit Linux Evaluation 5
Vendor Selection
• Only used vendors that Fermilab has previous experience with.
• Requested 12 evaluation units, got 9.• Opteron units from Koi, ASA, Penguin, CSI, Rackable,
IBM, HP, Sun• Purposely requested a variety of CPU speeds• Motherboard manufacturers represented include Tyan,
Accelertech, Sun (by Newisys), IBM (by MSI), HP.• Dell Poweredge SC1425 Xeon unit (3.6 GHz) from Dell,
as a reference. (Dell doesn’t offer Opteron).
May 11, 2005 Fermilab 64-bit Linux Evaluation 6
Machine configurations:Vendor Processor Speed Board
HP (Proliant DL145) Opt 244 1.8 GHz HP
IBM (eServer 326) Opt246 2.0 GHz IBM
Rackable Opt246HE 2.0 GHz Tyan
Penguin Opt248 2.2 GHz Accelertech
CSI Opt248 2.2 GHz Accelertech
ASA Opt248 2.2 GHz Tyan
Sun (Sunfire v20z) Opt248 2.2 GHz Newisys
Koi Opt250 2.4 GHz Tyan
Dell PE SC1425 Xeon 3.6 GHz Dell
May 11, 2005 Fermilab 64-bit Linux Evaluation 7
Hardware features• Dual Opteron boards designed with NUMA
– Each CPU has its own memory bank– No contention between CPU’s on front side bus
• Some remote management available on all of them; we did not test it.
• Several with SATA drives, they work fine.• Broadcom tg3 is network interface on all.• Rackable has low voltage Opteron 246HE chip,
only 55W but same compute power as regular Opteron 246.
May 11, 2005 Fermilab 64-bit Linux Evaluation 8
Evaluation units
May 11, 2005 Fermilab 64-bit Linux Evaluation 9
OS Installation
• Successfully installed all systems with 64-bit NPACI-Rocks, Scientific Linux Fermi i386, and Scientific Linux Fermi x86_64.
• Tested operations of XFS file system, OK• Default SL kernel in version 3.0.3. is
2.4.21-20, ran with that most of time.• 2.6.9 kernel needed to take full advantage
of NUMA architecture of Opterons, that works too.
May 11, 2005 Fermilab 64-bit Linux Evaluation 10
Linux kernels and distros.• One architecture x86_64, kernels come
compiled for ia32e (Xeon) and amd64 (Opteron).
• Similar to i386 architecture with separate i686 and athlon kernels.
• All other rpms are the same for either.• Able to run almost all of our 32-bit
applications under the 64-bit kernel/distro in compatibility mode with little trouble.
May 11, 2005 Fermilab 64-bit Linux Evaluation 11
Reliability Testing• Full Fermilab Acceptance test for 30 days
– Continual disk activity both disks– Both cpu’s continuously busy.– 20 days in 64-bit mode, 10 in 32-bit mode
• Excluding one node with two catastrophic disk failures (which was disqualified), other seven Opterons had 97.6% uptime.
• Downtime was due to kernel hangs in 64-bit mode that we haven’t been able to reproduce since.
May 11, 2005 Fermilab 64-bit Linux Evaluation 12
Benchmarks• All major Fermilab computing users contributed
benchmarks and people to run them.– CDF: reconstruction– D0: reconstruction– CMS: OSCAR and ORCA simulation and digitization,
Root stress test, Pythia– SDSS: Supernova search program– LQCD: QCDStreams, MILC lattice code– General: seti@home, CERN unit benchmark, tiny
• Many more details in our paper
May 11, 2005 Fermilab 64-bit Linux Evaluation 13
CMS Root Benchmark
64-bit mode gives gains on Opterons of about 40%
May 11, 2005 Fermilab 64-bit Linux Evaluation 14
Fermi Cycles
• Reconstruction farms use Fermi Cycles (to account for differences in clock speed between Intel and AMD hardware).
• Pentium III 1 GHz is defined to have 1000 Fermi Cycles• All other platforms take the average of the performance
of CDF Reconstruction and D0 Reconstruction, normalized to PIII 1GHz performance.
• D0 and CDF executables are 32-bit, optimized only at Pentium architecture, not recompiled.
• We find D0 legacy executable runs is 2.93x faster on Opteron 250, 2.38x faster on Xeon 3.6 (than PIII 1GHz).
May 11, 2005 Fermilab 64-bit Linux Evaluation 15
Compilers• Use “tiny” (3000-line mock reconstruction program in
Fortran, runs all in cache)• Opteron 250
– Legacy executable, i386: 1290 VUPS– Gcc 3.4.2 optimized: 2440 VUPS– Pathscale compiler: 2677 VUPS
• Xeon 3.6– Legacy executable, i386: 1386 VUPS– Gcc 3.4.2 optimized: 2309 VUPS– Intel 8.1 compiler:2910 VUPS– Intel 8.1 compiler with profile feedback: 4332 VUPS
• Intel Fortran (and C) 8.1 uses SSE3 instructions to optimize, makes it incompatible with Opterons.
• For comparison PentiumIII 1.0 GHz=568 VUPS.
May 11, 2005 Fermilab 64-bit Linux Evaluation 16
Run II benchmarksProcessor D0 evts/s Rel. PIII CDF evts/s Rel PIII AVG
Opt244 0.0187 2.28 1.05 2.56 2.42
Opt246 0.0206 2.51 1.12 2.73 2.62
Opt246HE 0.0208 2.54 1.11 2.71 2.62
Opt248 0.0230 2.80 1.24 3.02 2.91
Opt250 0.0240 2.93 1.33 3.24 3.09
Xeon2.4 0.0132 1.61 0.82 2.00 1.80
Xeon2.66 0.0146 1.78 0.90 2.20 1.99
Xeon3.06 0.0164 2.00 1.05 2.56 2.28
Xeon3.0 0.0162 1.98 1.00 2.44 2.21
Xeon3.4 0.0183 2.23 1.12 2.73 2.48
Xeon3.6 0.0190 2.32 1.20 2.93 2.62
Athlon1.66 0.0134 1.63 0.79 1.92 1.78
PIII1.0 0.0082 1.00 0.41 1.00 1.00
May 11, 2005 Fermilab 64-bit Linux Evaluation 17
Power DrawVendor Processor Idle Loaded
HP Opt244 1.54 1.74
Rackable Opt246HE 1.44 1.61
IBM Opt246 1.55 1.80
Penguin Opt248 1.70 1.98
ASA Opt248 1.91 2.20
CSI Opt248 1.95 2.30
Sun Opt248 2.00 2.35
Koi Opt250 2.07 2.44
Dell Xeon3.6 1.66 2.75
Koi Xeon3.6 1.70 3.10
Koi Xeon3.4 1.60 2.90
Koi Xeon3.0 1.20 2.70
Koi Xeon 3.06 1.10 2.30
Koi Xeon 2.66 1.00 1.90
In general Opterons draw 10-27% less current at full load than comparable Xeon chips.
Four Opteron 248’s vary in current draw, explained by increasing numbers of fans and higher-performance disk drives.
Low voltage Opteron246HE saves 10-15% over high-voltage Opteron 246.
We need to average 10 kVA per rack in our facility. Have many racks now that are 12kVA.
10kVA/rack = 2.1A/node
May 11, 2005 Fermilab 64-bit Linux Evaluation 18
Conclusions
• 64-bit Linux OS is a stable operating platform• Opteron CPU and associated platforms have
sufficient reliability for Fermilab production Farms
• Opteron CPU gives us slightly better performance for significantly less power draw and about the same price as Xeon.
• Using 64-bit compilation and optimization can lead to significant performance gains on AMD and Intel.
May 11, 2005 Fermilab 64-bit Linux Evaluation 19
Referances• Fermilab Evaluation Results:
– http://www-oss.fnal.gov/scs/public/qualify2005/opteron_external.ps
• AMD Developer Symposium 2002– “Optimizing for the AMD Opteron™
Processor” by Tim Wilkens PH.D.– http://www.amd.com/us-en/assets/
content_type/DownloadableAssets/Optimization_-_Tim_Wilkens.pdf
May 11, 2005 Fermilab 64-bit Linux Evaluation 20
Power Supply Efficiency
• General Information on PS Efficiency– http://www.efficientpowersupplies.org/
• “Energy Efficiency of Computer Power Supplies” from CEPE website– http://www.cepe.ethz.ch/download/staff/bernar
d/28_formated.pdf
• http://www.xbitlabs.com/articles/other/display/psu-methodology.html