Download - Dell Accelerates the Business of HPC
Dell Accelerates the Business of HPC Sponsored Whitepaper
June 20, 2016
Page 1
© TIRIAS RESEARCH ALL RIGHTS RESERVED
This paper is first of a series that will describe Dell’s initiatives and product strategy in commercial
and mid-sized research applications of high performance computing (HPC). The focus in this paper
is on academic and research institutions that run a multitude of workloads. Subsequent papers will
explore Dell’s focus on more HPC business buyers in vertical markets, such as life sciences and
manufacturing.
Executive Summary
HPC buying behavior has changed greatly from previous patterns.
Commercial and mid-sized general research HPC customers are
typically focused on product development and multitenant research
instead of massive basic research projects. The front sections of this
paper will summarize the current state of the overall HPC market, and
the later sections introduce Dell’s recent participation in the HPC
market and describe representative Dell HPC general research
customer use cases.
Dell is engaged in a number of initiatives and partnerships to address
the specific needs of the growing mid-market HPC customer
community in order to extend HPC into the enterprise. These include
forming a Dell HPC Community, being a founding member of
OpenHPC, focusing on market expertise for specific HPC vertical
markets, funding their expansive Dell HPC Innovation Lab, integrating in-memory analytics
solutions, and offering the HPC community enterprise-class financing, deployment, and support
services.
Dell’s HPC System for Research enables Dell to understand academic and research institutions
that run a multitude of workloads to better serve them with scalable and affordable HPC resources.
Many of these customers are mid-sized and smaller organizations.
HPC is Still Evolving
There are many definitions for HPC and supercomputing. Most of them compare high performance
systems to more “normal” enterprise or personal computing systems – of course, both normal and
high-performance are moving targets over time.
TIRIAS Research has a more functional description – in general, HPC systems are designed to
accurately simulate reality. Specifically, a large portion of HPC systems are designed to accurately
simulate some aspect of our physical, three-dimensional (3D) world. HPC systems are designed to
break up physical spaces into smaller pieces. These smaller pieces are called “voxels” (a mash-up
Figure 1 Dell’s System for Research in Dell’s HPC
Innovation Lab [Source: Dell]
Page 2
Dell Accelerates the Business of HPC
TIRIAS RESEARCH
of “volume” and “pixel”). Each voxel is described by its position in the simulation space, its
resolution (height, width, depth), and a set of mathematical rules that describe what happens in the
voxel during each time-slice of a simulation (and considering what happens in adjacent voxels).
Current generation HPC systems can model useful components of real world systems, for example
automotive crash test simulations, local weather predictions, and molecular dynamics, and they
can model larger systems with commercially useful simulation size, time resolution and scale, and
model complexity. There are some classes of HPC that are designed to attack problems in pattern
matching (such as genomics), cryptography, number theory and other abstract mathematical
concepts, but the bulk of HPC system design investment is directed at simulating reality – weather,
biology, product design (from consumer goods to weapons), etc.
The reason HPC performance is still a moving target is that current HPC systems are still not
capable of running realistic simulations in real-time for meaningfully large systems and volumetric
spaces. The difficult part for many people to understand is how far away IT still is from simulating
important small scale physical systems, such as a fully functional single cell organism living in
human blood or a complete automobile and its local neighborhood, including local weather
conditions.
The driving factors behind HPC architecture today remain as they have been for decades:
Larger volumetric spaces with
finer volumetric precision
Longer time scales with finer
time slices
Increasingly complex models
with more intricate interactions
between model elements
While HPC performance is still a moving target, it has past the point where simulation size and
complexity are only useful to researchers. As HPC performance continues to increase, more
commercial uses will open up.
Scaling Performance is Getting Harder
HPC hardware architectures are settling on a core of best practices. One key best practice in
simulating physical spaces is to parallelize voxel processing by scaling-out the size of an HPC
cluster. The supercomputer Top5001 list now includes HPC clusters with hundreds of thousands
1 http://www.top500.org/
Figure 2 [Source images: NASA]
Page 3
Dell Accelerates the Business of HPC
TIRIAS RESEARCH
of processors, and that consume tens of megawatts of power. This is not a sustainable trajectory in
many ways – power consumption, capital expenses and operational complexity are at the top of
the list – therefore many governments have funded initiatives2 to pioneer the next generation of
architecture.
Another best practice is the use of compute accelerators, in particular graphics processing units
(GPUs), to parallelize and accelerate voxel calculations. Graphics acceleration technology was
originally created to more efficiently render increasingly more complex two-dimensional (2D)
surfaces onto computer monitors – rows and columns of
pixels. GPUs were then designed to render complex 3D
volumes onto 2D pixel arrays, and as pixel processing grew
more sophisticated the HPC community discovered that
GPUs could accelerate many types of simple simulations.
GPU designers leaned into this trend and designed more
flexible pixel processing into their parallel pixel pipelines,
and the result was general purpose GPUs (GPGPU). Intel
designed a different type of highly parallel processing with
their Xeon Phi architecture, and for some HPC algorithms
it has similar acceleration behavior.
Not only do GPGPUs accelerate voxel processing, they also lower
the power consumption per voxel over mainstream processors. As a
result, eight of the top 25 clusters listed in the June 2016
supercomputer Top500 list use highly parallel accelerators; six of
those use NVIDIA GPGPUs and three use Intel’s Xeon Phi (one uses
both).
HPC cluster performance is also highly dependent on network
interconnect fabric and storage architectures. There are many
permutations for processor, accelerator, network and storage
architectures and no clear best practices for the growing class of
commercial HPC customers to leverage.
Understanding Results of Simulations is Getting More Difficult
HPC simulations are now modeling many aspects of our physical
world at a level of complexity where people cannot see the subtle
nuances of model behavior by looking at images or time-lapse movies of a simulation run. It is
2 http://www.exascaleinitiative.org/
Figure 3 NVIDIA's P100 GPU module [Source: TIRIAS Research]
Figure 4 Intel’ Xeon Phi processor chip and module [Source: TIRIAS Research]
Page 4
Dell Accelerates the Business of HPC
TIRIAS RESEARCH
even harder for people to evaluate the subtle differences in behavior of two simulations with
slightly different initial conditions or voxel behavior.
Most commercial HPC simulations are now run tens or hundreds of times with varying simulation
characteristics. Correlating subtle simulation behavior changes to simulation math or initial
conditions is becoming impossible for mere mortal humans.
Pattern analytics is being pressed into service in HPC applications in order to close this feedback
loop in creating more accurate and faster simulations. Because comparing simulation results adds
time and power consumption on top of running the simulations, accelerating pattern analytics is
now vitally important in commercial HPC applications.
In-memory big data processing techniques and deep learning algorithms are being deployed to
close the gap. “In-memory” systems implement very large physical memory spaces to load as
much of a data set into memory as possible. This accelerates many operations that use different
portions of a large data set by reducing data transfers between storage and memory.
Commercial and Mid-Sized HPC Customers Are Different
Classic HPC customers are usually typecast as expensive and well-staffed cost plus government
research projects or fixed budget academic research purchases dependent on “free” grad student
time for everything from assembly to administration and applications coding.
Commercial HPC customers are development focused – they are in business to solve problems
and bring solutions to market. These commercial customers want to see fast time-to-results, low
operational cost per result, consistent performance, and operational flexibility in multitenant
provisioning and metering of HPC resources.
Many new HPC customers are already buying cloud services, either from public clouds or
managed private clouds. These customers expect the same multitenant ease of provisioning,
administration and management for their HPC clusters as they do for their public and private cloud
infrastructure. They also expect standard commercial support and maintenance contracts.
However, HPC hardware requirements are very different than cloud infrastructure. Generic,
processor based scale-out architectures work very well for many cloud workloads, but that is not
the case for evolving HPC workloads. As mentioned above, HPC architectures are not mature yet
and will continue to evolve for decades.
Vendors who succeed in the commercial and mid-sized HPC markets will borrow cloud
architecture concepts to create multi-tenant HPC clusters, which in effect will become HPC private
clouds. Commercial HPC customers want to charge back time on their HPC cloud to internal
Page 5
Dell Accelerates the Business of HPC
TIRIAS RESEARCH
customers, which means that execution time and storage utilization must be metered so that it can
be billed.
This new class of HPC customers have adopted open source software frameworks, such as the
Linux operating system and OpenStack cloud framework. Unlike research oriented HPC,
development focused customers also leverage commercial off-the-shelf (COTS) software
applications whenever possible. Commercial customers don’t want to write the best possible
simulation themselves, they want to run useful, supported simulations in a production environment
for their vertical market. It is a customer’s understanding of their vertical market that drives their
simulation variations. However, their choice of COTS software provider often determines the mix
of compute, storage and networking they must deploy, while the simulation scope and budget
jointly determine the size of the HPC hardware deployment and phases of installation.
Equally as important, these new HPC customers do not want to spend calendar quarters installing,
verifying, and tuning their HPC cloud to extract optimal performance. Their IT staff is not
budgeted nor equipped to do the experimentation required to find optimal configuration settings
for a complex pile of compute, network and storage gear.
Dell’s HPC Investments
Dell’s self-declared mission is to “democratize HPC.” Dell uses the word “democratize” in the
sense of making HPC cluster ownership accessible to more and smaller organizations over time.
This is not a purchase price strategy – HPC technology is evolving too fast for a hardware pricing
race to the bottom. Dell’s HPC strategy is more in line with traditional IT buying practices; Dell
is creating packages of HPC solutions that can be sized and configured for individual customer
solutions. Similar to traditional IT, Dell is offering finance and service packages on top of
hardware and software. And while traditional IT suppliers can talk with many different CIO
organizations, the HPC market is different and Dell has formed an HPC community to better listen
to and learn from its customer base.
Newly Formed HPC Community
The kick-off meeting for Dell’s HPC Community3 occurred in Austin, Texas in mid-April 2016,
with a large, well-attended meeting. The Dell HPC Community will meet again at ISC 20164 and
then again as SC165. Although there is an independent and precompetitive HPC Advisory
3 http://www.dellhpc.org/program-agenda.html 4 http://www.isc-hpc.com/ 5 http://sc16.supercomputing.org/
Page 6
Dell Accelerates the Business of HPC
TIRIAS RESEARCH
Council6, Dell has the benefit of hosting own channel for listening to HPC customers for direct
market and operational feedback.
Founding Member of OpenHPC
The OpenHPC Collaborative Project7 is precompetitive community effort between processing IP
designers and chip vendors, system vendors, software vendors, and research labs to standardize an
open source HPC software stack. Many of the components of OpenHPC have been deployed for
years at one or more member sites. A key goal is to find the right combinations of these tools that
work together to form a consistent “close to the metal” low latency and yet vendor agnostic
distribution. This distribution must be updated, integrated, tested and verified by the community.
OpenHPC includes compute and I/O drivers, message passing libraries, software tools for
developers and administrators, and also performance testing tools.
Focus on Vertical Market Expertise
Dell has hired HPC subject matter experts to create the Dell HPC Systems solutions portfolio. Dell
announced their HPC System for Research (the focus of this paper), HPC System for Life Sciences,
HPC System for Manufacturing, and is working on future solutions for other markets as well. We
plan to dig deeper into Dell’s vertical markets in future papers.
HPC Innovation Lab
Dell’s HPC Innovation Lab is not just a simple
software “try before you buy” facility, it is a
focal point for Dell’s joint R&D activities with
partners and system integrators, as well as
coordination with customers. The lab is
housed in a 13,000 square-foot shared facility
containing over 1000 servers of different form
factors and generations. Whenever Dell
investigates a new HPC technology, they
bring it into this lab to understand its impact
on the system and performance. The focus is on the design, development and integration of HPC
systems, with a focus on the software stack, plus compute, interconnect, and storage performance
analysis and performance tuning down to BIOS settings.
6 http://www.hpcadvisorycouncil.com/ 7 http://www.openhpc.community/
Figure 5 Dell's HPC Innovation Lab [Source: Dell]
Page 7
Dell Accelerates the Business of HPC
TIRIAS RESEARCH
The lab hosts Dell’s Zenith HPC system. Zenith is
designed on Intel’s Scalable Systems Framework (SSF)8
and today contains 256 2P nodes using Intel Xeon E5-2697
v4 processors, 128 GB of memory per node, and OmniPath
Architecture (OPA) interconnects connected by a non-
blocking OPA fabric, and 480 TB of Dell HPC NFS
storage. Dell says that the systems is performing at 270 TFLOPS today, which almost qualifies it
for the June 2016 Top500 supercomputer list (number 500 is now at 286 TFLOPs). That is already
impressive, and Dell has plans to double the size of this system by the end of the year.
Dell uses Zenith to prototype and characterize performance of advanced technologies in their
Innovation Lab, for general HPC use and specifically for target vertical markets, such as genomics9
and manufacturing10. Zenith is used for co-development with partners and customer evaluation of
software for scalability and performance. Zenith has been used to create proof of concepts blending
HPC and cloud technologies, HPC and Big Data analytics, OpenStack distributions for HPC, their
Hadoop analytics framework distribution running on a Lustre database, and many more. As a side
note, Dell is a gold member of and long-standing contributor to the OpenStack Foundation.
Dell’s HPC Innovation Lab was involved with the design, development and performance analysis
of Dell’s newly announced PowerEdge C6320p server11, based on the latest “Knights Landing”
(code named “KNL”, now the 7200 series) generation of Intel Xeon Phi processors. The big
difference with Intel Xeon Phi 7200 series processors from previous Xeon Phi generations is that
the 7200 series can act as a stand-alone processor, not just a coprocessor. There are four single-
socket Xeon Phi 7200 nodes in the 2U PowerEdge C6320p chassis. The chassis also integrates
Dell’s Remote Access Controller 8 (iDRAC8) with Xeon Phi 7200 to automate systems
management just like the new Xeon Phi was a mainstream Xeon processor. These systems will
appear in both TACC’s Stampede 1.5 and Stampede 2 systems, below. Dell says that they will
target the Xeon Phi 7200 series at specific classes of applications, such as computational finance,
molecular dynamics, and weather simulation.
Dell’s HPC Innovation Lab works closely with Dell’s office of the CTO for forward looking
technology exploration, and has evaluated ARM processors, RDMA over Converged Ethernet
(RoCE), special purpose compute accelerators, new file systems, and other architectural concepts.
8 http://www.intel.com/content/www/us/en/high-performance-computing/product-solutions.html 9 https://www.dell.com/learn/us/en/555/hpcc/high-performance-computing-life-sciences 10 http://www.dell.com/en-us/work/learn/assets/business~solutions~whitepapers~en/documents~digital-manufacturing-vrtx-tech-whitepaper.pdf 11 http://www.dell.com/us/business/p/poweredge-c6320p/pd
"Nobody else buys a system for
the joy of putting it together and
tuning it; that's what we do."
-- Garima Kochhar, Dell Systems
Sr. Principal Engineer
Page 8
Dell Accelerates the Business of HPC
TIRIAS RESEARCH
Dell’s focus for this lab is not far future basic research, but rather the practical aspects of
commercializing leading edge technology at scale.
Dell’s HPC Innovation Labs also collaborates with Dell’s Extreme Scale Infrastructure (ESI)
group, both labs are located in the same campus. One of ESI’s eminently practical general R&D
projects is Dell’s recently disclosed Triton water cooling pilot project. Triton is notable for cooling
the processors in a server rack using the inflow water supply pressure – it requires no pumps at a
rack level, and for small numbers of racks it can operate directly from the standard pressure of a
commercial municipal water supply. While the server sled portion of the cooling system is
designed to cool Intel Xeon E5 processors operating at 200W each, it could easily be modified to
cool leading edge 250W to 300W GPGPUs such as NVIDIA’s Tesla P100 module12. Triton is an
example of technology developed in the ESI lab for extreme scale customers that is also applicable
to many HPC customers.
Dell’s HPC Innovation Lab also creates the direct experience base for Dell’s “HPC System
Builder,” Dell’s internal sizing tool for recommending properly sized systems, taking into account
configuring, provisioning and running those systems. HPC system builder is operational for
research and life science customers today, and Dell will add manufacturing to it soon.
Integration with In-Memory Analytic Solutions
A few years ago in-memory analytics would not have been important to mention in a paper about
the HPC market, but today in-memory analytics are being used by many mid-sized organizations
to analyze HPC simulation runs in order to identify features and patterns across simulations that
people need to pay attention to.
Dell has been doing a lot in this space, they are shipping their Dell In-Memory Appliance for
Cloudera Enterprise with Apache Spark, a Cloudera Apache Hadoop reference architecture, plus
Statistica Big Data Analytics from Dell.
12 http://www.nextplatform.com/2016/04/21/nvidias-tesla-p100-steals-machine-learning-cpu/
Figure 6 Dell Triton server sled [Source: Dell]
Page 9
Dell Accelerates the Business of HPC
TIRIAS RESEARCH
Dell enables Cloudera’s Hadoop MapReduce analytics to directly access HPC results in a Lustre
File System data store13 using the Bright Cluster Manager (BCM) tool to deploy and configure the
hybrid cluster, plus Intel’s Hadoop Adapter for Lustre (HAL) plug-in. The result is that large data
sets do not have to be moved from Lustre to the Hadoop File System (HDFS), which consumes
time, power, and bandwidth.
Dell is also collaborating closely with SAP to develop mid-market SAP HANA14 in-memory
database and analytics solutions, including SAP HANA Edge and SAP Predictive Analytics. Dell
develops appliances for SAP HANA, and then works with SAP to build platforms for vertical
markets, such as SAP’s Foundation for Health15.
Financing, Deployment and Support Services
Dell Financial Services (DFS)16 can facilitate purchases across a wide range of customer sizes and
budgets. While not directly related to technology, it is important not to underestimate the
importance of Dell’s ability to directly assist mid-market customers in financing data center build-
out, including HPC hardware, software, and services.
Dell offers customers the option for Dell to install, configure, and integrate17 new Dell data center
equipment, remotely manage18 and support that equipment through its lifecycle, and then remove
and retire equipment as it reaches the end of its life cycle.
Dell enables enterprise class management using Dell’s iDRAC, Open Manage, Active System
Manager, and Lifecycle controller products.
Focus on End-Customer Enablement
At SC15 Dell announced three market initiatives: one to make HPC more widely available to
smaller companies and researchers, and the other two to dive into vertical market applications for
life sciences and manufacturing. TIRIAS Research will cover the vertical market applications in
subsequent papers, our focus here is on democratizing HPC in the general research market.
13 http://i.dell.com/sites/doccontent/business/solutions/whitepapers/en/Documents/DellHPCStorageWithIntelEELustre.pdf 14 https://hana.sap.com/abouthana.html 15 https://help.sap.com/platform_health 16 https://dfs.dell.com/Pages/DFSHomePage.aspx 17 http://www.dell.com/learn/in/en/inbsd1/services/deployment-services?s=bsd 18 https://www.dell.com/en-us/work/learn/assets/legal~service-descriptions~en/documents~remote-hpc-cluster-management-service-en.pdf
Page 10
Dell Accelerates the Business of HPC
TIRIAS RESEARCH
Dell HPC System for Research
One of Dell’s HPC System Builder configuration
models is designed to support general research
customers. Building on the description above, the HPC
System Builder tool provides guidance to rapidly and
accurately size a customer purchase for desired general
performance targets, from 4 to 1024 compute nodes per
system, and then optimize the operations of the
installation (including optimizing BIOS
configurations). Customers can tune their deployment
and operations for performance, efficiency, or a balance
of the two. Dell can scale performance based on Highly
Parallel Linpack (HPL) sustained or theoretical
TFLOPS performance, or to a specific customer node
type and count requirement.
Much of Dell’s emphasis for mid-sized research
customers is to deliver a multitenant HPC cluster with
balanced throughput. Applications on a single cluster
can span a wide range of research interests in modeling,
rendering and analysis, from complex simulations to
analyzing machine-generated data from sensor systems
and scientific instruments.
Dell’s HPC System for Research marquee customers include:
University of Cambridge: Wilkes19
The Wilkes cluster became operational in late 2013
and debuted at #2 on the Green500 list at 3,631
MFLOPS/Watt. It contains 128 Dell PowerEdge T620
servers with 256 NVIDIA Tesla K20 GPUs
interconnected by 256 Mellanox Connect InfiniBand
NICs. It is attached to a 4 PB custom Lustre file
system. The cluster has 183 TFLOP CPU performance
and 240 TFLOP GPU performance. Wilkes is housed
19 http://www.hpc.cam.ac.uk/services/wilkes
Figure 7 Dell HPC System for Research with PowerEdge R430 nodes [Source: Dell]
Figure 8 Wilkes [Source: University of Cambridge]
Page 11
Dell Accelerates the Business of HPC
TIRIAS RESEARCH
in water cooled data center implementing evaporative coolers and back of rack water heat
exchangers, yielding a spot PUE of 1.075.
The University of Cambridge HPC Solution Centre20 is a cloud-based resource available to U.K.
based small and medium businesses (SMB) to foster national competitiveness in a global economy.
Wilkes also supports the international Square Kilometre Array (SKA)21 radio telescope project
with CHPC.
Centre for High Performance Computing (CHPC):
Lengau (Cheetah)22
Lengau is a new HPC cluster, built from 1,039 Dell
PowerEdge C6320 servers connected by Mellanox EDR
InfiniBand NICs in 19 racks. It has 5 PB of attached
storage. Each of the C6320 servers contains four dual-
socket Xeon E5 server nodes. This CPU-only cluster is
ranked at 120 on the Top500 list at 782 TFLOPS.
Lengau will support South African science, including SKA,
and it will also be available to private, non-academic users
to boost national economic competitiveness.
Indiana University (IU) Pervasive Technology
Institute (PTI): Jetstream23
Jetstream is a geographically distributed half-
PFLOPS cloud based on the OpenStack cloud
framework and KVM hypervisor. It links IU’s cluster
to an identical cluster at TACC (below) and a small
test cluster at the University of Arizona. The IU and
TACC quarter-PFLOPS clusters are each built from
320 dual-socket Dell PowerEdge M630 blade server
nodes using Xeon E5-2600 v4 family processors.
Each node has 128 GB of memory and 2 TB local
20 http://www.dell.com/learn/uk/en/ukbsdt1/hpcc/cambridge-hpc-solution-centre 21 https://www.skatelescope.org/ 22 http://www.chpc.ac.za/index.php/news2/203-chpc-unveils-petascale-machine 23 http://jetstream-cloud.org/partners.php
Figure 10 Jetstream [Source: IU PTI]
Figure 9 CHPC Lengau [Source: Dell]
Page 12
Dell Accelerates the Business of HPC
TIRIAS RESEARCH
storage, with a total of 40 TB of memory and a 960 TB storage system per cluster. The combined
system has recently become available for production workloads.
Jetstream resources are scheduled through the U.S. National Science Foundation’s (NSF)
XSEDE24 program. Jetstream enables creating customized virtual machines and features the ability
to initiate interactive computing sessions on the cluster, essentially virtual Linux desktops running
in Jetstream’s virtual machine, with screens delivered to smartphones and tablets across cellular
networks or to PCs on slow network connections.
Jetstream’s anticipated user base includes historically black colleges and universities, minority
serving institutions, tribal colleges, and higher education institutions in EPSCoR states25. A wide
variety of “long tail” applications are planned for Jetstream, including biology, earth science,
geographic information services (GIS), building network analytics tools, social sciences, and
others.
San Diego Supercomputer Center (SDSC): Comet26
Comet is a Dell-integrated 2 PFLOPS (peak) cluster using Dell
PowerEdge C6320 servers connected by Mellanox FDR
InfiniBand. Each C6320 chassis contains four dual-core nodes
using Intel Xeon E5-2680 v3 processors and 128 GB of
memory and 320 GB solid-state drive (SSD) storage. There are
18 PowerEdge C6320 chassis in each of 27 racks. The Comet
cluster contains 247 TB total memory and 634 TB total SSD
capacity. SDSC’s Data Oasis parallel file storage system is
being upgraded to 7.6 PB of storage.
Comet is scheduled through XSEDE, and also supports NSF’s target of long tail modest-scale
users, with a focus on genomics, social sciences, and economics.
The cluster also contains 36 GPU nodes containing two NVIDIA Tesla K-80 cards each, and four
large memory nodes, each containing 1.5TB of global memory. These nodes support specific
applications, such as visualizations, molecular dynamics simulations, and genome assembly.
Comet implements single root I/O virtualization (SR-IOV) and virtual LAN (VLAN) technologies,
which means that researchers can quickly carve out virtual sub-clusters that behave as stand-alone
hardware clusters – they can run their own OS and software stacks. The overall cluster is designed
24 https://www.xsede.org/overview 25 https://www.nsf.gov/od/oia/programs/epscor/nsf_oiia_epscor_eligible.jsp 26 http://www.sdsc.edu/services/hpc/hpc_systems.html#comet
Figure 11 Comet [Source: SDSC]
Page 13
Dell Accelerates the Business of HPC
TIRIAS RESEARCH
so that lots of small node count clusters can be run simultaneously, which boosts overall cluster
utilization and efficiency, as well as availability to researchers.
Texas Advanced Computing Center (TACC)
TACC’s resources are also scheduled through XSEDE, and they also
support NSF’s target of long tail modest-scale users. TACC focuses on
natural and social sciences, engineering, technology, medicine, and many
other applications.
Stampede27
The Stampede cluster contains 6,400 Dell PowerEdge C8220 dual-
processor nodes, each using Intel Xeon E5-2680 processors with 32 GB
of memory, for 2.2 PFLOPS peak CPU performance. Those C8220
chassis also contain 6,880 previous generation Intel Xeon Phi SE10P co-
processors, which contribute an additional 7.4 PFLOPs peak accelerator
performance. Stampede ranks 12th on the June 2016 Top500 list at 5.2
PFLOPS sustained performance.
There are also 128 NVIDIA GPU cards for remote visualization, plus 16 more Dell servers for
large data analysis, each containing 1 TB shared memory and two GPUs. The cluster is connected
by Mellanox FDR InfiniBand.
Stampede 1.5
This upgrade to the original Stampede system adds 500 Intel Xeon Phi 7250 based Dell nodes to
the existing cluster. The NIC interfaces use OmniPath (OPA) bridge to InfiniBand. This is a
revised plan; it replaced an earlier upgrade plan that called for adding Xeon Phi 7200 series add-
in cards to existing server chassis. TACC is currently evaluating OPA and the first wave of pre-
27 https://www.tacc.utexas.edu/systems/stampede
Figure 12 Inside Stampede [Source: TIRIAS Research]
Figure 13 Front of Stampede [Source: TACC]
Page 14
Dell Accelerates the Business of HPC
TIRIAS RESEARCH
production Dell’s Xeon Phi 7250 based systems, however, the evaluation ranks 116 on the June
2016 Top500 list.
Stampede 228
This recently announced cluster will deploy in phases during 2017 and 2018 and deliver a peak
performance of up to 18 PFLOPS. Stampede 2 will implement future Dell servers using a mix of
Xeon CPUs and Xeon Phi 7200 series processors connected by OPA. The final phase of the project
will be among the first wave of systems to use Intel’s 3D XPoint non-volatile main memory
technology.
Conclusion
Dell’s strengths in traditional enterprise IT and cloud computing markets directly apply to the
modern HPC market. This was not the case several years ago, but now cloud customers are
deploying increasingly sophisticated and intelligent services at scale. These services are pushing
state-of-the-art in processor, accelerator (including GPUs and many other types), storage and
networking technologies. As cloud services push technology, the HPC market benefits via lower
costs and better power efficiency.
TIRIAS Research predicts that more types of simulations and more complex simulations will
waterfall into more affordable HPC deployments over the next few decades.
Dell’s investments in HPC innovation and Dell’s deep relationships with HPC research centers
serving smaller customers put Dell a good position to benefit from this waterfall effect, by
understanding commercial and mid-market customers’ business needs to better serve them with
scalable and affordable HPC resources.
28 https://www.tacc.utexas.edu/-/stampede-2-drives-the-frontiers-of-science-and-engineering-forward
Copyright TIRIAS Research LLC 2016. All rights reserved.
Reproduction in whole or in part is prohibited without written permission from TIRIAS Research LLC.
This report is the property of TIRIAS Research LLC and is made available only upon these terms and
conditions. The contents of this report represent the interpretation and analysis of statistics and information that
is either generally available to the public or released by responsible agencies or individuals. The information
contained in this report is believed to be reliable but is not guaranteed as to its accuracy or completeness.
TIRIAS Research LLC reserves all rights herein. Reproduction or disclosure in whole or in part is permitted
only with the written and express consent of TIRIAS Research LLC.