numascale product ibm
DESCRIPTION
Presentation from the HPC event at IBM Denmark - September 2013, CopenhagenTRANSCRIPT
Copyright 2013 All rights reserved. 1
NumaConnectNumaConnect
Einar Rustad, Co-Founder & CTO
September 2013
NumaConnectNumaConnect
• Cache Coherent Global Shared Memory and Shared IO
• Single Image Standard Operating System• All kinds of APIs
Commodity ServersCommodity Servers
Copyright 2013. All rights reserved. 2
At Cluster Price
Commodity ServersCommodity Servers
Tightly Coupled into Tightly Coupled into
One Monolithic SystemOne Monolithic Systemwith NumaConnectwith NumaConnect
Shared Everything Shared Everything -- One Single Operating System ImageOne Single Operating System Image
NumaConnect NumaConnect -- Share EverythingShare Everything
CachesCaches
CPUsCPUs
MemoryMemory
CachesCaches CachesCaches
CPUsCPUs
MemoryMemory
CachesCaches CachesCaches
CPUsCPUs
MemoryMemory
CachesCaches CachesCaches
CPUsCPUs
MemoryMemory
CachesCachesNumaChipNumaChip
RemoteRemoteCacheCache
NumaChipNumaChip
RemoteRemoteCacheCache
RemoteRemoteCacheCache
NumaChipNumaChip
RemoteRemoteCacheCache
NumaChipNumaChip
Copyright 2013. All rights reserved. 3
CPUsCPUs
I/OI/O
CPUsCPUs
I/OI/O
CPUsCPUs
I/OI/O
CPUsCPUs
I/OI/O
NumaConnect NumaConnect Fabric (no switch required)Fabric (no switch required)
�� Convex Convex Exemplar (Acquired by HP)Exemplar (Acquired by HP)
–– First implementation of the First implementation of the ccNUMAccNUMA architecture architecture from Dolphin in 1994from Dolphin in 1994
�� Data Data General General AviionAviion (Acquired by EMC)(Acquired by EMC)
–– Designed in 1996 with deliveries from 1997 Designed in 1996 with deliveries from 1997 -- 2002 2002
–– Used DolphinUsed Dolphin’’s chips with 3 generations of s chips with 3 generations of processor/memory busesprocessor/memory buses
�� I/O I/O Attached Products for Attached Products for Clustering OEMsClustering OEMs
Technology BackgroundTechnology Background
Copyright 2013. All rights reserved. 4
�� I/O I/O Attached Products for Attached Products for Clustering OEMsClustering OEMs
–– Sun Microsystems Sun Microsystems ((SunClusterSunCluster))
–– Siemens RM600 Siemens RM600 Server (IO Expansion)Server (IO Expansion)
–– Siemens Siemens Medical (3D CT)Medical (3D CT)
–– Philips Philips Medical (3D Ultra Sound)Medical (3D Ultra Sound)
–– Dassault/Thales Rafale Dassault/Thales Rafale
�� HPCHPC ClustersClusters ((WulfKitWulfKit ww.. ScaliScali))
–– First First Low Latency Cluster InterconnectLow Latency Cluster Interconnect
Dolphin’s Low
Latency Clustering HW
Dolphin’s
Cache Chip
NumaChipNumaChip
IBM Microelectronics ASIC
FCPBGA1023, 33mm x 33mm, 1mm ball pitch, 4-2-4 package
Copyright 2013. All rights reserved. 5
4-2-4 package
IBM cu-11 Technology
~ 2 million gates
Chip Size 9x11mm
NumaConnect NumaConnect CardCard
Copyright 2013. All rights reserved. 6
NumaChip Top Block DiagramNumaChip Top Block Diagram
SDRAM CacheSDRAM Cache
SDRAM TagsSDRAM Tags
HyperTransportHyperTransport
SPISPI
LC C
onfig D
ata
LC C
onfig D
ata
Mic
rocode
Mic
rocode
H2SH2S
ccHTccHTCaveCave
SMSM
SPI Init SPI Init ModuleModule
CSR
CSR
Copyright 2013. All rights reserved. 7
XAXA XBXB YBYBYAYA ZAZA ZBZB
LC C
onfig D
ata
LC C
onfig D
ata
Mic
rocode
Mic
rocode
Crossbar switch, LCs, SERDESCrossbar switch, LCs, SERDES
SCCSCC
CSR
CSR
NumaConnectNumaConnect™ ™ Node Node ConfigurationConfigurationNumaConnectNumaConnect™ ™ Node Node ConfigurationConfiguration
MultiMulti--CoreCore
CPUCPU
I/OI/O
MemoryMemoryMemoryMemoryMemoryMemoryMemoryMemory
MultiMulti--CoreCore
CPUCPU
MemoryMemoryMemoryMemoryMemoryMemoryMemoryMemory
Copyright 2013. All rights reserved. 8
NumaChipNumaChip MemoryMemoryNumaNuma
Cache+TagsCache+Tags
I/OI/OBridgeBridge
Coherent HyperTransport
6 x4 SERDES links
2-D Torus
NumaConnect™ System ArchitectureNumaConnect™ System Architecture
Multi-CPU Node
Num
aChip
Num
aChip
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Num
aN
um
a
Cache
Cache
Num
aN
um
a
Cache
Cache
Core
Core
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Multi
Multi--
Core
Core
CPU
CPU
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Copyright 2013. All rights reserved. 9
3-D Torus
6 external links 6 external links -- flexible flexible system configurations in multisystem configurations in multi--dimensional topologiesdimensional topologies
Num
aChip
Num
aChip
Multi
Multi--
Core
Core
CPU
CPU
I/O
I/O
Bri
dge
Bri
dge
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Cabling ExampleCabling Example
Copyright 2013. All rights reserved. 10
22--D DataflowD Dataflow
CPUsCPUs
CachesCachesNumaChipNumaChip
MemoryMemoryMemoryMemoryMemoryMemoryMemoryMemory
CPUsCPUs
MemoryMemoryMemoryMemoryMemoryMemoryMemoryMemory
NumaChipNumaChipMemoryMemoryMemoryMemoryMemoryMemory
Copyright 2013. All rights reserved. 11
Request
Response
CPUsCPUs
CachesCachesNumaChipNumaChip
CPUsCPUs
CachesCaches
MemoryMemoryMemoryMemoryMemoryMemoryMemoryMemory
NumaChipNumaChip
Size does MatterSize does Matter
�640K ought to be enough for anybody– Bill Gates, 1981
Copyright 2013. All rights reserved. 12
�We are looking for systems that can hold 10 – 20 TeraBytes in main Memory
– Trond J. Suul, Statoil, 2010
ScaleScale--up Capacityup Capacity
��Single System Image or Single System Image or Multiple Multiple PartitionsPartitions
��LimitsLimits–– 256 256 TeraBytesTeraBytes Physical Address SpacePhysical Address Space
–– 4096 4096 NodesNodes
Copyright 2013. All rights reserved. 13
–– 196 608 cores196 608 cores
��Largest and Most Cost Effective Largest and Most Cost Effective Coherent Shared MemoryCoherent Shared Memory
��NO Virtualization SW Layer!NO Virtualization SW Layer!
Copyright 2013 All rights reserved. 14
PerformancePerformance
LMbenchLMbench -- ChartChart
147,125
287,01 308,61 304,582
628,583
1087,576
100
1000
10000
ose
con
ds
Latency - LMbench
Copyright 2013. All rights reserved. 15
1,251
6,421
16,732
1
10
0,00049
0,00195
0,00391
0,00781
0,01562
0,03125
0,0625
0,125
0,25
0,5 1 2 4 8
16 32
64 128 256 512 1024 2048 4096
Na
no
Array Size (MBytes)
The Memory HierarchyThe Memory Hierarchy
7
5 0
00 0
00
10 000
100 000
1 000 000
10 000 000
co
nd
s
Access Times in the Memory Hierarchy
7
5 0
00 0
00
10 000
100 000
1 000 000
10 000 000
co
nd
s
Access Times in the Memory Hierarchy
Copyright 2013. All rights reserved. 16
L1 C
ache;
1,3
L2 C
ache;
6,4
L3 C
ache ;
16,7
Local M
em
ory
; 90
Num
aCache;
308
Rem
ote
Mem
ory
; 1 0
87
SSD
; 100 0
00
Rota
ting D
isk;
1
10
100
1 000
Nan
osec
L1 C
ache;
1,3
L2 C
ache;
6,4
L3 C
ache ;
16,7
Local M
em
ory
; 90
Num
aCache;
308
Rem
ote
Mem
ory
; 1 0
87
SSD
; 100 0
00
Rota
ting D
isk;
1
10
100
1 000
Nan
osec
Typical size: 100KB 1MB 8MB 240GB 4GB < 256TB nx1TB nx5TB
Memory BandwidthMemory Bandwidth
--------------------------------------------------------------------------------------------------------------------------
STREAM version $Revision: 5.9 $STREAM version $Revision: 5.9 $
--------------------------------------------------------------------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.This system uses 8 bytes per DOUBLE PRECISION word.
--------------------------------------------------------------------------------------------------------------------------
Array size = 180000000000, Offset = 0Array size = 180000000000, Offset = 0
Total memory required = 4119873.0 MB.Total memory required = 4119873.0 MB.
Each test is run 10 times, but onlyEach test is run 10 times, but only
the *best* time for each is used.the *best* time for each is used. University of Oslo:University of Oslo:
Copyright 2013. All rights reserved. 17
the *best* time for each is used.the *best* time for each is used.
--------------------------------------------------------------------------------------------------------------------------
Number of Threads requested = 828Number of Threads requested = 828
--------------------------------------------------------------------------------------------------------------------------
Function Function Rate (MB/s) Rate (MB/s) AvgAvg time time Min time Min time Max timeMax time
Copy: Copy: 1599317.5028 1599317.5028 1.9224 1.9224 1.8008 1.8008 2.13932.1393
ScaleScale: : 1468219.1643 1468219.1643 2.0954 2.0954 1.9616 1.9616 2.22902.2290
AddAdd: : 1664455.1221 1664455.1221 2.8375 2.8375 2.5954 2.5954 3.09473.0947
Triad: Triad: 1492414.0721 1492414.0721 3.0478 3.0478 2.8946 2.8946 3.32673.3267
University of Oslo:University of Oslo:•• 72 nodes 72 nodes -- IBM x3755IBM x3755•• 1 728 cores1 728 cores•• 4.6 4.6 TBytesTBytes Shared MemoryShared Memory
423,4
449,0 432,5
450
250
300
350
400
450
500
250,0
300,0
350,0
400,0
450,0
500,0
o
on
ds
Time to Allocate and Ini alize 15GB Array in Parallel
Memory Allocation and InitializationMemory Allocation and Initialization
Copyright 2013. All rights reserved. 18
211,1
23,40
3,77 2,02 0,96 9
112
222
0
50
100
150
200
250
0,0
50,0
100,0
150,0
200,0
250,0
1 8 16 104
Ra
Se
co
Number of Processes
ScaleMP Numascale Ra o
MPI Latency (Pallas)MPI Latency (Pallas)
4
5
6
7
seco
nd
s
NumaConnect MPI Latency
Copyright 2013. All rights reserved. 19
0
1
2
3
0 1 2 4 8 16 32 64 128 256 512 1k
Mic
ros
Message Size (Bytes)
MPI BarrierMPI Barrier
13,0 13,8
18,3
19,9
15,0
20,0
25,0
150
200
250
me
nt
Ra
o
eco
nd
s
MPI_Barrier on 32 Nodes
NumaConnect-BTL
Standard-SM-BTL
Ra o
Copyright 2013. All rights reserved. 20
1,7
8
4,0
6
7,4
3
8,5
2
10
,72
8,0
9
52
,67 10
2,6
6
15
5,8
4 21
3,3
8
4,5
0,0
5,0
10,0
0
50
100
2 4 8 16 32
Imp
rov
em
Mic
rose
Number of Processes
MPI Barrier (Pallas)MPI Barrier (Pallas)
13,1
23,1
20,0
15,0
20,0
25,0
800,0
1000,0
1200,0
nt
Ra
o
con
ds
MPI_Barrier, 2 - 256 processes
NumaConnect-BTL
Standard-SM-BTL
Ra o
Copyright 2013. All rights reserved. 21
1,3
3,6
6,2
10
,2
14
,3
14
,3
22
,1
48
,5
0,9
2,7
7,2
13
2,5
33
0,7
33
0,7
51
0,9
96
9,3
0,7 0,7 1,2
13,1
0,0
5,0
10,0
0,0
200,0
400,0
600,0
2 4 8 16 32 64 128 256
Imp
rov
em
en
Mic
rose
c
Number of MPI Processes
Scaling Applications with Scaling Applications with
Copyright 2013 All rights reserved. 22
Scaling Applications with Scaling Applications with NumaConnectNumaConnect
Atle Vesterkjær, Numascale
September 2013
150
200
250
300
80000
100000
120000
140000
peedup
me in
Seco
nd
s
RWTH RWTH TrajSearchTrajSearch codecode
�� OpenMP programs can be made OpenMP programs can be made numanuma--awareaware by decomposing memory and making sure that all memory by decomposing memory and making sure that all memory is “local” to the CPU running the process that is using the memory. Affinity settings can be used to is “local” to the CPU running the process that is using the memory. Affinity settings can be used to distribute jobs in a distribute jobs in a numanuma--awareaware way. The graph shows that the application has the way. The graph shows that the application has the most speedup on most speedup on NumaConnectNumaConnect, even if it was originally adapted to , even if it was originally adapted to ScaleMPScaleMP. .
Copyright 2013. All rights reserved. 23
0
50
100
0
20000
40000
60000
8 16 32 64 128 256 512
Sp
Runti
me
Number of Threads
SGI Altix UV: Nehalem EX SCALEMP: Nehalem EX Bull BCS: Nehalem EX SCALEMP: SandyBridge EP
NUMASCALE SGI Altix UV: Nehalem EX-Speedup SCALEMP: Nehalem EX-Speedup Bull BCS: Nehalem EX-Speedup
SCALEMP: SandyBridge EP-Speedup Numascale-Speedup
Stream on 16 NodesStream on 16 Nodes
150 000
200 000
Streams scaling (16 nodes - 32 sockets - 192 cores)
Triad – NumaConnect
Triad – 1041A-T2F
Triad – Linear socket
Applications that are scalable by design achieve great results on Applications that are scalable by design achieve great results on a NumaConnect Shared Memory Systema NumaConnect Shared Memory System. .
Copyright 2013. All rights reserved. 24
0
50 000
100 000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
MB
yte
/se
c
Number of CPU Sockets
Triad – Linear socket
Triad – Linear node
Porting MPI applications to NCPorting MPI applications to NC--OpenMPOpenMP
��OpenMP programs are OpenMP programs are fasterfaster and has less and has less overhead than MPI programs. overhead than MPI programs.
��Both Both numanuma--aware programs and MPI aware programs and MPI programs care about programs care about memory localitymemory locality. It is . It is therefore often possible to take an MPI therefore often possible to take an MPI program and convert it to a program and convert it to a numanuma--aware aware
Copyright 2013. All rights reserved. 25
program and convert it to a program and convert it to a numanuma--aware aware OpenMP program that will have shorter OpenMP program that will have shorter runtime. runtime.
��In order to demonstrate this the In order to demonstrate this the NAS NAS Parallel Benchmark SP Parallel Benchmark SP has been used. has been used.
NPB SP DNPB SP D--class Mop/s totalclass Mop/s total
35600,82
54029,36
62988,6
40000
50000
60000
70000
Mo
p/s
To
tal
NPB-SP NC-OPENMP
256 cores
8 NumaConnect Nodes
CLASS=D
�� The SP benchmark exists The SP benchmark exists in both an OpenMP in both an OpenMP version written for version written for standalone servers and standalone servers and an an MPI version written MPI version written for distributed servers. for distributed servers.
�� As the computational job As the computational job is the same we find the is the same we find the code written for code written for distributed servers (MPI) distributed servers (MPI) easier to convert to a easier to convert to a
Copyright 2013. All rights reserved. 26
20700,87
35600,82
0
10000
20000
30000
16 36 64 121
Mo
p/s
To
tal
Number of threads
easier to convert to a easier to convert to a NumaConnect optimized NumaConnect optimized version since version since we also we also need to consider need to consider memory locality.memory locality.
�� The The graph shows the graph shows the scalability for the SP scalability for the SP benchmark code benchmark code when when converted converted to a to a NumaConnect optimized NumaConnect optimized version. version.
NPB SP DNPB SP D--class runtimeclass runtime
1426,8
1000
1200
1400
1600
Ru
nti
me
[se
c]
NPB-SP NC-OPENMP D-CLASS
8 NumaConnect Nodes: Runtime [sec]��The The
overheadoverheadintroduced introduced by MPI is not by MPI is not needed when needed when we are we are running on a running on a
Copyright 2013. All rights reserved. 27
829,64
546,67
468,91
0
200
400
600
800
16 36 64 121
Ru
nti
me
[se
c]
Number of threads
running on a running on a Shared Shared Memory Memory SystemSystem
NPB SP DNPB SP D--class runtime affinity effectclass runtime affinity effect
1426,8
1200
1400
1600
NPB-SP NC-OPENMP D-CLASS
Runtime (sec) where the
threads are evenly distributed
on all NumaConnect Nodes
5701,24
4000
5000
6000
NPB-SP NC-OPENMP D-CLASS
Runtime (sec) bound to only the
first <n> cores
�� The figure on The figure on the left the left illustrates the illustrates the great scalinggreat scalingwe get from the we get from the NPBNPB--SP on the SP on the NumaConnect NumaConnect systemsystem
�� The figure on The figure on the right the right illustrates that illustrates that using affinity to using affinity to distribute the distribute the
Copyright 2013. All rights reserved. 28
829,64
546,67
468,91
0
200
400
600
800
1000
16 36 64 121
Ru
nti
me
(se
c)
Number of processes
2496,88
1512,98
967,72
0
1000
2000
3000
16 36 64 121
Ru
nti
me
(se
c)
Number of processes
distribute the distribute the job evenly job evenly between the between the different different NumaConnect NumaConnect Nodes (and Nodes (and cores in the cores in the system) leads to system) leads to shorter shorter runtimeruntime
�� These effects will These effects will be analyzed be analyzed more in the next more in the next slidesslides
NPB SP DNPB SP D--class runtimeclass runtime
�� The runtime using 121 The runtime using 121 cores is cut to half cores is cut to half when running on 8 when running on 8 NumaConnect NumaConnect Nodes, compared to 4 Nodes, compared to 4 NumaConnect NodesNumaConnect Nodes
�� The The numbernumber ofof corescorespr. pr. FPUsFPUs is is oneone whenwhenrunning on 8 running on 8 NumaConnect NumaConnect Nodes Nodes and two when running and two when running
998,86
800
1000
1200
sec)
NPB-SP NC-OPENMP D-CLASS
8 NumaConnect Nodes: Runtime [sec]
vs Affinity 121 cores using SP CLASS D
Copyright 2013. All rights reserved. 29
and two when running and two when running on on 4 NumaConnect 4 NumaConnect NodesNodes. .
�� The The picturepicture onon thetheright shows right shows thatthat thetheThe AMD Opteron™ The AMD Opteron™ 6300 series processor 6300 series processor is organized as a is organized as a MultiMulti--Chip Module Chip Module (two (two CPU dies CPU dies in the in the packagepackage). Each die ). Each die has 8 cores, but 4 has 8 cores, but 4 FPUsFPUs
475,28
0
200
400
600
0-127 0-127:2
Ru
nti
me
(se
c)
Affinty
NPB SP DNPB SP D--class runtimeclass runtime
�� The runtime using 64 The runtime using 64 cores is cut to half when cores is cut to half when running on 8 running on 8 NumaConnect NumaConnect Nodes, compared to 4 Nodes, compared to 4 NumaConnect NodesNumaConnect Nodes
�� In this test the number In this test the number of FPUs is constantof FPUs is constant
�� The scaling is good when The scaling is good when using more nodesusing more nodes
800,69
700
800
900
Runtime (sec) decreasing
when moving from 4 to 8
NumaConnect Nodes using
affinity settings and
SP CLASS D
Copyright 2013. All rights reserved. 30
using more nodesusing more nodes
�� The AMD Opteron™ 6300 The AMD Opteron™ 6300 series memory controller series memory controller interface allow external interface allow external connection of up to two connection of up to two memory channels per memory channels per die. This memory die. This memory interface is shared interface is shared among all the CPU core among all the CPU core pairs on each die. pairs on each die.
548,13
0
100
200
300
400
500
600
0-127:2 0-255:4
Ru
nti
me
(se
c)
Affintity settings
NPB SP DNPB SP D--class runtime MPI class runtime MPI vsvs OpenMPOpenMP
1426,8
1200
1400
1600
NPB-SP NC-OPENMP D-CLASS
Runtime (sec) where the threads are
evenly distributed on all NumaConnect
Nodes
OpenMP
MPI
�� The runtime when using The runtime when using NPB SP DNPB SP D--classclass is the same for the MPI is the same for the MPI version as when using NCversion as when using NC--OpenMPOpenMPwhen the number of cores is 121 or when the number of cores is 121 or less. less.
�� This MPI library overhead in the This MPI library overhead in the communication part of the runtime communication part of the runtime is not significant for these small is not significant for these small sizes.sizes.
Copyright 2013. All rights reserved. 31
829,64
546,67
468,91
0
200
400
600
800
1000
16 36 64 121
Ru
nti
me
(se
c)
Number of processes
MPI
NPB SP DNPB SP D--class runtime MPI class runtime MPI vsvs OpenMPOpenMP
701,58
953,39
729,24708,68
800
1000
1200
Ru
nti
me
(se
c)
NPB-SP NC-OPENMP D-CLASS
Runtime (sec) bound to only the first <n>
cores
OpenMP
MPI
�� The runtime when using The runtime when using NPB SP DNPB SP D--classclass is higher for MPI compared to is higher for MPI compared to the NCthe NC--OpenMP version when the OpenMP version when the number of cores is 144 or higher. number of cores is 144 or higher.
�� The MPI library overhead in the The MPI library overhead in the communication part of the runtime communication part of the runtime now plays a larger role.now plays a larger role.
Copyright 2013. All rights reserved. 32
599,04
554,57
478,14
415,93
534,34
0
200
400
600
144 169 196 225 256
Ru
nti
me
(se
c)
Number of processes
NPB SP ENPB SP E--class runtimeclass runtime
�� The SP benchmark The SP benchmark EE--class scales perfectly class scales perfectly from 64 processes (from 64 processes (using using affinity affinity 00--255:4) to 255:4) to 121 121 processes ( using affinity processes ( using affinity 00--241:2).241:2).
�� General statement: General statement: Larger problems scales Larger problems scales better.better.
�� EE--class problems are class problems are
15242,11
12000
14000
16000
18000
Runtime [sec]: NPB-SP NC-OPENMP E-
CLASS
8 NumaConnect Nodes
Copyright 2013. All rights reserved. 33
�� EE--class problems are class problems are more representative for more representative for large shared memory large shared memory systems and clusters. systems and clusters.
7246,13
0
2000
4000
6000
8000
10000
64 121
Ru
nti
me
Number of processes
NPB SP ENPB SP E--class runtimeclass runtime
�� The SP benchmark EThe SP benchmark E--class scales good class scales good from 16 to 121 on a from 16 to 121 on a 32 node system with 32 node system with a total of 384 coresa total of 384 cores
�� NB: The test with 384 NB: The test with 384 cores has been run on cores has been run on an old system and are an old system and are presented to show presented to show scaling and not scaling and not absolute values.absolute values.
53922,05
40000
50000
60000
Runtime: NPB-SP NC-OPENMP E-CLASS
32 NumaConnect Nodes: "Time in seconds"
Copyright 2013. All rights reserved. 34
absolute values.absolute values.32481,25
19740,37
11247,03
0
10000
20000
30000
0-383:24 0-383:10 0-383:6 0-363:3
Big Data with Shared MemoryBig Data with Shared Memory
“Any application requiring a large memory “Any application requiring a large memory footprint can benefit from a shared memory footprint can benefit from a shared memory computing environment.”computing environment.”
Copyright 2013. All rights reserved. 35
William W. Thigpen, Chief, Engineering William W. Thigpen, Chief, Engineering Branch, NASA Advanced Supercomputing (NAS) Branch, NASA Advanced Supercomputing (NAS) DivisionDivision