fermi cluster for real-time hyperspectral scene generation · sdp sockets direct protocol srp scsi...
TRANSCRIPT
![Page 1: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/1.jpg)
Fermi Cluster for Real-Time Hyperspectral Scene Generation
Gary McMillian, Ph.D. Crossfield Technology LLC
9390 Research Blvd, Suite I200 Austin, TX 78759-7366
(512)795-0220 x151 [email protected]
AF SBIR Program, Donald Snyder, III Program Manager
Funding provided by Frank Carlen, Multi-Spectral Test
![Page 2: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/2.jpg)
System Architecture & Approach
• Scenes generated by heterogeneous processors, then transported over In5iniBand to the projector(s) using RDMA protocol for high throughput and low latency
• Network interfaces aggregate data from multiple heterogeneous processors in high-‐speed frame buffers
• Contents of frame buffers output to projector through FPGA Mezzanine Card (FMC) interface
• IEEE 1588 Precision Time Protocol (PTP) provides global time synchronization
• Heterogeneous processors and projector network interfaces scale independently
7/20/11 Crossfield Technology LLC 2
![Page 3: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/3.jpg)
Scalable System Architecture
7/20/11 Crossfield Technology LLC 3
InfiniBand Switch
Network Interface
Network Interface Adapters
Projector
HWIL
DVI
LVDS Fiber
Processor CPU/GPU
Processor Nodes
![Page 4: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/4.jpg)
HWIL Simulation System
7/20/11 Crossfield Technology LLC 4
CPU CPU DDR3
SDRAM DDR3
SDRAM
GPU GDDR5 SDRAM GPU
GDDR5 SDRAM
Network Adapter
PCIe Bridge
Network Adapter
CPU DDR3
SDRAM
FPGA DDR3
SDRAM
PHY
Projector / HWIL
1U-4U Heterogeneous
Processor
1U Crossfield
Network Interface
InfiniBand Switch (36-648 ports)
PCIe Bridge
QuickPath Interconnect (QPI) ~100 Gbps PCI Express x8 ~32 Gbps (x16 ~64 Gbps) DDR3 SDRAM ~85 Gbps/ch x 3 ch GDDR5 SDRAM ~192 Gbps/ch x 6 ch QDR InfiniBand ~32 Gbps VITA 57.1 / FMC ~100 Gbps SERDES + 120 Gbps LVDS I/O
PCIe x8
PCIe x8
FMC
QPI QPI
User-definable Frame Synch/Request
Network Adapter
IEEE 1588 PTP Server + Ethernet
SSD
GPU GDDR5 SDRAM
PCIe x8
![Page 5: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/5.jpg)
REAL-TIME HIGH PERFORMANCE COMPUTER (HPC)
7/20/11 Crossfield Technology LLC 5
![Page 6: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/6.jpg)
Real-Time HPC Requirements
• Deterministic & Synchronous
– Synthesized images complete & ready at HWIL frame rate
• High Floating-Point Performance
– Implement physics-based algorithms
• High Bandwidth
– Inter-processor communications for data exchange
– Stream high-resolution images to projector at high frame rates
• High Memory Capacity & Performance
– Processor memory – code, model parameters, data
– Non-volatile storage – code, model parameters, data, logging
7/20/11 Crossfield Technology LLC 6
![Page 7: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/7.jpg)
Intel Xeon Processor Roadmap
7/20/11 Crossfield Technology LLC 7
Sandy Bridge Microarchitecture • 32 nm process, 4-8 Cores • 40 lanes PCI Express Gen 3.0 • 4 channels DDR3-1600
Westmere Microarchitecture • 32 nm process, 6 Cores • 40 lanes PCI Express Gen 2.0 • 3 channels DDR3-1333
![Page 8: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/8.jpg)
Nvidia CUDA GPU Roadmap
7/20/11 Crossfield Technology LLC 8
21 SEP 2010 Kepler – To be released sometime in 2011, 28 nm process. Estimated performance of 4-6 DP GFLOPS/W Maxwell – To be released sometime in 2013, 22 nm process. Estimated performance of 15-16 DP GFLOPS/W
![Page 9: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/9.jpg)
Nvidia Tesla (Fermi Architecture)
• CUDA™ Programming Environment – C/C++, Fortran, OpenCL, Java, Python or
DirectX Compute
• GIGATHREAD™ Engine – 515 GFLOP Double Precision
– 1030 GFLOP Single Precision
• PARALLEL DATACACHE™ Technology – 3 - 6 GB GDDR5 memory
– 384-bit bus
– ECC option
• GPUDirect™ with InfiniBand
• PCI Express 2.0 (16 lanes) – Two DMA engines for bi-directional data
transfer
7/20/11 9
C2050/C2070
M2050/M2070
Crossfield Technology LLC
![Page 10: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/10.jpg)
Nvidia Tesla Comparison
Tesla C2070 Tesla M2070 Tesla M2090 Peak double precision floating point performance
515 GFLOPS 515 GFLOPS 665 GFLOPS
Peak single precision floating point performance
1030 GFLOPS 1030 GFLOPS 1331 GFLOPS
CUDA cores 448 448 512 Memory size (GDDR5) 6 GB 6 GB 6 GB
Memory bandwidth (ECC off) 144 GB/s 150 GB/s 177 GB/s
Total Dissipated Power (TDP) 247 W 225 W 250 W
Retail price $2300 ~$2300 ~$3500
7/20/11 Crossfield Technology LLC 10
![Page 11: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/11.jpg)
InfiniBand Roadmap
7/20/11 Crossfield Technology LLC 11
SDR - Single Data Rate DDR - Double Data Rate QDR - Quad Data Rate FDR - Fourteen Data Rate EDR - Enhanced Data Rate HDR - High Data Rate NDR - Next Data Rat
![Page 12: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/12.jpg)
Mellanox ConnectX-2 Network Adapters
• Nvidia GPUDirect™ – InfiniBand Adapter and Nvidia
GPU share CPU memory region
• Open Fabrics Enterprise Distribution (OFED) Software
• Bandwidth – 10G Ethernet
– 10/20/40G InfiniBand
– PCIe 2.0 (8-lanes)
• Performance – 1 µs Ping latency
– 50M MPI messages/s
• Protocol Support – Remote Direct Memory
Access (RDMA)
– OpenMPI, OSU MVAPICH, HPMPI, Intel MPI, MS MPI, Scali MPI
– TCP/UDP, IPoIB, SDP, RDS – SRP, iSER, NFS RDMA, FCoIB, FCoE
7/20/11 12 Crossfield Technology LLC
![Page 13: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/13.jpg)
Mellanox IS5200 InfiniBand Switch
• Non-blocking, full bisectional bandwidth
• 100-300 ns latency
• Up to 216 QSFP ports – 17.28 Tb/s aggregate
throughput
• 9U cabinet – 6 spine modules
– 12 leaf modules
• 1 kW
7/20/11 Crossfield Technology LLC 13
![Page 14: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/14.jpg)
Remote Direct Memory Access (RDMA)
• Remote Direct Memory Access enables data to be transferred from one processor’s memory to another processor’s memory across a network, without significantly involving either operating system
• RDMA supports zero-copy data transfers by enabling the network adapter to transfer data directly to or from application memory, eliminating the need to copy data between application memory and data buffers in the operating system kernel
• RDMA defines READ, WRITE and SEND/RECEIVE
• RDMA adapters support thousands of concurrent transactions using work queues
7/20/11 Crossfield Technology LLC 14
![Page 15: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/15.jpg)
7/20/11 Crossfield Technology LLC 15
SA Subnet Administrator
MAD Management Datagram
SMA Subnet Manager Agent
PMA Performance Manager Agent
IPoIB IP over InfiniBand
SDP Sockets Direct Protocol
SRP SCSI RDMA Protocol (Initiator)
iSER iSCSI RDMA Protocol (Initiator)
RDS Reliable Datagram Service
UDAPL User Direct Access Programming Lib
HCA Host Channel Adapter
R-NIC RDMA NIC
Common
InfiniBand
iWARP
Key
InfiniBand HCA iWARP R-NIC
Hardware Specific Driver
Hardware Specific Driver
Connection Manager
MAD
InfiniBand OpenFabrics Kernel Level Verbs / API iWARP R-NIC
SA Client
Connection Manager
Connection Manager Abstraction (CMA)
InfiniBand OpenFabrics User Level Verbs / API iWARP R-NIC
SDP IPoIB SRP iSER RDS
SDP Lib
User Level MAD API
Open SM
Diag Tools
Hardware
Provider
Mid-Layer
Upper Layer Protocol
User APIs
Kernel Space
User Space
NFS-RDMA RPC
Cluster File Sys
Application Level
SMA
Clustered DB Access
Sockets Based Access
Various MPIs
Access to File
Systems
Block Storage Access
IP Based App
Access
Apps & Access Methods for using OF Stack
UDAPL
Ker
nel b
ypas
s
Ker
nel b
ypas
s
OpenFabrics Alliance (OFA) Open Source
![Page 16: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/16.jpg)
GPU Server Options
• 1U server
– Dual Xeon 5600 processors & 5520 chipsets
– Three 16-lane + one 8-lane PCIe slots
– Supports 1-3 M2090 + 1-2 IB HCA
• 2U server
– Dual Xeon 5600 processors & 5520 chipsets
– Four 16-lane + two 8-lane PCIe slots (PLX 8647 switch)
– Supports 1-4 M2090 + 1-2 IB HCA
• 4U server
– Dual Xeon 5600 processors & 5520 chipsets
– Eight 16-lane PCIe slots (4 PLX 8647 switches)
– Supports 4-7 C2070 + 1-4 IB HCA
7/20/11 Crossfield Technology LLC 16
![Page 17: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/17.jpg)
HPC System Configuration
• 4U Servers (64 + 1)
– Dual 6-core, 2.66 GHz Intel Xeon 5650 (Westmere) CPUs
– Dual Intel 5520 (Tylersburg-36D) IOH with 6.4 GT/s QPI
• Four 16-lane PCI Express Gen 2 slots
– Six 8 GB DDR3-1333 DIMMs (48 GB)
– Four Nvidia Tesla C2070 (Fermi) GPUs
– One Mellanox 40G InfiniBand Host Channel Adapter
– One 300 GB, 10K RPM disk drive
• Mellanox 40G InfiniBand Switch (216 ports max)
• Symmetricom IEEE 1588 PTP Master Clock
• APC Smart-UPS RT 6000VA (18) – 76 kW
• 42U Racks (9)
7/20/11 Crossfield Technology LLC 17
*65 nodes x 1.4 kW/node = 91 kW
![Page 18: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/18.jpg)
Advanced HPC System Configuration
• 2U Servers (64 + 1)
– Dual 6-core, 2.66 GHz Intel Xeon 5650 (Westmere) CPUs
– Dual Intel 5520 (Tylersburg-36D) IOH with 6.4 GT/s QPI
• Four 16-lane + two 8-lane PCI Express Gen 2 slots (with switch)
– Six 8 GB DDR3-1333 DIMMs (48 GB)
– Three Nvidia Tesla M2090 (Fermi) GPUs
– Two Mellanox 40G InfiniBand Host Channel Adapters
– One 250 GB SSD (solid state disk)
• Mellanox 40G InfiniBand Switch (216 ports max)
• Symmetricom IEEE 1588 PTP Master Clock
• APC Symmetra PX SY100K100F UPS - 100 kW
• 42U Racks (4+1)
7/20/11 Crossfield Technology LLC 18
![Page 19: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/19.jpg)
Future HPC System Configuration
• 2U Servers (64 + 1)
– Dual 8-core, 2.3 GHz Intel Xeon E5-2600 (Sandy Bridge) CPUs
• Four 16-lane + two 8-lane PCI Express Gen 3 slots (with switch)
– Eight 8 GB DDR3-1600 DIMMs (64 GB)
– Three Nvidia Tesla M2090 (Fermi) GPUs
– Two Mellanox 56G InfiniBand Host Channel Adapters
– One 250 GB SSD (solid state disk)
• Mellanox 56G InfiniBand Switch (648 ports max)
• Symmetricom IEEE 1588 PTP Master Clock
• APC Symmetra PX SY100K100F UPS - 100 kW
• 42U Racks (4+1)
7/20/11 Crossfield Technology LLC 19
![Page 20: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/20.jpg)
IEEE 1588 Precision Time Protocol
• IEEE 1588-2008 Precision Time Protocol (PTP) Version 2 overcomes network and application latency and jitter through hardware time stamping at the physical layer of the network.
• IEEE 1588-2008 provides time transfer accuracy in the sub ns range, a significant improvement in time synchronization accuracy over Network Time Protocol (NTP).
• The Symmetricom XLi Grandmaster is IEEE 1588-2008 PTP V2 compliant and time stamps PTP packets with a time stamp accuracy of 50 ns to UTC. Measured synchronization accuracy at a PTP client has been shown to be as good as a 17 ns offset from the XLi Grandmaster. Operating at 100BaseT line speed with deep time stamp packet buffers, the XLi Grandmaster can support thousands of 1588 clients.
7/20/11 20 Crossfield Technology LLC
![Page 21: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/21.jpg)
Uninterruptable Power Supply (UPS)
• APC Symmetra PX 100kW
• Scalable to 100kW/100kVA
• 208V 3PH 332A Service
7/20/11 Crossfield Technology LLC 21
![Page 22: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/22.jpg)
APC Symmetra PX Performance
7/20/11 Crossfield Technology LLC 22
![Page 23: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/23.jpg)
HPC Performance
Node System Cores – CPU/GPU 12/1536 768/98304 CPU SP FP Performance 128 GFLOP 8 TFLOP CPU DP FP Performance 64 GFLOP 4 TFLOP GPU SP FP Performance 3990 GFLOP 255 TFLOP GPU DP FP Performance 1995 GFLOP 128 TFLOP Main Memory Size 48 GB 3 TB Main Memory BW 64 GB/s 4 TB/s Disk Size 250 GB 16 TB Disk IOPS (4 KB) 20K 1.28M Disk R/W BW 500/315 MB/s 32/20 GB/s Network BW 50 Gb/s 3.2 Tb/s Power 1.5 kW 100 kW
7/20/11 Crossfield Technology LLC 23
![Page 24: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/24.jpg)
HPC Procurement Schedule
• Breadboard Performance Evaluation 15 JUL
• Finalize HPC Configuration 15 JUL
– # Fermi Processors (4 -> 3)
– # IB Adapters (1 -> 2)
– UPS (100 kW), Server (4U -> 2U), SSD
• Request Final Vendor Quotes 1 AUG
• HPC Vendor Selection
– Issue HPC System Purchase Order OCT 31
• HPC System Integration & Test by Vendor
– 6-12 week delivery ARO
• Installation DEC 31
– Prepare electrical supply for UPS
7/20/11 Crossfield Technology LLC 24
![Page 25: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/25.jpg)
REAL-TIME LINUX
7/20/11 Crossfield Technology LLC 25
![Page 26: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/26.jpg)
Real-Time Operating System (RTOS)
• Requirements
– No dropped frames during simulation run
– Support Nvidia’s CUDA
– Support InfiniBand Adapter with GPUDirect™
– Support Precision Time Protocol (PTP) IEEE 1588
• Candidate RTOS’
– Concurrent Computer RedHawk
– RedHat MRG (Messaging, Real-Time, Grid)
7/20/11 Crossfield Technology LLC 26
![Page 27: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/27.jpg)
Interrupt Dispatch Latency*
7/20/11 Crossfield Technology LLC 27
*Ravi Malhotra, “Real-Time Performance on Linux-based Systems,” 2011 Freescale Technology Forum
![Page 28: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/28.jpg)
Real-Time Support on Linux*
• Traditionally, Linux is not a real-time operating system
– Designed for server throughput performance rather than embedded systems latency
– Scheduling latencies can be unbound
– Big kernel lock and other mechanisms (softIRQ) typically end up blocking real-time critical tasks
– Processes cannot be pre-empted while executing system calls
7/20/11 Crossfield Technology LLC 28
*Ravi Malhotra, “Real-Time Performance on Linux-based Systems,” 2011 Freescale Technology Forum
![Page 29: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/29.jpg)
Sources of Latency & How RT Patch Helps*
7/20/11 Crossfield Technology LLC 29
*Ravi Malhotra, “Real-Time Performance on Linux-based Systems,” 2011 Freescale Technology Forum
![Page 30: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/30.jpg)
HPC PERFORMANCE MODEL
7/20/11 Crossfield Technology LLC 30
![Page 31: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/31.jpg)
Hyperformix Workbench Performance Model
7/20/11 Crossfield Technology LLC 31
![Page 32: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/32.jpg)
Workbench Model Steps
The application consists of 9 steps that comprise the generation and transfer of a frame:
1. Projector requests frame (provides state data)
2. CPU setups Frame Generation Process
3. CPU writes task data to CPU Memory (DDR3 SDRAM)
4. CPU tasks the GPU to synthesize the Frame
5. GPU reads the task data from CPU memory
6. GPU synthesizes the Frame
7. GPU transfers the frame data to CPU memory
8. CPU tasks the InfiniBand Network Adapters to transfer the frame to Crossfield Network
Interface via the InfiniBand Switch
9. Network Adapters transfer the frame to FPGA memory using RDMA Protocol
7/20/11 Crossfield Technology LLC 32
![Page 33: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/33.jpg)
Hyperformix Workbench Performance Model
7/20/11 Crossfield Technology LLC 33
![Page 34: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/34.jpg)
Workbench Model Results
7/20/11 Crossfield Technology LLC 34
Application Steps Response
(µs)
Application.Step_1_Frame_Request_from_Projector.response 1.151
Application.Step_2_and_3_Setup_Process_and_write_data_to_memory.response 0.1923
Application.Step_4_CPU_tasks_GPU.response 0.1923
Application.Step_5_GPU_reads_data_from_CPU_Memory.response 0.4148
Application.Step_6_GPU_synthesizes_Frame_first_transfer.response 1000
Application.Step_7_GPU_xfers_Frame_to_CPU_memory.response 917.7
Application.Step_8_CPU_tasks_Network_Adapter_to_transfer_Frame_to_NI.response 0.1682
Application.Step_9_Network_Adapter_xfer_frame_to_NI_FPGA_Memory.response 2259
Application.Main_RT_App.All_Steps_transfer_RT_2 4179
![Page 35: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/35.jpg)
PROJECTOR INTERFACE
7/20/11 Crossfield Technology LLC 35
![Page 36: Fermi Cluster for Real-Time Hyperspectral Scene Generation · SDP Sockets Direct Protocol SRP SCSI RDMA Protocol (Initiator) iSER iSCSI RDMA Protocol (Initiator) RDS Reliable Datagram](https://reader036.vdocuments.us/reader036/viewer/2022063014/5fce5e37296d620d8f722806/html5/thumbnails/36.jpg)
Projector Interfaces
FPGA Mezzanine Cards (FMC)
1. Two Dual DVI
2. Parallel Fiber Optic Ports (8-10)
3. Digital Micromirror Device (DMD) Interface
– All modules provide 2 User Definable I/Os, e.g.
• HWIL Synchronization Signal
• Output Next Frame
7/20/11 Crossfield Technology LLC 36