real-time and embedded systems, fpgas and gpus and embedded systems, fpgas and gpus ... •...

33
FYS3240 PC-based instrumentation and microcontrollers Real-Time and Embedded systems, FPGAs and GPUs Spring 2013 Lecture #10 Bekkeng, 8.1.2013

Upload: dinhminh

Post on 20-Jun-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

FYS3240

PC-based instrumentation and microcontrollers

Real-Time and Embedded systems,

FPGAs and GPUs

Spring 2013 – Lecture #10

Bekkeng, 8.1.2013

Page 2: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

Embedded Computing

• An embedded system is a computer system designed to

perform one or a few dedicated functions, often with real-

time computing constraints.

• Embedded processors can be microprocessors,

microcontrollers, FPGAs og GPUs.

• Embedded systems run with limited computer hardware

resources: limited memory, small or non-existent keyboard

and/or screen

Page 3: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

Embedded microprocessors

• Modern x86 CPUs are relatively uncommon in embedded

systems and small low power applications, as well as low-cost

microprocessor markets (e.g. home appliances and toys).

• Simple 8-bit and 16-bit based architectures are common,

although the x86-compatible AMD's Athlon and Intel Atom are

examples of 64-bit designs used in some relatively low power

and low cost segments

Page 4: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

Hardware accelaration

• In computing, Hardware acceleration is the use of computer hardware to

perform some function faster than is possible in software running on the

general-purpose CPU. Examples of hardware acceleration includes using

graphics processing units (GPUs) and instructions for complex operations in

CPUs.

• Normally, processors are sequential, and instructions are executed one by one.

Various techniques are used to improve performance; hardware acceleration is

one of them. The main difference between hardware and software is

concurrency, allowing hardware to be much faster than software. Hardware

accelerators are designed for computationally intensive software code

• The hardware that performs the acceleration, when in a separate unit from the

CPU, is referred to as a hardware accelerator, or often more specifically as

graphics accelerator or floating-point accelerator, etc. Those terms, however,

are older and have been replaced with less descriptive terms like video card or

graphics card.

• Many hardware accelerators are built on top of field-programmable gate array

(FPGA) chips. From wikipedia, edited

Page 5: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

Real-time hardware platform examples

compatible with LabVIEW

• Desktop PC with real-time OS (RTOS)

– as long as the hardware meets certain system requirements

• 8-, 16-, and 32-bit microprocessors

– Using the LabVIEW C generator

• PXI with real-time controller

– often used for high-performance real-time systems such as hardware-in-

the-loop testing

• NI FPGA

• NI CompactRIO

• NI Single-Board RIO

• NI CompactVision

• Industrial PCs/Controllers

• NI Compact FieldPoint

– a PLC (programmable logic controller)

Page 6: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

Build vs. Buy for Embedded systems

• Buy COTS (Commercial-off-the-shelf) hardware when possible

• Examples of when a custom build in necessary:

– High volumes (10,000+)

– An iteration on an existing custom design

– Custom size or shape required

– Very stringent technical requirements

(such as ultralow power consumption)

PC

B d

es

ign

ers

Page 7: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

FPGA = Field Programmable Gate Array

Page 8: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

From FYS 4220

FPGAs give low-latency processing, but they have

limitations in terms of floating-point computations

Page 9: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

FPGA advantages

• High reliability

• High determinism

• High performance

• True parallelism

• Reconfigurable

The highest performance FPGAs today (2012) have 600 MHz clock speed

Page 10: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

Common Applications for FPGAs in

DAQ and control systems

• High-speed control

• HW programmable DAQ-cards

• Onboard processing and data reduction

– e.g. video processing

• Co-processing

– offload the CPU

Video Frame grabber

Video

Camera

Host PC

FPGA

Image processing etc.

FFT

Page 11: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:
Page 12: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

How to program an FPGA ?

• Write VHDL (Hardware Description Language)

• Write C-code (need a development tool)

• Automatic Generation of VHDL code (or a bit stream) from a high

level development tool, such as

– MATLAB

– Simulink

– LabVIEW

VHDL Code

Matlab simulink

Page 13: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

From LabVIEW to Hardware

Page 14: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

With the LabVIEW FPGA Module you develop FPGA applications on

a host computer running Windows, and then LabVIEW compiles and

implements the code in hardware

open close Com. with FPGA

http://zone.ni.com/devzone/cda/tut/p/id/3358

Including GUI

Page 15: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

Include HDL-code in LabVIEW

• The IP Integration Node (replaces the HDL Interface from

LabVIEW 2010) can bring in third-party Xilinx IPs.

– a wizard to import files and configure the interface step by step

• You can also use the IP Integration Node to include your own

VHDL code

• Once you have configured the node, you can use the IP just

like any other LabVIEW node with inputs and outputs.

IP = Intellectual Property

Page 16: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

FPGAs in DAQ-systems

• DAQ-cards with a programmable FPGA

• Multi-rate sampling

– Allows different sampling frequencies on the I/O channels

– For comparison, when using an “ordinary” DAQ-card

(without a user reconfigurable FPGA) all channels must

have the same sampling frequency

• User defined processing in the FPGA

• FPGA-based hardware timing/synchronization

– Remember that without an external driver/buffer the current output (source)

from an FPGA output pin might not be able to driver your external

electronics.

Page 17: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

General Purpose Operating Systems

• Windows, Linux, MacOS, Unix

– Processor time shared between programs

– OS can preempt high priority threads

– Service interrupts –keyboard, mouse, Ethernet…

– Cannot ensure that code finish within specified time limits!

Page 18: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:
Page 19: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

What is a real-time system

• A real-time system gives you determinism

– real-time does not mean “real fast” (it can be slower)!

– real-time means that you can determine (predict) accurately when

a section of your program will execute

• Hard real-time

– systems where it is absolutely imperative that responses occur within the

required deadline (Example: Flight control systems)

• Soft real-time

– allows for some deadlines to be missed with only a slight degradation in

performance but not a complete failure (example: DAQ-systems)

• In contrast, on an ordinary desktop PC (with Windows) the OS

operates on a fairness basis

– Each application gets time on the CPU regardless of its priority

– Even our most time-critical application can be suspended for some routine

maintenance

Page 20: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

Deterministic communication between

real-time threads with RT FIFOs

RT FIFO

RT FIFO

Page 21: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

Ethernet for real-time applications

• Remote I/O can demand reaction in the 5-10 ms region. Motion Control demands

even higher determinism with cycle times into the microsecond region.

• Standard Ethernet communication utilizes TCP/IP, which is inherently non-

deterministic and has a reaction time in the hundreds of milliseconds. In an

effort to boost determinism some networks utilize custom technologies in the

transport and network layers of the Ethernet stack. These networks merely use

TCP/IP as a supplemental channel to provide non real-time data transfers. By

bypassing the TCP/IP protocols, such proprietary networks limit the end user’s

ability to use standard, off-the-shelf Ethernet products such as routers, switches,

firewalls, etc. This limitation destroys one of the fundamental advantages of

standard Ethernet - the availability of low-cost, ubiquitous COTS Ethernet

hardware.

• By using UDP instead of TCP the reaction time comes down to about 10 ms at

best. UDP is not suited for “hard” deterministic distributed systems.

Page 22: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

LabVIEW Embedded system

application development

• Developing the LabVIEW FPGA application for Input/Output

(I/O), timing, synchronization, high speed control and signal

processing.

• Developing the LabVIEW Real-Time application for

deterministic floating point analysis and control as well as

communication with a networked host computer.

• Developing the LabVIEW for Windows application for graphical

user interfaces, supervisory control and data logging.

Page 23: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

Architecture for Advanced Embedded

Applications

Data storage is non-deterministic

PC – Windows OS

Page 24: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:
Page 25: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

Interrupts for Data Acquisition

• In general, there are three approaches to acquiring data from

an external device or synchronizing communication between

devices. These three approaches are described as follows:

• Polling – This method involves periodically reading the status of

the device to determine whether the device needs attention.

• Interrupts – the device is configured to interrupt the processor

whenever the device requires attention.

• Direct Memory Access (DMA) – A dedicated processor, the

DMA controller, transparently transfers data from the device to

computer memory, or vice versa.

Page 26: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

Interrupt-Driven Programming

• In interrupt-driven systems software is designed such that when

a registered event, such as a timer, is received, a response is

fired to respond to this event.

• There are two components of any interrupt-driven system: the

interrupt and the interrupt handler.

• An interrupt is a signal that is generated by hardware, which

indicates an event has occurred that should halt the currently

executing program.

• Interrupt handlers (also referred to as interrupt service

routines - ISRs) are portions of code that are registered with

the processor to execute once a particular interrupt has

occurred. Once the processor is aware of an interrupt, it halts

the currently executing process, performs a context switch to

save the state of the system, and executes the interrupt

handler. Once the interrupt handler code has executed, the

processor returns control to the previously running program.

Page 27: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

GPUs

• GPU = Graphics Processing Unit

• GPUs can be used as hardware accelerators for

numerical/computational tasks

• Can be used in Real-Time High-Performance Computing

systems

• GPUs have more transistors dedicated for processing than a CPU

– The performance gain when using GPUs can be significant

Page 28: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

NVIDIA Tesla GPUs

Page 29: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

CUDA

void saxpy_serial(int n, float a, float* x, float* y)

{

for (int i=0; i<n; ++i)

y[i] = a*x[i] + y[i];

}

// Kjør seriell saxpy

saxpy_serial(n, 2.0, x, y);

__global__ void saxpy_parallel(int n, float a, float* x, float* y)

{

int i = blockIdx.x * blockDim.x + threadIdx.x;

if (i < n) y[i] = a*x[i] + y[i];

}

// Kjør parallell saxpy med 256 tråder/blokk

int nBlocks = (n + 255) / 256;

saxpy_parallel<<<nBlocks, 256>>>(n, 2.0, x, y);

Standard C

Parallel

CUDA C

• CUDA = Compute Unified Device Architecture

• CUDA is developed by Nvidia and is a GPU interface for C

Page 30: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

Application examples

Page 31: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

Application example II

http://www.top500.org/

Page 32: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

LabVIEW and GPUs

• GPUs can not be directly programmed with LabVIEW

– will not compile LabVIEW code for use on a GPU, but rather

enables CUDA functions to be used in LabVIEW.

• However, a framework is designed to integrate GPU execution

into LabVIEW's parallel execution, to execute the CUDA code

• Some applications are tailor made for deployment to GPUs,

such as those related to matrix operations

Page 33: Real-Time and Embedded systems, FPGAs and GPUs and Embedded systems, FPGAs and GPUs ... • Automatic Generation of VHDL code (or a bit stream) ... DMA controller, ...Published in:

Hybrid architecture example

• Numerical Computing

– multicore computer (CPU)

– GPU computing system

• Real-Time Measurement and Control

– PXIe chassis

– Embedded controller with a multicore

CPU

– FPGA-based data acquisition/control

boards