stream processors seminar ppt

29
Stream Processors: Programmability with Efficiency Presented by Satyam Dhar 4NI08EC056

Upload: satyam-dhar

Post on 27-Aug-2014

127 views

Category:

Documents


4 download

TRANSCRIPT

Presented by

Satyam Dhar 4NI08EC056

Efficiency

refers to the power efficiency of a chip in performing a given task or executing an operation or calculation. It is measured in GOPS/W(Giga Operations Per Second Per Watt) Programmability is the the capability within hardware and software to change; to accept a new set of instructions that alter its behavior

At the system level, a choice is to be made between flexibility and power efficiency Specialized architectures(eg. ASICs) are better in performance( speed, power consumption) but no flexibility DSP or Microprocessors are highly flexible but these do not provide the high efficiency needed by the application Hence, a trade-off is to be made between efficiency and programmability

It

is a computer programming paradigm, related to SIMD (single instruction, multiple data) It allows some applications to more easily exploit a limited form of parallel processing. The basic idea is that single instruction acts on multiple data i.e. a stream of data.

The

Main Idea:

Stream 43 Stream 2 data Stream data data Stream 1 data datadata data data data data data data data data data data data data data data

Programmable Kernel

The

Main Idea:

Stream 43 Stream 2 data Stream data datadata data data data data data data data data data data data

Stream 1Programmable Kerneltransformed data transformed data transformed data transformed data transformed data

The

Main Idea:

Stream 43 Stream datadata data data data data data data data data

Stream 2 Stream 1 data Programmable Kernel

transformed data data transformed data data transformed data data transformed data data transformed data

The

Main Idea:Stream 32 Stream data Stream 1 data data Programmable Kernel

Stream 4data data data data data

transformed data data data transformed data data data transformed data data data transformed data data transformed data

The

Main Idea:Stream 43 Stream 2 data Stream data data Stream 1 data data Programmable Kernel

data transformed data data data data transformed data data data data transformed data data data transformed data data transformed data

Streams:

Streams are sets of data

elements. All

elements are a single data type.

Stream

elements can be simple, such as a single number, or complex, such as the coordinates of a triangle in 3D space.

Kernels:

Kernels are pieces of code that operate on streams. They take a stream as input and produce a stream as output. Kernels can be chained together Kernels can have one or more input and output streams performs complex calculations

Conventional,

sequential paradigm: for(int i = 0; i < 100 * 4; i++) result[i] = source0[i] + source1[i];

Parallel

SIMD paradigm: for(int el = 0; el < 100; el++) // for each vector vector_sum(result[el], source0[el], source1[el]);

Types 1. 2. 3.

of Parallelism and Locality exhibited: Instruction-Level Parallelism Data-Level Parallelism Produce-Consumer Locality

A

Stream Program expresses a computation as a signal flow graph with streams of records (the edges) flowing between computation kernels (the nodes).

One

huge advantage of Stream Processors: Partitioning of storage structures to support many ALUs operands for arithmetic operations reside in local register files (LRFs) near the ALUs Streams of data are stored in a stream register file (SRF) Reduces on-chip memory required and hence, highly power efficient

Hardware Implementation: the Imagine Stream ProcessorTransfer data between parts of the chip.

Hardware Implementation: the Imagine Stream ProcessorLocal storage and reuse of intermediate streams.

Hardware Implementation: the Imagine Stream Processor

Store kernel code.

Hardware Implementation: the Imagine Stream Processor

Execute one kernel at a time.

Hardware Implementation: the Imagine Stream Processor

Connection with other Imagine chips.

A

conventional processor has only a few (typically fewer than four)arithmetic units Thus, unable to exploit much of the parallelism exposed by a stream program. A conventional processor is unable to realize much kernel locality because it has too few processor registers(typically fewer than 32, compared with thousands for a stream processor)

Most

of the energy consumed by a modern microprocessor or DSP is consumed by data and instruction movement(only 1% in performing arithmetic calculations) A stream processor exploits data and instruction locality to reduce this overhead Approximately 30 percent of the energy is consumed by arithmetic operations.

A stream processor time-multiplexes its hardware over the kernels of an application All of the clusters work together on one kerneleach operating on different datathen they all proceed to the next kernel, and so on.

1. 2.

Mapping an application to a stream processor involves two steps: kernel scheduling, in which the operations of each kernel are scheduled on the ALUs of a cluster stream scheduling, in which kernel executions and data transfers are scheduled to use the SRF efficiently and to maximize data locality. Researchers of Stanford University have developed a set of programming tools that automate both of these tasks so that a stream processor can be programmed entirely in C without sacrificing efficiency.

The

stream processing benefits are limited to applications where similar operation is to be performed on a large data stream. If the work performed on each element is not of the same type, stream processing is inefficient Inertia i.e. learning to use the stream programming tools and writing a complex streaming application still represents a significant effort.

Though ASICs have efficiency as good as or better than stream processors, they are costly to design and lack flexibility. a single stream processor can be reused across many applications with no incremental design cost flexibility also permits new algorithms and functions to be easily implemented Due to: 1. competitive energy efficiency, 2. lower recurring costs, and 3. the advantages of flexibility, we expect stream processors to replace ASICs in the most demanding of signal-processing applications.