programming with simd instructionspdslab/2014/slides/simd-debrup.pdf · 2016-07-22 · flynn’s...

26
Programming with SIMD Instructions Debrup Chakraborty Computer Science Department, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional México D.F., México. email: [email protected] November 13, 2014

Upload: others

Post on 13-Mar-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Programming with SIMD Instructionspdslab/2014/slides/simd-debrup.pdf · 2016-07-22 · Flynn’s Taxonomy A classification of computer architectures by Michael J. Flynn, 1972. Single

Programming with SIMD Instructions

Debrup Chakraborty

Computer Science Department, Centro de Investigación y de Estudios Avanzados delInstituto Politécnico Nacional

México D.F., México.email: [email protected]

November 13, 2014

Page 2: Programming with SIMD Instructionspdslab/2014/slides/simd-debrup.pdf · 2016-07-22 · Flynn’s Taxonomy A classification of computer architectures by Michael J. Flynn, 1972. Single

Flynn’s Taxonomy

A classification of computer architectures by Michael J. Flynn, 1972.

Single Instruction Multiple InstructionSingle Data SISD MISD

Multiple Data SIMD MIMD

Debrup (Computer Science Department, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional México D.F., México. email: [email protected])Programming with SIMD Instructions November 13, 2014 2 / 26

Page 3: Programming with SIMD Instructionspdslab/2014/slides/simd-debrup.pdf · 2016-07-22 · Flynn’s Taxonomy A classification of computer architectures by Michael J. Flynn, 1972. Single

SISD

Single Instruction Multiple InstructionSingle Data SISD MISD

Multiple Data SIMD MIMD

One instruction operating on one data in the same time (traditionalsequential processing).Flynn includes pipelined architectures also in this category.Intel processors < 1996 and AMD < 1998

Debrup (Computer Science Department, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional México D.F., México. email: [email protected])Programming with SIMD Instructions November 13, 2014 3 / 26

Page 4: Programming with SIMD Instructionspdslab/2014/slides/simd-debrup.pdf · 2016-07-22 · Flynn’s Taxonomy A classification of computer architectures by Michael J. Flynn, 1972. Single

MISD

Single Instruction Multiple InstructionSingle Data SISD MISD

Multiple Data SIMD MIMD

Executes different instructions on the same data at the same time.This is not common.

Debrup (Computer Science Department, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional México D.F., México. email: [email protected])Programming with SIMD Instructions November 13, 2014 4 / 26

Page 5: Programming with SIMD Instructionspdslab/2014/slides/simd-debrup.pdf · 2016-07-22 · Flynn’s Taxonomy A classification of computer architectures by Michael J. Flynn, 1972. Single

SIMD

Single Instruction Multiple InstructionSingle Data SISD MISD

Multiple Data SIMD MIMD

Execute the same instruction on multiple data at the same time.First Intel processor: Intel Pentium MMX (1996), MMXinstructionsFirst AMD processor : AMD K6-2 (1998), 3DNow! instructions

Debrup (Computer Science Department, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional México D.F., México. email: [email protected])Programming with SIMD Instructions November 13, 2014 5 / 26

Page 6: Programming with SIMD Instructionspdslab/2014/slides/simd-debrup.pdf · 2016-07-22 · Flynn’s Taxonomy A classification of computer architectures by Michael J. Flynn, 1972. Single

MIMD

Single Instruction Multiple InstructionSingle Data SISD MISD

Multiple Data SIMD MIMD

Executes asynchronously distinct instructions on distinct data.Multiprocessor architectures, clusters etc.

Debrup (Computer Science Department, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional México D.F., México. email: [email protected])Programming with SIMD Instructions November 13, 2014 6 / 26

Page 7: Programming with SIMD Instructionspdslab/2014/slides/simd-debrup.pdf · 2016-07-22 · Flynn’s Taxonomy A classification of computer architectures by Michael J. Flynn, 1972. Single

A Brief History of Intel Processors

Debrup (Computer Science Department, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional México D.F., México. email: [email protected])Programming with SIMD Instructions November 13, 2014 7 / 26

Page 8: Programming with SIMD Instructionspdslab/2014/slides/simd-debrup.pdf · 2016-07-22 · Flynn’s Taxonomy A classification of computer architectures by Michael J. Flynn, 1972. Single

A Brief History of Intel Processors

Debrup (Computer Science Department, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional México D.F., México. email: [email protected])Programming with SIMD Instructions November 13, 2014 8 / 26

Page 9: Programming with SIMD Instructionspdslab/2014/slides/simd-debrup.pdf · 2016-07-22 · Flynn’s Taxonomy A classification of computer architectures by Michael J. Flynn, 1972. Single

A Brief History of Intel Processors

Debrup (Computer Science Department, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional México D.F., México. email: [email protected])Programming with SIMD Instructions November 13, 2014 9 / 26

Page 10: Programming with SIMD Instructionspdslab/2014/slides/simd-debrup.pdf · 2016-07-22 · Flynn’s Taxonomy A classification of computer architectures by Michael J. Flynn, 1972. Single

Time line for SIMD Instruction sets

Debrup (Computer Science Department, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional México D.F., México. email: [email protected])Programming with SIMD Instructions November 13, 2014 10 / 26

Page 11: Programming with SIMD Instructionspdslab/2014/slides/simd-debrup.pdf · 2016-07-22 · Flynn’s Taxonomy A classification of computer architectures by Michael J. Flynn, 1972. Single

Intel SIMD Instruction Sets

MMX instructions: Multimedia extentions. 8 registers of 64 bits.SSE instructions: Streaming SIMD Extensions. Includes 128 bitregisters, and a variety of instructions for bit manipulations,arithmetic etc. Recently includes dedicated instructions forcryptography.AVX instructions: Advanced Vectorial Extension, includes 256 bitregisters.More extensions on the way.

Debrup (Computer Science Department, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional México D.F., México. email: [email protected])Programming with SIMD Instructions November 13, 2014 11 / 26

Page 12: Programming with SIMD Instructionspdslab/2014/slides/simd-debrup.pdf · 2016-07-22 · Flynn’s Taxonomy A classification of computer architectures by Michael J. Flynn, 1972. Single

History of SSE

Debrup (Computer Science Department, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional México D.F., México. email: [email protected])Programming with SIMD Instructions November 13, 2014 12 / 26

Page 13: Programming with SIMD Instructionspdslab/2014/slides/simd-debrup.pdf · 2016-07-22 · Flynn’s Taxonomy A classification of computer architectures by Michael J. Flynn, 1972. Single

How SSE instructions work?

Utilize dedicated registers.

Debrup (Computer Science Department, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional México D.F., México. email: [email protected])Programming with SIMD Instructions November 13, 2014 13 / 26

Page 14: Programming with SIMD Instructionspdslab/2014/slides/simd-debrup.pdf · 2016-07-22 · Flynn’s Taxonomy A classification of computer architectures by Michael J. Flynn, 1972. Single

How SSE instructions work?

Multiple data can be packed in a single register

Debrup (Computer Science Department, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional México D.F., México. email: [email protected])Programming with SIMD Instructions November 13, 2014 14 / 26

Page 15: Programming with SIMD Instructionspdslab/2014/slides/simd-debrup.pdf · 2016-07-22 · Flynn’s Taxonomy A classification of computer architectures by Michael J. Flynn, 1972. Single

How SSE instructions work?

Task: For each f in array compute

f = sqrt(f ).

SISD:for each f in array {

load f to the floating point register

calculate the square root

write the result from the register to memory

}

Debrup (Computer Science Department, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional México D.F., México. email: [email protected])Programming with SIMD Instructions November 13, 2014 15 / 26

Page 16: Programming with SIMD Instructionspdslab/2014/slides/simd-debrup.pdf · 2016-07-22 · Flynn’s Taxonomy A classification of computer architectures by Michael J. Flynn, 1972. Single

How SSE instructions work?

SIMD:for each 4 members in array {

load 4 members to the SSE register

calculate 4 square roots in one operation

write the result from the register to memory

}

Debrup (Computer Science Department, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional México D.F., México. email: [email protected])Programming with SIMD Instructions November 13, 2014 16 / 26

Page 17: Programming with SIMD Instructionspdslab/2014/slides/simd-debrup.pdf · 2016-07-22 · Flynn’s Taxonomy A classification of computer architectures by Michael J. Flynn, 1972. Single

Summary of SSE registers

Number of registers SizeMMX 8 64-bitsSSE 8 128-bits

SSE2 16 128-bits... ... ...

AVX 16 256-bits

Debrup (Computer Science Department, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional México D.F., México. email: [email protected])Programming with SIMD Instructions November 13, 2014 17 / 26

Page 18: Programming with SIMD Instructionspdslab/2014/slides/simd-debrup.pdf · 2016-07-22 · Flynn’s Taxonomy A classification of computer architectures by Michael J. Flynn, 1972. Single

Sample SSE Instructions

movss xmm, m32Load a single-precision (32-bit) floating-point element frommemory into the lower of xmm, and zero the upper 3 elements.memory address does not need to be aligned on any particularboundary.movaps xmm, m128Load 128-bits (composed of 4 packed single-precision (32-bit)floating-point elements) from memory into destination. Memoryaddress must be aligned on a 16-byte boundary.movdqa xmm1, m128,Load 128-bits of integer data from memory into destination.Memory address must be aligned on a 16-byte boundary.

(Other usages possible)

Debrup (Computer Science Department, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional México D.F., México. email: [email protected])Programming with SIMD Instructions November 13, 2014 18 / 26

Page 19: Programming with SIMD Instructionspdslab/2014/slides/simd-debrup.pdf · 2016-07-22 · Flynn’s Taxonomy A classification of computer architectures by Michael J. Flynn, 1972. Single

Sample SSE Instructions

Debrup (Computer Science Department, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional México D.F., México. email: [email protected])Programming with SIMD Instructions November 13, 2014 19 / 26

Page 20: Programming with SIMD Instructionspdslab/2014/slides/simd-debrup.pdf · 2016-07-22 · Flynn’s Taxonomy A classification of computer architectures by Michael J. Flynn, 1972. Single

Sample SSE Instructions

Scalar operations (ss Single scalar)Packed (ps Parallel scalar)

Debrup (Computer Science Department, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional México D.F., México. email: [email protected])Programming with SIMD Instructions November 13, 2014 20 / 26

Page 21: Programming with SIMD Instructionspdslab/2014/slides/simd-debrup.pdf · 2016-07-22 · Flynn’s Taxonomy A classification of computer architectures by Michael J. Flynn, 1972. Single

Initially it was done by "Inline Assembly":

__asm {

MOV EAX Op_A

MOV EBX, Op_B

MOVUPS XMM0, [EAX]

MOVUPS XMM1, [EBX]

ADDPS XMM0, XMM1

MOVUPS [Op_C], XMM0

}

Complicated, not very readable, programmer needs to take care of lowlevel details like register allocation etc.

Debrup (Computer Science Department, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional México D.F., México. email: [email protected])Programming with SIMD Instructions November 13, 2014 21 / 26

Page 22: Programming with SIMD Instructionspdslab/2014/slides/simd-debrup.pdf · 2016-07-22 · Flynn’s Taxonomy A classification of computer architectures by Michael J. Flynn, 1972. Single

How to use these instructions in my C code?

A better alternative is to use Intel intrinsics...

__m128 _mm_add_ps(__m128 a , __m128 b );

They are functions coded in assembly in appropriate header files.The syntax is much intuitive, and the programmer need not takecare of low level details.Most compilers (say GCC, ICC) has a good understanding of theintrinsics and can generate optimized codes with them.

Debrup (Computer Science Department, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional México D.F., México. email: [email protected])Programming with SIMD Instructions November 13, 2014 22 / 26

Page 23: Programming with SIMD Instructionspdslab/2014/slides/simd-debrup.pdf · 2016-07-22 · Flynn’s Taxonomy A classification of computer architectures by Michael J. Flynn, 1972. Single

What do we need?

A processor which supports the instructions that we want to use.An appropriate copiler, which understand intrinsics(GCC or ICC, ingeneral)The headers (.h) which corresponds to the instructions.Compile with appropriate flags to enable the instruction sets.Know the syntax of the instructions.

Debrup (Computer Science Department, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional México D.F., México. email: [email protected])Programming with SIMD Instructions November 13, 2014 23 / 26

Page 24: Programming with SIMD Instructionspdslab/2014/slides/simd-debrup.pdf · 2016-07-22 · Flynn’s Taxonomy A classification of computer architectures by Michael J. Flynn, 1972. Single

Intrinsics

Instructions Headers Flags

-MMX mmintrin.h -mmmx

-SSE xmmintrin.h -msse

-SSE2 emmintrin.h -msse2

-SSE3 pmmintrin.h -msse3

-SSSE3 tmmintrin.h -mssse3

-SSE4.1 et SSE4.2 smmintrin.h -msse4.1 -msse4.2

-AES et PCLMUL wmmintrin.h -maes -mpclmul

Debrup (Computer Science Department, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional México D.F., México. email: [email protected])Programming with SIMD Instructions November 13, 2014 24 / 26

Page 25: Programming with SIMD Instructionspdslab/2014/slides/simd-debrup.pdf · 2016-07-22 · Flynn’s Taxonomy A classification of computer architectures by Michael J. Flynn, 1972. Single

Intrinsics

In intrinsics we can use some nonstandard data types : __m128d

, __m128i

General syntax for function names: _mm_<name>_<type>The prefix _mm_ is always presentThe second part is <name>, generally it is same as the assemblymnemonic, but not always.The final part <type> indicates the packing information.

Debrup (Computer Science Department, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional México D.F., México. email: [email protected])Programming with SIMD Instructions November 13, 2014 25 / 26

Page 26: Programming with SIMD Instructionspdslab/2014/slides/simd-debrup.pdf · 2016-07-22 · Flynn’s Taxonomy A classification of computer architectures by Michael J. Flynn, 1972. Single

Intrinsics

Examples :

__m128i _mm_add_epi8(__m128i a, __m128i b)

__m128i _mm_add_epi32(__m128i a, __m128i b)

__m128i _mm_add_epi64(__m128i a, __m128i b)

__m128i _mm_and_si128(__m128i a, __m128i b)

__m128i _mm_xor_si128(__m128i a, __m128i b)

__m128i _mm_or_si128(__m128i a, __m128i b)

_piX : vector MM (64-bits) packed with X-bit words_epiX : vector XMM (128-bits) packed with X-bit words_si64 : vector MM (64-bits) of a single 64-bit word_si128 : vector XMM (128-bits) of a single 128-bit wordsee instruction list at:https:

//software.intel.com/sites/landingpage/IntrinsicsGuide/

Debrup (Computer Science Department, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional México D.F., México. email: [email protected])Programming with SIMD Instructions November 13, 2014 26 / 26