intel® ipp. fighting for the performance intel® ipp. fighting for the performance novosibirsk,...

19
Intel® IPP. Intel® IPP. Fighting for the Fighting for the performance performance Novosibirsk, 2008 Novosibirsk, 2008 Boris Sabanin Boris Sabanin

Upload: charlotte-porter

Post on 28-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin

Intel® IPP.Intel® IPP.Fighting for the Fighting for the

performanceperformance

Intel® IPP.Intel® IPP.Fighting for the Fighting for the

performanceperformance

Novosibirsk, 2008Novosibirsk, 2008Boris SabaninBoris SabaninNovosibirsk, 2008Novosibirsk, 2008Boris SabaninBoris Sabanin

Page 2: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin

Why Primitives?Why Primitives?Why Primitives?Why Primitives?““Было бы расточительством и неграмотностью не предоставлять Было бы расточительством и неграмотностью не предоставлять разработчикам общего фундамента для их [систем] построения.разработчикам общего фундамента для их [систем] построения.””

А.П.Ершов, "Математическое обеспечение 4-го поколения"А.П.Ершов, "Математическое обеспечение 4-го поколения"

Intel® Integrated Performance PrimitivesIntel® Integrated Performance PrimitivesIntel® Integrated Performance PrimitivesIntel® Integrated Performance Primitives

• To optimize deeplyTo optimize deeply

• To make it cross-platformTo make it cross-platform

• To make it orthogonal in functionalityTo make it orthogonal in functionality

• To test perfectlyTo test perfectly

• To develop independentlyTo develop independently

• To give customers the build blocksTo give customers the build blocks

• To optimize deeplyTo optimize deeply

• To make it cross-platformTo make it cross-platform

• To make it orthogonal in functionalityTo make it orthogonal in functionality

• To test perfectlyTo test perfectly

• To develop independentlyTo develop independently

• To give customers the build blocksTo give customers the build blocks

Page 3: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin

Being PrimitiveBeing PrimitiveBeing PrimitiveBeing Primitive ANSI C. ANSI C. PortablePortable Low overhead. Low overhead. High perf with small dataHigh perf with small data Low structure. Low structure. No conversionNo conversion Basic common operation. Basic common operation. For many ISVFor many ISV Atomic. Atomic. Making one thing. Build blocks, Making one thing. Build blocks,

flexibleflexible Self contained. Self contained. Min or zero OS dependencyMin or zero OS dependency Predictable. Predictable. Expectable behavior and resultsExpectable behavior and results Well defined. Well defined. No “result is not defined”No “result is not defined” Well documented. Well documented. And self documentedAnd self documented Intuitive. Intuitive. Understand onceUnderstand once No magic. No magic. No side effects, explicit behaviorNo side effects, explicit behavior

ippippssAddAddCC__8u8u__II

Page 4: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin

High Temperature IPPHigh Temperature IPPHigh Temperature IPPHigh Temperature IPP

SWSW. Applications. ApplicationsSWSW. Applications. Applications

HWHW. CPU & chipset. CPU & chipsetHWHW. CPU & chipset. CPU & chipset

OSOSOSOS IPPIPP

ComponentsComponentsComponentsComponents

Page 5: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin

IPP & Media. What is Inside?IPP & Media. What is Inside?IPP & Media. What is Inside?IPP & Media. What is Inside?• Signal & Image ProcessingSignal & Image Processing

• String ProcessingString Processing

• Computer VisionComputer Vision

• Speech Recognition primitivesSpeech Recognition primitives

• Jpeg & Jpeg2000 primitivesJpeg & Jpeg2000 primitives

• Speech, Audio and Video CodingSpeech, Audio and Video Coding

• Lossless Data CompressionLossless Data Compression

• Small Matrix operations, Vector MathSmall Matrix operations, Vector Math

• CryptographyCryptography

• Realistic RenderingRealistic Rendering

• Data Integrity Data Integrity

• Automatically generated DSP transformsAutomatically generated DSP transforms

Page 6: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin

IPP, What Else? For Free?IPP, What Else? For Free?IPP, What Else? For Free?IPP, What Else? For Free?

50+ IPP Samples given in source codes50+ IPP Samples given in source codes• Video codecs: MPEG2, MPEG4, H264, VC1Video codecs: MPEG2, MPEG4, H264, VC1

• Audio codecs: MP3, AAC, AC3Audio codecs: MP3, AAC, AC3

• JPEG and JPEG2000 codecsJPEG and JPEG2000 codecs

• Speech codecs: G722, G723, G726, G728Speech codecs: G722, G723, G726, G728

• Computer Vision: Face DetectionComputer Vision: Face Detection

• Ray Tracing demoRay Tracing demo

• Interfaces: Java, C#, .VB, F90, C++Interfaces: Java, C#, .VB, F90, C++

• Yes. Download free source-code samplesYes. Download free source-code samples

http://www.intel.com/support/performancetools/libraries/ipp/http://www.intel.com/support/performancetools/libraries/ipp/

Page 7: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin

More Optimization NeededMore Optimization NeededMore Optimization NeededMore Optimization Needed

MHzMHzMHzMHz

ArchArch

TimeTime

Perf

orm

ance

Perf

orm

ance

OptimizationOptimizationis needed.is needed.A lot of workA lot of work

MMXMMX

SSESSE

CoreCore

3GHz3GHz

Page 8: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin

Achieving PerformanceAchieving PerformanceAchieving PerformanceAchieving Performance

AlgorithmsAlgorithms SIMDSIMD ThreadingThreading HW acceleratorsHW accelerators Hybrid SolutionHybrid Solution

AlgorithmsAlgorithms SIMDSIMD ThreadingThreading HW acceleratorsHW accelerators Hybrid SolutionHybrid Solution

Page 9: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin

Algorithm. Right DFT Algorithm. Right DFT DecompositionDecomposition

Algorithm. Right DFT Algorithm. Right DFT DecompositionDecomposition

Manually optimized code vs. automatically generated.Manually optimized code vs. automatically generated.The best of 200 decomposition cases are benchmarkedThe best of 200 decomposition cases are benchmarked

Page 10: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin

Threading. Function level and Threading. Function level and aboveabove

Threading. Function level and Threading. Function level and aboveabove

Primitive levelPrimitive level. 1D FFT is . 1D FFT is optimized and threaded. optimized and threaded. Performance on Core™2 Performance on Core™2 Duo Duo 2222 GFlops GFlops

Over primitivesOver primitives. IPP based . IPP based GZIP even single thread GZIP even single thread version is faster, see version is faster, see performance on the chartperformance on the chartin CPU clocks per byte. in CPU clocks per byte. The threaded version is The threaded version is much faster due to the much faster due to the threading modes threading modes implemented: multi-file implemented: multi-file and in-file parallelizationand in-file parallelization

Page 11: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin

FFTW Compares FFT FFTW Compares FFT PerformancePerformance

FFTW Compares FFT FFTW Compares FFT PerformancePerformance

3.60 GHz Intel Xeon Pentium 4 (Prescott), 3.60 GHz Intel Xeon Pentium 4 (Prescott), unknown L2 size, 64 bit mode. Linux unknown L2 size, 64 bit mode. Linux 2.4.21, Intel C/C++ Compiler 9.0, Intel 2.4.21, Intel C/C++ Compiler 9.0, Intel Fortran Compiler 9.0, Intel Math Kernel Fortran Compiler 9.0, Intel Math Kernel Library Version 8.0.1, Intel Integrated Library Version 8.0.1, Intel Integrated Performance Primitives v5.0. Has SSE (4-Performance Primitives v5.0. Has SSE (4-way single precision SIMD), SSE2 (2-way way single precision SIMD), SSE2 (2-way double precision SIMD), SSE3double precision SIMD), SSE3

http://www.fftw.org/speed

FFTW web FFTW web sitesite

IPPIPPIPPIPP

Page 12: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin

The Open Source Powered by The Open Source Powered by IPPIPP

The Open Source Powered by The Open Source Powered by IPPIPP

• Data CompressionData Compression• GZIP, ZLIB, BZIP2, LZOGZIP, ZLIB, BZIP2, LZO

• Image Coding. JpegImage Coding. Jpeg• IJGIJG

• CryptographyCryptography• OpenSSLOpenSSL

• Computer VisionComputer Vision• OpenCVOpenCV

Page 13: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin

OpenCV Calls IPP and WinsOpenCV Calls IPP and WinsOpenCV Calls IPP and WinsOpenCV Calls IPP and Wins

Stanford Racing Team has Stanford Racing Team has won Grand Challenge. won Grand Challenge. OpenCV & IPP are used inOpenCV & IPP are used in“Stanley” computer vision“Stanley” computer visionsoftware.software.

Stanford Racing Team has Stanford Racing Team has won Grand Challenge. won Grand Challenge. OpenCV & IPP are used inOpenCV & IPP are used in“Stanley” computer vision“Stanley” computer visionsoftware.software.

DARPA "Urban Challenge“ . InDARPA "Urban Challenge“ . InNovember with 60-mile multi-robotNovember with 60-mile multi-robot

face-off in a simulated city.face-off in a simulated city.Powered by Intel Core2 QuadPowered by Intel Core2 Quad

running IPP and OpenCVrunning IPP and OpenCV

DARPA "Urban Challenge“ . InDARPA "Urban Challenge“ . InNovember with 60-mile multi-robotNovember with 60-mile multi-robot

face-off in a simulated city.face-off in a simulated city.Powered by Intel Core2 QuadPowered by Intel Core2 Quad

running IPP and OpenCVrunning IPP and OpenCV

Page 14: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin

HW AccelerationHW AccelerationLow Power ComputingLow Power Computing

HW AccelerationHW AccelerationLow Power ComputingLow Power Computing

In media, the CPU utilization decrease is desirable (unlike in HPC)In media, the CPU utilization decrease is desirable (unlike in HPC)Because of less power consumption and letting other applicationsBecause of less power consumption and letting other applicationsrun. IPP video decoders running on CPU and on HW acceleratorsrun. IPP video decoders running on CPU and on HW acceleratorscompared with PowerVR technology. Menlow with Linuxcompared with PowerVR technology. Menlow with Linux

Page 15: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin

Hybrid Solution. Hybrid Solution. MC+HT+HWAMC+HT+HWA

Hybrid Solution. Hybrid Solution. MC+HT+HWAMC+HT+HWA

Page 16: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin

AMD Performance LibraryAMD Performance Library

• IPP API compatible•Much less functionality•Much less performance

• IPP API compatible•Much less functionality•Much less performance

Page 17: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin

Quality vs. PerformanceQuality vs. PerformanceQuality vs. PerformanceQuality vs. Performance

MSU Graphics LabMSU Graphics Labreports IPP H.264reports IPP H.264encoder is in top 3encoder is in top 3

Page 18: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin

IPP EconomicsIPP EconomicsIPP EconomicsIPP Economics

• 16 functional domains16 functional domains

• 10K functions10K functions

• 350MB of source codes350MB of source codes

• Windows, Linux, MacOSXWindows, Linux, MacOSX

• IA32, Intel®64, IA64, XScaleIA32, Intel®64, IA64, XScale

• All development in RussiaAll development in Russia

• 3 Releases a year + Out-Of-Cycle 3 Releases a year + Out-Of-Cycle releasesreleases

• IPP $199, IPP samples $0IPP $199, IPP samples $0

Page 19: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin

IPP CustomersIPP CustomersIPP CustomersIPP Customers

•MicrosoftMicrosoft

• AdobeAdobe

• Philips MedicalPhilips Medical

•MathWorksMathWorks

• UleadUlead

• ThomsonThomson

• YahooYahoo

• OKIOKI

• AppleApple

• SymantecSymantec

• Pixar Pixar

• EnvivioEnvivio

• SGISGI

• OracleOracle

• SAPSAP

• GoogleGoogle

RussianRussian??

RussianRussian??