intel® ipp. fighting for the performance intel® ipp. fighting for the performance novosibirsk,...
TRANSCRIPT
![Page 1: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin](https://reader036.vdocuments.us/reader036/viewer/2022062500/56649e495503460f94b3c5ac/html5/thumbnails/1.jpg)
Intel® IPP.Intel® IPP.Fighting for the Fighting for the
performanceperformance
Intel® IPP.Intel® IPP.Fighting for the Fighting for the
performanceperformance
Novosibirsk, 2008Novosibirsk, 2008Boris SabaninBoris SabaninNovosibirsk, 2008Novosibirsk, 2008Boris SabaninBoris Sabanin
![Page 2: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin](https://reader036.vdocuments.us/reader036/viewer/2022062500/56649e495503460f94b3c5ac/html5/thumbnails/2.jpg)
Why Primitives?Why Primitives?Why Primitives?Why Primitives?““Было бы расточительством и неграмотностью не предоставлять Было бы расточительством и неграмотностью не предоставлять разработчикам общего фундамента для их [систем] построения.разработчикам общего фундамента для их [систем] построения.””
А.П.Ершов, "Математическое обеспечение 4-го поколения"А.П.Ершов, "Математическое обеспечение 4-го поколения"
Intel® Integrated Performance PrimitivesIntel® Integrated Performance PrimitivesIntel® Integrated Performance PrimitivesIntel® Integrated Performance Primitives
• To optimize deeplyTo optimize deeply
• To make it cross-platformTo make it cross-platform
• To make it orthogonal in functionalityTo make it orthogonal in functionality
• To test perfectlyTo test perfectly
• To develop independentlyTo develop independently
• To give customers the build blocksTo give customers the build blocks
• To optimize deeplyTo optimize deeply
• To make it cross-platformTo make it cross-platform
• To make it orthogonal in functionalityTo make it orthogonal in functionality
• To test perfectlyTo test perfectly
• To develop independentlyTo develop independently
• To give customers the build blocksTo give customers the build blocks
![Page 3: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin](https://reader036.vdocuments.us/reader036/viewer/2022062500/56649e495503460f94b3c5ac/html5/thumbnails/3.jpg)
Being PrimitiveBeing PrimitiveBeing PrimitiveBeing Primitive ANSI C. ANSI C. PortablePortable Low overhead. Low overhead. High perf with small dataHigh perf with small data Low structure. Low structure. No conversionNo conversion Basic common operation. Basic common operation. For many ISVFor many ISV Atomic. Atomic. Making one thing. Build blocks, Making one thing. Build blocks,
flexibleflexible Self contained. Self contained. Min or zero OS dependencyMin or zero OS dependency Predictable. Predictable. Expectable behavior and resultsExpectable behavior and results Well defined. Well defined. No “result is not defined”No “result is not defined” Well documented. Well documented. And self documentedAnd self documented Intuitive. Intuitive. Understand onceUnderstand once No magic. No magic. No side effects, explicit behaviorNo side effects, explicit behavior
ippippssAddAddCC__8u8u__II
![Page 4: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin](https://reader036.vdocuments.us/reader036/viewer/2022062500/56649e495503460f94b3c5ac/html5/thumbnails/4.jpg)
High Temperature IPPHigh Temperature IPPHigh Temperature IPPHigh Temperature IPP
SWSW. Applications. ApplicationsSWSW. Applications. Applications
HWHW. CPU & chipset. CPU & chipsetHWHW. CPU & chipset. CPU & chipset
OSOSOSOS IPPIPP
ComponentsComponentsComponentsComponents
![Page 5: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin](https://reader036.vdocuments.us/reader036/viewer/2022062500/56649e495503460f94b3c5ac/html5/thumbnails/5.jpg)
IPP & Media. What is Inside?IPP & Media. What is Inside?IPP & Media. What is Inside?IPP & Media. What is Inside?• Signal & Image ProcessingSignal & Image Processing
• String ProcessingString Processing
• Computer VisionComputer Vision
• Speech Recognition primitivesSpeech Recognition primitives
• Jpeg & Jpeg2000 primitivesJpeg & Jpeg2000 primitives
• Speech, Audio and Video CodingSpeech, Audio and Video Coding
• Lossless Data CompressionLossless Data Compression
• Small Matrix operations, Vector MathSmall Matrix operations, Vector Math
• CryptographyCryptography
• Realistic RenderingRealistic Rendering
• Data Integrity Data Integrity
• Automatically generated DSP transformsAutomatically generated DSP transforms
![Page 6: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin](https://reader036.vdocuments.us/reader036/viewer/2022062500/56649e495503460f94b3c5ac/html5/thumbnails/6.jpg)
IPP, What Else? For Free?IPP, What Else? For Free?IPP, What Else? For Free?IPP, What Else? For Free?
50+ IPP Samples given in source codes50+ IPP Samples given in source codes• Video codecs: MPEG2, MPEG4, H264, VC1Video codecs: MPEG2, MPEG4, H264, VC1
• Audio codecs: MP3, AAC, AC3Audio codecs: MP3, AAC, AC3
• JPEG and JPEG2000 codecsJPEG and JPEG2000 codecs
• Speech codecs: G722, G723, G726, G728Speech codecs: G722, G723, G726, G728
• Computer Vision: Face DetectionComputer Vision: Face Detection
• Ray Tracing demoRay Tracing demo
• Interfaces: Java, C#, .VB, F90, C++Interfaces: Java, C#, .VB, F90, C++
• Yes. Download free source-code samplesYes. Download free source-code samples
http://www.intel.com/support/performancetools/libraries/ipp/http://www.intel.com/support/performancetools/libraries/ipp/
![Page 7: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin](https://reader036.vdocuments.us/reader036/viewer/2022062500/56649e495503460f94b3c5ac/html5/thumbnails/7.jpg)
More Optimization NeededMore Optimization NeededMore Optimization NeededMore Optimization Needed
MHzMHzMHzMHz
ArchArch
TimeTime
Perf
orm
ance
Perf
orm
ance
OptimizationOptimizationis needed.is needed.A lot of workA lot of work
MMXMMX
SSESSE
CoreCore
3GHz3GHz
![Page 8: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin](https://reader036.vdocuments.us/reader036/viewer/2022062500/56649e495503460f94b3c5ac/html5/thumbnails/8.jpg)
Achieving PerformanceAchieving PerformanceAchieving PerformanceAchieving Performance
AlgorithmsAlgorithms SIMDSIMD ThreadingThreading HW acceleratorsHW accelerators Hybrid SolutionHybrid Solution
AlgorithmsAlgorithms SIMDSIMD ThreadingThreading HW acceleratorsHW accelerators Hybrid SolutionHybrid Solution
![Page 9: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin](https://reader036.vdocuments.us/reader036/viewer/2022062500/56649e495503460f94b3c5ac/html5/thumbnails/9.jpg)
Algorithm. Right DFT Algorithm. Right DFT DecompositionDecomposition
Algorithm. Right DFT Algorithm. Right DFT DecompositionDecomposition
Manually optimized code vs. automatically generated.Manually optimized code vs. automatically generated.The best of 200 decomposition cases are benchmarkedThe best of 200 decomposition cases are benchmarked
![Page 10: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin](https://reader036.vdocuments.us/reader036/viewer/2022062500/56649e495503460f94b3c5ac/html5/thumbnails/10.jpg)
Threading. Function level and Threading. Function level and aboveabove
Threading. Function level and Threading. Function level and aboveabove
Primitive levelPrimitive level. 1D FFT is . 1D FFT is optimized and threaded. optimized and threaded. Performance on Core™2 Performance on Core™2 Duo Duo 2222 GFlops GFlops
Over primitivesOver primitives. IPP based . IPP based GZIP even single thread GZIP even single thread version is faster, see version is faster, see performance on the chartperformance on the chartin CPU clocks per byte. in CPU clocks per byte. The threaded version is The threaded version is much faster due to the much faster due to the threading modes threading modes implemented: multi-file implemented: multi-file and in-file parallelizationand in-file parallelization
![Page 11: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin](https://reader036.vdocuments.us/reader036/viewer/2022062500/56649e495503460f94b3c5ac/html5/thumbnails/11.jpg)
FFTW Compares FFT FFTW Compares FFT PerformancePerformance
FFTW Compares FFT FFTW Compares FFT PerformancePerformance
3.60 GHz Intel Xeon Pentium 4 (Prescott), 3.60 GHz Intel Xeon Pentium 4 (Prescott), unknown L2 size, 64 bit mode. Linux unknown L2 size, 64 bit mode. Linux 2.4.21, Intel C/C++ Compiler 9.0, Intel 2.4.21, Intel C/C++ Compiler 9.0, Intel Fortran Compiler 9.0, Intel Math Kernel Fortran Compiler 9.0, Intel Math Kernel Library Version 8.0.1, Intel Integrated Library Version 8.0.1, Intel Integrated Performance Primitives v5.0. Has SSE (4-Performance Primitives v5.0. Has SSE (4-way single precision SIMD), SSE2 (2-way way single precision SIMD), SSE2 (2-way double precision SIMD), SSE3double precision SIMD), SSE3
http://www.fftw.org/speed
FFTW web FFTW web sitesite
IPPIPPIPPIPP
![Page 12: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin](https://reader036.vdocuments.us/reader036/viewer/2022062500/56649e495503460f94b3c5ac/html5/thumbnails/12.jpg)
The Open Source Powered by The Open Source Powered by IPPIPP
The Open Source Powered by The Open Source Powered by IPPIPP
• Data CompressionData Compression• GZIP, ZLIB, BZIP2, LZOGZIP, ZLIB, BZIP2, LZO
• Image Coding. JpegImage Coding. Jpeg• IJGIJG
• CryptographyCryptography• OpenSSLOpenSSL
• Computer VisionComputer Vision• OpenCVOpenCV
![Page 13: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin](https://reader036.vdocuments.us/reader036/viewer/2022062500/56649e495503460f94b3c5ac/html5/thumbnails/13.jpg)
OpenCV Calls IPP and WinsOpenCV Calls IPP and WinsOpenCV Calls IPP and WinsOpenCV Calls IPP and Wins
Stanford Racing Team has Stanford Racing Team has won Grand Challenge. won Grand Challenge. OpenCV & IPP are used inOpenCV & IPP are used in“Stanley” computer vision“Stanley” computer visionsoftware.software.
Stanford Racing Team has Stanford Racing Team has won Grand Challenge. won Grand Challenge. OpenCV & IPP are used inOpenCV & IPP are used in“Stanley” computer vision“Stanley” computer visionsoftware.software.
DARPA "Urban Challenge“ . InDARPA "Urban Challenge“ . InNovember with 60-mile multi-robotNovember with 60-mile multi-robot
face-off in a simulated city.face-off in a simulated city.Powered by Intel Core2 QuadPowered by Intel Core2 Quad
running IPP and OpenCVrunning IPP and OpenCV
DARPA "Urban Challenge“ . InDARPA "Urban Challenge“ . InNovember with 60-mile multi-robotNovember with 60-mile multi-robot
face-off in a simulated city.face-off in a simulated city.Powered by Intel Core2 QuadPowered by Intel Core2 Quad
running IPP and OpenCVrunning IPP and OpenCV
![Page 14: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin](https://reader036.vdocuments.us/reader036/viewer/2022062500/56649e495503460f94b3c5ac/html5/thumbnails/14.jpg)
HW AccelerationHW AccelerationLow Power ComputingLow Power Computing
HW AccelerationHW AccelerationLow Power ComputingLow Power Computing
In media, the CPU utilization decrease is desirable (unlike in HPC)In media, the CPU utilization decrease is desirable (unlike in HPC)Because of less power consumption and letting other applicationsBecause of less power consumption and letting other applicationsrun. IPP video decoders running on CPU and on HW acceleratorsrun. IPP video decoders running on CPU and on HW acceleratorscompared with PowerVR technology. Menlow with Linuxcompared with PowerVR technology. Menlow with Linux
![Page 15: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin](https://reader036.vdocuments.us/reader036/viewer/2022062500/56649e495503460f94b3c5ac/html5/thumbnails/15.jpg)
Hybrid Solution. Hybrid Solution. MC+HT+HWAMC+HT+HWA
Hybrid Solution. Hybrid Solution. MC+HT+HWAMC+HT+HWA
![Page 16: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin](https://reader036.vdocuments.us/reader036/viewer/2022062500/56649e495503460f94b3c5ac/html5/thumbnails/16.jpg)
AMD Performance LibraryAMD Performance Library
• IPP API compatible•Much less functionality•Much less performance
• IPP API compatible•Much less functionality•Much less performance
![Page 17: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin](https://reader036.vdocuments.us/reader036/viewer/2022062500/56649e495503460f94b3c5ac/html5/thumbnails/17.jpg)
Quality vs. PerformanceQuality vs. PerformanceQuality vs. PerformanceQuality vs. Performance
MSU Graphics LabMSU Graphics Labreports IPP H.264reports IPP H.264encoder is in top 3encoder is in top 3
![Page 18: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin](https://reader036.vdocuments.us/reader036/viewer/2022062500/56649e495503460f94b3c5ac/html5/thumbnails/18.jpg)
IPP EconomicsIPP EconomicsIPP EconomicsIPP Economics
• 16 functional domains16 functional domains
• 10K functions10K functions
• 350MB of source codes350MB of source codes
• Windows, Linux, MacOSXWindows, Linux, MacOSX
• IA32, Intel®64, IA64, XScaleIA32, Intel®64, IA64, XScale
• All development in RussiaAll development in Russia
• 3 Releases a year + Out-Of-Cycle 3 Releases a year + Out-Of-Cycle releasesreleases
• IPP $199, IPP samples $0IPP $199, IPP samples $0
![Page 19: Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin](https://reader036.vdocuments.us/reader036/viewer/2022062500/56649e495503460f94b3c5ac/html5/thumbnails/19.jpg)
IPP CustomersIPP CustomersIPP CustomersIPP Customers
•MicrosoftMicrosoft
• AdobeAdobe
• Philips MedicalPhilips Medical
•MathWorksMathWorks
• UleadUlead
• ThomsonThomson
• YahooYahoo
• OKIOKI
• AppleApple
• SymantecSymantec
• Pixar Pixar
• EnvivioEnvivio
• SGISGI
• OracleOracle
• SAPSAP
• GoogleGoogle
RussianRussian??
RussianRussian??