intel math kernel library (mkl) clay p. breshears, phd intel software college ncsa multi-core...

Post on 03-Jan-2016

220 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Intel Math Kernel Library (MKL)Clay P. Breshears, PhD

Intel Software College

NCSA Multi-core WorkshopJuly 24, 2007

2

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Agenda

Performance Features

The Library Sections• BLAS• LAPACK*• DFTs• VML• VSL

SciMark 2.0 Optimization Case Study (from Henry Gabb)

• SciMark 2.0 overview

• Tuning with the Intel compiler

• Tuning with the Intel Math Kernel Library

3

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library Purpose

Performance, Performance, Performance!

Intel’s engineering, scientific, and financial math library

Addresses:

• Solvers (BLAS, LAPACK)

• Eigenvector/eigenvalue solvers (BLAS, LAPACK)

• Some quantum chemistry needs (dgemm)

• PDEs, signal processing, seismic, solid-state physics (FFTs)

• General scientific, financial [vector transcendental functions (VML) and vector random number generators (VSL)]

Tuned for Intel® processors – current and future

4

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library Purpose – Don’ts

But don’t use Intel® Math Kernel (Intel® MKL) on …

Don’t use Intel® MKL on “small” counts

Don’t call vector math functions on small n

X’Y’Z’W’

XYZW

=4x4

Transformationmatrix

Geometric Transformation

But you could use Intel® Performance Primitives

5

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library Environment

Support 32-bit and 64-bit Intel® processors

Large set of examples and tests

Extensive documentation

Windows* Linux*

Compilers Intel, Microsoft Intel, Gnu

Libraries .dll, .lib .a, .so

6

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Resource Limited Optimization

The goal of all optimization is maximum speed

Resource limited optimization – exhaust one or more resource of system:

• CPU: Register use, FP units

• Cache: Keep data in cache as long as possible; deal with cache interleaving

• TLBs: Maximally use data on each page

• Memory bandwidth: Minimally access memory

• Computer: Use all the processors/cores available using threading

• System: Use all the nodes available (cluster software)

7

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Threading

Most of Intel® Math Kernel Library could be threaded but:

• Limited resource is memory bandwidth

• Threading level 1 and level 2 BLAS are mostly ineffective ( O(n) )

There are numerous opportunities for threading:

• Level 3 BLAS ( O(n3) )

• LAPACK* ( O(n3) )

• FFTs ( O(n log n ) )

• VML, VSL ? depends on processor and function

All threading is via OpenMP*

All Intel MKL is designed and compiled for thread safety

8

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SciMark 2.0

Produced by the National Institute of Standards and Technology

ANSI C and Java versions available

Five floating-point-intensive kernels

• FFT: Compute a complex 1D FFT

• SOR: Jacobi successive over-relaxation in 2D

• MC: Compute by Monte Carlo integration

• MV: Sparse matrix-vector multiplication

• LU: Dense matrix LU factorization

9

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SciMark 2.0 Problem Sizes

Benchmark

Problem Size

Small Large

FFT N = 1024 N = 1048576

SOR 100 x 100 1000 x 1000

MC Problem size not fixed, no distinction between small and large problems

MVN = 1000

NZ = 5000

N = 100000

NZ = 1000000

LU 100 x 100 1000 x 1000

10

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Benchmark System

Hardware

CPU (dual-processor system) 3.6 GHz Xeon (2 MB L2 cache) EM64T

Motherboard Intel Server Board SE7520AF2

Memory 512 MB DDR2

BIOS

Version P06

Adjacent Cache Line Prefetch ON

Hardware Prefetch ON

Hyper-Threading Technology OFF

Software

Operating system Red Hat Enterprise Linux AS3

Linux kernel 2.4.21-20.EL #1 SMP

Intel C++ Compiler for Linux 8.1 (l_cce_pc_8.1.024)

Intel Cluster MKL 7.2 (l_cluster_mkl_7.2.008)

GNU C Compiler gcc 3.2.3

11

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

GNU Performance Baseline

Small Problems

0

200

400

600

800

1000

FFT

SO

R

MC

MV

LU

Co

mp

.

MFLO

PS

Default Optimized

Large Problems

0

100

200

300

400

500

FFT

SO

R

MC

MV

LU

Co

mp

.

MFLO

PS

Default Optimized

Aggressive optimization significantly improves performance relative to the default optimization level. The following gcc options were used to establish baseline performance: –O3 –march=nocona –ffast-math –mfpmath=sse

12

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel C++ Compiler for Linux

Performance• Automatic vectorization• Streaming SIMD Extensions 3• IPO and PGO• Automatic parallelization and OpenMP support• Automatic CPU dispatch• Much more...

Compatibility• Source and object compatible with gcc and g++• Supports GNU inline ASM• ANSI/ISO C/C++ standards compliance• Conforms to the C++ ABI standard• Integrated with the Eclipse IDE

13

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Tuning SciMark 2.0 with the Intel Compiler

Small Problems

0

500

1000

1500

2000

FFT

SO

R

MC

MV

LU

Co

mp

.

MFLO

PS

GNU Intel

Large Problems

0200400600800

10001200

FFT

SO

R

MC

MV

LU

Co

mp

.

MFLO

PS

GNU Intel

The Intel C++ Compiler for Linux improves SciMark 2.0 performance relative to the GNU baseline. Intel compiler options: –O3 –xP –ipo –fno-alias.

14

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library ContentsBLAS

BLAS (Basic Linear Algebra Subroutines)

Level 1 BLAS – vector-vector operations• 15 function types• 48 functions

Level 2 BLAS – matrix-vector operations• 26 function types• 66 functions

Level 3 BLAS – matrix-matrix operations• 9 function types• 30 functions

Extended BLAS – level 1 BLAS for sparse vectors• 8 function types• 24 functions

15

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library ContentsLAPACK

LAPACK (linear algebra package)• Solvers and eigensolvers. Many hundreds of routines total!

• There are more than 1000 total user callable and support routines

DFTs (Discrete Fourier transforms)• Mixed radix, multi-dimensional transforms

• Multithreaded

VML (Vector Math Library)• Set of vectorized transcendental functions

• Most of libm functions, but faster

VSL (Vector Statistical Library)• Set of vectorized random number generators

16

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library Contents

BLAS and LAPACK* are both Fortran

• Legacy of high performance computation

VSL and VML have Fortran and C interfaces

DFTs have Fortran 95 and C interfaces

cblas interface available

• More convenient for a C/C++ programmer

17

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library Optimizations in LAPACK*

Most important LAPACK optimizations:

• Threading – effectively uses multiple cores

• Recursive factorization• Reduces scalar time (Amdahl’s law: t = tscalar + tparallel/p)

• Extends blocking further into the code

No runtime library support required

18

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Tuning the SciMark 2.0 LU Kernel

Replacing the SciMark 2.0 LU kernel with the LAPACK dgetrf function requires attention to detail:

• SciMark 2.0 is written in C• LAPACK defines a Fortran interface

• C is call-by-value• Fortran is call-by-reference

• C uses row-major ordering• Fortran uses column-major ordering

• For best performance, dgetrf requires data to be contiguous in memory

• SciMark 2.0 LU kernel allocates a 2D array as pointers-to-pointers (not necessarily contiguous in memory)

19

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Tuning the SciMark 2.0 LU Kernel

0

1000

2000

3000

4000

5000

6000

7000

MFLO

PS

Small Large

SciMark 2.0 LU Kernel

GNU baseline

Intel compiler

Intel MKL LAPACK

Intel MKL LAPACK+ OpenMP

The Intel MKL Lapack significantly improves performance over the original SciMark 2.0 LU source code.

20

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library Contents Discrete Fourier Transforms

One dimensional, two-dimensional, three-dimensional…

Multithreaded

Mixed radix

User-specified scaling, transform sign

Transforms on embedded matrices

Multiple one-dimensional transforms on single call

Strides

C and F90 interfaces; FFTW interface support

21

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Using the Intel® Math Kernel Library DFTs

Basically a 3-step Process

Create a descriptor

Status = DftiCreateDescriptor(MDH, …)

Commit the descriptor (instantiates it)

Status = DftiCommitDescriptor(MDH)

Perform the transform

Status = DftiComputeForward(MDH, X)

Optionally free the descriptor

MDH: MyDescriptorHandle

22

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Tuning the SciMark 2.0 FFT Kernel

#include <mkl.h>

int N = 1024; // Size of SciMark 2.0 small FFT problemdouble scale = 1.0 / (double)N;

double *x = RandomVector ((2 * N), R); // SciMark creates a random vector // of size 2*N to hold real and // imaginary partsDFTI_DESCRIPTOR *dftiHandle; // Structure for MKL DFT descriptor

DftiCreateDescriptor (&dftiHandle, // Transform descriptor DFTI_DOUBLE, // Precision DFTI_COMPLEX, // Complex-to-complex 1, // Number of dimensions N); // Size of transform

// Apply scaling factor to backward transformDftiSetValue (dftiHandle, DFTI_BACKWARD_SCALE, scale);

DftiCommitDescriptor (dftiHandle);

DftiComputeForward (dftiHandle, x); // Apply DFT to array xDftiComputeBackward (dftiHandle, x);

DftiFreeDescriptor (&dftiHandle);

23

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Tuning the SciMark 2.0 FFT Kernel

0

500

1000

1500

2000

MFLO

PS

Small Large

SciMark 2.0 FFT Kernel

GNU baselineIntel compilerIntel MKL DFT

The Intel MKL DFT significantly improves performance over the original SciMark 2.0 FFT source code.

24

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library Contents Vector Math Library (VML)

Vector Math Library: vectorized transcendental functions – like libm but better (faster)

Interface: Have both Fortran and C interfaces

Multiple accuracies

• High accuracy ( < 1 ulp )

• Lower accuracy, faster ( < 4 ulps )

Special value handling √(-a), sin(0), and so on

Error handling – can not duplicate libm here

25

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

VML: Why Does It Matter?

It is important for financial codes (Monte Carlo simulations)

• Exponentials, logarithms

Other scientific codes depend on transcendental functions

Error functions can be big time sinks in some codes

26

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library Contents Vector Statistical Library (VSL)

Set of random number generators (RNGs)

Numerous non-uniform distributions

VML used extensively for transformations

Parallel computation support – some functions

User can supply own BRNG or transformations

Five basic RNGs (BRNGs)

• MCG31, R250, MRG32, MCG59, WH

27

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Non-Uniform RNGs

Gaussian (two methods)

Exponential

Laplace

Weibull

Cauchy

Rayleigh

Lognormal

Gumbel

28

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Using VSL

Basically a 3-step Process

Create a stream pointer

VSLStreamStatePtr stream;

Create a stream

vslNewStream(&stream,VSL_BRNG_MC_G31,seed );

Generate a set of RNGs

vsRngUniform( 0,&stream,size,out,start,end );

Delete a stream (optional)

vslDeleteStream(&stream);

29

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Calculating Pi by Monte Carlo

squarein darts of #

circle hitting darts of#4

41

squarein darts of #

circle hitting darts of#2

2

rr

Loop I = 1 to N_samples

x.coor = random [0..1]

y.coor = random [0..1]

dist = sqrt (x^2 + y^2)

if dist <= 1

hits = hits + 1

Pi = 4 * hits / N_samples

r

30

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Tuning the SciMark 2.0 MC Kernel

#include <mkl.h>

double MonteCarlo_integrate (int Num_samples){ int i, j, blocks, under_curve = 0; static double rnBuf[2 * BLOCK_SIZE]; double rnX, rnY; VSLStreamStatePtr stream;

blocks = Num_samples / BLOCK_SIZE; vslNewStream (&stream, VSL_BRNG_MCG31, SEED);

for (i = 0; i < blocks; i++) { vdRngUniform (VSL_METHOD_DUNIFORM_STD, stream, (2 * BLOCK_SIZE), rnBuf, 0.0, 1.0);

for (j = 0; j < BLOCK_SIZE; j++) { rnX = rnBuf[2*j]; rnY = rnBuf[2*j+1]; if (sqrt(rnX*rnX + rnY*rnY) <= 1.0) under_curve++; } } vslDeleteStream (&stream);

return ((double) under_curve / Num_samples) * 4.0;}

31

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Tuning the SciMark 2.0 MC Kernel

0100200300400500600700800900

1000

MFLO

PS

SciMark 2.0 MC Kernel

GNU baseline

Intel compiler

Intel MKL VSL

Intel MKL VSL +OpenMP

The Intel MKL VSL significantly improves performance over the original SciMark 2.0 MC source code.

32

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Best SciMark 2.0 Single Node PerformanceSmall Problems

0

500

1000

1500

2000FFT

SO

R

MC

MV

LU

Co

mp

.

MFLO

PS

GNU Intel

Small Problems (MFLOPS)

GNU Intel Speedup

FFT 510 1817 3.6

SOR 524 1092 2.1

MC 206 1003 4.9

MV 857 832 1.0

LU 884 1827 2.1

Comp. 596 1314 2.2

33

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Best SciMark 2.0 Single Node PerformanceLarge Problems

0

500

1000

1500

2000

FFT

SO

R

MC

MV

LU

Co

mp

.

MFLO

PS

GNU Intel

Large Problems (MFLOPS)

GNU Intel Speedup

FFT 45 600 13.3

SOR 495 1015 2.1

MC 206 1003 4.9

MV 453 457 1.0

LU 392 6646 16.9

Comp. 318 1944 6.1

6646

34

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Cluster MKL

Intel Cluster MKL is a superset of MKL for solving large linear algebra problems on a cluster

Intel Cluster MKL contains:

• ScaLAPACK (Scalable LAPACK)

• BLACS (Basic Linear Algebra Communication Subprograms)

Supports MPICH and the Intel MPI Library

35

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Data Layout Critical to Parallel Performance

ScaLAPACK uses 2D block-cyclic data distribution

Example layouts of lower triangular matrix for four processes

0 1

32

0 1

32

0 1

32

0 1

32

2D block-cyclic

distribution

0 1 2 3 0 1 2 3 0 1 2 3

1D block distribution

1D block-cyclic

distribution

2D block-cyclic

distribution

Load balancePoor Better

36

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Parallelizing the SciMark 2.0 LU Kernel with Intel® Cluster MKL

1. Initialize the process grid

2. Create a descriptor for each distributed matrix

3. Replace the call to dgetrf with pdgetrf (the ‘p’ is for parallel)

Result: LU factorization of a 40000 x 40000 matrix on an 8-node, dual 3.0 GHz Xeon cluster achieves 46000 MFLOPS.

37

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Performance Libraries: Intel® MKLWhat’s Been Covered

Intel® Math Kernel Library is a broad scientific/engineering math library

It is optimized for Intel® processors

It is threaded for effective use on multi-core and SMP machines

The Intel C++ Compiler for Linux improves SciMark 2.0 performance without requiring code modifications

With minor code modifications, Intel MKL dramatically improves the FFT, MC, and LU kernels

Some SciMark 2.0 kernels benefit from parallel computing

38

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Useful Links

Intel Software Products• http://www.intel.com/software/products/

Intel Software Network• http://www.intel.com/software/

Intel Software College• http://www.intel.com/software/college/

SciMark 2.0• http://math.nist.gov/scimark2/index.html

39

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

top related