optimizing code in compilers using parallel genetic algorithm
TRANSCRIPT
Optimizing Code by Selecting Compiler Flags
using Parallel Genetic Algorithm onMulticore CPUs
by: Fatemeh KarimiSpring 2015
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
An overview
• Introduction• Background • Methodology• Experimental results• Conclusion
2/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Compiler optimization• Compiler optimization is the technique of minimizing or maximizing some
features of an executable code by tuning the output of a compiler.• Modern compilers support many different optimization phases and these phases
should analyze the code and should produce semantically equivalent performance enhanced code.• The three vital parameters defining enhancement of the performance are: Execution tim
e
Size of code
Introduction
4/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
• The compiler optimization phase ordering not only possesses challenges to
compiler developer but also for multithreaded programmer to enhance the performance of Multicore systems. • Many compilers have numerous optimization techniques which are applied in
predetermined ordering.• These ordering of optimization techniques may not always give an optimal code.
CODE
Search Space
Introduction
The phase ordering
4/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Optimization flags• The variation in optimization phase ordering depends on the application that is
compiled, the architecture of the machine on which it runs and the compiler implementation.• Many compilers allow optimization flags to be set by the users.• Turning on optimization flags makes the compiler attempt to improve the
performance and code size at the expense of compilation time.
Introduction
5/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
GNU compiler collection• The GNU Compiler Collection (GCC) includes front ends for C, C++, Objective C,
Fortran, Java, Ada, and Go, as well as libraries for these languages • In order to control compilation-time and compiler memory usage, and
the trade-offs between speed and space for the resulting executable, GCCprovides a range of general optimization levels, numbered from 0–3, aswell as individual options for specific types of optimization.
O3
O2
O1
Background
6/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Optimization levels• The impact of the different optimization levels on the input code is as described
below:
-O0 or no-O (default)
Source Code Object Code
Optimization
Easy bug elimination
Background
7/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Optimization levels
Source Code Object Code
1. Less compile time.2. smaller and faster
executable code.A lot of simpleoptimizations
eliminatesredundancy
-O1 or -O
Background
8/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Optimization levels• Only optimizations that do not require any speed-space tradeoffs
are used, so the executable should not increase in size.-O2
Source Code Object Code
1. maximum optimization without
increasing the executable size
O1+ additionaloptimizations
instructionscheduling
1. More compile time.2. More memory usage.the best choice
for deployment of a program
Background
8/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Optimization levels-O3
Source Code Object Code
1. faster executable code
2. Maximum Loop optimization
O1+ O2 +more expensive
optimizations
function inlining
1. Bulky code
Background
8/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Optimization levelBackground
9/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
The challenge
sequential quick sort Parallel quick sort
Which optimization level??
overhead of inter-process communication
Background
10/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Genetic algorithmInitial population Selection
Crossover & mutation
Intermediate population(mating pool)
Replacement
Next population
Background
11/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
PGA for Compiler Optimization• The work in this research uses GCC 4.8 compiler on Ubuntu 12.04 with OpenMP
3.0 library.
Methodology
12/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
The master-slave model• In the master-slave model the
master runs the evolutionary algorithm, controls the slaves and distributes the work. • The Slaves take batches of
individuals from the master and evaluate them. Finally send the calculated fitness value back to master.
Methodology
13/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Encoding
1 1 0 1 0 1 1 1 0 1
Methodology
14/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Fitness function• In the proposed system the PGA works with a population of six Chromosomes on
eight core machine and fitness function is computed at the Master core.Fitness=|(exe_with_flagi-exe_without_flagi)|
i
Master Node
Generate random
population
evaluates all individuals Slave nodes
algorithmAfter 200 generations
Methodology
15/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Algorithm for Slave Nodes
Receives all the chromosomes from the master node with the fitness values.
The slave cores apply the roulette wheel, Stochastic Universal Sampling and Elitism methods respectively for selection process in parallel
Create next generation applying two point crossover.
Applies mutations using method, two position interchange and produce two new offspring/chromosomes.
Sends both the chromosomes back to the master-node. (The master collects chromosomes from all slaves.)
Step 1
Step 2
Step 3
Step 4
Step 5
Methodology
16/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
SelectionMethodology
17/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Crossover and mutation
Two point crossover
Swap mutation
Methodology
18/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Benchmarks
• All the Bench mark programs are parallelized using OpenMP library to reap the benefits of PGA.
Experimental results
19/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Performance analysis• As one sees the Figures, the results after applying PGA (WGAO) presents a major
improvement with respect to the random optimization (WRO) and compiling code without applying optimization (WOO).
Experimental results
20/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Performance analysisExperimental results
21/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Performance analysisExperimental results
21/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Performance analysisExperimental results
21/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Performance analysisExperimental results
21/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Conclusion • In Compiler Optimization research, the phase ordering is an important
performance enhancement problem.• This study indicates that by increasing the number of cores the performance of
the benchmark program increases along with the usage of PGA.• The major concern in the experiment is the master core waiting time collect
values from slaves which is primarily due to the usage of Synchronized communication between the Master-Slave cores in the system.• Further it may be explicitly noted that apart from PRIMS algorithm for core-8
system all other Bench marks exhibit better average performance.
22/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Thanks for your attention.
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
References
[1] Satish Kumar T., Sakthivel S. and Sushil Kumar S., “Optimizing Code by Selecting Compiler Flags using Parallel Genetic Algorithm on Multicore CPUs,” International Journal of Engineering and Technology, Vol. 32, No. 5, 2014.[2] Prathibha B., Sarojadevi H., Harsha P., “Compiler Optimization: A Genetic Algorithm Approach,” International Journal of Computer Applications, Vol. 112, No.10, 2015.