developing yocto linux* targeted apps using intel dev tools · legal information information in...
TRANSCRIPT
Developing Yocto Linux* targeted Apps using Intel Dev Tools
Feilong Huang
Technical Consulting Engineer
DPD/SSG
1
Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm
Intel, VTune, Cilk, Atom and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries.
*Other names and brands may be claimed as the property of others
Copyright© 2011 Intel Corporation. All rights reserved.
2
Executive Summary
• Yocto Linux* requires performance tools in addition to baseline development tools
• Intel® Embedded Software Development Tool Suite 2.3 for Intel® Atom™ Processor integrates Intel® Compiler with Yocto* Application Development Toolkit 1.1 (ADT).
• It provides Sampling Collector for Intel® VTune™ Amplifier XE validated for use on Yocto Linux*.
• System and Application Debuggers permit debug for all layers of the software stack
3
Agenda
• Intel® Embedded Software Development Tool Suite for Intel® Atom™ Processor
• Integrating Intel® C++ Compiler into ADT
• Using Intel® VTune™ Amplifier XE with Yocto Linux*
• Debugging Yocto Linux* Applications
• Summary
4
The Development Cycle
Intel® C++ Compiler
• SSSE3 Vectorization
• In-order scheduler
• Memory access optimization
Intel® Integrated Performance Primitives
• Intel® Application Debugger
• Intel® JTAG Debugger
• Intel® Flash Memory Tool
• Intel® VTune™ Amplifier XE
• Sampling Collector for Intel® VTune™ Amplifier XE (SEP)
• Intel® Debuggers
Tool Suite for all Phases of Development Design through Validation
5
Intel® Embedded Software Development Tool Suite for Intel® Atom™ Processor
Target OS: Linux*
Kernel debug; On-Chip trace & SMP run control
Identify optimization opportunities
Thread Specific Run Control & Thread Grouping
Broad Processor coverage CE4xxx, Z6xx, E6xx series
6
Intel® C++ Compiler & Intel® Atom™ Processor
• Optimization Switch –xSSE3_ATOM
– In order scheduler – IDIV DIVB expansion – Arithmetic operations feeding addresses turned into LEAs – All stack adjusts done using LEAs – Support for movbe instruction – Intel® Streaming SIMD Extensions 3 (SSE3) instruction support
• Compiler Based Vectorization and Automatic Processor Dispatch –ax[?] – Single executable optimized for Intel® Atom™ processors and generic code that runs on
all IA32 processors
– For each target processor it uses:
Processor-specific instructions, vectorization,
low overhead, some increase in code size
Dedicated performance optimizations for the Intel® Atom™ Processor
7
Build Support for Cross-Build Environments Embedded cross-build environments for Linux* tend to have varying install
locations for • Preprocessor defines
• GNU tools paths and names
• GNU startup files, C++ includes/runtime
• Location of target system headers and libraries
• The list of default libraries
Intel® C++ Compiler supports • --sysroot
• Chroot/jailroot installs
• Detailed build environment definition via –platform=<name>
(where name is the name of a user editable environment file)
• Tested against Poky Linux*, MADDE*, CE Linux* SDK
• Yocto Linux* Application Development Toolkit Support
Compiler is flexible in meeting
embedded build environment needs
8
Common Optimization Switches
9
Windows* Linux*
Disable optimization /Od -O0
Optimize for speed (no code size increase) /O1 -O1
Optimize for speed (default) /O2 -O2
High-level loop optimization /O3 -O3
Create symbols for debugging /Zi -g
Multi-file inter-procedural optimization /Qipo -ipo
Profile guided optimization (multi-step build) /Qprof-gen
/Qprof-use
-prof-gen
-prof-use
Optimize for speed across the entire program /fast (same as: /O3 /Qipo /Qprec-div- /QxHost)
-fast (same as: -ipo –O3 -no-prec-div -static -xHost)
OpenMP 3.0 support /Qopenmp -openmp
Automatic parallelization /Qparallel -parallel
High-Level Optimizer (HLO)
• Compiler switches: /O2, /O3 (Windows*), -O2, -O3 (Linux*)
• Loop level optimizations – loop unrolling, cache blocking, prefetching
• More aggressive dependency analysis – Determines whether or not it‘s safe to reorder or
parallelize statements
• Scalar replacement – Goal is to reduce memory by replacing with register
references
10
SIMD: Single Instruction Multiple Data
• Scalar processing
– traditional mode
– one operation produces one result
• SIMD processing
– one instruction produces multiple results
+
x3 x2 x1 x0
y3 y2 y1 y0
x3+y3 x2+y2 x1+y1 x0+y0
X
Y
X + Y
+
X
Y
X + Y
= =
12
Interprocedural Optimizations (IPO)
• Interprocedural optimizations performs a static, topological analysis of your application!
• ip: Enables inter-procedural optimizations for current source file compilation
• ipo: Enables inter-procedural optimizations across files Can inline functions in separate files
Especially many small utility functions benefit from IPO Enabled optimizations: • Procedure inlining (reduced function call overhead) • Interprocedural dead code elimination, constant propagation and procedure
reordering • Enhances optimization when used in combination with other compiler features
Windows* Linux*
/Qip -ip
/Qipo -ipo
13
Interprocedural Optimizations (IPO)
Linking
Linux* icc -ipo main.o func1.o
func2.o
Pass 1
Pass 2
mock object
executable
Compiling
Linux* icc -c -ipo main.c func1.c
func2.c
14
Interprocedural Optimizations
Compile & Optimize
Compile & Optimize
Compile & Optimize
Compile & Optimize
file1.c
file2.c
file3.c
file4.c
Without IPO
Compile & Optimize
file1.c
file4.c file2.c
file3.c
With IPO
-ip Only between modules of one source file
-ipo Modules of multiple files/whole application
Profile-Guided Optimizations (PGO) • Static analysis leaves many questions open for the
optimizer like: – How often is x > y – What is the size of count – Which code is touched how often
• Use execution-time feedback to guide (final) optimization
• Enhancements with PGO: • More accurate branch prediction • Basic block movement to improve instruction cache behavior • Better decision of functions to inline (help IPO) • Can optimize function ordering • Switch-statement optimization • Better vectorization decisions
15
if (x > y) do_this(); else do that();
for(i=0; i<count; ++I
do_work();
PGO Usage: Three Step Process
16
Compile + link to add instrumentation icc -prof_gen prog.c
Execute instrumented program prog.exe (on a typical dataset)
Compile + link using feedback icc -prof_use prog.c
Dynamic profile: 12345678.dyn
Instrumented executable: prog.exe
Merged .dyn files: pgopti.dpi
Step 1
Step 2
Step 3
Optimized executable: prog.exe
Integrating Intel® C++ Compiler into ADT
• Fully automated via toolsuite installation script
• Warning message if insufficient access rights (root, sudo)
• Warning message if ADT installation in /opt/poky/1.1/ not present:
“Yocto* ADT has not been detected or its directory is not writable”
“The Yocto Project* Application Development Toolkit has not been detected on your system or its directory “/opt/poky/1.1” is not writable. This toolkit is required if you want to use the Intel® Composer XE for building Yocto Project* targeted applications. For automatic Intel® Composer XE integration with the Application Development Toolkit during installation, please make sure that the toolkit is installed and its directory is writable, and then re-check the prerequisites. For manual integration after installation, please consult the product Release Notes.”
17
Integrating Intel® C++ Compiler into ADT
/opt/intel/atom/composerxe/bin/ia32/yocto.env *platform: yocto *yocto_sdk_toolchain: %$(YOCTO_TOOLCHAIN) *sysroot: %$(YOCTO_SYSROOT) *target_root: %(sysroot) *gcc_install: %(sysroot)/usr/lib/gcc/i586-poky-linux/4.5.1 *intel_include: %(intel_root)/../compiler/include *intel_lib: %(intel_root)/../compiler/lib/ia32 *exec_path: %(yocto_sdk_toolchain)/i586-poky-linux
*exec_prefix: i586-poky-linux- *gxx_include: %(sysroot)/usr/include/c++/i586-pokyplinux/bits *link_lib_path: %(intel_lib)%(path_separator)%(gcc_install)%(path_separator)%(sysroot)/lib%(path_separator) %(sysroot )/usr/lib:%(sysroot)/usr/lib/i586-poky-linux/4.5.1 *link_start_files: %(sysroot)/usr/lib/i586-poky-linux/4.5.1/crtbegin.o %(sysroot)/usr/lib/crti.o %(sysroot)/usr/lib/crtn.o %(sysroot)/usr/lib/crt1.o *link_end_files: %(sysroot)/usr/lib/i586-poky-linux/4.5.1/crtend.o
18
Integrating Intel® C++ Compiler into ADT (cont.)
/opt/intel/atom/composerxe/bin/ia32/yocto.env
*link_default_libs: %{!static?%{i-dynamic|shared?-Bdynamic;-Bstatic}} -lsvml -limf \ %{!static?-Bdynamic} -lm \ %{!static?%{i-dynamic|shared?-Bdynamic;-Bstatic}} -lipgo -ldecimal \ %{i_cxxlink? \ %{cxxlib-gcc? \ %{!static?%{i-static|static-libcxa?-Bstatic;-Bdynamic}} -lcxaguard}} \ %{openmp-stubs?%{!static?%{i-static?-Bstatic;-Bdynamic}} -lompstub} \ %{!static?%{i-dynamic|shared?-Bdynamic;-Bstatic}} %{pic-libirc?-lirc_pic;-lirc} \ %{!static?-Bdynamic} -lc \ %{cxxlib-gcc? \ %{!cxxlib-nostd?%{!static?-Bdynamic} -lstdc++;%{!static?-Bdynamic} -lsupc++} \ %{static|static-libgcc? \ %{!static?-Bstatic} -lgcc -lgcc_eh; \ %{!shared?%{!static?%{static-libgcc?-Bstatic;-Bdynamic}} -lgcc -lgcc_s}} \ %{!static?-Bdynamic} -ldl -lc}
19
Integrating Intel® C++ Compiler into ADT (cont.)
/opt/poky/1.1/environment-setup-i586-poky-linux export PATH=/opt/poky/1.1/sysroots/i686-pokysdk-linux/usr/bin:/opt/poky/1.0/sysroots/i686-pokysdk-linux/usr/bin/i586-poky-linux:$PATH export PKG_CONFIG_SYSROOT_DIR=/home/roboima/test-yocto/x86 export PKG_CONFIG_PATH=/home/roboima/test-yocto/x86/usr/lib/pkgconfig export CONFIG_SITE=/opt/poky/1.1/site-config-i586-poky-linux export CC=icc export CXX=icpc export GDB=i586-poky-linux-gdb export TARGET_PREFIX=i586-poky-linux- export CONFIGURE_FLAGS="--target=i586-poky-linux --host=i586-poky-linux --build=i686-linux" export CFLAGS="-march=i586 -platform=yocto" export CXXFLAGS="-march=i586" export LDFLAGS="--sysroot=/home/roboima/test-yocto/x86" export CPPFLAGS="" export POKY_NATIVE_SYSROOT="/opt/poky/1.1/sysroots/i686-pokysdk-linux" export POKY_TARGET_SYSROOT="/home/roboima/test-yocto/x86" export POKY_DISTRO_VERSION="1.1" export POKY_SDK_VERSION="1.1" export POKY_ACLOCAL_OPTS="-I /opt/poky/1.1/sysroots/i686-pokysdk-linux/usr/share/aclocal"
Remove –sysroot references since covered in *.env file
20
Integrating Intel® C++ Compiler into ADT (cont.)
~/.bashrc
source /opt/intel/composerxe/bin/compilervars.sh ia32 source /opt/poky/1.1/environment-setup-i586-poky-linux export YOCTO_TOOLCHAIN=/opt/poky/1.1/sysroots/i686-pokysdk-linux/usr/bin export YOCTO_SYSROOT=/home/roboima/test-yocto/x86 export POKY_ACLOCAL_OPTS="-I /opt/poky/1.1/sysroots/i686-pokysdk-linux/usr/share/aclocal"
21
Intel® VTune™ Amplifier XE in Embedded
Where is my application…
Spending Time? Wasting Time?
• Focus tuning on functions taking time
• See time on source
• See cache misses on your source
• See functions sorted by # of cache misses
• Linux host and targets • Low overhead • No special recompiles
Advanced Profiling for Scalable Performance
22
Intel® VTune™ Amplifier XE
.TB5 file
Sampling Collector
Host
Features
• Statistic Analysis
• Low overhead sampling
• No instrumentation required
• Monitor processor events like cache misses etc.
• View results in source or assembly
Usage Model
• Two components
Intel® VTune™ Amplifier XE on host
Sampling Collector on the target
• Collect data on target and analyze it on the host
The Intel® VTune™ Amplifier XE helps identify
optimization opportunities in modules, functions or routines
Intel® VTune™ Amplifier XE in Embedded
23
Using Intel® VTune™ Amplifier XE Sampling Collector 1. Copy the Sampling Collector for Intel® VTune™ Amplifier XE located at
~/l_MID_DBG_p_2.2.xxx_amplifier_xe/rpm/sep34_linux_ia32.tar.gz
onto the Intel® Atom™ Processor based target device running Linux*
2. On the target device unpack the sampling collector using the following command:
# tar –xvzf sep34_linux_ia32.tar.gz
3. Install, customize, and rebuild sampling driver following instructions in Release_Install_All.pdf
=> New customizable script cc-sep3-driver for SEP cross-build requirements for reduced feature Linux* target OSs (CE Linux, Embedded Linux without core package etc…) This script is only part of the Embedded Software Development Tool Suite SEP Drop
Build custom sampling collector for Yocto Project*
on host, then deploy to target device
24
Sampling - How To Find Hotspots • Pick an event to sample and configure PMU
– Cache misses, branch mis-predictions, Dependency/pipeline stalls
• Start SEP sampling routine and application • Performance Monitoring Unit (PMU) periodically interrupts the processor
– Time based sampling – Event based sampling
Event Counter 1
SEP == ISR PMU
Event Counter 2
Event Counter 3
Event Counter 5
<0
<0
<0
<0
IRQ
Co
un
ter
regi
ster
s Collect
• Execution address in memory (CS:IP)
• OS process and thread ID
• Executable module loaded at that address
Write
• Information into *.TB5 file
• Numbers in counters define sampling rate
Event Counter 4 <0
General Purpose Event Registers Dedicated Event Registers
25
Intel® Debugger for Linux*
Cross-Debug Solution with advanced thread awareness
26
Menu & Toolbars (installation default)
Start / Stop / Run / Step control
Evaluation Windows
ASM, Register, Memory, Vector Register Windows Breakpoint, Call Stack, Thread Windows
27
Two ways of selecting an application to debug Option 1:
LOAD – the application will be loaded and the arguments passed to the app as specified in dialog.
Option 2: ATTACH to a running process by selecting it from the list.
28
Vector registers
29
Vector evaluation
Select and
drag & drop.
30
Assembly window
The ASM window will be displayed automatically if there are no source code available.
Intel style assembly
AT&T style assembly
31
Breakpoint dialog
Code breakpoint
Data breakpoint
32
Multi threading support
33
Control which thread to debug
In the Threads window you can select which thread you would like to debug. The context menu allow you to ‘freeze’ and ‘thaw’ individual threads. When you stop/hit a breakpoint all threads will be stopped and when you continue all threads will run.
34
Summary / Call to Action
• Intel® Embedded Software Development Tool Suite already includes everything needed to support Yocto Linux
• Integration into ADT is simple and automated
• First class performance tools and debug tools from Intel are already available for the Yocto project
• Get a license and give a try
35
Optimization Notice
36
37 3/23/2012