strategies to improve embedded linux application performance beyond ordinary techniques

50
Strategies to improve embedded Linux applications’ performance beyond ordinary techniques Anderson Medeiros Software Engineer, Motorola André Oriani Software Engineer, Motorola

Upload: andre-oriani

Post on 25-May-2015

586 views

Category:

Technology


0 download

DESCRIPTION

he common recipe for performance improvement is to profile an application, identify the most time-consuming routines, and finally select them for optimization. Sometimes that is not enough. Developers may have to look inside the OS searching for performance improvement opportunities. Or they might need to optimize code inside a third party library they do not have access to. For those cases, other strategies shall be used. This presentation reports the experiences of Motorola's Brazilian developers reducing the startup time of an application on Motorola's MOTOMAGX embedded Linux platform. Most of the optimization was performed in the binary loading stage, prior to the execution of the entry point function. This endeavor required use of Linux ABI and Linux Loader going beyond typical bottleneck searching. The presentation will cover prelink, dynamic library loading, tuning of shared objects, and enhancing user experience. A live demo will show the use of prelink and other tools to improve performance of general Linux platforms when libraries are used.

TRANSCRIPT

Page 1: Strategies to improve embedded Linux application performance beyond ordinary techniques

Strategies to improve embedded Linux applications’ performance

beyond ordinary techniques

Anderson Medeiros Software Engineer, Motorola

André Oriani Software Engineer, Motorola

Page 2: Strategies to improve embedded Linux application performance beyond ordinary techniques

•  The performance problem faced by Motorola’s IM team •  Linux’s Dynamic Loader •  Prelink •  Libraries Tools •  Dymically loading Libraries •  Tuning Shared Libraries •  UI Time Perception •  Q & A

Agenda

Page 3: Strategies to improve embedded Linux application performance beyond ordinary techniques

Motivation

Page 4: Strategies to improve embedded Linux application performance beyond ordinary techniques

Our Problem

Page 5: Strategies to improve embedded Linux application performance beyond ordinary techniques

The basic recipe

Measure

Analyze

Optimize

Page 6: Strategies to improve embedded Linux application performance beyond ordinary techniques

Our Discover

User clicks fork() main() Screen.show()

t

DYNAMIC LOADER

Page 7: Strategies to improve embedded Linux application performance beyond ordinary techniques

Linux’s Dynamic Loader

Page 8: Strategies to improve embedded Linux application performance beyond ordinary techniques

Loading a dynamically linked program

.interp

Load dynamic linker

.dynamic

Dependency libraries

Symbol tables

A

A

Relocation .rel.text .rel.data

Program’s entry point

.init Libraries

Page 9: Strategies to improve embedded Linux application performance beyond ordinary techniques

A closer look at relocation

Compute offset

Type

Add load address

Relative

Symbol’s hash

Symbol-based

Hash bucket

Match

Chain empty

Lookup scope empty

Next object

Next element

No No

No

Yes

Lookup failed

Yes

Adjust address

Yes

Page 10: Strategies to improve embedded Linux application performance beyond ordinary techniques

prelink

Page 11: Strategies to improve embedded Linux application performance beyond ordinary techniques

Motivation

Page 12: Strategies to improve embedded Linux application performance beyond ordinary techniques

•  Collects ELF binaries which should be prelinked and all the ELF shared libraries they depend on

•  Assigns a unique virtual address space slot for each library and relinks the shared library to that base address

•  Resolves all relocations in the binary or library against its dependant libraries and stores the relocations into the ELF object

•  Stores a list of all dependant libraries together with their checksums into the binary or library

•  For binaries, it also computes a list of conflicts and stores it into a special ELF section

How does prelink work? I

Note: Libraries shall be compiled with the GCC option -fPIC

Page 13: Strategies to improve embedded Linux application performance beyond ordinary techniques

•  At runtime, the dynamic linker first checks if it is prelinked itself •  Just before starting an application, the dynamic linker checks if:

•  There is a library list section created by prelink •  They are present in symbol search scope in the same order •  None have been modified since prelinking •  There aren’t any new shared libraries loaded either

•  If all conditions are satisfied, prelinking is used: •  Dynamic linker processes the fixup section and skips all normal

relocation handling •  If at least one condition fails:

•  Dynamic linker continues with normal relocation processing in the executable and all shared libraries

How does prelink work? II

Page 14: Strategies to improve embedded Linux application performance beyond ordinary techniques

Results

t

Page 15: Strategies to improve embedded Linux application performance beyond ordinary techniques

Library tools

Page 16: Strategies to improve embedded Linux application performance beyond ordinary techniques

•  prelink –avf --ld-library-path=PATH --dynamic-linker=LDSO •  -a --all

•  Prelink all binaries and dependant libraries found in directory hierarchies specified in /etc/prelink.conf

•  -v --verbose •  Verbose mode. Print the virtual address slot assignment to libraries

•  -f --force •  Force re-prelinking even for already prelinked objects for which no

dependencies changed •  --ld-library-path=PATH

•  Specify special LD_LIBRARY_PATH to be used when prelink queries dynamic linker about symbol resolution details

•  --dynamic-linker=LDSO •  Specify alternate dynamic linker instead of the default

How to use prelink?

Page 17: Strategies to improve embedded Linux application performance beyond ordinary techniques

Dynamically Loading Libraries

Page 18: Strategies to improve embedded Linux application performance beyond ordinary techniques

Motivation

Page 19: Strategies to improve embedded Linux application performance beyond ordinary techniques

Motivation II

If there are any libraries you are going to use only on special occasions, it is better to load them when they are really needed.

Page 20: Strategies to improve embedded Linux application performance beyond ordinary techniques

The Basics

#include <dlfcn.h>

void* dlopen ( const char* filename, int flags); void* dlsym ( void* handle, const char* symbol); char* dlerror (void); int dlclose (void* handle);

#echo Although you don’t have to link against the library #echo you still have to link against libdl # #gcc main.cpp -ldl -o program

Page 21: Strategies to improve embedded Linux application performance beyond ordinary techniques
Page 22: Strategies to improve embedded Linux application performance beyond ordinary techniques

Loading C++ Libraries

C++ uses mangling!

int mod (int a , int b); float mod (float a, float b);

math.cpp

_Z3sumii _Z3sumff

math.o

Page 23: Strategies to improve embedded Linux application performance beyond ordinary techniques

The example

class Foo { public: Foo(){} ~Foo(){} void bar(const char * msg) {

std::cout<<"Msg:"<<msg<<std::endl; } };

Page 24: Strategies to improve embedded Linux application performance beyond ordinary techniques

The solution

Step 1 Define an interface for your class.

+ Foo() + ~Foo() + void bar(const char*)

Foo

Page 25: Strategies to improve embedded Linux application performance beyond ordinary techniques

The solution

Step 1 Define an interface for your class.

+ Foo() + ~Foo() + void bar(const char*)

FooImpl

+ virtual void bar(const char*) = 0

Foo <<interface>>

Page 26: Strategies to improve embedded Linux application performance beyond ordinary techniques

The solution - Lib’s Header file

Step 1 Define an interface for your class

#ifndef FOO_H__ #define FOO_H__

class Foo { public: virtual void bar (const char*) = 0; };

Page 27: Strategies to improve embedded Linux application performance beyond ordinary techniques

The solution - Lib’s Header file

Step 2 Create “C functions” to create and destroy instances of your class

Step 3 You might want to create typedefs

extern "C" Foo* createFoo(); extern "C" void destroyFoo(Foo*);

typedef Foo* (*createFoo_t) (); typedef void (*destroyFoo_t)(Foo*);

#endif

Page 28: Strategies to improve embedded Linux application performance beyond ordinary techniques

The solution - Lib’s Implementation file

Step 4 Implement your interface and “C functions”

#include "foo.h" #include <iostream.h>

class FooImpl:public Foo { public: FooImpl(){} virtual ~FooImpl(){} virtual void bar(const char * msg) { cout<<"Msg: "<<msg<<endl; } };

Foo* createFoo() { return new FooImpl(); }

void destroyFoo(Foo* foo) { FooImpl* fooImpl = static_cast<FooImpl*>(foo); delete fooImpl; }

Page 29: Strategies to improve embedded Linux application performance beyond ordinary techniques

The solution - The program

#include <foo.h> #include <assert.h> #include <dlfcn.h>

int main() { void* handle = dlopen("./libfoo.so",RTLD_LAZY); assert(handle); createFoo_t dyn_createFoo = (createFoo_t)dlsym(handle,"createFoo"); assert(!dlerror()); Foo* foo = dyn_createFoo(); if(foo) foo->bar("The method bar is being called"); destroyFoo_t dyn_destroyFoo = (destroyFoo_t)dlsym(handle,"destroyFoo"); assert(!dlerror()); dyn_destroyFoo(foo); dlclose(handle); return 0; }

Page 30: Strategies to improve embedded Linux application performance beyond ordinary techniques

Tunning Shared Libraries

Page 31: Strategies to improve embedded Linux application performance beyond ordinary techniques

Inspiration

“How To Write Shared Libraries” Ulrich Drepper- Red Hat

http://people.redhat.com/drepper/dsohowto.pdf

Page 32: Strategies to improve embedded Linux application performance beyond ordinary techniques

Less is always better

Keep at minimum…

•  The number of libraries you directly or indirectly depend •  The size of libraries you link against shall have the smallest size possible •  The number for search directories for libraries, ideally one directory •  The number of exported symbols •  The length of symbols strings •  The numbers of relocations

Page 33: Strategies to improve embedded Linux application performance beyond ordinary techniques

Search directories for libs

Page 34: Strategies to improve embedded Linux application performance beyond ordinary techniques

Reducing search space

Step 1 Set LD_LIBRARY_PATH to empty

Step 2 When linking use the options: -rpath-link <dir> to the specify your system’s directory for

libraries -z nodeflib to avoid searching on /lib, /usr/lib and others

places specified by /etc/ld.so.conf and /etc/ld.so.cache

#export LD_LIBRARY_PATH=“” #gcc main.cpp -Wl,-z,nodeflib -Wl,-rpath-link,/lib -lfoo -o program

Page 35: Strategies to improve embedded Linux application performance beyond ordinary techniques

Reducing exported symbols

Using GCC’s attribute feature

int localVar __attribute__((visibility(“hidden”)));

int localFunction() __attribute__((visibility(“hidden”)));

class Someclass { private: static int a __attribute__((visibility(“hidden”))); int b; int doSomething(int d)__attribute__((visibility (“hidden”)));

public: Someclass(int c); int doSomethingImportant(); };

Page 36: Strategies to improve embedded Linux application performance beyond ordinary techniques

Reducing exported symbols II

You can tell the linker which symbols shall be exported using export maps

{ global: cFunction*;

extern “C++” { cppFunction*; *Someclass; Someclass::Someclass*; Someclass::?Someclass*; Someclass::method* };

local: *;

};

#g++ -shared example.cpp -o libexample.so.1 -Wl,-soname=libexample.so.1 -Wl,--version-script=example.map

Page 37: Strategies to improve embedded Linux application performance beyond ordinary techniques

Pro and Cons

Pros

Visibility attribute •  Compiler can generate optimal

code;

Export Maps •  More practical; •  Centralizes the definition of library’s

API;

Cons

Visibility attribute •  GCC’s specific feature; •  Code become less readable;

Export Maps •  No optimization can be done by

compiler because any symbol may be exported

Page 38: Strategies to improve embedded Linux application performance beyond ordinary techniques

Restricting symbol string’s lenght

namespace java { namespace lang { class Math { static const int PI; static double sin(double d); static double cos(double d); static double FastFourierTransform (double a, int b,const int** const c);

}; } }

_ZN4java4lang4Math2PIE _ZN4java4lang4Math3sinEd _ZN4java4lang4Math3cosEd _ZN4java4lang4Math20FastFourierTransformEdiPPKi

Page 39: Strategies to improve embedded Linux application performance beyond ordinary techniques

Avoiding relocations

A \0 B C

A \0 B C

.data

.rodata

ELF

char* a = “ABC”;

const char a[] = “ABC”;

Page 40: Strategies to improve embedded Linux application performance beyond ordinary techniques

UI Time perception

Page 41: Strategies to improve embedded Linux application performance beyond ordinary techniques

Motivation

X hours to deliver $ to ship

No tracking

X hours to deliver $ to ship

Package tracking

Page 42: Strategies to improve embedded Linux application performance beyond ordinary techniques

Motivation II

Page 43: Strategies to improve embedded Linux application performance beyond ordinary techniques

Improving responsiveness

It is not always possible to optimize code because:

•  You might not have access to problematic code; •  It demands too much effort or it is too risky to change it. •  There is nothing you can do (I/O latency, etc…). •  Other reasons ...

Page 44: Strategies to improve embedded Linux application performance beyond ordinary techniques

Can I postpone ?

loading Plug-Ins …

Page 45: Strategies to improve embedded Linux application performance beyond ordinary techniques

Can I postpone ?

Loading plug-ins

Page 46: Strategies to improve embedded Linux application performance beyond ordinary techniques

Can I parallelize?

Page 47: Strategies to improve embedded Linux application performance beyond ordinary techniques

Can I parallelize?

Sending file…

Page 48: Strategies to improve embedded Linux application performance beyond ordinary techniques

Can I remove it ?

Page 49: Strategies to improve embedded Linux application performance beyond ordinary techniques

In conclusion …

•  You learned that libraries may play an important role in the startup performance of your application;

•  You saw how dynamic link works on Linux; •  You were introduce to prelink and and became aware of its potential

to boost the startup; •  You learned how to load a shared object on demand, preventing

that some them be a burden at startup; •  You got some tips on how to write libraries to get the best

performance; •  You understood that an UI that provides quick user feedback is more

important than performance;

Page 50: Strategies to improve embedded Linux application performance beyond ordinary techniques

Q&A