strategies to improve embedded linux application performance beyond ordinary techniques

Post on 25-May-2015

586 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

he common recipe for performance improvement is to profile an application, identify the most time-consuming routines, and finally select them for optimization. Sometimes that is not enough. Developers may have to look inside the OS searching for performance improvement opportunities. Or they might need to optimize code inside a third party library they do not have access to. For those cases, other strategies shall be used. This presentation reports the experiences of Motorola's Brazilian developers reducing the startup time of an application on Motorola's MOTOMAGX embedded Linux platform. Most of the optimization was performed in the binary loading stage, prior to the execution of the entry point function. This endeavor required use of Linux ABI and Linux Loader going beyond typical bottleneck searching. The presentation will cover prelink, dynamic library loading, tuning of shared objects, and enhancing user experience. A live demo will show the use of prelink and other tools to improve performance of general Linux platforms when libraries are used.

TRANSCRIPT

Strategies to improve embedded Linux applications’ performance

beyond ordinary techniques

Anderson Medeiros Software Engineer, Motorola

André Oriani Software Engineer, Motorola

•  The performance problem faced by Motorola’s IM team •  Linux’s Dynamic Loader •  Prelink •  Libraries Tools •  Dymically loading Libraries •  Tuning Shared Libraries •  UI Time Perception •  Q & A

Agenda

Motivation

Our Problem

The basic recipe

Measure

Analyze

Optimize

Our Discover

User clicks fork() main() Screen.show()

t

DYNAMIC LOADER

Linux’s Dynamic Loader

Loading a dynamically linked program

.interp

Load dynamic linker

.dynamic

Dependency libraries

Symbol tables

A

A

Relocation .rel.text .rel.data

Program’s entry point

.init Libraries

A closer look at relocation

Compute offset

Type

Add load address

Relative

Symbol’s hash

Symbol-based

Hash bucket

Match

Chain empty

Lookup scope empty

Next object

Next element

No No

No

Yes

Lookup failed

Yes

Adjust address

Yes

prelink

Motivation

•  Collects ELF binaries which should be prelinked and all the ELF shared libraries they depend on

•  Assigns a unique virtual address space slot for each library and relinks the shared library to that base address

•  Resolves all relocations in the binary or library against its dependant libraries and stores the relocations into the ELF object

•  Stores a list of all dependant libraries together with their checksums into the binary or library

•  For binaries, it also computes a list of conflicts and stores it into a special ELF section

How does prelink work? I

Note: Libraries shall be compiled with the GCC option -fPIC

•  At runtime, the dynamic linker first checks if it is prelinked itself •  Just before starting an application, the dynamic linker checks if:

•  There is a library list section created by prelink •  They are present in symbol search scope in the same order •  None have been modified since prelinking •  There aren’t any new shared libraries loaded either

•  If all conditions are satisfied, prelinking is used: •  Dynamic linker processes the fixup section and skips all normal

relocation handling •  If at least one condition fails:

•  Dynamic linker continues with normal relocation processing in the executable and all shared libraries

How does prelink work? II

Results

t

Library tools

•  prelink –avf --ld-library-path=PATH --dynamic-linker=LDSO •  -a --all

•  Prelink all binaries and dependant libraries found in directory hierarchies specified in /etc/prelink.conf

•  -v --verbose •  Verbose mode. Print the virtual address slot assignment to libraries

•  -f --force •  Force re-prelinking even for already prelinked objects for which no

dependencies changed •  --ld-library-path=PATH

•  Specify special LD_LIBRARY_PATH to be used when prelink queries dynamic linker about symbol resolution details

•  --dynamic-linker=LDSO •  Specify alternate dynamic linker instead of the default

How to use prelink?

Dynamically Loading Libraries

Motivation

Motivation II

If there are any libraries you are going to use only on special occasions, it is better to load them when they are really needed.

The Basics

#include <dlfcn.h>

void* dlopen ( const char* filename, int flags); void* dlsym ( void* handle, const char* symbol); char* dlerror (void); int dlclose (void* handle);

#echo Although you don’t have to link against the library #echo you still have to link against libdl # #gcc main.cpp -ldl -o program

Loading C++ Libraries

C++ uses mangling!

int mod (int a , int b); float mod (float a, float b);

math.cpp

_Z3sumii _Z3sumff

math.o

The example

class Foo { public: Foo(){} ~Foo(){} void bar(const char * msg) {

std::cout<<"Msg:"<<msg<<std::endl; } };

The solution

Step 1 Define an interface for your class.

+ Foo() + ~Foo() + void bar(const char*)

Foo

The solution

Step 1 Define an interface for your class.

+ Foo() + ~Foo() + void bar(const char*)

FooImpl

+ virtual void bar(const char*) = 0

Foo <<interface>>

The solution - Lib’s Header file

Step 1 Define an interface for your class

#ifndef FOO_H__ #define FOO_H__

class Foo { public: virtual void bar (const char*) = 0; };

The solution - Lib’s Header file

Step 2 Create “C functions” to create and destroy instances of your class

Step 3 You might want to create typedefs

extern "C" Foo* createFoo(); extern "C" void destroyFoo(Foo*);

typedef Foo* (*createFoo_t) (); typedef void (*destroyFoo_t)(Foo*);

#endif

The solution - Lib’s Implementation file

Step 4 Implement your interface and “C functions”

#include "foo.h" #include <iostream.h>

class FooImpl:public Foo { public: FooImpl(){} virtual ~FooImpl(){} virtual void bar(const char * msg) { cout<<"Msg: "<<msg<<endl; } };

Foo* createFoo() { return new FooImpl(); }

void destroyFoo(Foo* foo) { FooImpl* fooImpl = static_cast<FooImpl*>(foo); delete fooImpl; }

The solution - The program

#include <foo.h> #include <assert.h> #include <dlfcn.h>

int main() { void* handle = dlopen("./libfoo.so",RTLD_LAZY); assert(handle); createFoo_t dyn_createFoo = (createFoo_t)dlsym(handle,"createFoo"); assert(!dlerror()); Foo* foo = dyn_createFoo(); if(foo) foo->bar("The method bar is being called"); destroyFoo_t dyn_destroyFoo = (destroyFoo_t)dlsym(handle,"destroyFoo"); assert(!dlerror()); dyn_destroyFoo(foo); dlclose(handle); return 0; }

Tunning Shared Libraries

Inspiration

“How To Write Shared Libraries” Ulrich Drepper- Red Hat

http://people.redhat.com/drepper/dsohowto.pdf

Less is always better

Keep at minimum…

•  The number of libraries you directly or indirectly depend •  The size of libraries you link against shall have the smallest size possible •  The number for search directories for libraries, ideally one directory •  The number of exported symbols •  The length of symbols strings •  The numbers of relocations

Search directories for libs

Reducing search space

Step 1 Set LD_LIBRARY_PATH to empty

Step 2 When linking use the options: -rpath-link <dir> to the specify your system’s directory for

libraries -z nodeflib to avoid searching on /lib, /usr/lib and others

places specified by /etc/ld.so.conf and /etc/ld.so.cache

#export LD_LIBRARY_PATH=“” #gcc main.cpp -Wl,-z,nodeflib -Wl,-rpath-link,/lib -lfoo -o program

Reducing exported symbols

Using GCC’s attribute feature

int localVar __attribute__((visibility(“hidden”)));

int localFunction() __attribute__((visibility(“hidden”)));

class Someclass { private: static int a __attribute__((visibility(“hidden”))); int b; int doSomething(int d)__attribute__((visibility (“hidden”)));

public: Someclass(int c); int doSomethingImportant(); };

Reducing exported symbols II

You can tell the linker which symbols shall be exported using export maps

{ global: cFunction*;

extern “C++” { cppFunction*; *Someclass; Someclass::Someclass*; Someclass::?Someclass*; Someclass::method* };

local: *;

};

#g++ -shared example.cpp -o libexample.so.1 -Wl,-soname=libexample.so.1 -Wl,--version-script=example.map

Pro and Cons

Pros

Visibility attribute •  Compiler can generate optimal

code;

Export Maps •  More practical; •  Centralizes the definition of library’s

API;

Cons

Visibility attribute •  GCC’s specific feature; •  Code become less readable;

Export Maps •  No optimization can be done by

compiler because any symbol may be exported

Restricting symbol string’s lenght

namespace java { namespace lang { class Math { static const int PI; static double sin(double d); static double cos(double d); static double FastFourierTransform (double a, int b,const int** const c);

}; } }

_ZN4java4lang4Math2PIE _ZN4java4lang4Math3sinEd _ZN4java4lang4Math3cosEd _ZN4java4lang4Math20FastFourierTransformEdiPPKi

Avoiding relocations

A \0 B C

A \0 B C

.data

.rodata

ELF

char* a = “ABC”;

const char a[] = “ABC”;

UI Time perception

Motivation

X hours to deliver $ to ship

No tracking

X hours to deliver $ to ship

Package tracking

Motivation II

Improving responsiveness

It is not always possible to optimize code because:

•  You might not have access to problematic code; •  It demands too much effort or it is too risky to change it. •  There is nothing you can do (I/O latency, etc…). •  Other reasons ...

Can I postpone ?

loading Plug-Ins …

Can I postpone ?

Loading plug-ins

Can I parallelize?

Can I parallelize?

Sending file…

Can I remove it ?

In conclusion …

•  You learned that libraries may play an important role in the startup performance of your application;

•  You saw how dynamic link works on Linux; •  You were introduce to prelink and and became aware of its potential

to boost the startup; •  You learned how to load a shared object on demand, preventing

that some them be a burden at startup; •  You got some tips on how to write libraries to get the best

performance; •  You understood that an UI that provides quick user feedback is more

important than performance;

Q&A

top related