open source kan sneller, the startlog study
TRANSCRIPT
(C)
AL
bert
Mie
tus,
PT
SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study1
OpenSource kan sneller!Historical, Linux is used on systems where boot-time is not relevant.With EQSLinux it has become easy to start using embedded Linux.Now, we want to make to make Linux boot faster!
But,Linux is complex ...It always has a thousand solutions ...And we even are not sure about the problem ...
So,How to solve it?
Time to investigate, to study, ...
(C)
AL
bert
Mie
tus,
PT
SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study2
Today1. Is there a problem?
Linux is a slow starter...
• How does Linux boot?Some theory
and a few quick wins...
• How to measure?We need to KNOWKNOW, not hope or guessMeasure, change, re-measure and compare!
• Results and solutionsNot only PTS’ EQSLinux,
but for all (embedded) Linux-systems!
(C)
AL
bert
Mie
tus,
PT
SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study3
Typical Linux ...Linux is never designed for embedded systems
Unix/Linux used to be a ‘server’Boot-time is not important, flexibility is
Linux@theDesktop is ‘hot’People get used to wait for computersThere is a lot of ‘user-space’ waiting
‘Building the desktop’ takes a lot more then system-boot
Example: Booting one of our development-systems takes143143 seconds!
☞ suse-9.1, with lots of HW
On most embedded systems, that is not acceptableWe are used to millimillisecond, not thousands of them!
(C)
AL
bert
Mie
tus,
PT
SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study4
Booting Linux is complexFlexibility
Unix-brands, Linux-distributions, SystemUse☞E.g. Banking versus SW-development
Hardware, Systems
Legacy RunLevels, Compatibility
Lack of design-vision Background & know-how of developers
Some developers have other ideas, or “enrich” (don’t understand) the concept
Focus on ‘C’, ‘desktop’, ‘new’No time left for: Makefiles, engineering, booting, embedded
(C)
AL
bert
Mie
tus,
PT
SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study5
Booting Linux Systems
POST(bios)
BOOT(grub)
Kernel(vmlinuz)
modules(*.ko)
Start-up(rc-scripts)
RE
AD
Y(lo
gin/
appl
icat
ion)
time
UP
-nes
s (%
)
There are 5 phasesWith several sub-phases; some of them are time-outs!And thousands of steps
example development system :143 seconds !
(C)
AL
bert
Mie
tus,
PT
SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study6
Booting Linux Systems
POST(bios)
BOOT(grub)
Kernel(vmlinuz)
modules(*.ko)
Start-up(rc-scripts)
RE
AD
Y(lo
gin
/app
licat
ion)
time
UP
-nes
s (%
)
There are 5 phasesWith several sub-phases; some of them are time-outs!And thousands of steps
example development system :143 seconds !
(C)
AL
bert
Mie
tus,
PT
SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study7
How to measure?Reliable measuring the boot-steps is difficult
All is SW-only. So, no external (scope) measurement Multiple domains, HW, SW & versions
• E.g. bios; grub, uboot, redboot; Linux-kernel-2.*.*; scripts• Instrumenting is difficult
– Can’t change BIOS, hardly can change ‘BOOT’ (there is no space!)– Lot’s & lots of code in Linux-system; fork() is architecture-dependent
Time-resolution: seconds or micro-seconds?• Some steps take less then 1 milli-second!• Every statement takes time! a_time() takes to much!• We need to see the ‘big picture’; not only details
Note: monitoring 143s in ms-resolution is (about) 500-meters of print-out!!
(C)
AL
bert
Mie
tus,
PT
SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study8
StartLogConcept
Make it fast• Capture data in real-time, transfer later• Process data off-line
Make it simple• Minimize the changes (simple to port to every Linux)• Both for kernel, modules and start-up scripts
Make it reliable• Measure the measurement• Measure, change Linux, re-measure AND compare!
Result: StartLog (patch & post-processing)
(C)
AL
bert
Mie
tus,
PT
SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study9
Details: patchLog system-call ‘exec’
Start a new program☞ It will miss ' source *.sh '
Filename is availableArch independent (Linux)
Use ‘dmesg’ storage It’s availableEasy to read Increase size ! (Watch for build-bug)
Use ‘jiffies’ for timeCounter (long unsigned integer)
Can wrap; random startAbout 1 ms (most systems)
do_execve(... filename ...) {
static int PTS_startlog=0; // int PTS_i;
printk("@PTS@startlog=%07d,jiffies=%012lu, do_execve(%s)",
PTS_startlog++, jiffies, filename);/* //Optional: for (PTS_i =0; PTS_i < 9; PTS_i++) printk("%s,", argv[PTS_i] ); printk(” ; "); for (PTS_i = 0; PTS_i < 9; PTS_i++) printk("%s,", envp[PTS_i] ); printk(")\n”);*/...}
(C)
AL
bert
Mie
tus,
PT
SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study10
Details: processing & QualityStore Log
dmesg -s 512000 > aFile ftp/scp to host
Off-lineSome awk script
• Filter, recount jiffies, csv-formatExcel macros & graphing
• X-Y (scatter) for timeline• Histogram (bar), #calls• Pie, for bottleneck
Customisable It is easy to adapt to find
each problem.
Influence (real-time) Each log: 200-300 µsec
☞Depending on CPU! Near linear slowdown
Especially for interesting partsAccurate for slow steps!
Repeatability Variation ∆jiffies: <5 (95%)
Double, triple log for check Use series of three or more
Example:When flashing memory, the first boot always takes a few seconds somewhere !
EnvironmentWatch it! It has effect on boot-speedE.g. dhcp timeout!
(C)
AL
bert
Mie
tus,
PT
SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study11
Some Results
The following sheets show some results. They show what CAN be (& is) measured Some general examples are selected
• Measurements are very project specific• And are –often– very boring for others
Often, POST and BOOT phases excluded• Not generic, not changeable,
Not measured with StartLogStartLog (but can be measured and added to graphs!)
For that reason, details are not explainedPlease contact me directly/offline for your specific questions
All times (numbers) in jiffies
All systems are NON-optimized!
(C)
AL
bert
Mie
tus,
PT
SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study12
A first impressionBoot and clock-time does give some info, but
Only limited information• No real phases
But close, for a 1st look
• What to improve?• Hard to explain
(where) is the system Busy (‘overloaded’), or
Waiting?
We need more detail ! And concentrate on
• Linux/OpenSource parts• Delays (system is doing nothing)
Global
0
10
20
30
40
50
60
70
80
Systeem "on" Boot choise First log Ready (login)
Phase
Seconds
Trident
Trident (zonder netwerk)
Gumstix
Gumstix (zonder netwerk)
Start (HDD)
Start (HDD)(zonder netwerk)
Start
Start (zonder netwerk)
(C)
AL
bert
Mie
tus,
PT
SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study13
0
100
200
300
0 1000 2000 3000 4000Time (jiffies)
starting-process-no
Gumstix (typical)
Gumstix (networktimeout)
Some timelines
Gumstix 2 timelines Exactly the same, but for
• the horizontal line; which is
• a network timeout (See next sheet)A lot less processes are started: Only look to % of total number! It uses less modules than above
0
500
1000
1500
2000
0 5000 10000 15000 20000 25000 30000 35000Time (jiffies)
starting-process-no
EmbeddedPC-1 (start-1)
EmbeddedPC-2 (start-3)
EmbeddedPC-3 (HD ipv CF)
EmbeddedPC-4 (TR-1)
Embedded PC 4 timelines
Similar but differentNotice: Speed:
Fastest: yellowSlowest: purple
Horizontal lines:No progress!
(C)
AL
bert
Mie
tus,
PT
SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study14
Bottleneck && Network timeout
Normal (top)Kernel 23%Modules 35%Networking 28%*-mount 10%
S01 + S10 + S11
Network Timeout (bottom) Times are equal, but for S20networkingS20networking
When the dhcp-server is gone(no network, cable or server)
It takes an extra 1616 secondsto boot
/etc/rcS.d/S01mountvirtfs/etc/rcS.d/S05module-init-tools
/etc/rcS.d/S10mountall/etc/rcS.d/S11RAMdisk
/etc/rcS.d/S15hostname/etc/rcS.d/S17sysklogd
/etc/rcS.d/S20networking
/etc/rcS.d/S25cron/etc/rcS.d/S30thttpd
Kernel
Gumstix-NetworkTimeout
613
34
16
26
2091
22
22
147
27
396
109
Gumstix
109
613
34
16
26
487
22
25
396
150
27
(C)
AL
bert
Mie
tus,
PT
SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study15
Program Count
0
20
40
60
80
100
120
140
/sbin/thttpd/bin/egrep
/etc/rcS.d/S15hostname/etc/rcS.d/S01mountvirtfs
/usr/sbin/modprobe
/bin/grep/usr/bin/expr
/bin/mount/etc/init.d/rcS
/sbin/modprobe/bin/uname
/bin/login
/etc/rcS.d/S11RAMdisk
/sbin/dhcpcd/sbin/hotplug/bin/bash
/bin/sh
/etc/rcS.d/S25cron
/etc/rcS.d/S05module-init-tools
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
count programs
log of count
0
200
400
600
800
1000
1200
1400
/Config/run/dhcp/dhcpcd.exe
/bin/dmesg/bin/egrep
/bin/hostname/bin/mount/bin/uname
/etc/rcS.d/S01mountvirtfs/etc/rcS.d/S10mountall/etc/rcS.d/S15hostname/etc/rcS.d/S20networking
/etc/rcS.d/S30thttpd
/sbin/getty/sbin/ifconfig/sbin/modprobe
/sbin/syslogd/usr/bin/[/usr/bin/expr/usr/bin/utelnetd/usr/sbin/crond
/usr/sbin/modprobe
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5Count(programm)
Log of count
Often-exec’ed programs are good candidates to optimize
Found some surprises• ‘hotplug’ is called directly by
the kernel• Even when it does not exists!• Called 123 (aside)
to 1315 (below) times!Most: called a few times
Some: an awful lot☞ Use logarithmic 2nd axis
(C)
AL
bert
Mie
tus,
PT
SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study16
Summary (1/4)Problems & Solutions
Linux is a slow starter It needs more attention then a traditional RTOS There are thousand of ‘improvements’ on the Net
• Google on ‘make embedded Linux boot faster’1.4 million hits, of which
198 PowerPoint presentations in past 3 months (excluding this one)
• Usually they are (more or less ‘good’) ideas But, what do they improve? Or change?
• Does it apply to your HW, system, version, ... too? Do you know your bottleneck?
• How to measure that improvement?
(C)
AL
bert
Mie
tus,
PT
SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study17
Summary (2/4)Booting Linux
Flexibility & Power do come with a cost• Embedded Linux boots a lot faster then ‘normal Linux’
There is much more but ‘the LinuxLinux kernel’• At least 5 phases• Thousands of steps
The ‘environment’ has influence• dhcp example: 16 extra seconds without networking!
Linux is OpenSource ... You can change it! There is more OpenSource then ‘Linux’ only
• Non-kernel stuff; other OS (both Unix-alike and others)
(C)
AL
bert
Mie
tus,
PT
SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study18
Summary (3/4)StartLog
Capture, measure and visualize the Linux start-up• Simple, reliable, repeatable
Cheap• It is a concept, with little, free code• Easy and fast to operate
– For interpretation Linux know-how is neededUseable on all Linux versions
• So you can improve your system!
Improve ‘Measure, pin-point, change, re-measure’
(C)
AL
bert
Mie
tus,
PT
SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study19
Summary (4/4)Generic Quick Wins
Ø Disable timeoutsØ Disable/remove unneeded kernel-modulesØ Trade-off time/space
Uncompressed images are (usually) faster!
Advice Make specific (non-generic) boot-scripts Use delayed/background processing
E.g. Start network (dhcp) late, background fsck (BSD only)
Measure and compare what is going on!
(C)
AL
bert
Mie
tus,
PT
SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study20
Questions and More infoAlthough the sheets are overloaded with info,
it’s only a fraction of what’s available.More info:
Most patches & scripts are available This presentation is available
– See the ‘note-pages’! (Print hidden sheets!) See the website(s) for the latest versions.
Questions:http://www.PTS.nl http://www.EQSL.PTS.nl
☎ 035 6926969 [email protected]://albert.mietus.nl [email protected]://www.PassieVoorTechniek.nl