![Page 1: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/1.jpg)
Ineffective and effective way to find out
latency bottlenecks by Ftrace
16 Feb, 2012
Yoshitake KobayashiAdvanced Software Technology Group
Corporate Software Engineering CenterTOSHIBA CORPORATION
Copyright 2012, Toshiba Corporation.
![Page 2: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/2.jpg)
Agenda
� About Ftrace
� Problem definition
� Some actual examples to fix latency issue
� Conclusion
2Yoshitake Kobayashi - Embedded Linux Conference 2012 -
![Page 3: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/3.jpg)
About Ftrace
� Ftrace is a lightweight internal tracer� Event tracer
� Function tracer
� Latency tracer
� Stack tracer
� The trace log stay in the kernel ring buffer
3Yoshitake Kobayashi - Embedded Linux Conference 2012 -
� The trace log stay in the kernel ring buffer
� Documentation available in kernel source tree� Documentation/trace/ftrace.txt
� Documentation/trace/ftrace-design.txt
![Page 4: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/4.jpg)
About Ftrace
� Ftrace is a lightweight internal tracer� Event tracer
� Function tracer
� Latency tracer
� Stack tracer
� The trace log stay in the kernel ring buffer
4Yoshitake Kobayashi - Embedded Linux Conference 2012 -
� The trace log stay in the kernel ring buffer
� Documentation available in kernel source tree� Documentation/trace/ftrace.txt
� Documentation/trace/ftrace-design.txt
![Page 5: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/5.jpg)
Definition of latency
� Latency is a measure of time delay experienced in a system, the precise definition of which depends on the system and the time being measured. Latencies may have different meaning in different contexts.
* http://en.wikipedia.org/wiki/Latency_(engineering)
� In this presentation
5Yoshitake Kobayashi - Embedded Linux Conference 2012 -
time
expected timeto wake up
actual timeto wake up
latency
![Page 6: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/6.jpg)
Problem definition
� Evaluate the scheduling latency by Cyclictest*…. but
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
0 200 400 600 800 1000
latency(μ秒)
回数
Latency (us)
Num
ber
of s
ampl
es
23ms
Why?
6Yoshitake Kobayashi - Embedded Linux Conference 2012 -
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
latency (us)
Num
ber
of s
ampl
es
Why?
*Cyclictest (RTwiki): https://rt.wiki.kernel.org/articles/c/y/c/Cyclictest.html
![Page 7: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/7.jpg)
Problem definition
� Evaluate the scheduling latency by Cyclictest*…. but
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
0 200 400 600 800 1000
latency(μ秒)
回数
Latency (us)
Num
ber
of s
ampl
es
23ms
Why?
7Yoshitake Kobayashi - Embedded Linux Conference 2012 -
� Need to identify the cause of above problems
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
latency (us)
Num
ber
of s
ampl
es
Why?
*Cyclictest (RTwiki): https://rt.wiki.kernel.org/articles/c/y/c/Cyclictest.html
![Page 8: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/8.jpg)
Problem definition
� Evaluate the scheduling latency by Cyclictest*…. but
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
0 200 400 600 800 1000
latency(μ秒)
回数
Latency (us)
Num
ber
of s
ampl
es
23ms
Why?
Case 1
Case 2
8Yoshitake Kobayashi - Embedded Linux Conference 2012 -
� Need to identify the cause of above issues
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
latency (us)
Num
ber
of s
ampl
es
Why?
Case 2
*Cyclictest (RTwiki): https://rt.wiki.kernel.org/articles/c/y/c/Cyclictest.html
![Page 9: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/9.jpg)
First mistake
� Incorrect use of trace-cmd
# trace-cmd start –p function_graph ; target_prog ; trace-cmd stop
� The target_prog start with higher priority and stop if it missed
9Yoshitake Kobayashi - Embedded Linux Conference 2012 -
deadline
![Page 10: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/10.jpg)
First mistake
� Incorrect use of trace-cmd
# iostress ‘50%’ &
# trace-cmd start –p function_graph ; target_prog ; trace-cmd stop
� The target_prog start with higher priority and stop if it missed
SCHED_OTHER SCHED_FIFO SCHED_OTHER
10Yoshitake Kobayashi - Embedded Linux Conference 2012 -
deadline
� Ftrace record the logs in the kernel ring buffer� The older data just overwritten by new data
� Evaluate with too much workload is not a good idea
� Need to care about scheduling priority� Run a shell with higher priority
� Stop logging in the target_prog
![Page 11: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/11.jpg)
An example for stop tracing (Cyclictest)
� rt-tests/src/cyclictest/cyclictest.c
� Cyclictest has following option“-b USEC” : send break trace command when latency > USEC
if (!stopped && tracelimit && (diff > tracelimit)) {stopped++;tracing(0);shutdown++;
11Yoshitake Kobayashi - Embedded Linux Conference 2012 -
shutdown++;pthread_mutex_lock(&break_thread_id_lock);if (break_thread_id == 0)
break_thread_id = stat->tid;break_thread_value = diff;pthread_mutex_unlock(&break_thread_id_lock);
}
![Page 12: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/12.jpg)
Case1: Evaluation environment
� CPU: Intel Pentium 4(2.66GHz)
� Memory: 512MB
� Kernel: Linux 2.6.31.12-rt21� echo -1 > /proc/sys/kernel/sched_rt_runtime_us
12Yoshitake Kobayashi - Embedded Linux Conference 2012 -
![Page 13: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/13.jpg)
Case1: Summary of evaluation result
� Evaluated without other CPU work load Cycle: 300us
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
0 200 400 600 800 1000
latency(μ秒)
回数
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
0 200 400 600 800 1000
latency(μ秒)
回数
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
0 200 400 600 800 1000
latency(μ秒)
回数
Cycle: 500us Cycle: 1000us
Latency (us)
Num
ber
of s
ampl
es
Latency (us)
Num
ber
of s
ampl
es
Latency (us)
Num
ber
of s
ampl
es13Yoshitake Kobayashi - Embedded Linux Conference 2012 -
Cycle [us]
Min.Ave. Max Number of DL
misses
300 18.118 19.162 25.629 0500 18.171 19.269 32.615 01000 17.935 19.361 27207.563 1
![Page 14: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/14.jpg)
Case1: Check the trace log (First attempt)
� Stop the Cyclictest if the evaluated latency is higher than the given maximum� Backward search for sys_clock_gettime() function
� Check the log for sys_rt_sigtimedwait() function
5586.207717 | 0) | 5586.207717 | 0) | 5586.207717 | 0) | 5586.207717 | 0) | activate_task() {activate_task() {activate_task() {activate_task() {
5586.207718 | 0) | enqueue_task() {5586.207718 | 0) | enqueue_task() {5586.207718 | 0) | enqueue_task() {5586.207718 | 0) | enqueue_task() {
5586.207718 | 0) | enqueue_task_rt() {5586.207718 | 0) | enqueue_task_rt() {5586.207718 | 0) | enqueue_task_rt() {5586.207718 | 0) | enqueue_task_rt() {
5586.207718 | 0) | cpupri_set() {5586.207718 | 0) | cpupri_set() {5586.207718 | 0) | cpupri_set() {5586.207718 | 0) | cpupri_set() {
14Yoshitake Kobayashi - Embedded Linux Conference 2012 -
5586.207719 | 0) 0.407 us | _atomic_spin_lock_irqsave();5586.207719 | 0) 0.407 us | _atomic_spin_lock_irqsave();5586.207719 | 0) 0.407 us | _atomic_spin_lock_irqsave();5586.207719 | 0) 0.407 us | _atomic_spin_lock_irqsave();
5586.207720 | 0) 0.448 us | _atomic_spin_unlock_irqrestore();5586.207720 | 0) 0.448 us | _atomic_spin_unlock_irqrestore();5586.207720 | 0) 0.448 us | _atomic_spin_unlock_irqrestore();5586.207720 | 0) 0.448 us | _atomic_spin_unlock_irqrestore();
5586.207721 | 0) 0.430 us | _atomic_spin_lock_irqsave();5586.207721 | 0) 0.430 us | _atomic_spin_lock_irqsave();5586.207721 | 0) 0.430 us | _atomic_spin_lock_irqsave();5586.207721 | 0) 0.430 us | _atomic_spin_lock_irqsave();
5586.207722 | 0) 0.444 us | _atomic_spin_unlock_irqrestore();5586.207722 | 0) 0.444 us | _atomic_spin_unlock_irqrestore();5586.207722 | 0) 0.444 us | _atomic_spin_unlock_irqrestore();5586.207722 | 0) 0.444 us | _atomic_spin_unlock_irqrestore();
5586.207723 | 0) 4.391 us | }5586.207723 | 0) 4.391 us | }5586.207723 | 0) 4.391 us | }5586.207723 | 0) 4.391 us | }
5586.207723 | 0) 0.394 us | update_rt_migration();5586.207723 | 0) 0.394 us | update_rt_migration();5586.207723 | 0) 0.394 us | update_rt_migration();5586.207723 | 0) 0.394 us | update_rt_migration();
5586.207724 | 0) 6.158 us | }5586.207724 | 0) 6.158 us | }5586.207724 | 0) 6.158 us | }5586.207724 | 0) 6.158 us | }
5586.2077255586.2077255586.2077255586.207725 | 0) 7.040 us | }| 0) 7.040 us | }| 0) 7.040 us | }| 0) 7.040 us | }
5586.2358355586.2358355586.2358355586.235835 | 0) ! 28117.53 us | | 0) ! 28117.53 us | | 0) ! 28117.53 us | | 0) ! 28117.53 us | }}}}
5586.235836 | 0) | check_preempt_curr_rt() {5586.235836 | 0) | check_preempt_curr_rt() {5586.235836 | 0) | check_preempt_curr_rt() {5586.235836 | 0) | check_preempt_curr_rt() {
5586.235836 | 0) 0.415 us | resched_task();5586.235836 | 0) 0.415 us | resched_task();5586.235836 | 0) 0.415 us | resched_task();5586.235836 | 0) 0.415 us | resched_task();
5586.235837 | 0) 1.397 us | }5586.235837 | 0) 1.397 us | }5586.235837 | 0) 1.397 us | }5586.235837 | 0) 1.397 us | }
Here!
![Page 15: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/15.jpg)
Case1: Check the kernel source (First attempt)
� The activate_task() defined in kernel/sched.c
static void static void static void static void inc_nr_runninginc_nr_runninginc_nr_runninginc_nr_running(struct rq *rq)(struct rq *rq)(struct rq *rq)(struct rq *rq)
{{{{
rqrqrqrq---->nr_running++;>nr_running++;>nr_running++;>nr_running++;
}}}}
・・・・
・・・・
・・・・
static voidstatic voidstatic voidstatic void
activate_taskactivate_taskactivate_taskactivate_task(struct rq *rq, struct task_struct *p, int (struct rq *rq, struct task_struct *p, int (struct rq *rq, struct task_struct *p, int (struct rq *rq, struct task_struct *p, int
Only an increment instruction
15Yoshitake Kobayashi - Embedded Linux Conference 2012 -
activate_taskactivate_taskactivate_taskactivate_task(struct rq *rq, struct task_struct *p, int (struct rq *rq, struct task_struct *p, int (struct rq *rq, struct task_struct *p, int (struct rq *rq, struct task_struct *p, int
wakeup, bool head) wakeup, bool head) wakeup, bool head) wakeup, bool head)
{{{{
if (task_contributes_to_load(p))if (task_contributes_to_load(p))if (task_contributes_to_load(p))if (task_contributes_to_load(p))
rqrqrqrq---->nr_uninterruptible>nr_uninterruptible>nr_uninterruptible>nr_uninterruptible--------;;;;
enqueue_task(rq, p, wakeup, head);enqueue_task(rq, p, wakeup, head);enqueue_task(rq, p, wakeup, head);enqueue_task(rq, p, wakeup, head);
inc_nr_runninginc_nr_runninginc_nr_runninginc_nr_running(rq);(rq);(rq);(rq);
}}}}
Here!
![Page 16: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/16.jpg)
Case1: Check the trace log (Second attempt)
� Result of the function graph tracer
184.976213 | 0) 0.404 us | pre_schedule_rt();184.976213 | 0) 0.404 us | pre_schedule_rt();184.976213 | 0) 0.404 us | pre_schedule_rt();184.976213 | 0) 0.404 us | pre_schedule_rt();
184.976214 | 0) | put_prev_task_rt() {184.976214 | 0) | put_prev_task_rt() {184.976214 | 0) | put_prev_task_rt() {184.976214 | 0) | put_prev_task_rt() {
184.976214184.976214184.976214184.976214 | 0) | | 0) | | 0) | | 0) | update_curr_rt() {update_curr_rt() {update_curr_rt() {update_curr_rt() {
185.004323185.004323185.004323185.004323 | 0) 0.537 us | sched_avg_update();| 0) 0.537 us | sched_avg_update();| 0) 0.537 us | sched_avg_update();| 0) 0.537 us | sched_avg_update();
185.004324 | 0) ! 28109.76 us | 185.004324 | 0) ! 28109.76 us | 185.004324 | 0) ! 28109.76 us | 185.004324 | 0) ! 28109.76 us | }}}}
185.004324 | 0) ! 28110.62 us | }185.004324 | 0) ! 28110.62 us | }185.004324 | 0) ! 28110.62 us | }185.004324 | 0) ! 28110.62 us | }
185.004325 | 0) 0.496 us | pick_next_task_rt();185.004325 | 0) 0.496 us | pick_next_task_rt();185.004325 | 0) 0.496 us | pick_next_task_rt();185.004325 | 0) 0.496 us | pick_next_task_rt();
185.004326 | 0) 0.442 us | perf_counter_task_sched_out();185.004326 | 0) 0.442 us | perf_counter_task_sched_out();185.004326 | 0) 0.442 us | perf_counter_task_sched_out();185.004326 | 0) 0.442 us | perf_counter_task_sched_out();
185.004327 | 0) 0.434 us | native_load_sp0();185.004327 | 0) 0.434 us | native_load_sp0();185.004327 | 0) 0.434 us | native_load_sp0();185.004327 | 0) 0.434 us | native_load_sp0();
185.004328 | 0) 0.465 us | native_load_tls();185.004328 | 0) 0.465 us | native_load_tls();185.004328 | 0) 0.465 us | native_load_tls();185.004328 | 0) 0.465 us | native_load_tls();
Here!
16Yoshitake Kobayashi - Embedded Linux Conference 2012 -
185.004328 | 0) 0.465 us | native_load_tls();185.004328 | 0) 0.465 us | native_load_tls();185.004328 | 0) 0.465 us | native_load_tls();185.004328 | 0) 0.465 us | native_load_tls();
![Page 17: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/17.jpg)
Case1: Check the kernel source (Second attempt)
� Update_curr_rt() defined in kernel/sched.cstatic void static void static void static void update_curr_rtupdate_curr_rtupdate_curr_rtupdate_curr_rt(struct rq *rq)(struct rq *rq)(struct rq *rq)(struct rq *rq)
{{{{
struct task_struct *curr = rqstruct task_struct *curr = rqstruct task_struct *curr = rqstruct task_struct *curr = rq---->curr;>curr;>curr;>curr;
struct sched_rt_entity *rt_se = &currstruct sched_rt_entity *rt_se = &currstruct sched_rt_entity *rt_se = &currstruct sched_rt_entity *rt_se = &curr---->rt;>rt;>rt;>rt;
struct rt_rq *rt_rq = rt_rq_of_se(rt_se);struct rt_rq *rt_rq = rt_rq_of_se(rt_se);struct rt_rq *rt_rq = rt_rq_of_se(rt_se);struct rt_rq *rt_rq = rt_rq_of_se(rt_se);
u64 delta_exec;u64 delta_exec;u64 delta_exec;u64 delta_exec;
if (!task_has_rt_policy(curr))if (!task_has_rt_policy(curr))if (!task_has_rt_policy(curr))if (!task_has_rt_policy(curr))
return;return;return;return;
delta_exec = rqdelta_exec = rqdelta_exec = rqdelta_exec = rq---->clock >clock >clock >clock ---- currcurrcurrcurr---->se.exec_start;>se.exec_start;>se.exec_start;>se.exec_start;
if (unlikely((s64)delta_exec < 0))if (unlikely((s64)delta_exec < 0))if (unlikely((s64)delta_exec < 0))if (unlikely((s64)delta_exec < 0))
17Yoshitake Kobayashi - Embedded Linux Conference 2012 -
if (unlikely((s64)delta_exec < 0))if (unlikely((s64)delta_exec < 0))if (unlikely((s64)delta_exec < 0))if (unlikely((s64)delta_exec < 0))
delta_exec = 0;delta_exec = 0;delta_exec = 0;delta_exec = 0;
schedstat_set(currschedstat_set(currschedstat_set(currschedstat_set(curr---->se.exec_max, max(curr>se.exec_max, max(curr>se.exec_max, max(curr>se.exec_max, max(curr---->se.exec_max, delta_exec));>se.exec_max, delta_exec));>se.exec_max, delta_exec));>se.exec_max, delta_exec));
currcurrcurrcurr---->se.sum_exec_runtime += delta_exec;>se.sum_exec_runtime += delta_exec;>se.sum_exec_runtime += delta_exec;>se.sum_exec_runtime += delta_exec;
account_group_exec_runtime(curr, delta_exec);account_group_exec_runtime(curr, delta_exec);account_group_exec_runtime(curr, delta_exec);account_group_exec_runtime(curr, delta_exec);
currcurrcurrcurr---->se.exec_start = rq>se.exec_start = rq>se.exec_start = rq>se.exec_start = rq---->clock;>clock;>clock;>clock;
cpuacct_charge(curr, delta_exec);cpuacct_charge(curr, delta_exec);cpuacct_charge(curr, delta_exec);cpuacct_charge(curr, delta_exec);
sched_rt_avg_update(rq, delta_exec);sched_rt_avg_update(rq, delta_exec);sched_rt_avg_update(rq, delta_exec);sched_rt_avg_update(rq, delta_exec);
Maybe here!But different
result.
![Page 18: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/18.jpg)
Case1: Check the trace log (Third attempt)
641.941738 | 0) | schedule() {641.941738 | 0) | schedule() {641.941738 | 0) | schedule() {641.941738 | 0) | schedule() {
641.941738 | 0) | 641.941738 | 0) | 641.941738 | 0) | 641.941738 | 0) | __schedule() {__schedule() {__schedule() {__schedule() {
641.941739 | 0) 0.421 us | rcu_qsctr_inc();641.941739 | 0) 0.421 us | rcu_qsctr_inc();641.941739 | 0) 0.421 us | rcu_qsctr_inc();641.941739 | 0) 0.421 us | rcu_qsctr_inc();
・・・・・・・・・・・・
641.941751 | 0) | pre_schedule_rt() {641.941751 | 0) | pre_schedule_rt() {641.941751 | 0) | pre_schedule_rt() {641.941751 | 0) | pre_schedule_rt() {
641.941752 | 0) 0.418 us | pull_rt_task();641.941752 | 0) 0.418 us | pull_rt_task();641.941752 | 0) 0.418 us | pull_rt_task();641.941752 | 0) 0.418 us | pull_rt_task();
641.941752 | 0) 1.281 us | }641.941752 | 0) 1.281 us | }641.941752 | 0) 1.281 us | }641.941752 | 0) 1.281 us | }
641.941753 | 0) | put_prev_task_rt() {641.941753 | 0) | put_prev_task_rt() {641.941753 | 0) | put_prev_task_rt() {641.941753 | 0) | put_prev_task_rt() {
641.941753 | 0) | update_curr_rt() {641.941753 | 0) | update_curr_rt() {641.941753 | 0) | update_curr_rt() {641.941753 | 0) | update_curr_rt() {
641.941754 | 0) 0.426 us | sched_avg_update();641.941754 | 0) 0.426 us | sched_avg_update();641.941754 | 0) 0.426 us | sched_avg_update();641.941754 | 0) 0.426 us | sched_avg_update();
641.941755 | 0) 1.310 us | }641.941755 | 0) 1.310 us | }641.941755 | 0) 1.310 us | }641.941755 | 0) 1.310 us | }
641.941755 | 0) 2.178 us | }641.941755 | 0) 2.178 us | }641.941755 | 0) 2.178 us | }641.941755 | 0) 2.178 us | }
641.941756 641.941756 641.941756 641.941756 | 0) 0.423 us | pick_next_task_fair();| 0) 0.423 us | pick_next_task_fair();| 0) 0.423 us | pick_next_task_fair();| 0) 0.423 us | pick_next_task_fair();
18Yoshitake Kobayashi - Embedded Linux Conference 2012 -
641.941756 641.941756 641.941756 641.941756 | 0) 0.423 us | pick_next_task_fair();| 0) 0.423 us | pick_next_task_fair();| 0) 0.423 us | pick_next_task_fair();| 0) 0.423 us | pick_next_task_fair();
641.969863641.969863641.969863641.969863 | 0) 0.839 us | pick_next_task_rt();| 0) 0.839 us | pick_next_task_rt();| 0) 0.839 us | pick_next_task_rt();| 0) 0.839 us | pick_next_task_rt();
641.969864 | 0) 0.422 us | pick_next_task_fair();641.969864 | 0) 0.422 us | pick_next_task_fair();641.969864 | 0) 0.422 us | pick_next_task_fair();641.969864 | 0) 0.422 us | pick_next_task_fair();
641.969865 | 0) 0.421 us | pick_next_task_idle();641.969865 | 0) 0.421 us | pick_next_task_idle();641.969865 | 0) 0.421 us | pick_next_task_idle();641.969865 | 0) 0.421 us | pick_next_task_idle();
641.969866 | 0) 0.461 us | perf_counter_task_sched_out();641.969866 | 0) 0.461 us | perf_counter_task_sched_out();641.969866 | 0) 0.461 us | perf_counter_task_sched_out();641.969866 | 0) 0.461 us | perf_counter_task_sched_out();
Here!
![Page 19: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/19.jpg)
Case1: Check the kernel source (Third attempt)
� Pick_next_task() also defined in kernel/sched.c
static inline struct task_struct * static inline struct task_struct * static inline struct task_struct * static inline struct task_struct *
pick_next_taskpick_next_taskpick_next_taskpick_next_task(struct rq *rq) (struct rq *rq) (struct rq *rq) (struct rq *rq)
{ { { {
・・・・・・・・・・・・
if (likely(rqif (likely(rqif (likely(rqif (likely(rq---->nr_running == rq>nr_running == rq>nr_running == rq>nr_running == rq---->cfs.nr_running)) { >cfs.nr_running)) { >cfs.nr_running)) { >cfs.nr_running)) {
p = fair_sched_class.pick_next_task(rq); p = fair_sched_class.pick_next_task(rq); p = fair_sched_class.pick_next_task(rq); p = fair_sched_class.pick_next_task(rq);
if (likely(p)) if (likely(p)) if (likely(p)) if (likely(p))
return p; return p; return p; return p;
} } } }
class = sched_class_highest; class = sched_class_highest; class = sched_class_highest; class = sched_class_highest;
19Yoshitake Kobayashi - Embedded Linux Conference 2012 -
class = sched_class_highest; class = sched_class_highest; class = sched_class_highest; class = sched_class_highest;
for ( ; ; ) {for ( ; ; ) {for ( ; ; ) {for ( ; ; ) {
p = classp = classp = classp = class---->pick_next_task(rq); >pick_next_task(rq); >pick_next_task(rq); >pick_next_task(rq);
if (p) if (p) if (p) if (p)
return p; return p; return p; return p;
・・・・・・・・・・・・
class = classclass = classclass = classclass = class---->next; >next; >next; >next;
} } } }
}}}}
asmlinkage void __sched asmlinkage void __sched asmlinkage void __sched asmlinkage void __sched __schedule__schedule__schedule__schedule(void)(void)(void)(void)
{{{{
・・・・・・・・・・・・
put_prev_task(rq, prev);put_prev_task(rq, prev);put_prev_task(rq, prev);put_prev_task(rq, prev);
next = next = next = next = pick_next_taskpick_next_taskpick_next_taskpick_next_task(rq);(rq);(rq);(rq);
Differentresult again!
![Page 20: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/20.jpg)
Case1: Summary of the evaluation results
� Latency occurs in different points each time and difficult to identify the cause
� Only occurs on specific hardware
� Maybe SMI (System Management Interrupt)
20Yoshitake Kobayashi - Embedded Linux Conference 2012 -
![Page 21: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/21.jpg)
Learn a lesson from Case1
� One shot trace log is not enough
21Yoshitake Kobayashi - Embedded Linux Conference 2012 -
![Page 22: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/22.jpg)
Case2: Evaluation environment� CPU: ARM Coretex-A8� Memory: 512MB� Kernel: Linux 2.6.31.12-rt21
� A function graph tracer patch appliedhttp://elinux.org/Ftrace_Function_Graph_ARM
22Yoshitake Kobayashi - Embedded Linux Conference 2012 -
![Page 23: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/23.jpg)
Case2: Summary of evaluation result (First attempt)
� Evaluated without other CPU workload Cycle: 300us Cycle: 500us Cycle: 1000us
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
Latency (us)
Num
ber
of s
ampl
es
Latency (us)
Num
ber
of s
ampl
es
Latency (us)
Num
ber
of s
ampl
es23Yoshitake Kobayashi - Embedded Linux Conference 2012 -
Cycle [µs] Min. Ave. Max # of DL misses
300 11 19.025 921 3018500 11 19.458 684 5026
1000 11 23.011 717 0
![Page 24: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/24.jpg)
Case2: Summary of evaluation result (First attempt)
� Evaluated with approx 60% CPU workload Cycle: 300us Cycle: 500us Cycle: 1000us
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
Latency (us)
Num
ber
of s
ampl
es
Latency (us)
Num
ber
of s
ampl
es
Latency (us)
Num
ber
of s
ampl
es24Yoshitake Kobayashi - Embedded Linux Conference 2012 -
Cycle [µs] Min. Ave. Max # of DL misses
300 13 34.165 980 3018500 13 33.234 760 5025
1000 12 35.014 714 0
![Page 25: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/25.jpg)
504.294647 | 0) | sys_timer_getoverrun() {504.294647 | 0) | sys_timer_getoverrun() {504.294647 | 0) | sys_timer_getoverrun() {504.294647 | 0) | sys_timer_getoverrun() {
504.294649 | 0) 2.000 us | lock_timer();504.294649 | 0) 2.000 us | lock_timer();504.294649 | 0) 2.000 us | lock_timer();504.294649 | 0) 2.000 us | lock_timer();
504.294654 | 0) 6.625 us | }504.294654 | 0) 6.625 us | }504.294654 | 0) 6.625 us | }504.294654 | 0) 6.625 us | }
504.294656 | 0) | 504.294656 | 0) | 504.294656 | 0) | 504.294656 | 0) | sys_rt_sigtimedwait() {sys_rt_sigtimedwait() {sys_rt_sigtimedwait() {sys_rt_sigtimedwait() {
504.294658 | 0) | dequeue_signal() {504.294658 | 0) | dequeue_signal() {504.294658 | 0) | dequeue_signal() {504.294658 | 0) | dequeue_signal() {
504.294660 | 0) | __dequeue_signal() {504.294660 | 0) | __dequeue_signal() {504.294660 | 0) | __dequeue_signal() {504.294660 | 0) | __dequeue_signal() {
504.294662 | 0) 1.625 us | next_signal();504.294662 | 0) 1.625 us | next_signal();504.294662 | 0) 1.625 us | next_signal();504.294662 | 0) 1.625 us | next_signal();
・・・・
・・・・
・・・・
・・・・
・・・・
Case2: Check the trace log (First attempt)
?
25Yoshitake Kobayashi - Embedded Linux Conference 2012 -
・・・・
・・・・
・・・・
・・・・
・・・・
504.297163 | 0) | recalc_sigpending() {504.297163 | 0) | recalc_sigpending() {504.297163 | 0) | recalc_sigpending() {504.297163 | 0) | recalc_sigpending() {
504.297165 | 0) | recalc_sigpending_tsk() {504.297165 | 0) | recalc_sigpending_tsk() {504.297165 | 0) | recalc_sigpending_tsk() {504.297165 | 0) | recalc_sigpending_tsk() {
504.297168 | 0) 5.250 us | }504.297168 | 0) 5.250 us | }504.297168 | 0) 5.250 us | }504.297168 | 0) 5.250 us | }
504.297170 | 0) ! 2513.625 us | 504.297170 | 0) ! 2513.625 us | 504.297170 | 0) ! 2513.625 us | 504.297170 | 0) ! 2513.625 us | }}}}
504.297175 | 0) | 504.297175 | 0) | 504.297175 | 0) | 504.297175 | 0) | sys_clock_gettime()sys_clock_gettime()sys_clock_gettime()sys_clock_gettime() {{{{
504.297177 | 0) | posix_ktime_get_ts() {504.297177 | 0) | posix_ktime_get_ts() {504.297177 | 0) | posix_ktime_get_ts() {504.297177 | 0) | posix_ktime_get_ts() {
504.297178 | 0) | ktime_get_ts() {504.297178 | 0) | ktime_get_ts() {504.297178 | 0) | ktime_get_ts() {504.297178 | 0) | ktime_get_ts() {
504.297181 | 0) | asm_do_IRQ() {504.297181 | 0) | asm_do_IRQ() {504.297181 | 0) | asm_do_IRQ() {504.297181 | 0) | asm_do_IRQ() {
Here!
?
![Page 26: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/26.jpg)
504.295229 | 0) | __do_softirq() {504.295229 | 0) | __do_softirq() {504.295229 | 0) | __do_softirq() {504.295229 | 0) | __do_softirq() {
504.295232 | 0) | run_timer_softirq() {504.295232 | 0) | run_timer_softirq() {504.295232 | 0) | run_timer_softirq() {504.295232 | 0) | run_timer_softirq() {
504.295234 | 0) 1.750 us | hrtimer_run_pending();504.295234 | 0) 1.750 us | hrtimer_run_pending();504.295234 | 0) 1.750 us | hrtimer_run_pending();504.295234 | 0) 1.750 us | hrtimer_run_pending();
504.295239 | 0) 1.750 us | preempt_schedule();504.295239 | 0) 1.750 us | preempt_schedule();504.295239 | 0) 1.750 us | preempt_schedule();504.295239 | 0) 1.750 us | preempt_schedule();
504.295243 | 0) | ehci_watchdog() {504.295243 | 0) | ehci_watchdog() {504.295243 | 0) | ehci_watchdog() {504.295243 | 0) | ehci_watchdog() {
504.295245 | 0) | ehci_work() {504.295245 | 0) | ehci_work() {504.295245 | 0) | ehci_work() {504.295245 | 0) | ehci_work() {
504.295250 | 0) 1.875 us | free_cached_itd_list();504.295250 | 0) 1.875 us | free_cached_itd_list();504.295250 | 0) 1.875 us | free_cached_itd_list();504.295250 | 0) 1.875 us | free_cached_itd_list();
504.295256 | 0) 4.625 us | qh_completions();504.295256 | 0) 4.625 us | qh_completions();504.295256 | 0) 4.625 us | qh_completions();504.295256 | 0) 4.625 us | qh_completions();
504.295265 | 0) 3.375 us | qh_completions();504.295265 | 0) 3.375 us | qh_completions();504.295265 | 0) 3.375 us | qh_completions();504.295265 | 0) 3.375 us | qh_completions();
504.295272 | 0) 3.250 us | qh_completions();504.295272 | 0) 3.250 us | qh_completions();504.295272 | 0) 3.250 us | qh_completions();504.295272 | 0) 3.250 us | qh_completions();
504.295279 | 0) 3.250 us | qh_completions();504.295279 | 0) 3.250 us | qh_completions();504.295279 | 0) 3.250 us | qh_completions();504.295279 | 0) 3.250 us | qh_completions();
504.295286 | 0) 3.625 us | qh_completions();504.295286 | 0) 3.625 us | qh_completions();504.295286 | 0) 3.625 us | qh_completions();504.295286 | 0) 3.625 us | qh_completions();
Case2: Identify the latency (First attempt)
Point!
26Yoshitake Kobayashi - Embedded Linux Conference 2012 -
504.295286 | 0) 3.625 us | qh_completions();504.295286 | 0) 3.625 us | qh_completions();504.295286 | 0) 3.625 us | qh_completions();504.295286 | 0) 3.625 us | qh_completions();
504.295292 | 0) 3.375 us | qh_completions();504.295292 | 0) 3.375 us | qh_completions();504.295292 | 0) 3.375 us | qh_completions();504.295292 | 0) 3.375 us | qh_completions();
504.295299 | 0) 3.375 us | qh_completions();504.295299 | 0) 3.375 us | qh_completions();504.295299 | 0) 3.375 us | qh_completions();504.295299 | 0) 3.375 us | qh_completions();
504.295305 | 0) 3.375 us | qh_completions();504.295305 | 0) 3.375 us | qh_completions();504.295305 | 0) 3.375 us | qh_completions();504.295305 | 0) 3.375 us | qh_completions();
504.295312 | 0) 3.250 us | qh_completions();504.295312 | 0) 3.250 us | qh_completions();504.295312 | 0) 3.250 us | qh_completions();504.295312 | 0) 3.250 us | qh_completions();
504.295319 | 0) 3.250 us | qh_completions();504.295319 | 0) 3.250 us | qh_completions();504.295319 | 0) 3.250 us | qh_completions();504.295319 | 0) 3.250 us | qh_completions();
504.295325 | 0) 3.375 us | qh_completions()504.295325 | 0) 3.375 us | qh_completions()504.295325 | 0) 3.375 us | qh_completions()504.295325 | 0) 3.375 us | qh_completions();;;;
This function is used to process and free completed qtds for a qh.(drivers/usb/host/ehci-q.c)
![Page 27: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/27.jpg)
Case2: Cause and possible solution
� Cause� Too many qh_completions() function call
� Default polling threshold is 100ms
� Possible solution� Change the polling threshold to 10ms
27Yoshitake Kobayashi - Embedded Linux Conference 2012 -
diff diff diff diff --------git a/drivers/usb/host/ehcigit a/drivers/usb/host/ehcigit a/drivers/usb/host/ehcigit a/drivers/usb/host/ehci----hcd.c b/drivers/usb/host/ehcihcd.c b/drivers/usb/host/ehcihcd.c b/drivers/usb/host/ehcihcd.c b/drivers/usb/host/ehci----hcd.c hcd.c hcd.c hcd.c
index 0c9b7d2..db2efd2 100644 index 0c9b7d2..db2efd2 100644 index 0c9b7d2..db2efd2 100644 index 0c9b7d2..db2efd2 100644
------------ a/drivers/usb/host/ehcia/drivers/usb/host/ehcia/drivers/usb/host/ehcia/drivers/usb/host/ehci----hcd.chcd.chcd.chcd.c
+++ b/drivers/usb/host/ehci+++ b/drivers/usb/host/ehci+++ b/drivers/usb/host/ehci+++ b/drivers/usb/host/ehci----hcd.chcd.chcd.chcd.c
@@ @@ @@ @@ ----83,7 +83,7 @@ static const char hcd_name [] = "ehci_hcd";83,7 +83,7 @@ static const char hcd_name [] = "ehci_hcd";83,7 +83,7 @@ static const char hcd_name [] = "ehci_hcd";83,7 +83,7 @@ static const char hcd_name [] = "ehci_hcd";
#define EHCI_TUNE_FLS 2 /* (small) 256 frame schedule */#define EHCI_TUNE_FLS 2 /* (small) 256 frame schedule */#define EHCI_TUNE_FLS 2 /* (small) 256 frame schedule */#define EHCI_TUNE_FLS 2 /* (small) 256 frame schedule */
#define EHCI_IAA_MSECS 10 /* arbitrary */ #define EHCI_IAA_MSECS 10 /* arbitrary */ #define EHCI_IAA_MSECS 10 /* arbitrary */ #define EHCI_IAA_MSECS 10 /* arbitrary */
----#define EHCI_IO_JIFFIES (HZ/10) /* io watchdog > irq_thresh */ #define EHCI_IO_JIFFIES (HZ/10) /* io watchdog > irq_thresh */ #define EHCI_IO_JIFFIES (HZ/10) /* io watchdog > irq_thresh */ #define EHCI_IO_JIFFIES (HZ/10) /* io watchdog > irq_thresh */
+#define EHCI_IO_JIFFIES (HZ/100) /* io watchdog > irq_thresh */+#define EHCI_IO_JIFFIES (HZ/100) /* io watchdog > irq_thresh */+#define EHCI_IO_JIFFIES (HZ/100) /* io watchdog > irq_thresh */+#define EHCI_IO_JIFFIES (HZ/100) /* io watchdog > irq_thresh */
#define EHCI_ASYNC_JIFFIES (HZ/20) /* async idle timeout */#define EHCI_ASYNC_JIFFIES (HZ/20) /* async idle timeout */#define EHCI_ASYNC_JIFFIES (HZ/20) /* async idle timeout */#define EHCI_ASYNC_JIFFIES (HZ/20) /* async idle timeout */
#define EHCI_SHRINK_FRAMES 5 /* async qh unlink delay */ #define EHCI_SHRINK_FRAMES 5 /* async qh unlink delay */ #define EHCI_SHRINK_FRAMES 5 /* async qh unlink delay */ #define EHCI_SHRINK_FRAMES 5 /* async qh unlink delay */
![Page 28: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/28.jpg)
Case2: Summary of evaluation result (Second attempt)
� Evaluated without other CPU workload
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
Cycle: 300us Cycle: 500us Cycle: 1000us
Latency (us)
Num
ber
of s
ampl
es
Latency (us)
Num
ber
of s
ampl
es
Latency (us)
Num
ber
of s
ampl
es28Yoshitake Kobayashi - Embedded Linux Conference 2012 -
Cycle Min Ave. Max # of DL miss
300 11 17.590 209 0500 11 17.583 117 0
1000 11 17.610 154 0
![Page 29: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/29.jpg)
Case2: Summary of evaluation result (Second attempt)
� Evaluated with approx 60% CPU workload
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
Cycle: 300us Cycle: 500us Cycle: 1000us
Latency (us)
Num
ber
of s
ampl
es
Latency (us)
Num
ber
of s
ampl
es
Latency (us)
Num
ber
of s
ampl
es29Yoshitake Kobayashi - Embedded Linux Conference 2012 -
Cycle Min Ave Max # of DL miss
300 13 33.365 259 0500 13 29.695 228 0
1000 12 33.222 199 0
Maximum latency reduced to 1/4
![Page 30: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/30.jpg)
Learn a lesson from Case2
� Need to check what kind of functions are called
� Need to count the number of callees
30Yoshitake Kobayashi - Embedded Linux Conference 2012 -
![Page 31: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/31.jpg)
How to stabilize latency less than 50µs
� Required spec is the following:� Single process
� No network, No USB devices, No graphic device, No storage device
� latency < 50µs
� Evaluation result (cyclictest: cycle=250µs)
4500000
5000000
31Yoshitake Kobayashi - Embedded Linux Conference 2012 -
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
4500000
0 10 20 30 40 50 60 70 80 90 100
回数
latency(μ秒)Latency (us)
Num
ber
of s
ampl
es
Why?
![Page 32: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/32.jpg)
Second mistake
2000
4000
6000
8000
10000
12000
14000
回数
Num
ber
of s
ampl
escycle = 250 µs
32Yoshitake Kobayashi - Embedded Linux Conference 2012 -
� Deadline miss ratio: 100%
� Function graph tracing requires additional overhead
0
2000
0 500 1000 1500 2000
latency(μ秒)Latency (us)
![Page 33: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/33.jpg)
Evaluate the latency with trace ON
2000
4000
6000
8000
10000
12000回数
Num
ber
of s
ampl
escycle = 1500 µs
33Yoshitake Kobayashi - Embedded Linux Conference 2012 -
� This result seems to have same characteristics
� Trace with the following setting� Stop if the latency is more than 790µs
� Stop if the latency is between 575 and 585
0
2000
0 200 400 600 800 1000
latency(μ秒)Latency (us)
![Page 34: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/34.jpg)
Conclusion
� Evaluating with too much workload is not a good idea� Required log data is lost
� One shot trace log is not enough� Need to check performance characteristics between trace on and off
� Need to check if latency occurs at similar points
Require some creative thinking to identify the cause
34Yoshitake Kobayashi - Embedded Linux Conference 2012 -
� Require some creative thinking to identify the cause� Check if the same function takes different execution time
� Statistics approach may be applied
� Check all functions included in specific area to identify each function’s overhead
� Eliminate callee’s overhead to calculate pure execution time for each function’s algorithm
![Page 35: Ineffective and effective way to find out latency bottlenecks …...Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff57aca8c552511c817a886/html5/thumbnails/35.jpg)
35Yoshitake Kobayashi - Embedded Linux Conference 2012 - 2008 / 7 / 24TOSHIBA Confidential