linux kernel hacking free course - sprg.uniroma2.it · april 05, 2006 irq distribution in...
TRANSCRIPT
April 05, 2006 IRQ distribution in multiprocessor systems 1
Linux Kernel Hacking Free Course
G.Grilli, University of Rome “Tor Vergata”
IRQ DISTRIBUTION INMULTIPROCESSOR SYSTEMS
3rd edition
April 05, 2006 IRQ distribution in multiprocessor systems 2
Linux Kernel Hacking Free Course 3rd edition Contents:
What is an interruptWhat is an interrupt
Synchronous and asynchronous interruptsSynchronous and asynchronous interruptsInterrupts in uniprocessor and smp achitecturesInterrupts in uniprocessor and smp achitectures
The advanced programmable interrupt controller (APIC)The advanced programmable interrupt controller (APIC)
IRQs' distribution problem starting with IntelIRQs' distribution problem starting with Intel®® Pentium 4 architecture Pentium 4 architecture
Negative impact on massively parallel applicationsNegative impact on massively parallel applications
An experimental kernel with a new irq balancing capabilityAn experimental kernel with a new irq balancing capability
Hint: using smp_affinity() to bind irq lines to specific processorsHint: using smp_affinity() to bind irq lines to specific processors
April 05, 2006 IRQ distribution in multiprocessor systems 3
Linux Kernel Hacking Free Course 3rd edition What is an interrupt?
An interrupt is defined as an event that alters the sequence of instructions An interrupt is defined as an event that alters the sequence of instructions executed by a processor.executed by a processor.
Such events correspond to electrical signals generated by hardware circuits Such events correspond to electrical signals generated by hardware circuits both inside and outside the CPU chip.both inside and outside the CPU chip.
Interrupts are often divided into “Interrupts are often divided into “synchronoussynchronous” and “” and “asynchronousasynchronous” ” interruptsinterrupts
April 05, 2006 IRQ distribution in multiprocessor systems 4
Linux Kernel Hacking Free Course 3rd edition Interrupts and Exceptions (Intel® classification)
synchronous interrupts (“interrupts”)synchronous interrupts (“interrupts”)
asynchronous interrupts (“exceptions”)asynchronous interrupts (“exceptions”)
They are produced by the CPU control unit (CU) while executing instructions. They are produced by the CPU control unit (CU) while executing instructions. “Synchronous” because the CU issues them only after terminating the execution of an “Synchronous” because the CU issues them only after terminating the execution of an instruction (programming errors or anomalous conditions).instruction (programming errors or anomalous conditions).
They are generated by other hardware devices at arbitrary times with respect to the They are generated by other hardware devices at arbitrary times with respect to the CPU clock signals (interval timers and I/O devices).CPU clock signals (interval timers and I/O devices).
April 05, 2006 IRQ distribution in multiprocessor systems 5
Linux Kernel Hacking Free Course 3rd edition Interrupts (overview schema)
data bus, addresses and control
I/O controller
Network Card Hard Disk Mouse Keyboard
main memory processor
I/O controller I/O controller I/O controller
April 05, 2006 IRQ distribution in multiprocessor systems 6
Linux Kernel Hacking Free Course 3rd edition Interrupts – uniprocessor scenario (1)
disk drive
disk controller
Interrupt controller
CPU
Phase 1:Phase 1: CPU asks the disk controller to perform some operations CPU asks the disk controller to perform some operations
April 05, 2006 IRQ distribution in multiprocessor systems 7
Linux Kernel Hacking Free Course 3rd edition Interrupts – uniprocessor scenario (2)
disk drive
disk controller
Interrupt controller
CPU
Phase 2:Phase 2: the disk controller has completes its task and raises an interrupt on the bus the disk controller has completes its task and raises an interrupt on the bus using a specific irq line. using a specific irq line.
April 05, 2006 IRQ distribution in multiprocessor systems 8
Linux Kernel Hacking Free Course 3rd edition Interrupts – uniprocessor scenario (3)
disk drive
disk controller
Interrupt controller
CPU
Phase 3:Phase 3: the interrupt controller sets the interrupt signal to be handled by the cpu the interrupt controller sets the interrupt signal to be handled by the cpu
Phase 4:Phase 4: the interrupt controller sends on the bus the number which identifies the device the interrupt controller sends on the bus the number which identifies the device raising the interruptraising the interrupt
April 05, 2006 IRQ distribution in multiprocessor systems 9
next instruction
current instructioncurrent instruction
Interrupt Interrupt handlerhandler
interrupt vectorinterrupt vector
disk drive
disk controller
Interrupt controller
CPU
Interrupts – uniprocessor scenario (4)
Linux Kernel Hacking Free Course 3rd edition
Phase 5:Phase 5: the CPU receiving the interrupt request changes the current instructions' flow the CPU receiving the interrupt request changes the current instructions' flow by jumping to the proper interrupt handlerby jumping to the proper interrupt handler
April 05, 2006 IRQ distribution in multiprocessor systems 10
Proces s or1
Proces s or2
Proces s orN
. . .
Address BusAddress Bus
Data BusData Bus
bus bus locklock
bus bus locklock
bus bus locklock
bus bus locklock
bus bus locklock
bus bus locklock
Linux Kernel Hacking Free Course 3rd edition Symmetrical MultiProcessing architecture (SMP)
April 05, 2006 IRQ distribution in multiprocessor systems 11
Linux Kernel Hacking Free Course 3rd edition The Advanced Programmable Interrupt Controller (APIC) (1)
In order to deliver interrupts to each CPU in the system (granting the parallelism of a In order to deliver interrupts to each CPU in the system (granting the parallelism of a smp architecture), Intelsmp architecture), Intel®® introduced starting from Pentium III a new component, the introduced starting from Pentium III a new component, the I/O APICI/O APIC..
In multiprocessor systems based on 80x86 architecture, each processor includes a In multiprocessor systems based on 80x86 architecture, each processor includes a local APIC.local APIC.
Each lEach local APICocal APIC has 32 bit registers, an internal clock, a local timer device and two has 32 bit registers, an internal clock, a local timer device and two additional IRQ lines (LINT0 and LINT1) reserved for additional IRQ lines (LINT0 and LINT1) reserved for local APIClocal APIC interrupts. interrupts.
April 05, 2006 IRQ distribution in multiprocessor systems 12
Linux Kernel Hacking Free Course 3rd edition The Advanced Programmable Interrupt Controller (APIC) (2)
The I/O APIC consists of a set of 24 IRQ lines, a 24entry Interrupt Redirection Table, The I/O APIC consists of a set of 24 IRQ lines, a 24entry Interrupt Redirection Table, programmable registers and a message unit for sending and receiving APIC programmable registers and a message unit for sending and receiving APIC messages over the APIC bus.messages over the APIC bus.
Any entry in the redirection table can be individually programmed to indicate the Any entry in the redirection table can be individually programmed to indicate the interrupt vector and priority, the destination processor and how the processor is interrupt vector and priority, the destination processor and how the processor is selected.selected.
Generally speaking, the I/O APIC acts like an “IRQ router” with respect to the Local Generally speaking, the I/O APIC acts like an “IRQ router” with respect to the Local APICs.APICs.
April 05, 2006 IRQ distribution in multiprocessor systems 13
Linux Kernel Hacking Free Course 3rd edition APIC Overview in a quad processor architecture
backside cachebackside cache
CPU #0CPU #0
backside cachebackside cache
CPU #1CPU #1
backside cachebackside cache
CPU #2CPU #2
backside cachebackside cache
CPU #3CPU #3
BUSBUS
HOST BRIDGEHOST BRIDGEram bankram bank
PCI bus + slotsPCI bus + slots
APIC busAPIC bus
Local APICLocal APIC
I/OAPICI/OAPIC
* starting from Pentium IV, the APIC bus has been replaced by the system bus
April 05, 2006 IRQ distribution in multiprocessor systems 14
Linux Kernel Hacking Free Course 3rd edition How the I/O APIC ditributes irqs among the CPUs
Static distributionStatic distribution
Dynamic distributionDynamic distribution
The I/O APIC sends the IRQ signal according to the redirection table. The interrupt can be The I/O APIC sends the IRQ signal according to the redirection table. The interrupt can be delivered to one specific CPU, to a subset of CPUs, or to all CPUs at once (broadcast mode).delivered to one specific CPU, to a subset of CPUs, or to all CPUs at once (broadcast mode).
The IRQ signal is delivered to the local APIC of the processor that is executing the process The IRQ signal is delivered to the local APIC of the processor that is executing the process with the lowest priority.with the lowest priority.
The I/O APIC consists of a set of 24 IRQ lines, a 24entry Interrupt Redirection Table, The I/O APIC consists of a set of 24 IRQ lines, a 24entry Interrupt Redirection Table, programmable registers and a message unit for sending and receiving APIC messages over programmable registers and a message unit for sending and receiving APIC messages over the APIC bus.the APIC bus.
April 05, 2006 IRQ distribution in multiprocessor systems 15
Linux Kernel Hacking Free Course 3rd edition How the I/O APIC ditributes irqs among the CPUs (cont.)
Every Local APIC has a programmable task priority register (TPR), which is used to compute Every Local APIC has a programmable task priority register (TPR), which is used to compute the priority of the currently running process. Intel expects this register to be modified in an the priority of the currently running process. Intel expects this register to be modified in an operating system kernel at every task switch.operating system kernel at every task switch.
If two or more CPUs share the lowest priority, the load is distributed between them using the If two or more CPUs share the lowest priority, the load is distributed between them using the arbitration techniquearbitration technique::
each CPU has an arbitration priority ranging from 0 to 15 (highest)each CPU has an arbitration priority ranging from 0 to 15 (highest)
everytime an interrupt is delivered to a CPU, its priority is set to 0 while the priority everytime an interrupt is delivered to a CPU, its priority is set to 0 while the priority of the other CPUs is increased by 1of the other CPUs is increased by 1
when the arbitration priority register becomes greater than 15, it is set to the when the arbitration priority register becomes greater than 15, it is set to the previous arbitration priority of the winning CPU increased by 1 previous arbitration priority of the winning CPU increased by 1
April 05, 2006 IRQ distribution in multiprocessor systems 16
Linux Kernel Hacking Free Course 3rd edition
The Pentium 4 local Apic doesn't have an arbitration priority register and the The Pentium 4 local Apic doesn't have an arbitration priority register and the mechanism is hidden in the bus arbitration circuitrymechanism is hidden in the bus arbitration circuitry
If the Operating System kernel does not regularly update the TPRs, If the Operating System kernel does not regularly update the TPRs, performance may be suboptimal because interrupts might always be served performance may be suboptimal because interrupts might always be served by the same CPU!by the same CPU!
Integrating this mechanism inside the kernel can be a critical taskIntegrating this mechanism inside the kernel can be a critical task
Problems start with Intel® Pentium 4 processor family
April 05, 2006 IRQ distribution in multiprocessor systems 17
CPU#0
CPU#1
CPU#2
CPU#3
CPU#0
CPU#1
CPU#2
CPU#3
10 sec10 sec 12 sec12 sec
1 2 3 1 2 3
Ideal case:Ideal case: Real case:Real case:
The impact of load imbalance in massively parallel applications
Linux Kernel Hacking Free Course 3rd edition
In massively parallel application the delay in a job execution slows down the entire In massively parallel application the delay in a job execution slows down the entire calculus due to a synchronization phase.calculus due to a synchronization phase.
April 05, 2006 IRQ distribution in multiprocessor systems 18
Linux Kernel Hacking Free Course 3rd edition The current kernel implementation
Developed and submitted by Nitin Kamble (IntelDeveloped and submitted by Nitin Kamble (Intel®® Corporation) Corporation)
Integrated into the kernel starting from version 2.5.52 as a kernel thread Integrated into the kernel starting from version 2.5.52 as a kernel thread named “irqd”named “irqd”
Problems related to this mechanism:Problems related to this mechanism:
IRQs are not migrated if the interrupt rate is below an high IRQs are not migrated if the interrupt rate is below an high treshold (usually, hundreds of interrupts per second)treshold (usually, hundreds of interrupts per second)
Even when the threshold is reached, irq balancing is suboptimal Even when the threshold is reached, irq balancing is suboptimal (see the next slide)(see the next slide)
April 05, 2006 IRQ distribution in multiprocessor systems 19
Linux Kernel Hacking Free Course 3rd edition Example of bad irq distribution even under heavy irq load
#10 ping flooding#10 ping flooding
#4 disk interrupt generators#4 disk interrupt generators
““THORTHOR” quad Intel” quad Intel®® Xeon 1.5GHz, 4GB RAM, Xeon 1.5GHz, 4GB RAM, NIC 10/100 Mbps, Linux kernel 2.6.4NIC 10/100 Mbps, Linux kernel 2.6.4
Irq frequence:~ 65 KHzIrq frequence:~ 65 KHz
Irq frequence:~ 260 HzIrq frequence:~ 260 HzCPU#1CPU#1
CPU#0CPU#0
CPU#2CPU#2
CPU#3CPU#3
1 KHzIRQ#0 (TIMER)IRQ#0 (TIMER)
IRQ#29 (SCSI)IRQ#29 (SCSI)
IRQ#0 (NIC)IRQ#0 (NIC)
65 KH
z
5 Hz
260 Hz
??
??
April 05, 2006 IRQ distribution in multiprocessor systems 20
Linux Kernel Hacking Free Course 3rd edition New irq distribution mechanism (experimental)
Implemented as kernel thread in Linux kernel 2.6.4Implemented as kernel thread in Linux kernel 2.6.4
Intel Hyperthreading technology aware (physical CPU is seen by the operating Intel Hyperthreading technology aware (physical CPU is seen by the operating system as a couple of logical CPUs).system as a couple of logical CPUs).
At every execution, the kernel thread updates the data structures and tries to At every execution, the kernel thread updates the data structures and tries to find out the most and the least loaded CPU.find out the most and the least loaded CPU.
An heuristic function is used in order to evaluate the cpu load related to irq An heuristic function is used in order to evaluate the cpu load related to irq traffic. This function checks both the interrupt requests raised till the last thread traffic. This function checks both the interrupt requests raised till the last thread execution and the global ones, raised from the mechanism's startup time.execution and the global ones, raised from the mechanism's startup time.
If the most loaded CPU has “N” irq lines, the kernel thread tries to migrate the If the most loaded CPU has “N” irq lines, the kernel thread tries to migrate the first “N/2 + 1” lines to the least loaded CPU.first “N/2 + 1” lines to the least loaded CPU.
April 05, 2006 IRQ distribution in multiprocessor systems 21
IRQ0IRQ0
IRQ1IRQ1
IRQ2IRQ2
IRQ3IRQ3
IRQ4IRQ4
IRQ5IRQ5
IRQ6IRQ6
IRQ7IRQ7
IRQ8IRQ8
High rate IRQHigh rate IRQ Unassigned IRQUnassigned IRQ
CPU#1CPU#1 CPU#2CPU#2 CPU#3CPU#3CPU#0CPU#0
Linux Kernel Hacking Free Course 3rd edition Algorithm example (1)
Low rateIRQLow rateIRQ
April 05, 2006 IRQ distribution in multiprocessor systems 22
IRQ5IRQ5
IRQ6IRQ6
IRQ7IRQ7
IRQ8IRQ8
IRQ0IRQ0
IRQ1IRQ1
IRQ2IRQ2
IRQ3IRQ3
IRQ4IRQ4
CPU#2CPU#2 CPU#1CPU#1 CPU#3CPU#3CPU#0CPU#0
High rate IRQHigh rate IRQ Unassigned IRQUnassigned IRQLow rateIRQLow rateIRQ
Linux Kernel Hacking Free Course 3rd edition Algorithm example (2)
April 05, 2006 IRQ distribution in multiprocessor systems 23
Linux Kernel Hacking Free Course 3rd edition Algorithm example (3)
IRQ8IRQ8 IRQ5IRQ5
IRQ6IRQ6
IRQ7IRQ7
IRQ0IRQ0
IRQ1IRQ1
IRQ2IRQ2
IRQ3IRQ3
IRQ4IRQ4
CPU#2CPU#2 CPU#1CPU#1 CPU#3CPU#3CPU#0CPU#0
High rate IRQHigh rate IRQ Unassigned IRQUnassigned IRQLow rateIRQLow rateIRQ
April 05, 2006 IRQ distribution in multiprocessor systems 24
IRQ8IRQ8 IRQ3IRQ3
IRQ4IRQ4
IRQ5IRQ5
IRQ6IRQ6
IRQ7IRQ7
IRQ0IRQ0
IRQ1IRQ1
IRQ2IRQ2
CPU#1CPU#1 CPU#2CPU#2 CPU#3CPU#3CPU#0CPU#0
Linux Kernel Hacking Free Course 3rd edition Algorithm example (4)
High rate IRQHigh rate IRQ Unassigned IRQUnassigned IRQLow rateIRQLow rateIRQ
April 05, 2006 IRQ distribution in multiprocessor systems 25
Linux Kernel Hacking Free Course 3rd edition The new data structure
nr_c
punr
_cpu locloc tottot eureur iclicl irqlirql
nextnext
ppl_unitppl_unit
ppl_unitppl_unit
ppl_unitppl_unit
nr_ir
qnr
_irq hitshits
nextnext
irq_unitirq_unit
irq_unitirq_unit
irq_unitirq_unit
ppl_unitppl_unit
irq_unitirq_unit
There are no static data structures...is it better?(give a look to the gcc prefetch capability, lesson 8)
April 05, 2006 IRQ distribution in multiprocessor systems 26
Linux Kernel Hacking Free Course 3rd edition Irq distribution in kernel 2.6.4
Irq lines are binded to CPU0
April 05, 2006 IRQ distribution in multiprocessor systems 27
Linux Kernel Hacking Free Course 3rd edition Irq distribution in kernel 2.6.4irqd
Irq lines are distributed among the cpus sibling cpus (hyperthreading enabled)
April 05, 2006 IRQ distribution in multiprocessor systems 28
86,55786,55785,97085,970
89,63289,63285,11385,113
92,13992,13986,74586,745
92,51892,51887,53987,539
94,45794,45788,22088,220
95,56095,56088,74488,744
95,57395,57389,94089,940
95,89095,890114,240114,240
87,02087,02083,79483,794
87,25487,25484,78184,781
88,35888,35887,84487,844
88,71588,71588,18088,180
89,00289,00289,88389,883
89,27889,27889,08089,080
89,91189,91191,38291,382
90,28090,280100,290100,290
88,27088,27083,72183,721
88,97288,97286,04486,044
89,44089,44086,67386,673
89,65489,65487,51387,513
89,74589,74588,65788,657
91,23091,23089,64589,645
91,45691,45689,89489,894
92,42692,426100,204100,204
Kernel 2.6.4Kernel 2.6.4
Kernel 2.6.4irqdKernel 2.6.4irqd
Row 8
Row 7
Row 6
Row 5
Row 4
Row 3
Row 2
Row 1
0.000 25.000 50.000 75.000 100.000 125.000
Execution times 1 ping flooding
pro
cess
es
execution time
Row 18
Row 16
Row 14
Row 12
0.000 25.000 50.000 75.000 100.000 125.000
Execution times 5 ping flooding
pro
cess
es
execution time
Linux Kernel Hacking Free Course 3rd edition Benchmarks results for kernel 2.6.4 and 2.6.4irqd (adhoc mpp benchmark)
Row 28
Row 26
Row 24
Row 22
0.000 25.000 50.000 75.000 100.000 125.000
Execution times 3 ping flooding
pro
cess
es
execution time
April 05, 2006 IRQ distribution in multiprocessor systems 29
Linux Kernel Hacking Free Course 3rd edition Hint: the smp_affinity() utility
The The smp_affinity()smp_affinity() is used to assign IRQs to specific processors (or is used to assign IRQs to specific processors (or groups of processors)groups of processors)
It allows you to control how your system will respond to various hardware eventsIt allows you to control how your system will respond to various hardware events
In this way you can easily redistribute the work load related to I/O devicesIn this way you can easily redistribute the work load related to I/O devices
started by Ingo Molnar in kernel 2.4started by Ingo Molnar in kernel 2.4
some more informations related to SMP IRQ AFFINITY mechanism can be some more informations related to SMP IRQ AFFINITY mechanism can be found here:found here:/usr/src/linux2.X.X/Documentation/IRQaffinity.txt/usr/src/linux2.X.X/Documentation/IRQaffinity.txt
April 05, 2006 IRQ distribution in multiprocessor systems 30
Linux Kernel Hacking Free Course 3rd edition Hint: the smp_affinity() example (1)
The number held in the "smp_affinity" file is presented in hexadecimal format and represents which processors any interrupts on certain irq line should be routed to.
[root@thor]# cat /proc/irq/30/smp_affinity[root@thor]# cat /proc/irq/30/smp_affinity ffffffffffffffff
F F F F F F F FF F F F F F F F
1111 1111 1111 1111 1111 1111 1111 11111111 1111 1111 1111 1111 1111 1111 1111
CPU#31CPU#31 CPU#0CPU#0““0” means: “the cpu cannot handle the irq#30”0” means: “the cpu cannot handle the irq#30”““1” means: “the cpu can handle the irq#30”1” means: “the cpu can handle the irq#30”
In this case, every cpu is allowed to handle the irq#30”In this case, every cpu is allowed to handle the irq#30”
April 05, 2006 IRQ distribution in multiprocessor systems 31
Hint: the smp_affinity() example (2)
Let's try to change the value stored in the smp_affinity bitmask in order to allow only cpu#0 to handle irq#30:
[root@archimedes /proc]# echo 1 > /proc/irq/30/smp_affinity [root@archimedes /proc]# echo 1 > /proc/irq/30/smp_affinity [root@archimedes /proc]# cat /proc/irq/30/smp_affinity [root@archimedes /proc]# cat /proc/irq/30/smp_affinity 0000000100000001
0 0 0 0 0 0 0 10 0 0 0 0 0 0 1
0000 0000 0000 0000 0000 0000 0000 00010000 0000 0000 0000 0000 0000 0000 0001
CPU#31CPU#31 CPU#0CPU#0
Linux Kernel Hacking Free Course 3rd edition
Now only CPU#0 is allowed to handle the irq#30”Now only CPU#0 is allowed to handle the irq#30”