cache profiling on arm linux

12
peemuperf Cache monitoring on ARM Linux 2012

Upload: prabindh-sundareson

Post on 19-May-2015

3.971 views

Category:

Technology


6 download

DESCRIPTION

Explains how to measure Cache performance for Linux applications and kernel usage using peemuperf

TRANSCRIPT

Page 1: Cache profiling on ARM Linux

peemuperf

Cache monitoring on ARM Linux

2012

Page 2: Cache profiling on ARM Linux

What is PMU ?• Cortex-A series processors contain event counting hardware which

can be used to profile and benchmark code, including generation of cycle and instruction count figures and to derive figures for cache misses and so forth. The performance counter block contains a cycle counter which can count processor cycles, or be configured to count every 64 cycles. There are also a number of configurable 32-bit wide event counters which can be set to count instances of events from a wide-ranging list (for example, instructions executed, or MMU TLB misses). These counters can be accessed through debug tools, or by software running on the processor, through the CP15 Performance Monitoring Unit (PMU) registers. They provide a non-invasive debug feature and do not change the behavior of the processor. CP15 also provides a number of controls for enabling and resetting the counters and to indicate overflows (there is an option to generate an interrupt on a counter overflow). The cycle counter can be enabled independently of the event counters.

• From ARM Architecture Reference Manual

Page 3: Cache profiling on ARM Linux

Profiling alternatives

• Oprofile– Supported in mainline kernel (drivers/oprofile)– ARM support enabled– Relies on “Interrupts” from HW unit, when event counters

overflow – Timer fallback when no HW event monitors are available

• Unfortunately, different errata in current ARM A8/A9 devices, make interrupt based monitoring unreliable– To be fixed in later ARM cores

• Due to above, oprofile only supports CPU cycle measurement using timers, on majority of ARM cores, atleast upto 3.2 kernel

Page 4: Cache profiling on ARM Linux

Latest status• http://lists.infradead.org/pipermail/linux-arm-kernel/2012-June/103189.html• Convert OMAP2/3 devices to use HWMOD for creating a PMU device. To support PMU• on OMAP2/3 devices we only need to use MPU sub-system and so we can simply use• the MPU HWMOD to create the PMU device. The MPU HWMOD for OMAP2/3 devices is• currently missing the PMU interrupt and so add the PMU interrupt to the MPU• HWMOD for these devices.

• This change also moves the PMU code out of the mach-omap2/devices.c files into• its own pmu.c file as suggested by Kevin Hilman to de-clutter devices.c.

• Cc: Ming Lei <ming.lei at canonical.com>• Cc: Will Deacon <will.deacon at arm.com>• Cc: Benoit Cousson <b-cousson at ti.com>• Cc: Paul Walmsley <paul at pwsan.com>• Cc: Kevin Hilman <khilman at ti.com>

• Signed-off-by: Jon Hunter <jon-hunter at ti.com>• ---• arch/arm/mach-omap2/Makefile | 1 +• arch/arm/mach-omap2/devices.c | 33 -----------• arch/arm/mach-omap2/omap_hwmod_2xxx_ipblock_data.c | 6 ++• arch/arm/mach-omap2/omap_hwmod_3xxx_data.c | 6 ++• arch/arm/mach-omap2/pmu.c | 59 ++++++++++++++++++++• arch/arm/plat-omap/include/plat/irqs.h | 1 +• 6 files changed, 73 insertions(+), 33 deletions(-)• create mode 100644 arch/arm/mach-omap2/pmu.c

Page 5: Cache profiling on ARM Linux

Patch status

• The patch set mentioned in earlier slide, is in various stages of integration into different SOC architectures

• Beagle/ OMAP35x is supported

• This is not supported in AM335x as of 2012, expect to be in mainline by 2013

• In the interim, what is the option ?

Page 6: Cache profiling on ARM Linux

What is the need ?

• For measuring different aspects of performance related to external memory bandwidth, cache usage monitoring is very key

• Current oprofile does not support this in different SOCs

Page 7: Cache profiling on ARM Linux

peemuperf

• A tool to measure overall Linux Performance using PMU HW of ARM - ARM CPU Cycles, Cache misses at L1 and L2 level, stalls, NEON..

• Consists of a kernel module that does the heavy lifting, and exposes all profile information to userspace via proc entry

Page 8: Cache profiling on ARM Linux

Configurable parameters

• evdelay=500 evlist=1,68,3,4 evdebug=1

• evdelay – Sampling interval (milliseconds)

• evlist – Comma separated array of event IDs (refer 3.2.49 c9, Event Selection Register of Cortex A8 TRM)

• evdebug – Controls debug output messages

Page 9: Cache profiling on ARM Linux

Userspace access

• Proc entry is – /proc/peemuperf

• Displays in below format– <COUNTER #> : <COUNTER VALUE>– Counter[0] : 48,– Counter[1] :77448,– Counter[2]: 13,– Counter[3]: 115058– Overflow flag: = 0, Cycle Count: = 5739253

Page 10: Cache profiling on ARM Linux

A8 vs A9

• A8 has 4 performance counters

• A9 has 6

• peemuperf dynamically configures based on run-time query

Page 11: Cache profiling on ARM Linux

Default Events monitored

• 1 ==> Instruction fetch that causes a refill at the lowest level of instruction or unified cache

• 68 ==> Any cacheable miss in the L2 cache• 3 ==> Data read or write operation that causes a

refill at the lowest level of data or unified cache• 4 ==> Data read or write operation that causes a

cache access at the lowest level of data or unified cache

Page 12: Cache profiling on ARM Linux

Source

• github.com/prabindh/peemuperf