exposing and fixing common app performance problems

26
Take Control of Application Performance Jon C. Hodgson Technical Director, Advanced Technology Group APM Subject Matter Expert “Hidden in Plain Sight”

Upload: riverbed-technology

Post on 14-Apr-2017

1.236 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Exposing and Fixing Common App Performance Problems

Take Control of Application Performance

Jon C. Hodgson Technical Director, Advanced Technology Group APM Subject Matter Expert

“Hidden in Plain Sight”

Page 2: Exposing and Fixing Common App Performance Problems

© 2015 Riverbed Technology. All rights reserved. 2

App

Java App Server

OS TCP/IP Stack

VMware

Apache

LAN

Web

.NET Worker Process

IIS Web Server

OS TCP/IP Stack

WAN

Client Browser

Remote Calls Web Service, DB etc.

Code Processing

Queuing

Hypervisor Oversubscription

Network/Bandwidth/Latency

Code Processing

Queuing

Request Payload Network/Bandwidth/Latency

BEGIN

Code Processing

Network/Bandwidth/Latency

Code Processing

Response Payload Network/Bandwidth/Latency

END Page Render Time

Packets

Code Instrumentation

Metrics

Packets

Code Instrumentation

Metrics

Packets

EUE

Anatomy of a Transaction

Page 3: Exposing and Fixing Common App Performance Problems

© 2015 Riverbed Technology. All rights reserved. 3

Crash Course: Application Architecture

VMware Hypervisor

Physical OS Resources (CPU, RAM, I/O etc.)

Operating System

Operating System Guest Operating System

Guest Operating System

OS Resources (CPU, RAM, I/O etc.) OS Resources (CPU, RAM, I/O etc.)

Operating System

OS Resources (CPU, RAM, I/O etc.)

java.exe OS Process

Java JVM (OS Process)

w3wp.exe OS Process

.NE T CLR (OS Process)

JVM Heap (Reserved RAM) CLR Heap (Reserved RAM)

Java Code .NE T Code

.NE T Application Java Application

Page 4: Exposing and Fixing Common App Performance Problems

© 2015 Riverbed Technology. All rights reserved. 4

The Flaw of Averages & Aggregates

Page 5: Exposing and Fixing Common App Performance Problems

© 2015 Riverbed Technology. All rights reserved. 5

The Flaw of Averages

A classic example of the Flaw of Averages involves the Statistician who drowned crossing a river that was, on average, 3 ft. deep

Source: The Flaw of Averages: Why We Underestimate Risk in the Face of Uncertainty by Sam L. Savage, with illustrations by Jeff Danziger – http://flawofaverages.com

Page 6: Exposing and Fixing Common App Performance Problems

© 2015 Riverbed Technology. All rights reserved. 6

CPU Aggregation

0

50

100

0

50

100

0

50

100

0

50

100

0

50

100

Core 1

Core 2

Core 3

Core 4

Host CPU

Average

Runaway Thread Runaway Thread (Hyperthreading)

Intermittent CPU Spikes

`

Overloaded Host

Page 7: Exposing and Fixing Common App Performance Problems

© 2015 Riverbed Technology. All rights reserved. 7

CPU Aggregation

0

50

100

0

50

100

0

50

100

0

50

100

0

50

100

Core 1

Core 2

Core 3

Core 4

Host CPU

Average

Runaway Thread Runaway Thread (Hyperthreading)

Intermittent CPU Spikes

Overloaded Host

Page 8: Exposing and Fixing Common App Performance Problems

© 2015 Riverbed Technology. All rights reserved. 8

Data Granularity

1 sec

Runaway Thread Runaway Thread (Hyperthreading)

Intermittent CPU Spikes

Overloaded Host

0

50

100

0

50

100

15 sec Sampled

0

50

100

15 sec Averaged

Page 9: Exposing and Fixing Common App Performance Problems

© 2015 Riverbed Technology. All rights reserved. 9

Case Study: It’s Not the Database …

…Or Is It?

Page 10: Exposing and Fixing Common App Performance Problems

© 2015 Riverbed Technology. All rights reserved. 10

§ Customer’s application was running very slowly

§ Everything looked fine on the app server & database server

§ Database CPU was very low, using only 8.4% CPU

Pcpu time args 8.4 41:41 oracle (DESCRIPTION=(LOCAL=NO)(SDU=1521))

“It’s not the database”

Page 11: Exposing and Fixing Common App Performance Problems

© 2015 Riverbed Technology. All rights reserved. 11

Database CPU — Disaggregated Aggregate CPU Load ~ 9% Average

At any given moment, a single CPU is pegged at 100% while the others are mostly idle. Why is the aggregate ~ 9%? Why is oracle the top process with only 8.4% utilization?

ANSWER: 100% CPU / 12 CPUs = 8.3%

It was the database after all!

CPU

Loa

d of

CPU

s 1-

12

(Sca

le 0

-100

%)

Page 12: Exposing and Fixing Common App Performance Problems

© 2015 Riverbed Technology. All rights reserved. 12

Case Study: Forgotten freeware claims 10,000 CPUs

Page 13: Exposing and Fixing Common App Performance Problems

© 2015 Riverbed Technology. All rights reserved. 13

Customer noticed a CPU Spike hopping from core to core with AppInternals:

Hidden in Plain Sight

Page 14: Exposing and Fixing Common App Performance Problems

© 2015 Riverbed Technology. All rights reserved. 14

§ The top process was a freeware sysadmin utility, with 6.25% CPU

§ 100% CPU / 16 CPUs = 6.25% CPU

§ This utility, running on Windows 2008 servers, had not been updated since 2003

§  It was part of the default build on 10,000+ servers

Hidden in Plain Sight

Page 15: Exposing and Fixing Common App Performance Problems

© 2015 Riverbed Technology. All rights reserved. 15

…you’ll see this everywhere Now That You Know…

Android Phone

Dual Core XP

Quad Core Windows 7

8 Core Win 2012

Page 16: Exposing and Fixing Common App Performance Problems

© 2015 Riverbed Technology. All rights reserved. 16

The Power of Big Data

Page 17: Exposing and Fixing Common App Performance Problems

© 2015 Riverbed Technology. All rights reserved. 17

REMOVING THE HAYSTACK

Page 18: Exposing and Fixing Common App Performance Problems

© 2015 Riverbed Technology. All rights reserved. 18

“Slowest” Not Always “Worst”

2x slower, few transactions

12x slower, many transactions

“Slo

wes

t”

“Wor

st”

Page 19: Exposing and Fixing Common App Performance Problems

© 2015 Riverbed Technology. All rights reserved. 19

Case Study: Remote Dependency Blues

Page 20: Exposing and Fixing Common App Performance Problems

© 2015 Riverbed Technology. All rights reserved. 20

§ 3x production load test against 100 servers lasting 3 hours – 7 million front-end transactions with tens of millions of backend calls – All captured by AppInternals with call tree details

§ They could never reach their performance goals

Scalability Testing for Government Compliance Th

roug

hput

(hits

/sec

ond)

Expected Behavior Throughput would stall & then repeatedly trash until the entire environment was reset

Page 21: Exposing and Fixing Common App Performance Problems

© 2015 Riverbed Technology. All rights reserved. 21

§ Application Development identified GetQuotes.jws as the root cause

§ The team that owned that web service disputed the finding – Another APM product had previously identified this as well – This was dismissed as a “Red Herring”

Remote Dependency: GetQuotes.jws

Page 22: Exposing and Fixing Common App Performance Problems

© 2015 Riverbed Technology. All rights reserved. 22

Big Data Reveals a Back-End Pattern

Transactions which call GetQuotes.jws This pattern correlates with the load trashing

This pattern precedes every burst of traffic

Transactions which do not call GetQuotes.jws These do not show any relationship to the issue

Throughput Stall & trashing

App

Inte

rnal

s R

espo

nse

Tim

e (s

)

Load

G

ener

ator

Th

roug

hput

§ AppInternals clearly proved that GetQuotes.jws was the root cause of the thrash

§ The Application Owner used this information to force the web service team to take ownership of their issue

Page 23: Exposing and Fixing Common App Performance Problems

© 2015 Riverbed Technology. All rights reserved. 23

§ Multiple applications were affected

§ Dozens of transaction types were degraded

§ Months of effort was previously wasted chasing phantoms

Business Impact

Page 24: Exposing and Fixing Common App Performance Problems

© 2015 Riverbed Technology. All rights reserved. 24 © 2015 Riverbed Technology. All rights reserved. 24

Key Takeaways

Troubleshoot in the context of the entire stack

Averages, aggregates & sampling can mask issues

Learn to spot tell-tale patterns

“Slowest” is not always “Worst”

Leverage Big Data approaches to eliminate noise

Page 25: Exposing and Fixing Common App Performance Problems

© 2015 Riverbed Technology. All rights reserved. 25

Try instantly at www.appinternals.com No Installation Required!

Page 26: Exposing and Fixing Common App Performance Problems

Thank You

© 2015 Riverbed Technology. All rights reserved. 26