scalability: pushing the limits · pushing the limits neha rai, tim schooley, tejas patil. 2. 3 ......
TRANSCRIPT
PNSQC Presentation, October 2014
Scalability: Pushing the Limits
Neha Rai, Tim Schooley, Tejas Patil
2
3
So what is “Scalability”?
“Scalability is the ability of a
system to successfully handle an
increasing workload, or its ability
to be expanded without major
architectural changes, or
detriment.”
For a good read, check out “Characteristics of Scalability and Their Impact on
Performance”, André B. Bondi, AT&T Labs
4
Once upon a time, there was…
5
Policy
User authentication
Auditing
Reports
Key escrow
6
[1] By Brian Snelson (originally posted to Flickr as Final assembly) [CC-BY-2.0 (http://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons
[2] By U.S. Navy photo by Lt. Arwen Chisholm [Public domain], via Wikimedia Commons
Mission Critical(Photographs for example only ; not indicative of actual customers)
[1] [2]
7
DBAH AH
McAfee ePolicy Orchestrator
Drive Encryption
Agent-Server Communication Interval (ASCI)
(Agent Handler(s))
McAfee Agent
8
Effects of changing the ASCI, with 100,000 clients
0
5
10
15
20
25
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Ave
rag
e n
um
be
r o
f clie
nt re
qu
ests
p
er
se
co
nd
Agent-Server Communication Interval (hours)
9
So we integrated into ePO...
Policy
Users
Auditing
Reports
Key escrow
10
• Does it meet our scalability expectation?
We had a number in mind, based on existing ePO scalability guidelines (goal of 100,000).
• Will it work for existing customers?
Mission critical. It has to work.
•Does it meet our quality goals?
Do we know what happens when the system reaches its limits?
Are we ready to roll it out?
11
Without testing the limits,
bad things™ can happen.
[Confidence in] ability to meet demand
Inve
stm
ent ($
) in
pu
shin
g t
he lim
its
0
12
Key take-away #1:
Understand the risks of
not doing Scalability Testing(this will help you determine if you need to do it)
13
DBAH AH
“5_G>I’N^O!”
What to test?
• Covers many
components
• High impact failure
case
• Simple result to
interpret
• Covers high
complexity code
• Covers a very
common use case
1.5x ASCI
14[1] By David B. Gleason from Chicago, IL (The Pentagon) [CC-BY-SA-2.0 (http://creativecommons.org/licenses/by-sa/2.0)], via Wikimedia Commons
[2] By Rev Stan, Harry Potter studio tour: The cupboard under the stairs [CC-BY-SA-2.0 (http://creativecommons.org/licenses/by-sa/2.0)], via Flickr
OK, where are you going to get allthe clients from?
(Note: this will depend on your architecture)
You might not have
one of these!
15
DBAH AH
ePolicy Orchestrator
“5_G>I’N^O!”
N nodesN nodes
16
So why did we have to simulate?
(Optimization)
x 100
Not testing Steve’s true ability
to cook under heavy demand.
17
So why did we have to simulate?
Meaningful data helps uncover
the limitations of the system.
(for us, it was user data)
18
Example causes of limitations
Larger calculations
Cache memory
Connection pools
Contention
Disk IO
Network IO
Recommendation: keep the hardware consistent, and don’t
use virtualization unless you expect your customers to use it.
19
20
21
Key take-away #2:
Define your test scenarios sensibly.
Aim for broad coverage
Keep acceptance criteria simple
Target complex areas
Suitable tools for gathering results
22
# Nodes
# r
eq
ue
sts
/se
con
dSo how did we run the tests?
(the goal was 100k, but we needed to find the limit)
Increasing cost (setup time)
23
• The first scalability tests were fireworks.
– Crashes, memory leaks, deadlocks.
– All uncovering high severity defects.
• We identified bottlenecks, then optimized.
– Expensive calculations.
– Expensive SQL transactions.
• We finally obtained a level of confidence.
– Now we’re ready to sell it.
What were our findings?(bearing in mind this was a new integration)
24
The results
ePO, Agent Handler and SQL server hardware:
Dell PowerEdge R515, 2.6GHZ 6C, 8GB, 7.2K SATA
Dell PowerEdge R715, 2x 2.0GHZ 8C, 8GB, 15K SAS
ASCI: 4 hours
Nodes: 100,000
Average requests per second (to DB): ~7
All tests passed on this configuration.
Notes: no other point products were installed.
These results are advisory only.
25
How might this apply elsewhere?
26
Cost vs Gain
[Confidence in] ability to meet demand
Inve
stm
ent ($
) in
pu
shin
g t
he lim
its
0
Law of diminishing returns
27
Key take-away #3:
Invest in Scalability appropriately
(it’s a bottomless pit, if you want it to be)
28
Summary
• Understand the risks of your system not meeting its Scalability requirements.
• Define your test scenarios sensibly.
• Invest appropriately in Scalability testing.
• Have fun, and enjoy the fireworks!
29
Remember to take the in-app Presentation Survey!