dsm scalability considerations for unicenter nsm r11
DESCRIPTION
DSM Scalability Considerations for Unicenter NSM r11. Last Updated June 5 2006. Best Practice Summary – see notes. 50k local objects polled in one DSM is fine for r11 Manage polling to not exceed 600 polls per second Must configure –m parameter to allow this load - PowerPoint PPT PresentationTRANSCRIPT
DSM Scalability Considerations for Unicenter NSM r11
- Last Updated June 5 2006
© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
Best Practice Summary – see notes
- 50k local objects polled in one DSM is fine for r11
- Manage polling to not exceed 600 polls per second
- Must configure –m parameter to allow this load
- We encourage managing poll cycle use avg >20% and <50% of poll time window
- More than 100 DSMs can report to one MDB
Detailed DSM Performance
© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
Objectives
- Understand issues affecting DSM performance
- Understand issues affecting scalability
- Consider architectural options
- Recommendations
Issues affecting DSM performance
© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
Understand issues affecting DSM performance
- Hardware
- Local vs remote DSM(s)
- Cold start vs. warm start
- Electronic proximity to hosts
- Network configuration and congestion
- Number of hosts
- Number of managed objects
- Polling configuration
© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
Hardware
- See Hardware Requirements in NSM r11 Implementation Guide for latest guidance
© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
Hardware- Does hardware matter? - 30,000 objects ~= 2 subnets with 50 objects per host
Hardware Comparison
05000
100001500020000250003000035000
:00 :05 :10 :15 :20 :25 :30 :35 :40
Elapsed Time
Obj
ects
2x3.0 Ghz HT 500Mhz
© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
Local vs remote DSM(s)
- For smaller implementations a local DSM on the MDB machine is OK
- For larger implementations, remote DSM(s) should be strongly considered
- DSM should be electronically close to what it polls and may connect to a remote MDB
© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
Local vs remote DSM(s)
Local vs Remote DSM
0
10000
20000
30000
40000
50000
60000
70000
:00 :05 :10 :15 :20 :25 :30 :35 :40 :45 :50 :55 :60
Elapsed Time
DS
M O
bje
cts
Local DSM Remote DSM (60k) Remote DSM (30k)
© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
Multiple Remote DSMs
- Multiple remote DSMs have a synergistic effect
2 Remote DSMs Startup
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
:00 :05 :10 :15 :20 :25 :30 :35 :40 :45 :50 :55 :60
Elapsed Time
Ob
ject
s
Remote DSM 1 Remote DSM 2 Combined
© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
Local vs remote DSM(s)
- Local and remote DSM not as strong
Local & Remote DSM Startup
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
:00 :05 :10 :15 :20 :25 :30 :35 :40 :45 :50 :55 :60
Elapsed Time
Ob
ject
s
Local DSM Remote DSM Combined
© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
Cold start vs. warm start
- Set “WarmStart=yes” option in %AGENTWORKS_DIR%\services\config\atmanager.ini
- Warm start uses previously discovered objects
- Reduces MDB access time
- Reduces discovery process time
- Must still confirm status
© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
Cold start vs. warm start- Startup measured as time to DSM settling
DSM start complete
© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
Cold start vs. warm start
- Startup elapsed times
Cold Start vs. Warm Start
Cold Start
Wam Start
0:00
0:14
0:28
0:43
0:57
1:12
1:26
Startup Type
Ela
pse
d T
ime
© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
Electronic proximity to hosts
- Standard best practice not more than 3 hops
- High performance LAN access to hosts and MDB
- Avoid WAN links
- Given a choice, put a DSM close to what it polls, instead of close to its MDB
- Missed traps is in indication of excessive load or network busy – reduce distance of polling/traps
© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
LAN Polling
Polls Per Second27,475 DSM Objects
aws_snmp -m 600
0
500
1000
1500
2000
2500
3000
3500
4000
4500
18:4
7:14
18:5
0:04
18:5
2:50
18:5
5:37
18:5
8:23
19:0
1:14
19:0
4:00
19:0
6:46
19:0
9:32
19:1
2:18
19:1
5:04
19:1
7:50
19:2
0:36
19:2
3:22
19:2
6:09
19:2
8:55
19:3
1:41
19:3
4:27
19:3
7:13
19:3
9:59
19:4
2:45
19:4
5:31
19:4
8:17
19:5
1:03
Polls Per Second
Average
+2 Std Dev
© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
Network configuration and congestion- DSM should usually handle whole subnets
- Fast/stable path to MDB
- Network utilization
- Errors, timeouts, and retries
- Missed traps must be addressed
- Poll cycle must have free time for lead peaking
- Size counts
© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
WAN Polling
Polls Per Second35,178 DSM Objectsaws_snmp -m 1000
0
500
1000
1500
2000
2500
3000
3500
4000
21:3
8:47
21:4
3:33
21:4
6:19
21:4
9:05
21:5
1:56
21:5
4:42
21:5
7:28
22:0
0:17
22:0
3:07
22:0
5:53
22:0
8:39
22:1
1:27
22:1
4:13
22:1
6:59
22:1
9:45
22:2
2:31
22:2
5:17
22:2
8:03
22:3
0:49
22:3
3:35
22:3
6:21
22:3
9:07
22:4
1:53
22:4
4:39
Polls Per Second
Average
+2 Std Dev
© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
Number of hosts
- Affects startup and first stage discovery
- Affects total DSM object population
- Affects DSM host configuration
© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
Number of objects
- Each managed host may spawn dozens of objects
- Agents
- Watchers
- Split DSMs to keep number of objects constrained
- Split DSMs to keep electronically close
- Obrowser and query with no argument displays objects – actual polled objects usually is fewer
© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
Polling configuration – see notes
- Polling interval
- Polling rate for r11 DSM sustained at up to 1,000 polls/second (laboratory only – do not exceed 600)
- Speeds discovery (?)
- Not needed for status polling
- 10 to 20 minutes polling still best practice
- 50,000 poll-able objects at 10 minute polling interval is about 80 polls/second
- Timeouts are critical- Assume timeout 10, retry 2 = 30 second delay
- DSM thread waits for reply or timeout on SNMPGET
- IP policy makes extensive use of SNMPGET
© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
Polling configuration
- Calculating polling rates
- Target no more than 50% MaxPollRate utilization and no less than 20% MaxPollRate utilization
- 200/sec: five minute interval is 300 seconds so do not attempt more than 30k polls in five minute interval (300 seconds X .50 X 200 polls per second) = 30k objects polled every 5 minutes
- Configure [aws_snmp] MaxPollRate in atservices.ini
Issues affecting scalability
© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
Issues affecting scalability
- Hardware
- What hardware is available?
- Can it support MDB + DSM?
- Network
- How electronically close are managed objects?
- Is there capacity to handle polling and trap traffic?
- How reliable is the network?
- Geographic proximity
- Do managed objects exist on other side of WAN?
- Polling
- What are the polling requirements?
© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
Issues affecting scalability
- Type of host activity
- Web server
- Application server
- Database server
- Batch server
Architectural options
© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
Architectural Options- Local DSM
- Fine for smaller shops
- Add remote DSMs as necessary
- Add remote DSMs to improve performance
- Use several smaller DSMs
- Closer to managed objects (most important tuning choice!)
- Faster startup
- More robust (not single point of failure)
- Reduces effect of an outage
- Bridged MDBs
- Distribute MDBs for better DSM access – not critical unless bandwidth to MDB limited and high update activity
Questions?