cg health reports interpretation

36
eHealth Health Reports eHealth Health Reports are a set of comprehensive reports which leverage historical data from the eHealth Console. Using this end-to-end historical data from the eHealth database, Health Reports help to analyze trends, calculate averages, and evaluate the health of the internet infrastructure. With this information, operators can determine how efficiently applications and systems are running, whether critical resources are available, and what capacity planning initiatives make sense. Health Report Concepts What is a Baseline Period The Health Report uses historical data collected from the eHealth database to analyze trends, calculate averages, and evaluate the health of critical resources. One analysis technique involves comparing the current statistics against the historical norm and highlighting any significant deviations. For example, historically, the daily traffic volume handled by a group of circuits may be around 20 Gbytes. If, one day, the Health Report indicates that the daily volume jumped to 35 Gbytes, this could point to a sudden shift in workload, rerouting of traffic or even repeated retransmission of traffic1. Thus the historical norm, referred to as the baseline in eHealth, is an important set of statistics that provides a framework for comparison and evaluation. The baseline period is a rolling period that projects back from the day the report is run. The Health Report compares hourly information to the same hour of the day and daily information to the same days of the week in the baseline period. The length of the baseline period depends on the type of analysis. For daily trend analysis, the Health Report uses a six-week baseline period as the default. For weekly analysis, the default is 13 weeks and for monthly analysis, the default is 12 months. What are Health Variables and Health Indices The Health Report automatically evaluates the performance of network devices, systems ,and applications by analyzing a set of key statistics. These statistics are referred to as Health Variables. Using a human analogy, the health variables can be compared to our vital signs such as heart rate, blood pressure and body temperature. By monitoring and analyzing the vital signs, we can deduce the health of the human body. In a similar way, health variables allow us to identify potential problems in the internet infrastructure. The Health Variables are different for each tech- nology. For example, the variables for an ethernet port are utilization, collisions, other errors and broadcast/multicast. For a system, the variables are communication, CPU, memory, storage, and system. For application response elements, the variables are average response, failed attempts, jitter, and unavailability. The sections for the eHealth—System and Application, eHealth—Response, and eHealth—Network contain a detailed description of the Health Variables for each technology type. Each health variable is evaluated against a set of thresholds to determine its Health Index. A high Health Index indicates problems, while a low one indicates a healthy element. Since a Health Report is generated for a group of elements, the average Health Index provides an important indication of the overall health of the group. 78 eHealth Health Reports © Copyright Concord Communications, Inc. 2002 1 The exact cause of the change can become apparent when you examine the other information provided in the Health Report. For example, over the same time period, if there were outages in other parts of the network, then the increase in traffic was likely caused by the rerouting of traffic elsewhere. On the other hand, if the routers reported seeing a large number of errors, then the traffic increase could be explained by repeated retransmissions.

Upload: ahmad-raza

Post on 17-Oct-2014

70 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: CG Health Reports Interpretation

eHealth Health Reports

eHealth Health Reports are a set of comprehensive reports which leveragehistorical data from the eHealth Console. Using this end-to-end historicaldata from the eHealth database, Health Reports help to analyze trends,calculate averages, and evaluate the health of the internet infrastructure.With this information, operators can determine how efficiently applicationsand systems are running, whether critical resources are available, and whatcapacity planning initiatives make sense.

Health Report Concepts

What is a Baseline Period

The Health Report uses historical data collectedfrom the eHealth database to analyze trends,calculate averages, and evaluate the health ofcritical resources. One analysis technique involvescomparing the current statistics against thehistorical norm and highlighting any significantdeviations. For example, historically, the dailytraffic volume handled by a group of circuits maybe around 20 Gbytes. If, one day, the HealthReport indicates that the daily volume jumped to35 Gbytes, this could point to a sudden shift inworkload, rerouting of traffic or even repeatedretransmission of traffic1. Thus the historicalnorm, referred to as the baseline in eHealth, is animportant set of statistics that provides a frameworkfor comparison and evaluation.

The baseline period is a rolling period that projectsback from the day the report is run. The HealthReport compares hourly information to the samehour of the day and daily information to the samedays of the week in the baseline period. Thelength of the baseline period depends on the typeof analysis. For daily trend analysis, the HealthReport uses a six-week baseline period as thedefault. For weekly analysis, the default is 13weeks and for monthly analysis, the default is 12months.

What are Health Variables and Health

Indices

The Health Report automatically evaluates theperformance of network devices, systems ,andapplications by analyzing a set of key statistics.These statistics are referred to as Health Variables.Using a human analogy, the health variables canbe compared to our vital signs such as heart rate,blood pressure and body temperature. Bymonitoring and analyzing the vital signs, wecan deduce the health of the human body. In asimilar way, health variables allow us to identifypotential problems in the internet infrastructure.The Health Variables are different for each tech-nology. For example, the variables for anethernet port are utilization, collisions, othererrors and broadcast/multicast. For a system, thevariables are communication, CPU, memory,storage, and system. For application responseelements, the variables are average response,failed attempts, jitter, and unavailability. The sections for the eHealth—System and Application,eHealth—Response, and eHealth—Networkcontain a detailed description of the HealthVariables for each technology type.

Each health variable is evaluated against a set ofthresholds to determine its Health Index. A highHealth Index indicates problems, while a low oneindicates a healthy element. Since a Health Reportis generated for a group of elements, the averageHealth Index provides an important indication ofthe overall health of the group.

78

eHealth Health Reports

© Copyright Concord Communications, Inc. 2002

1 The exact cause of thechange can become apparentwhen you examine the otherinformation provided in theHealth Report. For example,over the same time period, ifthere were outages in otherparts of the network, then theincrease in traffic was likelycaused by the rerouting oftraffic elsewhere. On the other hand, if the routersreported seeing a large numberof errors, then the trafficincrease could be explainedby repeated retransmissions.

Page 2: CG Health Reports Interpretation

For example, the following are the Health Indexthresholds for ethernet:

A variable that falls into a certain range is assigneda grade, as follows:

■ Excellent receives a grade of 0

■ Good receives a grade of 2

■ Fair receives a grade of 4

■ Poor receives a grade of 8

For example, during the poll, a network devicesuch as a router reports the following statisticsfor one of its ethernet ports:

■ Utilization was 15%2

■ Collision rate was 2%

■ Other error rate was 0%

■ There were 180 broadcasts or multicastsper second

Based on these statistics, eHealth will assign thefollowing grades to the ethernet element:

■ Utilization: Good (2)

■ Collisions: Excellent (0)

■ Other errors: Excellent (0)

■ Broadcast/multicast: Good (2)

■ Total Health Index: 4

Trend Analysiss

As mentioned previously, the historical dataover the baseline period is used to calculate the“norm” which is simply the statistical average ofa health variable over the period. At the sametime, the historical data is also used to calculatethe trends of all the health variables.

For each variable, the data over the baselineperiod is used to construct a trend line using thestatistical technique of linear regression. This willallow eHealth to “predict” future values for eachvariable based on the historical data. Specifically,the trend line will be compared to a pre-set trendthreshold to determine how long it will take forthe trend line to reach the threshold.

Figure 1 shows an example of trend analysis. Forthis variable, the trend threshold has been set to40. Based on historical data over the previous sixweeks, the trend line was constructed as shown.If the current trend continues, the trend linepredicts that the variable will reach the thresholdin 16 days.

79

eHealthTM Suite Content Guide

© Copyright Concord Communications, Inc. 2002

Variable Excellent Good Fair Poor

Utilization 0%-10% 10%-20% 20%-35% 35% and over

Collisions 0%-5% 5%-9% 9%-15% 15% and over

Other errors 0%-3% 3%-7% 7%-10% 10% and over

Broadcast/multicast 0-100 100-200 200-300 300 and over

2 Note that all the statisticsrepresent averages over theduration of the poll. If thepolling frequency is five min-utes, then the statisticindicates that the averageutilization over the five-minute interval was 15%.

Heal

thRe

port

s

Page 3: CG Health Reports Interpretation

Wed

-06/

02

Thu-0

6/03

Fri-06

/04

Sat-0

6/05

Sun-0

6/06

Mon

-06/

07

Tue-0

6/08

Wed

-06/

09

Thu-0

6/10

Fri-06

/11

Sat-0

6/12

Sun-0

6/13

Mon

-06/

14

Tue-0

6/15

Wed

-06/

16

Thu-0

6/17

Fri-06

/18

Sat-0

6/19

Sun-0

6/20

Mon

-06/

21

Tue-0

6/22

Wed

-06/

23

Thu-0

6/24

Fri-06

/25

Sat-0

6/26

Sun-0

6/27

Mon

-06/

28

Tue-0

6/29

Wed

-06/

30

Thu-0

7/01

Fri-07

/02

Sat-0

7/03

Sun-0

7/04

Mon

-7/0

5

Tue-0

7/06

Wed

-07/

07

Thu-0

7/08

Fri-07

/09

Sat-0

7/10

Sun-0

7/11

Mon

-07/

12

Wed

-07/

14

Thu-0

7/15

Fri-07

/16

Sat-0

7/17

Sun-0

7/18

Mon

-07/

19

Tue-0

7/20

Wed

-07/

21

Thu-0

7/22

Fri-0

7/23

Sat-0

7/24

Sun-0

7/25

Mon

-07/

26

Tue-0

7/27

Wed

-07/

28

Thu-0

7/29

Tue-0

7/13

Exception PointseHealth assigns exception points to elementsbased on the Health Index and on the Trendbehavior of each variable. The Situations toWatch chart and its supplemental report displaythe Trend behavior of variables for elements.

An element can receive 100 exception points foreach variable, with 66 points allocated to theHealth Index and 34 to the Trend analysis. TheTrend analysis is further divided between howclose the predicted value is to the threshold(Trend proximity) for 17 points and how rapidlythe trend is increasing (Trend slope) for the other17 points. The exception points are totaled foreach element for the reporting period. Themaximum number of exception points aLAN/WAN element can receive is 400.

In addition to the Health Index and Trendbehavior, eHealth assigns exception points toelements that have suddenly experienced newfaults.

Health Index Exception Points

The higher the Health Index an element receivesfor a variable, the more exception points eHealthassigns that element. For example, an elementwith few faults for a reported day would receivea low Health Index for faults and few or noexception points. If that same element experienceda high percentage of faults during the day, theelement would probably receive a high HealthIndex and all 66 exception points.

Elements rarely experience extreme problemsthat would cause them to receive the maximumHealth Index for an entire day. To make sure thatall error conditions are assigned the appropriatenumber of exception points, eHealth aggressivelyassigns exception points to lower health indicesusing a logarithmic equation.

80

eHealth Health Reports

© Copyright Concord Communications, Inc. 2002

Figure 1 – Trend Analysis

40

20

0

Trend Threshold

Trend Line16 Days

Page 4: CG Health Reports Interpretation

Trend Proximity Exception Points

eHealth evaluates the predicted behavior from theTrend analysis to determine how close the predictedvalue for a variable is to the threshold. The closerthe predicted value, the more exception pointseHealth assigns to the element.

If the predicted value is less than 90% of thethreshold, eHealth assigns no exception points.eHealth starts assigning exception points whenthe predicted value is at 90% of the threshold andassigns all 17 points once the predicted value is200% or above the threshold.

Trend Slope Exception Points

eHealth assigns exception points to the trendslope based on the steepness of the slope; thatis, the steeper the trend, the more points. Thesteepness of the trend slope is determined by thenumber of days predicted for the element to gofrom a value of zero to the threshold value.

If the number of days exceeds 120 (four months),no exception points are assigned. If the numberof days is 30 or less (one month), the maximum17 points are assigned.

Sudden New Errors

If an element has been fault free for at least eightdays and suddenly experiences new faults duringthe reporting period, eHealth assigns that element34 exception points. This type of error conditionmay be the result of the addition of a faulty orincorrectly configured piece of equipment tothe network.

Fine-tuning Exceptions

You can control which elements appear in theExceptions report by controlling:

■ The number of exception points an elementneeds to accumulate to appear on the reports.

■ The maximum number of elements appearingon the reports.

■ Changing the settings for the Health Index andTrend thresholds.

You control the number of exception points anelement must receive before it appears on thereport by modifying the minimum points cutofffor that element in the Service Profile. Anyelement whose accumulated exception points areless than the minimum does not appear on theExceptions section. The default minimum pointscutoff is 25.

Through the Service Profile, you can:

■ Set the minimum exception points cutoff

■ Change the Health Index thresholds

■ Change the Trend thresholds

For example, by lowering the minimum cutoff to20, more elements will show up in the Exceptionssection. On the other hand, by raising the cutoffto 40, elements will only show up in the reports ifthey are in serious trouble.

81

eHealthTM Suite Content Guide

© Copyright Concord Communications, Inc. 2002

Heal

thRe

port

s

Page 5: CG Health Reports Interpretation

Health Report Layout

A Health Report can be generated for a groupof elements. The report period can be a day, aweek or a month. Depending on the technology,the contents contained in the report can vary.However, each report has the format whichconsists of the following major sections.

The Exceptions section appears only on thedaily report. It identifies the elements that haveencountered problems during the report period.That is, the elements whose exceptions pointsexceeded the minimum cut-off. The exceptionsprovide a high-level “to do” list that prioritizesproblems across large networks. A sample ofthe exceptions and the method of assigningexception points will be discussed later.

82

eHealth Health Reports

© Copyright Concord Communications, Inc. 2002

Figure 2 – ExceptionsSection

Daily eHealth ReportReport for 05/15/2002

LAN/WAN

Exception Summary Report

Element Ranking Total

Rank Element Name Speed Type Points Exceptions Leading Exception

1 2600_ISDN-B1/0-2-RH-link-5 64.0 Kb WAN 97.4 2 Availability

2 2600_ISDN-B1/0-1-RH-link-4 64.0 Kb WAN 90.4 2 Availability

3 bass-enet-port-2 64.0 Kb WAN 81.5 1 Availability

4 2600_ISDN-B1/1-1-RH-link-7 64.0 Kb WAN 81.5 1 Availability

5 2600_ISDN-B1/1-2-RH-link-8 10.0 Mb MIB2LAN 73.6 2 Error Health Index

6 122.122.15.230-seg-1 10.0 Mb Ethernet 61.6 3 Utilization Health Index

7 labroute11-N0-RH-link-18 4.3 Gb WAN 51.2 1 Availability

8 122.122.15.190-enet-port-4 100.0 K MIB2LAN 41.1 1 Availability

9 122.122.15.190-enet-port-2 10.0 Mb MIB2LAN 41.1 1 Availability

10 baybox-seg 10.0 Mb Ethernet 28.3 2 Collision Health Index

Daily eHealth Report LAN/WAN

1) 2600_ISDN-B1/0-2-RH-link-5 64.0 Kbs WANTotal Number of Exceptions 2 Total Exception Points 97.4

Leading Exceptions Points Detail

1 Availability 82.2 Unavailable for 21.87 hours during period of report

2 Utilization Health Index 15.2 Between 12 AM and 2 AM (Out)

24 hours on 05/15/2002

Availability

100

80

60

40

20

0

Tot

al T

ime

12m 4a 8a 12n 4p 8p 12m

2) 2600_ISDN-B1/0-1-RH-link-4 64.0 Kbs WANTotal Number of Exceptions 2 Total Exception Points 90.4

Leading Exceptions Points Detail

1 Availability 72.9 Unavailable for 10.39 hours during period of report

2 Utilization Health Index 17.5 Between 12 AM and 2 AM (Out)

24 hours on 05/15/2002

Availability

100

80

60

40

20

0

Tot

al T

ime

12m 4a 8a 12n 4p 8p 12m

24 hours on 05/15/20023) bass-enet-port-2 10.0 Mbs MIB2LANTotal Number of Exceptions 2 Total Exception Points 73.6

Leading Exceptions Points Detail

1 Error Health Index 59.6 Between 12 AM and 12 AM (Out)

2 Error Trend 14.0 Threshold= 5.00 Prediction= 9.04 Actual= 8.38 (Out)

Errors In Errors Out

150

100

50

0

Fra

mes

per

sec

ond

12m 4a 8a 12n 4p 8p 12m

Report for 05/15/2002

Exceptions Summary

Exceptions Detail

Page 6: CG Health Reports Interpretation

The next section of the Health Report is theSummary which includes the following graphsand table:

■ Total Volume graph

■ Average Volume graph

■ Health Index graph

■ Situations to Watch table

83

eHealthTM Suite Content Guide

© Copyright Concord Communications, Inc. 2002

Figure 3 – SummarySectionDaily eHealth Report

Report for 05/15/2002LAN/WAN

TrendBytes

500G

400G

300G

200G

100G

0

Average Network Volume by Hour

HistoricalCurrent

25G

20G

15G

10G

5G

0

Byt

es

12:0

0 AM

1:00

AM

2:00

AM

3:00

AM

4:00

AM

5:00

AM

6:00

AM

7:00

AM

8:00

AM

9:00

AM

10:0

0 AM

11:0

0 AM

12:0

0 PM

1:00

PM

2:00

PM

3:00

PM

4:00

PM

5:00

PM

6:00

PM

7:00

PM

8:00

PM

9:00

PM

10:0

0 PM

11:0

0 PM

Average Health Index by Hour

ErrorsEthernet ErrorsNonunicastDiscardsCollisionsUtilization

Worse 16

12

8

4

Better 0

12:0

0 AM

1:00

AM

2:00

AM

3:00

AM

4:00

AM

5:00

AM

6:00

AM

7:00

AM

8:00

AM

9:00

AM

10:0

0 AM

11:0

0 AM

12:0

0 PM

1:00

PM

2:00

PM

3:00

PM

4:00

PM

5:00

PM

6:00

PM

7:00

PM

8:00

PM

9:00

PM

10:0

0 PM

11:0

0 PM

Situations to Watch

Threshold Daily Average Days to (from)

Rank Element Name Variable Value Actual Predicted Threshold

1 122.122.15.230-seg-1 Volume (Bandwidth % ) 20.000 23.818 19.402 2

2 2600_ISDN-B1/1-2-RH-link-8(In) Errors (% Frames) 5.000 0.023 1.628 Increasing

3 baybox-seg Volume (Bandwidth % ) 20.000 2.398 2.720 Increasing

4 2600_ISDN-B1/0-2-RH-link-5(In) Errors (% Frames) 5.000 0.045 0.392 Increasing

5 2600_ISDN-B1/0-1-RH-link-4(Out) Volume (Bandwidth % ) 80.000 4.743 2.172 Increasing

6 AREA51-enet-port-2(Out) Volume (Bandwidth % ) 80.000 1.652 1.710 Increasing

7 2600_ISDN-B1/0-2-RH-link-5(Out) Volume (Bandwidth % ) 80.000 2.868 1.537 Increasing

8 AREA51-enet-port-2(In) Volume (Bandwidth % ) 80.000 0.690 0.712 Increasing

9 100MBcard-slot1-enet-port-9(In) Nonunicast Rate 100.000 5.449 5.132 Increasing

10 100MBslot1-enet-port-10(In) Nonunicast Rate 100.000 1.362 1.460 Increasing

Total Network Volume by Day

TrendBytes

Byt

es

Mon

-04/

03

Tue-0

4/04

Wed

-04/

05

Thu-0

4/06

Fri-04

/07

Sat-0

4/08

Sun-0

4/09

Mon

-04/

10

Tue-0

4/11

Wed

-04/

12

Thu-0

4/13

Fri-04

/14

Sat-0

4/15

Sun-0

4/16

Mon

-04/

17

Tue-0

4/18

Wed

-04/

19

Thu-0

4/20

Fri-04

/21

Sat-0

4/22

Sun-0

4/23

Mon

-04/

24

Tue-0

4/25

Wed

-04/

26

Thu-0

4/27

Fri-04

/28

Sat-0

4/29

Sun-0

4/30

Mon

-05/

01

Tue-0

5/02

Wed

-05/

03

Thu-0

5/04

Fri-05

/05

Sat-0

5/06

Sun-0

5/07

Mon

-05/

08

Tue-0

5/09

Wed

-05/

10

Thu-0

5/11

Fri-05

/12

Sat-0

5/13

Sun-0

5/14

Mon

-05/

15

Total Volume Graph

Average Volume Graph

Health Index Graph

Situations to WatchTable

Volume will increase abovethreshold in 2 days

Significantly highervolume vs. baselinefor this day

Heal

thRe

port

s

Page 7: CG Health Reports Interpretation

The next section, the Top Ten, lists the tenelements with the highest volume or highestHealth Index using the following graphs andtables:

■ Volume Leaders graph and table

■ Health Index Leaders table

■ Volume Change Leaders table

■ Health Index Change Leaders table

84

eHealth Health Reports

© Copyright Concord Communications, Inc. 2002

Figure 4 – Top TenSection Daily eHealth Report LAN/WAN

Volume Leaders

30G

25G

20G

15G

10G

5G

0

Byt

es

1 2 3 4 5 6 7 8 9 10

Volume Leaders in Bytes

Prior Volume Bandwidth Health Index

Rank Rank Element Name Speed Bytes vs Baseline Avg Peak Avg Peak

1 1 100MBslot1-seg-8 100.0 Mbs 29.5 G 15.3% 2.7% 14.9% 0.1 2.0

2 7 122.122.15.230-seg-1 10.0 Mbs 25.7 G 53.3% 23.8% 88.8% 2.3 8.0

3 32 100MBcard-slot1-enet-port-11(In) 800.0 Mbs 23.3 G 52.4% 0.3% 1.9% 0.0 0.0

4 22 100MBcard-slot1-enet-port-9(In) 800.0 Mbs 18.6 G 15.3% 0.2% 2.3% 0.0 0.0

5 2 100MBslot1-enet-port-9(Out) 800.0 Mbs 15.6 G 6.6% 0.2% 3.1% 0.0 0.0

6 3 100MBslot1-enet-port-12(In) 800.0 Mbs 12.2 G 34.8% 0.1% 2.2% 0.0 0.0

7 11 100MBslot1-enet-port-9(In) 800.0 Mbs 10.6 G 28.2% 0.1% 1.7% 0.0 0.0

8 10 belly-enet-port-4 100.0 Mbs 8.2 G 36.9% 0.8% 17.8% 0.0 2.0

9 6 100MBslot1-enet-port-11(In) 800.0 Mbs 5.1 G -11.0% 0.1% 1.1% 0.0 0.0

10 9 100MBslot1-enet-port-12(Out) 800.0 Mbs 5.1 G 35.2% 0.1% 0.5% 0.0 0.0

Health Index Leaders

Prior Health Index Bandwidth

Rank Rank Element Name Contributor Avg Peak Avg Peak

1 1 bass-enet-port-2(Out) Errors 4.8 8.0 0.9% 69.1%

2 2 122.122.15.230-seg-1 Utilization 2.3 8.0 23.8% 88.8%

3 3 baybox-seg Collisions 0.3 16.0 2.4% 38.1%

4 43 2600_ISDN-B1/0-1-RH-link-4(Out) Utilization 0.2 8.0 4.7% 97.2%

5 5 ctron-ssr2-1-seg-5 Utilization 0.1 10.0 2.1% 76.6%

6 44 2600_ISDN-B1/0-2-RH-link-5(Out) Utilization 0.1 8.0 2.9% 97.2%

7 4 100MBslot1-seg-8 Utilization 0.1 2.0 2.7% 14.9%

8 10 AREA51-enet-port-2(Out) Utilization 0.1 4.0 1.7% 86.6%

9 9 shadows-link-2(In) Utilization 0.0 4.0 52.1% 87.0%

10 96 belly-enet-port-4 Utilization 0.0 2.0 0.8% 17.8%

Volume Change Leaders

Prior Volume

Rank Rank Element Name Bytes vs Baseline

1 1 ctron-ssr2-1-seg-9 1.7 M 600.0%

2 58 2600_ISDN-B1/0-1-RH-link-4(In) 5.3 M 176.4%

3 21 2600_ISDN-B1/0-1-RH-link-4(Out) 32.8 M 131.1%

4 3 AREA51-enet-port-2(Out) 1.8 G 124.0%

5 10 100MBcard-slot1-enet-port-10(In) 1.9 G 101.3%

6 2 ctron-ssr2-1-seg-6 111.9 M -96.0%

7 39 AMRAAM-enet-port-2(Out) 27.2 M -93.3%

8 35 122.122.15.190-enet-port-2(Out) 81.7 M 83.6%

9 13 2600_ISDN-B1/0-RH-link-3(In) 32.0 K -74.7%

10 19 ANT-enet-port-2(In) 45.7 M -69.3%

Health Index Change Leaders

Prior Health Index

Rank Rank Element Name Avg Prior Change

1 2 122.122.15.230-seg-1 2.34 0.32 2.02

2 44 2600_ISDN-B1/0-1-RH-link-4(Out) 0.17 0.00 0.17

3 1 bass-enet-port-2(Out) 4.79 4.95 -0.15

4 45 2600_ISDN-B1/0-2-RH-link-5(Out) 0.14 0.00 0.14

5 7 ctron-ssr2-1-seg-5 0.15 0.06 0.08

6 6 100MBslot1-seg-8 0.08 0.15 -0.08

7 8 2600_ISDN-B1/0-1-RH-link-4(In) 0.00 0.06 -0.06

8 12 2600_ISDN-B1/1-2-RH-link-8(In) 0.00 0.05 -0.05

9 96 belly-enet-port-4 0.04 0.00 0.04

10 10 AREA51-enet-port-2(Out) 0.05 0.01 0.04

Report for 05/15/2002

Volume Leaders Graph

Volume Leaders Table

Health Index LeadersTable

Volume Change Leaders Table

Health Index Change Leaders Table

Page 8: CG Health Reports Interpretation

The Element Top N section provides charts thatcompare the health and performance of eachelement in a group for the top number of elementsthat you specify. Each chart appears on its ownpage. The charts show the elements from highestor most utilization to least. The charts vary for

each technology type, but they can include HealthIndex comparisons and utilization comparisons.Health reports for systems offer several optionalcharts that you can choose to include in yourElement Top N section.

85

eHealthTM Suite Content Guide

© Copyright Concord Communications, Inc. 2002

Figure 5 – ElementTop N SectionDaily eHealth Report

Report for 05/15/2002Lan/Wan

Top 75 by Bandwidth Utilization

> 10091 - 10081 - 9071 - 8061 - 7051 - 6041 - 5031 - 4021 - 3011 - 201 - 100

100

80

60

40

20

0

Tim

e (%

)

benj

amin

-ene

t-por

t-4

benj

amin

-SH

-ene

t-por

t-4(In

)

OnS

iteR

MO

Npr

obe.

-seg

-2Vo

yage

r-seg

-1

Eng_

File

_Svr

-169

1-SH

-ene

t-por

t-2(In

)

HP1

1-U

nixS

erve

r-SH

-ene

t-por

t-1(In

)si

lver

-ene

t-por

t-1

Supp

ortL

abR

tr-Se

rial0

/1(In

)

zipp

er-D

B-se

rver

-SH

-ene

t-por

t-2(In

)

eins

tein

-SH

-ene

t-por

t-2(In

)

kog-

1691

-SH

-ene

t-por

t-2(In

)le

arn-

enet

-por

t-2(In

)

Sol_

SysE

DG

E-9-

SH-e

net-p

ort-2

(In)

Sola

ris-S

ysED

GE-

1691

-SH

-ene

t-por

t-2(In

)

Syst

emN

ame-

SH-e

net-p

ort-1

(In)

UN

IX-s

ysED

GE-

43-1

691-

SH-e

net-p

ort-2

(In)

PC-W

OR

KSTA

TIO

N-6

8-SH

-ene

t-por

t-2(In

)

PC-W

OR

KSTA

TIO

N12

-SH

-ene

t-por

t-2(In

)

Eng-

Lab-

Rtr-

Seria

l0/1

(In)

Sale

s_Fi

le_S

vr-S

H-e

net-p

ort-1

6777

219(

In)

DU

TCH

-SH

-ene

t-por

t-2(In

)

PC-S

ERVE

R23

-SH

-ene

t-por

t-2(In

)

933-

500-

SH-e

net-p

ort-1

(In)

933-

500-

SH-e

net-p

ort-1

-B(In

)

Buck

y-SH

-ene

t-por

t-167

7721

9(In

)

Top 75 by Element Health Index

CongestionErrorsEthernet ErrorsNonunicastDiscardsCollisionsUtilization

Worse 16

12

8

4

Better 0

122.

12.1

8.93

-SH

-ene

t-por

t-1(In

)

933-

500-

SH-e

net-p

ort-1

(In)

933-

500-

SH-e

net-p

ort-1

-A(In

)

933-

500-

SH-e

net-p

ort-1

-B(In

)

benj

amin

-ene

t-por

t-4

benj

amin

-SH

-ene

t-por

t-4(In

)

Syst

emN

ame-

SH-e

net-p

ort-1

(In)

BigR

oute

r-Eng

:-700

0-lin

k-15

5(In

)

BigR

oute

r-Eng

:-700

0-lin

k-14

0(In

)

BigR

oute

r-Eng

:-700

0-lin

k-10

(In)

BigR

oute

r-Eng

:-700

0-lin

k-15

3(In

)

BigR

oute

r-Eng

:-700

0-lin

k-15

5(O

ut)

BigR

oute

r-Eng

:-700

0-lin

k-9(

In)

BigR

oute

r-Eng

:-700

0-lin

k-5(

In)

BigR

oute

r-Eng

:-700

0-lin

k-10

(Out

)

BigR

oute

r-Eng

:-700

0-lin

k-5(

Out

)

BigR

oute

r-Eng

:-700

0-lin

k-15

3(O

ut)

BigR

oute

r-Eng

:-700

0-lin

k-9(

Out

)

BigR

oute

r-Eng

:-700

0-lin

k-14

0(O

ut)

Voya

ger-s

eg-1

OnS

iteR

MO

Npr

obe.

-seg

-2

122.

12.1

8.93

-SH

-ene

t-por

t-1(O

ut)

3Com

-Sal

esR

tr-lin

k-6(

In)

3Com

-Sal

esR

tr-R

H-e

net-p

ort-1

(In)

3Com

-Sal

esR

tr-R

H-e

net-p

ort-2

(In)

Top N by BandwidthUtilization Graph

Top N by Health Graph

Heal

thRe

port

s

Page 9: CG Health Reports Interpretation

The Element Detail compares data foreach element in the report. The graphs that areincluded for each element depends on thetechnology.

Typical graphs that are presented for eachelement may include:

■ Volume versus Baseline graph

■ Utilization graph

■ Average Health Index graph

Each page of the Element Detail section displaysup to 25 elements.

86

eHealth Health Reports

© Copyright Concord Communications, Inc. 2002

Figure 6 – ElementDetail Section Daily eHealth Report LAN/WAN

Element Volume vs Baseline by Day

Baseline HighBaseline AverageBaseline LowVolume

40G

35G

30G

25G

20G

15G

10G

5G

0

Byt

es

Bandwidth Utilization

> 10091 - 10081 - 9071 - 8061 - 7051 - 6041 - 5031 - 4021 - 3011 - 201 - 100

100

80

60

40

20

0

Tim

e (%

)

Element Health Index

ErrorsEthernet ErrorsNonunicastDiscardsCollisionsUtilization

Worse 16

12

8

4

Better 0

100M

Bcar

d-sl

ot1-

enet

-por

t-10

100M

Bcar

d-sl

ot1-

enet

-por

t-11

100M

Bcar

d-sl

ot1-

enet

-por

t-12

100M

Bcar

d-sl

ot1-

enet

-por

t-9

100M

Bslo

t1-e

net-p

ort-1

0

100M

Bslo

t1-e

net-p

ort-1

1

100M

Bslo

t1-e

net-p

ort-1

2

100M

Bslo

t1-e

net-p

ort-9

100M

Bslo

t1-s

eg-8

122.

122.

15.1

29-e

net-p

ort-1

122.

122.

15.1

90-e

net-p

ort-2

122.

122.

15.1

90-e

net-p

ort-4

122.

122.

15.2

00-e

net-p

ort-1

122.

122.

15.2

03-e

net-p

ort-1

122.

122.

15.2

29-e

net-p

ort-1

122.

122.

15.2

30-s

eg-1

122.

122.

15.4

8-en

et-p

ort-1

2600

_ISD

N-B

1/0-

1-R

H-li

nk-4

2600

_ISD

N-B

1/0-

2-R

H-li

nk-5

2600

_ISD

N-B

1/0-

RH

-link

-3

2600

_ISD

N-B

1/1-

1-R

H-li

nk-7

2600

_ISD

N-B

1/1-

2-R

H-li

nk-8

2600

_ISD

N-B

1/1-

RH

-link

-6

2600

_ISD

N-D

1-R

H-li

nk-1

5

2600

_ISD

N-D

2-R

H-li

nk-1

6

Report for 05/15/2002

Volume Versus Baseline Graph

Bandwidth UtilizationGraph

Average Heath IndexGraph

Page 10: CG Health Reports Interpretation

The last section provides the following graphs foreach element:

■ Availability graph

■ Reachability graph

■ Latency graph

87

eHealthTM Suite Content Guide

© Copyright Concord Communications, Inc. 2002

Figure 7 - Availability,Reachability, LatencySection

Daily eHealth Report LAN/WAN

Availability

Upper Margin of ErrorObservedLower Margin of ErrorAvailabilityPlanned Downtime

100

95

90

% A

vaila

bilit

y

Reachability

Upper Margin of ErrorObservedLower Margin of ErrorReachability

100

95

90

% R

each

abili

ty

Latency

> 5K msec1K - 5K msec500 - 1K msec70 - 500 msec< 70 msec

100

80

60

40

20

0

Tim

e (%

)

100M

Bcar

d-sl

ot1-

enet

-por

t-10

100M

Bcar

d-sl

ot1-

enet

-por

t-11

100M

Bcar

d-sl

ot1-

enet

-por

t-12

100M

Bcar

d-sl

ot1-

enet

-por

t-9

100M

Bslo

t1-e

net-p

ort-1

0

100M

Bslo

t1-e

net-p

ort-1

1

100M

Bslo

t1-e

net-p

ort-1

2

100M

Bslo

t1-e

net-p

ort-9

100M

Bslo

t1-s

eg-8

122.

122.

15.1

29-e

net-p

ort-1

122.

122.

15.1

90-e

net-p

ort-2

122.

122.

15.1

90-e

net-p

ort-4

122.

122.

15.2

00-e

net-p

ort-1

122.

122.

15.2

03-e

net-p

ort-1

122.

122.

15.2

29-e

net-p

ort-1

122.

122.

15.2

30-s

eg-1

122.

122.

15.4

8-en

et-p

ort-1

2600

_ISD

N-B

1/0-

1-R

H-li

nk-4

2600

_ISD

N-B

1/0-

2-R

H-li

nk-5

2600

_ISD

N-B

1/0-

RH

-link

-3

2600

_ISD

N-B

1/1-

1-R

H-li

nk-7

2600

_ISD

N-B

1/1-

2-R

H-li

nk-8

2600

_ISD

N-B

1/1-

RH

-link

-6

2600

_ISD

N-D

1-R

H-li

nk-1

5

2600

_ISD

N-D

2-R

H-li

nk-1

6

Report for 05/15/2002

Availability Graph

Reachability Graph

Latency Graph

Heal

thRe

port

s

Page 11: CG Health Reports Interpretation

The following is a walkthrough of a HealthReport generated for a group of LAN/WANelements. Health Reports for other technologiesretain the same format but the contents of thecharts and tables are customized for eachtechnology.

Exceptions

Exceptions reports highlight potential problemsin order of priority. You can use Exceptionsreports as a daily “to do” list to assign valuabletechnical staff to the most critical issues first.

In the daily report, eHealth identifies allelements that have accumulated a high numberof exception points as the result of suchoccurrences as errors, high bandwidth utilization,

and trends. The Exceptions section includes theExceptions Summary followed by the ExceptionsDetail.

Elements only appear in the Exceptions sectionwhen their accumulated exception points exceeda minimum number. You can control thisminimum cutoff.

Exceptions Summary

The Exceptions Summary lists those elementswith the highest accumulated exception points indescending order. The summary table providesthe following information on elements that haveaccumulated more exception points than theminimum cutoff.

88

eHealth Health Reports

© Copyright Concord Communications, Inc. 2002

Column Description

Rank The elements are ranked by the total number of exception points.

Element Name The name of the element.

Speed The speed of the element.

Element Type of element.

Ranking Points The total number of exception points accumulated by the element.

Total Exceptions The total number of different exception conditions accumulated bythe element.

Leading Exceptions The exception condition that received the most points for the element.

Health Report Walkthrough

Page 12: CG Health Reports Interpretation

Exceptions Detail

The Exceptions Detail report lists each elementthat has accumulated exception points andprovides the details about what caused the excep-tion points. The Exception Detail report lists upto five elements on each page. For each element,the report summarizes the exceptions andprovides a detailed listing of each exception con-dition and a thumbnail graph of the leadingexceptions.

The summary information is:

■ Name of the element

■ Speed of the element

■ Type of element

■ Total number of exception conditions seenfor this element

■ Total number of exception points accumulatedby this element

For each element, the report lists the followinginformation:

89

eHealthTM Suite Content Guide

© Copyright Concord Communications, Inc. 2002

Column Description

Leading Exceptions The names of the leading exceptions and the rank based on number of points received for each exception.

Points Total number of points that exception received.

Detail Detail information about the exception, such as its earliest and latest occurrence. During the period in which an exception is reported, the element might have had periods in which the condition causing the exception was not observed.

Figure 8 – Exception Detail

The thumbnail graph provided with eachelement is a Trend report of the leadingexception condition. The Trend report thatappears in the thumbnail graph is based on theexception and which report best represents thedata for that exception.

In the sample panel, the element Switch-seg-112encountered two exceptions. Both utilization andcollisions were high, thus contributing to theexception conditions. As a result, 50.5 exceptionpoints were assigned.

Note: Any gaps in the thumbnail graph resultfrom insufficient data about the element for thatperiod. For example, the element was down during the period.

1) Switch-seg-112 10.0 Mbs EthernetTotal Number of Exceptions 2 Total Exception Points 50.5

Leading Exceptions Points Detail

1 Utilization Health Index 27.0 Between 10AM and 8PM

2 Collision Health Index 23.5 Between 6PM and 8PM

24 hours on 05/15/2002

Bandwidth Utilization

100

80

60

40

20

0

Per

cent

12m 4a 8a 12n 4p 8p 12m

High bandwidth utilization

Heal

thRe

port

s

Page 13: CG Health Reports Interpretation

The Average Network Volume graph providesthe total network volume in frames, bytes, orpercentage of bandwidth utilization for all reportedelements. For reports run for a day, each barrepresents an hour of the day. For reports run fora week, each bar represents a day of the week.For reports run for a month, each bar representsa week of the month.

The graph includes the historical volume for thebaseline period. This volume is the average of thebaseline period for the same period of time. Forexample, an Hourly Network Volume graph for agroup of LAN/WAN elements was generated forthe previous Tuesday. Each hour is compared tothe average volume for the preceding fiveTuesdays in the six-week baseline period.

Summary

90

eHealth Health Reports

© Copyright Concord Communications, Inc. 2002

Figure 9 – Total Network Volume by Day 25G

20G

15G

10G

5G

0

Total Network Volume by Day

TrendBytes

Sat-0

3/30

Sun-0

3/31

Mon

-04/

01

Tue-

04/0

2

Wed

-04/

03

Thu-0

4/04

Fri-0

4/05

Wed

-03/

27

Thu-0

3/28

Fri-0

3/29

Sat-0

4/06

Sun-0

4/07

Mon

-04/

08

Tue-

04/0

9

Wed

-04/

10

Thu-0

4/11

Fri-0

4/12

Sat-0

4/13

Sun-0

4/14

Mon

-04/

15

Tue-

04/1

6

Wed

-04/

17

Thu-0

4/18

Fri-0

4/19

Sat-0

4/20

Sun-0

4/21

Mon

-04/

22

Tue-

04/2

3

Wed

-04/

24

Thu-0

4/25

Fri-0

4/26

Sat-0

4/27

Sun-0

4/28

Mon

-04/

29

Tue-

04/3

0

Wed

-05/

01

Thu-0

5/02

Fri-0

5/03

Sat-0

5/04

Sun-0

5/05

Mon

-05/

06

Tue-

05/0

7

Wed

-05/

08

Byt

es

Figure 10 – AverageNetwork Volume byHour

Average Network Volume by Hour

HistoricalCurrent

1.2G

1G

800M

600M

400M

200M

0

12:0

0 AM

1:00

AM

2:00

AM

3:00

AM

4:00

AM

5:00

AM

6:00

AM

7:00

AM

8:00

AM

9:00

AM

10:0

0 AM

11:0

0 AM

12:0

0 PM

1:00

PM

2:00

PM

3:00

PM

4:00

PM

5:00

PM

6:00

PM

7:00

PM

8:00

PM

9:00

PM

10:0

0 PM

11:0

0 PM

Byt

es

Unusually low volume

Trend line

Lower than usualafter-hour traffic

The Total Network Volume graph shows the totalvolume for all reported elements in frames, bytes,or percentage of bandwidth utilization. For dailyreports, each bar represents a day of the baselineperiod, with a maximum of 56 days. For weeklyreports, each bar represents a week of thebaseline period. The most recent day or week islisted on the right. For monthly reports, each barrepresents a month of the baseline period. Forhistorical comparison, the trend line over thebaseline period is also shown.

This panel provides a graphic representation ofyour LAN/WAN traffic. As eHealth accumulatesdata, you can identify the regular patterns of yourLAN/WAN traffic. For example, your networknormally has more activity on Mondays than

Fridays and more activity at the end of themonth. By monitoring traffic volume and trendindicators, you can baseline current activity andproject capacity requirements to address networkand application growth or seasonal businesscycles. By default, eHealth sets the Y axis todisplay the volume as the number of total bytesusing a floating scale.The above panel shows a typical day-of-the-weekvariation. The aggregate network traffic is gener-ally higher during weekdays than on weekends.However, traffic was unusually low on 4/18(Thursday) and 4/19 (Friday) but unusually highon 4/21 (Sunday). In addition, the trend lineslopes slightly upward, which indicates that thetraffic has been increasing slowly.

Page 14: CG Health Reports Interpretation

If you notice significant changes between thebaseline and what was reported, use the followingtables and graphs to identify which elementswere responsible for the sudden changes:

■ Volume Leaders

■ Health Index Leaders

■ Volume Change Leaders

■ Health Index Change Leaders

In addition, use the Exceptions section and theAt-a-Glance reports to further investigate thesechanges. By default, eHealth sets the Y axis to display the volume as the total bytes using a float-ing scale.

In the sample panel, traffic was highest between8:00 AM and 5:00 PM. Also, historical data (the black line) indicates that there is usually asignificant amount of after-hour traffic. However,on the day being reported, the after-hour trafficwas very low.

91

eHealthTM Suite Content Guide

© Copyright Concord Communications, Inc. 2002

Figure 11 – Average HealthIndex by Hour

The Hourly/Daily Health Index graph displaysthe Health Index for the network as a whole byaveraging the Health Index assigned to eachelement in the report. For daily reports, each barrepresents an hour of the day. For weekly reports,each bar represents a day of the week. For monthlyreports, this graph is replaced by the “Total NetworkVolume by Day.” The legend to the right of thegraph identifies the key performance metricsthat caused the Health Index rating. For moreinformation, refer to “About the Health Index.”

This graph provides an overview of the healthof all your LAN/WAN elements. A high HealthIndex usually indicates that many elements havesignificant problems. To identify which elementsexperienced problems, use the Health IndexLeaders table. This graph can also indicate whena problem started on the network.

In the sample panel, the network was healthythroughout the day. During the business hours,usage was higher, and as a result, the HealthIndex for utilization and congestion was morenoticeable, especially at 10:00 AM and 2:00 PM.

Average Health Index by Hour

Worse 16

12

8

4

Better 0

12:0

0 AM

1:00

AM

2:00

AM

3:00

AM

4:00

AM

5:00

AM

6:00

AM

7:00

AM

6:00

PM

7:00

PM

8:00

PM

9:00

PM

10:0

0 PM

11:0

0 PM

8:00

AM

9:00

AM

10:0

0 AM

11:0

0 AM

12:0

0 PM

1:00

PM

2:00

PM

3:00

PM

4:00

PM

5:00

PM

NonunicastErrorsCongestionDiscardsUtilization High Health Index

between 10:00 am and2:00 pm, primarily dueto abnormally highererrors and discards

Heal

thRe

port

s

Page 15: CG Health Reports Interpretation

The Situations to Watch table lists the elementsthat have exceeded or are predicted to exceed aTrend threshold indicating a potential issue thatcould affect performance. Situations to Watchprovides you with another opportunity to addressissues before they impact business-critical processes.eHealth compares the data collected from the previous day to the Trend threshold calculated for

each variable. For more information, refer toAbout Trend Thresholds in this guide. TheSituations to Watch table always lists ten elements,unless the group has fewer than ten elements. Anelement can be listed more than once if morethan one variable is increasing. This table providesthe following information:

92

eHealth Health Reports

© Copyright Concord Communications, Inc. 2002

Column Description

Rank The rank of the element that has exceeded or is closest to a threshold for a variable is

listed first.

Element Name The name of the element.

Variable The name of the variable.

Threshold Value The value assigned to the threshold for the variable.

Daily Average The value reported for the variable based on what the element actually did.

Actual

Daily Average The value predicted for the variable for the reported day. This value is based on the data in

Predicted the baseline period.

Days to (from) A number indicates the predicted number of days until the variable reaches the threshold. A

Threshold number in parentheses, e.g., (23), indicates the number of days calculated that the variable has

exceeded the threshold. A zero (0) indicates that the variable is at threshold. Increasing

indicates that the variable is approaching the threshold very slowly. Decreasing indicates that

the variable is improving; that is, moving away from the threshold.

Situations to Watch

Threshold Daily Average Days to (from)

Rank Element Name Variable Value Actual Predicted Threshold

1 countenetport1(Total) Collisions (% Frames) 15.00 2.21 2.21 Increasing

2 Backboneseg108(Total) Volume (Bandwidth % ) 20.00 3.93 2.13 Increasing

3 Accountingseg(Total) Volume (Bandwidth % ) 20.00 3.64 1.89 Increasing

4 MUDDAenetport2(Total) Ethernet Errors (% Frames 3.00 0.02 0.14 Increasing

5 Backboneseg112(Total) Volume (Bandwidth % ) 20.00 4.89 2.52 Increasing

6 countenetport1(Total) Volume (Bandwidth % ) 20.00 1.04 0.82 Increasing

7 calvinenetport4(Total) Collisions (% Frames) 15.00 1.32 0.36 Increasing

8 FRONTDESKenetport2(Total) Nonunicast Rate 100.00 9.62 10.95 Increasing

9 EXCELSIORenetport2(Total) Nonunicast Rate 100.00 10.18 11.36 Increasing

10 calvinenetport4(Total) Volume (Bandwidth % ) 20.00 1.70 0.51 Increasing

Figure 12 – Situation toWatch

In the sample panel above, many of the elementshave already exceeded the threshold. In particu-lar, the first element is marked as chronic whichmeans that it has exceeded the threshold for along period of time (at least three times the base-line period) and is not improving.

You should investigate any element that hasexceeded or is within two weeks or less of reachinga threshold. Use either the At-a-Glance or Trendreport for the specific element.

For example, if the errors are high for an element,the At-a-Glance or Trend report can be used togather additional information on the problem.Were there a lot of errors during the day? Whendid the errors first start occurring? Is there acorrelation between the error rate and othervariables such as utilization and collisions?

Page 16: CG Health Reports Interpretation

The predicted and actual daily average for an element that has exceeded threshold are generallyhigher than the threshold value. An element forwhich the problem has probably been fixed will

have a high predicted daily average (because ofthe trend line from the past), but the actual dailyaverage is below threshold.

93

eHealthTM Suite Content Guide

© Copyright Concord Communications, Inc. 2002

Some tips on using the table are:

If you observe:

High bandwidth utilization and/or high collision rate

Low bandwidth utilization and high collision rate

High number of errors

High discard rate

High rate of broadcasts and multicasts

Possible cause:

The link is overutilized; upgrade may be necessary

Possible hardware problem

Faulty adapter, cabling, or hub

The link is a bottleneck

Broadcast storm

Heal

thRe

port

s

Page 17: CG Health Reports Interpretation

Top Ten

The Volume Leaders graph and table list the tenelements that had the highest volume for the pre-vious day. The Volume Leaders charts provide aquick picture of the busiest LAN/WAN elementsin your environment. Sudden changes (e.g., theappearance of a new element) can indicate apotential capacity or performance issue andshould be investigated.

This graph and table always display ten elements,unless the group has fewer than ten elements.The table displays in the volume attributed toand the Health Index assigned to each element.The volume is displayed as the number of bytesand in percentage of bandwidth utilized. The elements are ordered according to the number of frames transmitted. This table provides thefollowing information:

94

eHealth Health Reports

© Copyright Concord Communications, Inc. 2002

Column Description

Rank The rank of the element for the report day.

Prior Rank The rank of the element the day before the report day.

Element Name The name of the element.

Speed The speed of the element.

Volume The number of bytes that the element handled.

Volume vs Baseline The percentage of difference between average for the day the report was run and the averagefor the entire baseline period.

Bandwidth Average The average bandwidth utilization for the report day.

Bandwidth Peak The highest value for any poll during the report day.

Health Index Average The average Health Index for the element during the report day.

Health Index Peak The highest Health Index assigned to the element during the report day.

Volume Leaders

12G

10G

8G

6G

4G

2G

0

Byt

es

1 2 3 4 5 6 7 8 9 10

Volume Leaders in Bytes

Prior Volume Bandwidth Health Index

Rank Rank Element Name Speed Bytes vs Baseline Avg Peak Avg Peak

1 1 MUDDA-enet-port-2(Total) 100.0 Mbs 11.3 G 51.2% 1.1% 5.7% 0.0 0.0

2 2 Backbone-seg-1001(Total) 100.0 Mbs 7.3 G 26.6% 0.2% 2.2% 0.4 8.0

3 3 Switch-rptrGroup-1(Total) 100.0 Mbs 7.3 G 26.6% 0.7% 7.3% 0.4 8.0

4 4 sparrow-enet-port-1(Total) 10.0 Mbs 6.3 G 100.7% 5.9% 40.0% 0.0 0.0

5 5 Production-enet-port-1(Total) 10.0 Mbs 5.9 G 53.6% 5.5% 13.7% 0.3 2.0

6 6 Dept-20-seg-4(Total) 10.0 Mbs 5.8 G 54.3% 5.5% 13.6% 0.3 2.0

7 7 hub1-seg(Total) 10.0 Mbs 5.8 G 53.7% 5.4% 13.6% 0.3 2.0

8 10 Marketing-seg(Total) 10.0 Mbs 5.6 G 60.1% 5.2% 13.4% 1.8 10.0

9 11 Engineering-seg(Total) 10.0 Mbs 5.5 G 60.5% 5.1% 13.3% 1.9 10.0

10 12 Sales-seg(Total) 10.0 Mbs 5.4 G 60.2% 5.1% 13.3% 1.7 10.0

Figure 13 – VolumeLeaders

These elements havethe heaviest volumes

Page 18: CG Health Reports Interpretation

In the sample panel, the first element generatedthe most traffic volume. It also registered a high(51.2%) percentage increase in volume versusthe baseline. However, judging from thebandwidth utilization and the Health Index, it isa healthy element. Furthermore, by comparingthe current and prior rank, it can be seen that theranking has largely stayed the same.

Generally, the same elements will appear in thistable every day. Any change in appearance (thatis, an element that normally appears is absent ora new element appears) can indicate a problem.Any element with a high Health Index should beinvestigated, particularly if that element also dis-

plays low volume. Significant differencesbetween the peak and average Health Index canalso indicate a problem.

The following tables can provide more information on LAN/WAN elements:

■ Health Index Leaders table

■ Volume Change Leaders table

■ Health Index Change Leaders table

In addition, At-a-Glance reports can also be generated to further investigate any potentialproblems.

95

eHealthTM Suite Content Guide

© Copyright Concord Communications, Inc. 2002

Health Index Leaders

Prior Health Index Bandwidth

Rank Rank Element Name Contributor Avg Peak Avg Peak

1 2 Engineering-seg(Total) Collisions 1.9 10.0 5.1% 13.3%

2 3 Marketing-seg(Total) Collisions 1.8 10.0 5.2% 13.4%

3 1 Sales-seg(Total) Collisions 1.7 10.0 5.1% 13.3%

4 4 Backbone-seg-112(Total) Utilization 0.9 16.0 4.9% 62.8%

5 9 Backbone-seg-108(Total) Utilization 0.4 6.0 3.9% 30.0%

6 7 Backbone-seg-1001(Total) Collisions 0.4 8.0 0.2% 2.2%

7 8 Switch-rptrGroup-1(Total) Collisions 0.4 8.0 0.7% 7.3%

8 11 hub1-seg(Total) Utilization 0.3 2.0 5.4% 13.6%

9 14 Accounting-seg(Total) Utilization 0.3 2.0 3.6% 16.6%

10 5 Production-enet-port-1(Total) Utilization 0.3 2.0 5.5% 13.7%

Figure 14 – HealthIndex Leaders

Column Description

Rank The rank of the element for the report day.

Prior Rank The rank of the element for the day before the report day.

Element Name The name of the element.

Contributor The name of variable contributing to the high Health Index.

Health Index Average The average Health Index for the element during the report day.

Health Index Peak The highest Health Index assigned to the element during the report day.

Bandwidth Average Average bandwidth utilization for the report day.

Bandwidth Peak The highest bandwidth utilization for any polling interval during the report day.

The Health Index Leaders table lists the ten elements that received the highest Health Indexvalues on the reported day. This table always liststen elements, unless the group has fewer than tenelements. Some of the listed elements couldactually be healthy. This table includes the name

of the variable that is causing the high HealthIndex. The highest Health Index an element canreceive is 32. For more information, refer to theAbout the Health Index section. This tableprovides the following information:

These 10 elements havethe highest Health Index,indicating poor health dueto high collisions andutilization

Heal

thRe

port

s

Page 19: CG Health Reports Interpretation

In the sample panel, traffic (utilization and collisions) contributed to the Health Index of allthe top ten leaders. In this case, generating collision rate Trend Reports for the top elementswould be appropriate to further quantify theextent of the collisions.

Any element with a high Health Index or significant change in rank should be investigated.An element with high bandwidth utilization mayhave experienced an unusually high amount oftraffic for a brief period. You should investigatethat element if it also had a high Health Index forerrors, discards, or congestion.

96

eHealth Health Reports

© Copyright Concord Communications, Inc. 2002

Volume Change Leaders

Prior Volume

Rank Rank Element Name Bytes vs Baseline

1 100 NT3-enet-port-2(Total) 3.7 G 181.5%

2 26 NANTUCKET-enet-port-2(Total) 2.1 G 167.9%

3 30 calvin-enet-port-4(Total) 1.8 G 151.4%

4 46 bigbird-enet-port-4(Total) 3.1 G 146.0%

5 3 SKYWALKER-enet-port-3(Total) 2.3 M 121.8%

6 22 Accounting-seg(Total) 3.9 G 112.0%

7 5 sparrow-enet-port-1(Total) 6.3 G 100.7%

8 27 Backbone-seg-108(Total) 4.2 G 100.2%

9 1 ShivaLANRoverE/PLUS-link-2(Total) 0.0 -100.0%

10 126 ShivaLANRoverE/PLUS-link-3(Total) 0.0 -100.0%

Figure 15 – VolumeChange Leaders

The Volume Change Leader table shows the elements that experienced the largest percentagechange in volume between the report day and thebaseline period average. This table always displaysten elements, unless the group has fewer than tenelements. An increase in volume is indicated

as a positive value, and a decrease in volume isindicated as a negative value. Elements areordered according to the raw numbers of thechange whether the change is positive or negative.

This table provides the following information:

Column Description

Rank The rank of the element for the report day.

Prior Rank The rank of the element for the day before the report day.

Element Name The name of the element.

Volume in Bytes The number of bytes it processed on the report day.

Volume vs Baseline The percentage of change between the average for the day the report was run and the average

for the entire baseline period.

Some tips on using the table are:

If you observe:

High bandwidth utilization and/or highcollision rate

Low bandwidth utilization and high collision rate

High number of errors

High discard rate

High rate of broadcasts and multicasts

Possible cause:

The link is over-utilized; upgrade may be necessary

Possible hardware problem

Faulty adapter, cabling, or hub

The link is a bottleneck

Broadcast storm

These elements have thebiggest change in volume ascompared to the baselineperiod average

Page 20: CG Health Reports Interpretation

In the sample panel, the top element(NT3-enet-port-2) experienced an 181.5%increase in volume. On the other hand, the ninthand tenth elements both experienced traffic thatcompletely stopped (-100%). Normally, thismight mean that the links were down. However,in this case, the two links represented dial-upremote connections that were not used duringthe reporting period.

Any large change in volume should be investigated.For example, if a link was down and the traffichad to be rerouted through a different link, thefirst link would experience a large percentagedecrease of volume, while the second link wouldexperience a large percentage increase.

97

eHealthTM Suite Content Guide

© Copyright Concord Communications, Inc. 2002

Figure 16 – Health IndexChange LeadersHealth Index Change Leaders

Prior Health Index

Rank Rank Element Name Avg Prior Change

1 2 Engineering-seg(Total) 1.91 0.75 1.16

2 3 Marketing-seg(Total) 1.83 0.75 1.07

3 1 Sales-seg(Total) 1.74 0.75 0.99

4 4 Backbone-seg-112(Total) 0.91 0.72 0.19

5 16 Accounting-seg(Total) 0.34 0.17 0.17

6 11 192.124.15.47-seg-4(Total) 0.00 0.17 -0.17

7 15 Backbone-seg-108(Total) 0.44 0.27 0.17

8 9 hub1-seg(Total) 0.34 0.25 0.09

9 8 Dept-20-seg-4(Total) 0.34 0.25 0.09

10 14 calvin-enet-port-4(Total) 0.00 0.08 -0.08

Column Description

Rank The rank of the element for the report day.

Prior Rank The rank of the element for the day before the report day.

Element Name The name of the element.

Health Index Average The average Health Index for the element during the report day.

Health Index Prior The average Health Index for the element during the previous day.

Health Index Change The change in the Health Index between the previous day and the report day.

The Health Index Change Leaders table lists theelements that had the largest change in theHealth Index from the day previous to the report-ing day. This table always displays ten elements,unless the group has fewer than ten.

A positive change (an increase in the HealthIndex) represents a deterioration in the health ofthe element, while a negative change (a decreasein the Health Index) represents an improvement.This table provides the following information:

In the sample panel, most of the elements in theTop Ten list experienced an increase (deterioration)in the Health Index. Nonetheless, the increasewas modest and did not indicate a problem. TheHealth Index of the top element went from 0.75to 1.91.

Changes in the Health Index could indicate aone-time occurrence, the start of a pattern, or aserious problem. To determine the cause for thechange, use this table with:

■ Hourly Health Index graph

■ Health Index Leaders table

These elements have thebiggest change in healthindex and are indicationa deterioration of healthcompared to the previousday

Heal

thRe

port

s

Page 21: CG Health Reports Interpretation

Element Top 75

The Element Top N section of a Health reportcompares the health and performance of theTop N elements in a group and enables you to compare their performance on several variables.By default, each chart appears on its own page,displaying 75 elements from highest or most uti-lization to least. Additional pages display moreelements. By specifying a larger or smaller num-ber of elements per page, eHealth administratorscan increase or decrease the number of elementsin each chart.

The Element Top N section of a Health reportfor LAN/WAN elements contains the followingcharts:

■ Bandwidth Utilization

■ Element Health Index

The following shows part of the Top 75Bandwidth Utilization chart (this shows the top 25).

98

eHealth Health Reports

© Copyright Concord Communications, Inc. 2002

Figure 17 – Top 75 byBandwidth Utilization

The elements are sorted such that the ones onthe left are the most heavily utilized. For a detaileddiscussion on how to read the bandwidthutilization chart, please refer to the followingsection on Element Detail.

Top upgrade candidates

Page 22: CG Health Reports Interpretation

The following shows part of the Top 75 ElementHealth Index chart (this shows the top 25).

The elements are sorted such that the ones onthe left have the highest Health Index. In thisexample, the first three elements all had highHealth Index because of errors. For a detailed dis-cussion on how to read the Element HealthIndex chart, please refer to the following sectionon Element Detail. 99

eHealthTM Suite Content Guide

© Copyright Concord Communications, Inc. 2002

Figure 18 – Top 75 byElement Health Index

Elements with highestHealth Index are the leasthealthy elements

Heal

thRe

port

s

Page 23: CG Health Reports Interpretation

Element Detail

This section of the report provides an elementsummary that compares the data for eachelement using the following charts:

■ Element Volume vs Baseline chart

■ Bandwidth Utilization chart

■ Element Health Index chart

100

eHealth Health Reports

© Copyright Concord Communications, Inc. 2002

Figure 19 – Volumeversus Baseline Element Volume vs Baseline by Day

Baseline HighBaseline AverageBaseline LowVolume

18G

16G

14G

12G

10G

8G

6G

4G

2G

0

conc

ord-

E0-en

et-p

ort-1

conc

ord-

15-e

net-p

ort-4

coste

llo-S

H-ene

t-por

t-2

ctron

hub1

-ene

t-por

t-1

ctron

hub2

-ene

t-por

t-1

elvis-

enet

-por

t-4

ente

rpris

e-SH-e

net-p

ort-2

fatim

a-en

et-p

ort-4

fozz

i-SH-e

net-p

ort-4

grov

er-e

net-p

ort-4

grov

er-e

net-p

ort-4

A

hub1

-seg

mar

vin-e

net-p

ort-1

mon

k-en

et-p

ort-4

nexu

s-en

et-p

ort-2

osca

r-SH-e

net-p

ort-4

phoe

nix-S

H-ene

t-por

t-1

phoe

nix-S

H-ene

t-por

t-2

piggy

-ene

t-por

t-4

pike-

SH-ene

t-por

t-2

road

runn

er-S

H-ene

t-por

t-1

scoo

ter-S

H-ene

t-por

t-4

snee

zy-e

net-p

ort-4

spar

row-S

H-ene

t-por

t-1

switc

h1-rp

trGro

up-1

Byt

es

Some tips on using this graph are:

If the baseline values are:

High and low far apart and the average is veryhigh or very low

High and low far apart and the average in themiddle

Close

High and top of bar are equal

Low and top of bar are equal

Indicates element volume:

Experienced sudden and dramatic change

Is inconsistent

Is consistent and steady

Hit a new high

Hit a new low

The Volume vs Baseline graph displays the volumein frames, bytes, or percentage of bandwidthutilization for the reporting days. Each barrepresents the volume carried by that elementover the report period.

The graph displays three baseline values: thebaseline high, low, and average. For moreinformation, refer to the About the BaselinePeriod section. This graph displays up to 25elements in alphanumeric order.

If you are reporting on more than 25 elements,the remaining elements are listed on additionalpages. This graph can help you separateanomalies from long-term trends by identifyingelements that experience above- or below-averagevolumes. You should investigate the cause for anysignificant changes.

By default, eHealth sets the Y axis to display the volume as the number of total bytes using a floating scale.

New highs forthese elements

Page 24: CG Health Reports Interpretation

In Figure 19 all the elements showed highvariations in traffic volume because the highand low were far apart. A few elements also hita new high since the high coincided with thetop of the bar.

Some elements are inherently half-duplex, forexample, classic ethernet. When a device istransmitting, it cannot be receiving. On the other

hand, elements such as a WAN link or a framerelay virtual circuit are full-duplex. There is traffic in two directions - incoming and outgoing.For a full-duplex element, there are two bars foreach element - one for each direction. Thefollowing is an example of the Volume vsBaseline chart for full-duplex elements.

101

eHealthTM Suite Content Guide

© Copyright Concord Communications, Inc. 2002

Figure 20 – Volumeversus Baseline (Full-Duplex)

Element Volume vs Baseline by Day

Baseline HighBaseline AverageBaseline LowVolume

8G

6G

4G

2G

0

conc

ord-

E0-en

et-p

ort-1

conc

ord-

15-e

net-p

ort-4

coste

llo-S

H-ene

t-por

t-2

ctron

hub1

-ene

t-por

t-1

ctron

hub2

-ene

t-por

t-1

elvis-

enet

-por

t-4

ente

rpris

e-SH-e

net-p

ort-2

fatim

a-en

et-p

ort-4

fozz

i-SH-e

net-p

ort-4

grov

er-e

net-p

ort-4

grov

er-e

net-p

ort-4

A

hub1

-seg

mar

vin-e

net-p

ort-1

mon

k-en

et-p

ort-4

nexu

s-en

et-p

ort-2

osca

r-SH-e

net-p

ort-4

phoe

nix-S

H-ene

t-por

t-1

phoe

nix-S

H-ene

t-por

t-2

piggy

-ene

t-por

t-4

pike-

SH-ene

t-por

t-2

road

runn

er-S

H-ene

t-por

t-1

scoo

ter-S

H-ene

t-por

t-4

snee

zy-e

net-p

ort-4

spar

row-S

H-ene

t-por

t-1

switc

h1-rp

trGro

up-1

Byt

es

In the sample panel, each element has two bars -the one on the left is for “In” data and the one onthe right is for “Out” data. From this graph, wecan see that for some of the elements (for example,the first one), the traffic is not balanced. Therewas more traffic going out of the element thancoming in.

The Bandwidth Utilization chart displays howmuch time an element spends in a particularbandwidth utilization range. This graph displaysup to 25 elements in alphanumeric order. If youare reporting on more than 25 elements, the

remaining elements are listed on additionalpages. The Y axis represents time as a percentage,and the X axis displays each element. The heightof a section of a bar indicates the percentage oftime that element spent in a particularbandwidth utilization range.

If the bar for an element does not reach 100% (orthe top of the panel), the element missed somepolls. A missed poll can result from problems inthe network or when the SNMP agent at thedevice is down.

New low

Out data

In data

Heal

thRe

port

s

Page 25: CG Health Reports Interpretation

This graph provides a very concise description ofthe traffic pattern of the elements. For example,if the entire bar is between 1-10%, it means thatthe traffic was uniformly low. If the entire bar is0, the element was probably down. If the barshows mostly low utilization with some timespent in the high utilization ranges, then the traffic was bursty in nature. Based on utilizationor traffic patterns, you can redistribute workloadacross resources based on actual usage. For example, you may be able to move traffic from an

over-utilized WAN link to an under-utilized link,avoiding costly and unnecessary upgrades. Or,you may be able to reduce WAN costs by eliminating low-volume links altogether. Balancing the workload across all your resourcesenables you to optimize performance, improvenetwork efficiency and maximize your current ITinvestment.

The following is an example of the BandwidthUtilization graph for full-duplex elements:

102

eHealth Health Reports

© Copyright Concord Communications, Inc. 2002

Bandwidth Utilization

> 10091 - 10081 - 9071 - 8061 - 7051 - 6041 - 5031 - 4021 - 3011 - 201 - 100

100

80

60

40

20

0

conc

ord-

E0-en

et-p

ort-1

conc

ord-

15-e

net-p

ort-4

coste

llo-S

H-ene

t-por

t-2

ctron

hub1

-ene

t-por

t-1

ctron

hub2

-ene

t-por

t-1

elvis-

enet

-por

t-4

ente

rpris

e-SH-e

net-p

ort-2

fatim

a-en

et-p

ort-4

fozz

i-SH-e

net-p

ort-4

grov

er-e

net-p

ort-4

grov

er-e

net-p

ort-4

A

hub1

-seg

mar

vin-e

net-p

ort-1

mon

k-en

et-p

ort-4

nexu

s-en

et-p

ort-2

osca

r-SH-e

net-p

ort-4

phoe

nix-S

H-ene

t-por

t-1

phoe

nix-S

H-ene

t-por

t-2

piggy

-ene

t-por

t-4

pike-

SH-ene

t-por

t-2

road

runn

er-S

H-ene

t-por

t-1

scoo

ter-S

H-ene

t-por

t-4

snee

zy-e

net-p

ort-4

spar

row-S

H-ene

t-por

t-1

switc

h1-rp

trGro

up-1

Tim

e (%

)

Bandwidth Utilization

100

80

60

40

20

0

conc

ord-

E0-en

et-p

ort-1

conc

ord-

15-e

net-p

ort-4

coste

llo-S

H-ene

t-por

t-2

ctron

hub1

-ene

t-por

t-1

ctron

hub2

-ene

t-por

t-1

elvis-

enet

-por

t-4

ente

rpris

e-SH-e

net-p

ort-2

fatim

a-en

et-p

ort-4

fozz

i-SH-e

net-p

ort-4

grov

er-e

net-p

ort-4

grov

er-e

net-p

ort-4

A

hub1

-seg

mar

vin-e

net-p

ort-1

mon

k-en

et-p

ort-4

nexu

s-en

et-p

ort-2

osca

r-SH-e

net-p

ort-4

phoe

nix-S

H-ene

t-por

t-1

phoe

nix-S

H-ene

t-por

t-2

piggy

-ene

t-por

t-4

pike-

SH-ene

t-por

t-2

road

runn

er-S

H-ene

t-por

t-1

scoo

ter-S

H-ene

t-por

t-4

snee

zy-e

net-p

ort-4

spar

row-S

H-ene

t-por

t-1

switc

h1-rp

trGro

up-1

> 10091 - 10081 - 9071 - 8061 - 7051 - 6041 - 5031 - 4021 - 3011 - 201 - 100

Tim

e (%

)

Figure 21 – BandwidthUtilization

Figure 22 – BandwidthUtilization (Full-Duplex)

Low utilization

Low utilization withhigh traffic bursts

High utilization in theincoming direction

Probably down

Probably down

Page 26: CG Health Reports Interpretation

The Element Health Index graph displays theaverage Health Index for all of the elements inthe report. This graph displays up to 25 elementsin alphanumeric order. If you are reporting onmore than 25 elements, the remaining elementsare listed on additional pages. The Y axisrepresents the average Health Index assigned toeach element, and the X axis displays thevariables causing the Health Index for each element. The legend to the right of the graph

identifies the particular variable. For moreinformation on the variables, refer to the Aboutthe Health Index section.

In the sample panel above, one element exhibiteda high number of errors. At this point, an At-a-Glance report (see explanation later) would beuseful to investigate this problem further.

The following sample panel is an example ofan Element Health Index graph for full-duplexelements:

103

eHealthTM Suite Content Guide

© Copyright Concord Communications, Inc. 2002

Element Health Index

NonunicastEthernet ErrorsErrorsDiscarded FramesCollisionsUtilization

Worse 16

12

8

4

Better 0

conc

ord-

E0-en

et-p

ort-1

conc

ord-

15-e

net-p

ort-4

coste

llo-S

H-ene

t-por

t-2

ctron

hub1

-ene

t-por

t-1

ctron

hub2

-ene

t-por

t-1

elvis-

enet

-por

t-4

ente

rpris

e-SH-e

net-p

ort-2

fatim

a-en

et-p

ort-4

fozz

i-SH-e

net-p

ort-4

grov

er-e

net-p

ort-4

grov

er-e

net-p

ort-4

A

hub1

-seg

mar

vin-e

net-p

ort-1

mon

k-en

et-p

ort-4

nexu

s-en

et-p

ort-2

osca

r-SH-e

net-p

ort-4

phoe

nix-S

H-ene

t-por

t-1

phoe

nix-S

H-ene

t-por

t-2

piggy

-ene

t-por

t-4

pike-

SH-ene

t-por

t-2

road

runn

er-S

H-ene

t-por

t-1

scoo

ter-S

H-ene

t-por

t-4

snee

zy-e

net-p

ort-4

spar

row-S

H-ene

t-por

t-1

switc

h1-rp

trGro

up-1

Element Health Index

NonunicastErrorsCongestionDiscarded FramesUtilization

Worse 16

12

8

4

Better 0

Cis

co-7

000-

S3/0

Cis

co-7

000-

S3/1

Cis

co-7

000-

S3/1

-dlc

i-16

Cis

co-7

000-

S3/1

-dlc

i-17

Cis

co-7

000-

S3/1

-dlc

i-18

Cis

co-7

000-

S3/1

-dlc

i-19

Cis

co-7

000-

S3/1

-dlc

i-20

Cis

co-7

000-

S3/1

-dlc

i-21

Cis

co-7

000-

S3/1

-dlc

i-22

Cis

co-7

000-

S3/1

-dlc

i-24

Cis

co-7

000-

S3/1

-dlc

i-25

Cis

co-7

000-

S3/1

-dlc

i-26

Cis

co-7

000-

S3/1

-dlc

i-27

Cis

co-7

000-

S3/1

-dlc

i-28

Cis

co-7

000-

S3/1

-dlc

i-31

Cis

co-7

000-

S3/1

-dlc

i-35

Cis

co-7

000-

S3/2

Cis

co-7

000-

S3/3

Cis

co-7

000-

S3/4

Cis

co-7

000-

S3/5

Cis

co-7

000-

S3/6

Cis

co-7

000-

S3/7

Cis

co-7

000-

S4/0

Cis

co-7

000-

S4/1

Cis

co-7

000-

S4/2

Figure 23 – Health Index

Figure 24 – Health Index(Full-Duplex)

In the sample panel, congestion was observed inthe incoming direction only, possibly because thetraffic was heaviest in that direction.

High number of errors

High number of errors

Heal

thRe

port

s

Page 27: CG Health Reports Interpretation

Availability/Reachability/Latency

This supplemental report consists of three graphs:Availability, Reachability, and Latency.

It measures the level of service the networkis providing.

104

eHealth Health Reports

© Copyright Concord Communications, Inc. 2002

Availability

% A

vaila

bilit

y100

95

90

Upper Margin of Error

Lower Margin of ErrorAvailability

Observed

conc

ord-

E0-en

et-p

ort-1

conc

ord-

15-e

net-p

ort-4

coste

llo-S

H-ene

t-por

t-2

ctron

hub1

-ene

t-por

t-1

ctron

hub2

-ene

t-por

t-1

elvis-

enet

-por

t-4

ente

rpris

e-SH-e

net-p

ort-2

fatim

a-en

et-p

ort-4

fozz

i-SH-e

net-p

ort-4

grov

er-e

net-p

ort-4

grov

er-e

net-p

ort-4

A

hub1

-seg

mar

vin-e

net-p

ort-1

mon

k-en

et-p

ort-4

nexu

s-en

et-p

ort-2

osca

r-SH-e

net-p

ort-4

phoe

nix-S

H-ene

t-por

t-1

phoe

nix-S

H-ene

t-por

t-2

piggy

-ene

t-por

t-4

pike-

SH-ene

t-por

t-2

road

runn

er-S

H-ene

t-por

t-1

scoo

ter-S

H-ene

t-por

t-4

snee

zy-e

net-p

ort-4

spar

row-S

H-ene

t-por

t-1

switc

h1-rp

trGro

up-1

Planned Downtime

Reachability

% R

each

abili

ty

100

95

90

Upper Margin of Error

Lower Margin of ErrorReachability

Observed

conc

ord-

E0-en

et-p

ort-1

conc

ord-

15-e

net-p

ort-4

coste

llo-S

H-ene

t-por

t-2

ctron

hub1

-ene

t-por

t-1

ctron

hub2

-ene

t-por

t-1

elvis-

enet

-por

t-4

ente

rpris

e-SH-e

net-p

ort-2

fatim

a-en

et-p

ort-4

fozz

i-SH-e

net-p

ort-4

grov

er-e

net-p

ort-4

grov

er-e

net-p

ort-4

A

hub1

-seg

mar

vin-e

net-p

ort-1

mon

k-en

et-p

ort-4

nexu

s-en

et-p

ort-2

osca

r-SH-e

net-p

ort-4

phoe

nix-S

H-ene

t-por

t-1

phoe

nix-S

H-ene

t-por

t-2

piggy

-ene

t-por

t-4

pike-

SH-ene

t-por

t-2

road

runn

er-S

H-ene

t-por

t-1

scoo

ter-S

H-ene

t-por

t-4

snee

zy-e

net-p

ort-4

spar

row-S

H-ene

t-por

t-1

switc

h1-rp

trGro

up-1

Figure 25 – Availability

Figure 26 – Reachability

The Availability chart shows the percentage oftime each element was active and running.

eHealth measures availability with theifLastChange MIB variable for each interface.If the interface status changes between polls,eHealth uses the value of ifLastChange toestimate the amount of time the interface wasavailable.

This graph can be used to identify elements thatexperienced availability problems during thereporting period. In the sample panel, the firstelement only achieved an availability of approximately 98%.

The margin of error takes into consideration thetime during which the eHealth poller was down and no data was collected as a result. In theexample, the eHealth poller was down 3% of thetime. During the time the poller was up, all theelements were available 100% of the time exceptfor the first element. Hence the availability forthose elements has an upper margin of 100%(assuming the elements were up during theremaining 3% of the time) and a lower margin of97% (assuming they were down during theremaining 3%). The actual availability for eachelement can be anywhere in between.

The Reachability chart shows the percentage oftime that the elements are reachable. eHealthconsiders an element unreachable when a missedpoll occurs.

In the sample panel, the elements did not achieve100% reachability during the reporting period. Itvaried between 98% and 99%. This graph can beused to identify the element or groups of elementsthat are most prone to reachability problems.

98% availability

Page 28: CG Health Reports Interpretation

Just as in the case of the Availability chart, themargin of error takes into account the time thatthe eHealth poller was down. The upper marginof error represents the most optimistic estimate

and the lower margin of error represents the mostpessimistic estimate of what happened during thetime when no data was collected.

105

eHealthTM Suite Content Guide

© Copyright Concord Communications, Inc. 2002

Figure 27 – LatencyLatency

Tim

e (%

)

100

80

40

20

60

0

conc

ord-

E0-en

et-p

ort-1

conc

ord-

15-e

net-p

ort-4

coste

llo-S

H-ene

t-por

t-2

ctron

hub1

-ene

t-por

t-1

ctron

hub2

-ene

t-por

t-1

elvis-

enet

-por

t-4

ente

rpris

e-SH-e

net-p

ort-2

fatim

a-en

et-p

ort-4

fozz

i-SH-e

net-p

ort-4

grov

er-e

net-p

ort-4

grov

er-e

net-p

ort-4

A

hub1

-seg

mar

vin-e

net-p

ort-1

mon

k-en

et-p

ort-4

nexu

s-en

et-p

ort-2

osca

r-SH-e

net-p

ort-4

phoe

nix-S

H-ene

t-por

t-1

phoe

nix-S

H-ene

t-por

t-2

piggy

-ene

t-por

t-4

pike-

SH-ene

t-por

t-2

road

runn

er-S

H-ene

t-por

t-1

scoo

ter-S

H-ene

t-por

t-4

snee

zy-e

net-p

ort-4

spar

row-S

H-ene

t-por

t-1

switc

h1-rp

trGro

up-1

> 5K msec1K - 5K msec500 - 1K msec70 - 500 msec< 70 msec

The Latency chart displays the latencydistribution for each interface, that is, the percentage of time that the latency is within a certain range. eHealth measures latency as thelength of time in milliseconds a ping takes toreach a network device and return to the eHealthworkstation. In addition to measuring latencyfrom the perspective of the eHealth station, it isalso possible to define alternate latency sourceswhich can initiate ping messages. These alternatesources can be routers located elsewhere in the network.

In the sample panel, most of the elements experienced latency of under 70msec. One element, however, had latency of between70msec and 500msec 20% of the time. Whenusers complain about slow network response,this graph can be used to determine whetherthe delay is network related.

Latency was between70 msec and 500 msecabout 20% of the time

Heal

thRe

port

s

Page 29: CG Health Reports Interpretation

The eHealth console can discover and collectstatistics from a wide range of network devices

and resources. These devices can be groupedinto the following technologies:

106

eHealth Health Reports

© Copyright Concord Communications, Inc. 2002

Technology Key Technologies

Network LAN/WAN, router/switch, frame relay, ATM and remote access.

Response eHealth gateway that enables the collection of response time data from a variety of sources for

real-time analysis, historical reporting and service-level documentation.

System eHealth gateway that enables the collection of system and application data from eHealth

SystemEDGE, eHealth AIMs, and third-party SNMP agents for real-time analysis and

historical reporting.

Technology Support and Device Certification

The support of each requires a software key.Entering the appropriate key will enable the software to discover and collect information fromdevices of that technology.

Using SNMP, eHealth console collects statisticsfrom a wide variety of standard MIB objectsdefined by IETF (Internet Engineering TaskForce). For example, a router may support MIB-II objects, the ethernet MIB, the frame relayMIB and the ATM MIB. In addition, manydevices also provide vital performance and availability statistics, such as processor and bufferutilization, through private or proprietary MIBobjects. In order to ensure that eHealth canextract the most accurate information from thedevices, the Certification Group goes through arigorous process of decoding, testing, and certify-ing a large number of network devices from allthe major manufacturers. You can find acomplete list of the supported devices from thecertified devices database on the ConcordWeb site.

A typical network today consists of devices fromdifferent vendors. These devices may provideperformance and availability statistics from differ-ent MIB objects. In order to present a unifiedview for all devices of the same technology class,eHealth normalizes all statistics into generic ven-dor-independent variables. For example, eHealthmay be collecting ethernet performance statisticsfrom a database server, a LAN switch, a routerand an RMON probe. Each device may reportstatistics through a different combination ofstandard and private MIB objects. eHealthnormalizes the statistics into four generic healthvariables – bandwidth utilization, collisions,errors, and non-unicasts. Using this technique,all ethernet ports across the enterprise networkcan be monitored and assessed using a commonconsistent framework.

Page 30: CG Health Reports Interpretation

Variable Excellent Good Fair Poor

Utilization as a percentage 0 up to 10% 10 up to 20% 20 up to 35% 35% and over

of bandwidth

Collisions as a percentage 0 up to 5% 5 up to 9% 9 up to 15% 15% and over

of frames

Other errors as a 0 up to 3% 3 up to 7% 7 up to 10% 10% and over

percentage of frames

Non-unicast frames 0 up to 100 100 up to 200 200 up to 300 300 and over

per second

eHealth automatically evaluates the performanceof network devices, systems and applications byanalyzing a set of key statistics. These statisticsare referred to as Health Variables. During eachpoll, each Health Variable is evaluated against aset of thresholds to determine its Health Index.

In order to effectively support different technolo-gies, the Health Variables and the associatedHealth Index ranges have been carefullydesigned to reflect the characteristics of each

technology. For example, the collision rate is avery important measure of the health of a half-duplex ethernet segment. Thus the collision rateis one of the Health Variables used. On the otherhand, the CPU utilization of a router indicatesthe load of the device and is used as a HealthVariable in the case of routers.

The following is a description of the variablesand the thresholds used for differenttechnologies.

107

eHealthTM Suite Content Guide

© Copyright Concord Communications, Inc. 2002

Variable Excellent Good Fair Poor

Utilization as a percentage 0 up to 15% 15 up to 30% 30 up to 60% 60% and over

of bandwidth

Hard errors per hour 0 up to 20 20 up to 100 100 up to 300 300 and over

Soft errors per minute 0 up to 20 20 up to 100 100 up to 1000 1000 and over

Non-unicast frames 0 up to 100 100 up to 200 200 up to 300 300 and over

per second

Health Reports on Other Technologies

Default Token Ring Health Index Ranges

Default Ethernet Health Index Ranges

LAN/WAN Health Variables

LAN/WAN health variables are used to monitorthe health of LAN/WAN ports based on statisticscollected from routers, switches, probes or

end-systems. The following tables show thedefault Health Index Ranges for ethernet, tokenring, and WAN.

Heal

thRe

port

s

Page 31: CG Health Reports Interpretation

For example, during the poll, a network devicesuch as a router reports the following statistics forone of its Ethernet ports:

■ Utilization was 15%

■ Collision rate was 2%

■ Other error rate was 0%

■ There were 180 broadcasts or multicasts per second

Based on these statistics, eHealth will assign thefollowing grades to the Ethernet element:

■ Utilization: Good (2)

■ Collisions: Excellent (0)

■ Other errors: Excellent (0)

■ Broadcast/multicast: Good (2)

■ Total Health Index: 4

For a full explanation of the health variables,please refer to the glossary.

The Health Index Ranges can be modifiedthrough the Service Profile. For example, a service provider may provide different levels ofservice to different customers. For regularcustomers, the default Service Profile is used.However, for premium customers, a different Service Profile with more stringent thresholdscan be defined.

108

eHealth Health Reports

© Copyright Concord Communications, Inc. 2002

Variable Excellent Good Fair Poor

Utilization as a percentage 0 up to 70% 70 up to 80% 80 up to 90% 90% and over

of bandwidth

Discarded frames as a 0 up to 1% 1 up to 5% 5 up to 10% 10% and over

percentage of frames

Errors as a percentage 0 up to 1% 1 up to 3% 3 up to 5% 5% and over

of frames

Non-unicast frames per 0 up to 100 100 up to 200 200 up to 300 300 and over

second

Variable Excellent Good Fair Poor

Utilization as a percentage 0 up to 100% 100 up to 125% 125 up to 150% 150% and over

of bandwidth

Congestion per million 0 up to 1000 1000 up to 3000 3000 up to 5000 5000 and over

frames

Errors per million frames 0 up to 1000 1000 up to 3000 3000 up to 5000 5000 and over

Default WAN Health Index Ranges

Default Frame Relay Health Index Ranges

Frame Relay Health Index Ranges

These variables are used to monitor the health offrame relay PVCs (Permanent Virtual Circuits)

based on statistics reported by either frame relayswitches or routers with frame relay interfaces.

For example, during the poll, a frame relayswitch reports the following statistics for one of its PVCs:

■ Utilization was 115%

■ Congestion was 2500 per million

■ There were no errors

Based on these statistics, eHealth will assign thefollowing grades to the frame relay element:

■ Utilization: Good (2)

■ Congestion: Good (2)

■ Errors: Excellent (0)

■ Total Health Index: 4

Page 32: CG Health Reports Interpretation

ATM Health Index Ranges

These variables are used to monitor the health ofATM ports, paths and channels based onstatistics reported by either ATM switches orrouters with ATM interfaces. The variables areslightly different depending on the type of ATMelement.

In the case of ATM paths and channels, onlypermanent virtual circuits are supported. Thusan ATM path element must be a Permanent Virtual Path (PVP) and an ATM channelelement must be a Permanent Virtual Channel(PVC).

109

eHealthTM Suite Content Guide

© Copyright Concord Communications, Inc. 2002

Variable Excellent Good Fair Poor

Utilization as a percentage 0 up to 70% 70 up to 80% 80 up to 90% 90% and over

of bandwidth

Discarded cells as a 0 up to 0.1 0.1 up to 0.2 0.2 up to 0.3 0.3 and over

percentage of total cells

Errors as a percentage of 0 up to 0.1 0.1 up to 0.2 0.2 up to 0.3 0.3 and over

cells with errors over

total cells

Unavailable 32 points per second unavailable

Variable Excellent Good Fair Poor

Utilization as a percentage 0 up to 70% 70 up to 80% 80 up to 90% 90% and over

of bandwidth

Discarded cells as a 0 up to 0.1 0.1 up to 0.2 0.2 up to 0.3 0.3 and over

percentage of total cells

Unavailable 32 points per second unavailable

ATM Port Health Index Ranges

ATM Path Health Index Ranges

Variable Excellent Good Fair Poor

Utilization as a percentage 0 up to 70% 70 up to 80% 80 up to 90% 90% and over

of bandwidth

Discarded cells as a 0 up to 0.1 0.1 up to 0.2 0.2 up to 0.3 0.3 and over

percentage of total cells

Unavailable 32 points per second unavailable

ATM Channel Health Index Ranges

Heal

thRe

port

s

Page 33: CG Health Reports Interpretation

Router/Switch Health Index Ranges

These variables are used to monitor the health ofrouters and switches. A router or a switch typicallyconsists of a number of LAN/WAN interfaces.Some of these statistics are computed byaggregating the statistics of each of theLAN/WAN interfaces. For example, the error rateof the router represents the total errors as apercentage of the total frames across all the interfaces.

Most routers provide information on only fivevariables. That is, a router provides statistics forbuffer miss ratio or for buffer utilization, butrarely for both. Switches usually provide informa-tion on only three variables: line utilization,faults, and discards. Switches rarely provide dataon buffers or CPUs.

110

eHealth Health Reports

© Copyright Concord Communications, Inc. 2002

Variable Excellent Good Fair Poor

Buffer miss ratio as a percentage of 0 5% 5 10% 10 20% over 20%

attempts for router elements

(for enhanced switch elements, this

variable is memory utilization)

Buffer utilization as a percentage of 0 40% 40 60% 60 80% over 80%

total buffers

CPU utilization as a percentage of 0 40% 40 60% 60 80% over 80%

processor bandwidth

Line utilization 0 70% 70 80% 80 90% over 90%

Errors as a percentage of total frames 0 2% 2 8% 8 15% over 15%

Discards as a percentage of total frames 0 2% 2 8% 8 15% over 15%

Variable Excellent Good Fair Poor

Discards as a percentage 0 up to 1% 1 up to 5% 5 up to 10% over 10%

of total frames

Retrains 8 points per retrain

Frame errors as a 0 up to 1% 1 up to 3% 3 up to 5% over 5%

percentage of total frames

Modem errors per second 0 up to 0.001 0.001 up to 0.01 0.01 up to 0.1 over 0.1

Remote Access Health Index Ranges

These variables are used to monitor the health ofremote access elements. Statistics are collectedfrom a Remote Access Server (RAS) which may

report on the system itself, a modem pool, amodem, or an ISDN interface.

Variable Excellent Good Fair Poor

Discards as a percentage 0 up to 1% 1 up to 5% 5 up to 10% over 10%

of total frames

Frame errors as a 0 up to 1% 1 up to 3% 3 up to 5% over 5%

percentage of total frames

Default Modem Health Index Ranges

Default ISDN Health Index Ranges

Page 34: CG Health Reports Interpretation

111

eHealthTM Suite Content Guide

© Copyright Concord Communications, Inc. 2002

Variable Excellent Good Fair Poor

Average discards as a 0 up to 1% 1 up to 5% 5 up to 10% over 10%

percentage of frames

Average retrains per 0 up to 0.0005 0.0005 up to 0.005 0.005 up to 0.05 over 0.05

modem per call minute

Average frame errors as 0 up to 1% 1 up to 3% 3 up to 5% over 5%

a percentage of frames

Average modem errors 0 up to 0.0001 0.0001 up to 0.001 0.001 up to 0.01 over 0.01

per modem per second

Percent pool busy 0 up to 70% 70 up to 80% 80 up to 90% over 90%

Blocked pool (a modem 8 points per poll

pool that has a percent

pool busy of 100%)

Variable Excellent Good Fair Poor

Average discards as a 0 up to 1% 1 up to 5% 5 up to 10% over 10%

percentage of frames

Average retrains per modem 0 up to 0.0005 0.0005 up to 0.005 0.005 up to 0.05 over 0.05

per call minute

Average frame errors as a 0 up to 1% 1 up to 3% 3 up to 5% over 5%

percentage of frames

Average modem errors per 0 up to 0.0001 0.0001 up to 0.001 0.001 up to 0.01 over 0.01

modem per second

RAS connect time percentage 0 up to 70% 70 up to 80% 80 up to 90% over 90%

Default Modem Pool Index Ranges

Default RAS Health Index Ranges

Heal

thRe

port

s

Page 35: CG Health Reports Interpretation

These variables are grouped into the followingcategories:

112

eHealth Health Reports

© Copyright Concord Communications, Inc. 2002

Communication CPU Memory Storage System Process Set

Obtained from the Health • CPU • Paging • Disk faults • CPU • Unavailable

Index of the LAN/WAN utilization • Swapping • File cache imbalance

interfaces on the system • Virtual miss rate • Unavailable

memory • Partition

utilization • Allocation

• Physical failures

memory • System

utilization partition

utilization

• User

partition

utilization

Variable Excellent Good Fair Poor

CPU imbalance 0 – 1% 1 – 10% 10 – 30% over 30%

CPU utilization 0 – 40% 40 – 60% 60 – 80% over 80%

Disk faults 32 points for each fault

File cache miss rate 0 – 5% 5 – 10% 10 – 20% over 20%

Paging (pages/sec) 0 – 1 1 – 5 5 – 10 over 10

Swapping (pages/sec) 0 – .1 .1 – .2 .2 – .4 over .4

Virtual memory utilization 0 – 40% 40 – 70% 70 – 90% over 90%

Physical memory utilization 0 – 20% 20 – 65% 65 – 80% over 80%

Allocation failures 16 points for each failure

System partition utilization 0 – 85% 85 – 90% 90 – 95% over 95%

User partition utilization 0 – 80% 80 – 85% 85 – 90% over 90%

Unavailable Up to 8 points per second unavailable, with a percentage of the 8 points for the same

percentage the system was not available during the polling interval (for example, for

50% of a polling interval, the system receives 4 points).

Process set unavailable Up to 8 points per second unavailable, with a percentage of the 8 points for the same

percentage the process set was not available during the polling interval (for example,

for 50% of a polling interval, the process group receives 4 points).

System Health Index Ranges

These variables are collected from systemelements. Statistics are collected from computersystems that have SNMP agents that support thehost resources MIB. For example, these includesystems that are running the SystemEDGE

agent. These agents report performance statistics related to the CPU, storage (disk and partitionstatistics), memory (physical and virtual),communications, processes, and system.

Page 36: CG Health Reports Interpretation

Response Health Index Ranges

These variables are collected from response elements. Statistics on the performance of applications can be collected from one of thefollowing sources:

■ SystemEDGE agents with Service Availability

■ Application Response (AR) agents

■ Cisco routers with the Service Assurance Agent(SAA) option

These agents monitor transactions and store dataon network delay and on application responsetime. eHealth–Response periodically receivesdata from the agents and stores it in the eHealthdatabase where it is available for reporting.eHealth–Response provides reports based onresponse endpoints (both sources anddestinations) and response paths. For more information on response elements, please refer to the section Managing Response with eHealth.

113

eHealthTM Suite Content Guide

© Copyright Concord Communications, Inc. 2002

Variable Excellent Good Fair Poor

Average response/limit 0 50 75 100as a percentage

Failed attempts as a 0 10 20 400 percentage of total attempts

Jitter in milliseconds 0 5 10 20

Unavailable percentage 32 points per second unavailable

Heal

thRe

port

s