performance optimization, troubleshooting & filetroubleshooting & dimensioning of a ......

13
Performance optimization, troubleshooting & dimensioning of a radio access network through the "raw counters” KPI Roberto Carretto Vodafone Omnitel N.V. Technology – Network Engineering – RAN Optimization 2 nd TMA PhD School Napoli – June 6 th 2011

Upload: truongthien

Post on 30-Jan-2018

220 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Performance optimization, troubleshooting &   filetroubleshooting & dimensioning of a ... NodeB NodeB NodeB mRAB calls F1 mRAB DCR F1mRAB calls F2 mRAB DCR F2 ... 5 g o-2 8 s

Performance optimization, troubleshooting & dimensioning of a radio access network through the "raw counters” KPI

Roberto CarrettoVodafone Omnitel N.V.

Technology – Network Engineering – RAN Optimization

2nd TMA PhD SchoolNapoli – June 6th 2011

Page 2: Performance optimization, troubleshooting &   filetroubleshooting & dimensioning of a ... NodeB NodeB NodeB mRAB calls F1 mRAB DCR F1mRAB calls F2 mRAB DCR F2 ... 5 g o-2 8 s

… a challenging world for a RF engineer …

• Competition and cost reduction require very high level of effectiveness and efficiency

• The progressive refinement of the tools for network management is a way to achieve both targets

Customer very

“sensitive”

WEB2.0

Total convergence

Mature Market

Digital divide

Crisis

Page 3: Performance optimization, troubleshooting &   filetroubleshooting & dimensioning of a ... NodeB NodeB NodeB mRAB calls F1 mRAB DCR F1mRAB calls F2 mRAB DCR F2 ... 5 g o-2 8 s

RAN “raw counters”: a deposit of information

• The RAN “raw” counters are the perfect feeder for most of the management tools and process in place in Vodafone Italy

• All the radio procedures can be deeply monitored

• The “raw counters based” KPIs allows to summarize the carried traffic and its achieved performance, as well as to highlight any occurred issues

• The steering of their evolution and their deep validation has become one of the most important task for the Operators, now having the same priority put on the development of the radio features

Data OSS

Database

VodkaVodka

RNC

DrawSitiDrawSiti

NodeB

NodeB

NodeB

mRAB calls F1 mRAB DCR F1 mRAB calls F2 mRAB DCR F221-mar 174,101 0.54 51,170 0.9222-mar 180,301 0.56 54,605 0.8923-mar 183,745 0.55 55,218 0.8724-mar 188,331 0.51 56,323 0.925-mar 192,585 0.58 58,650 0.8928-mar 179,874 0.62 54,629 0.9529-mar 188,439 0.74 58,747 1.0630-mar 192,917 0.58 61,549 0.9731-mar 194,007 0.59 59,995 0.9301-apr 197,977 0.58 59,439 0.804-apr 186,883 0.62 56,184 0.9105-apr 210,744 0.73 69,523 1.2506-apr 189,757 0.67 57,869 1.0707-apr 193,859 0.68 61,765 1.0408-apr 204,914 0.58 64,804 0.9511-apr 197,798 0.59 67,311 0.9912-apr 217,676 0.63 83,416 1.5813-apr 227,818 0.6 87,703 2.114-apr 225,366 0.65 87,875 2.0615-apr 225,895 0.56 86,817 1.718-apr 187,266 0.48 59,052 0.9119-apr 199,003 0.47 61,991 0.920-apr 201,159 0.49 63,340 0.921-apr 185,576 0.48 57,720 0.8122-apr 138,839 0.43 42,599 0.7326-apr 147,160 0.44 45,423 0.8327-apr 177,756 0.43 55,720 0.7628-apr 183,240 0.49 56,605 0.8529-apr 185,505 0.49 57,408 0.93

Page 4: Performance optimization, troubleshooting &   filetroubleshooting & dimensioning of a ... NodeB NodeB NodeB mRAB calls F1 mRAB DCR F1mRAB calls F2 mRAB DCR F2 ... 5 g o-2 8 s

Radio Network Optimization • The configuration of the deployed radio features can be tuned and progressively optimized

• The performance of the new services can be monitored and improved

• The configuration of each cell can be tailored to the actual needs of the offered traffic, potentially leading to the “Self-Organized-Network” (SON) approach

HSDPA_MAC_D_all

HSDPA_normal_rel

Abn_Rel_HSDPA

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

9.00

Abn

rel

rat

e%

0

200,000

400,000

600,000

800,000

1,000,000

1,200,000

1,400,000

1,600,000

1,800,000

N abs

mar

-23

mar

-26

mar

-31

apr-

03ap

r-08

apr-

13ap

r-16

apr-

21ap

r-24

apr-

29m

ag-0

4

mag

-08

mag

-13

mag

-18

mag

-21

mag

-26

giu-

01gi

u-04

giu-

09gi

u-12

giu-

17

giu-

23gi

u-26

lug-

01lu

g-06

lug-

10lu

g-15

lug-

20lu

g-23

lug-

28

ago-

03ag

o-06

ago-

11ag

o-14

ago-

19

ago-

25ag

o-28

set-02

set-07

set-10

set-16

set-21

set-24

set-30

ott-05

Data

HSDPA abnormal release rate

Page 5: Performance optimization, troubleshooting &   filetroubleshooting & dimensioning of a ... NodeB NodeB NodeB mRAB calls F1 mRAB DCR F1mRAB calls F2 mRAB DCR F2 ... 5 g o-2 8 s

Radio Network Dimensioning

Distribuzione siti fra classi e criticità (week-end febbraio)

692

214

11773111

3065

581678

0

500

1000

1500

2000

2500

3000

3500

4000

High Loaded Mid Loaded Low Loaded

• Both the radio base stations and the related controllers can be properly dimensioned

• The “real-time” monitoring of the radio KPIs can allow to immediately cope with “lack of capacity” issues or to unexpected peak of traffic

180,000

220,000

260,000

300,000

340,000

380,000ap

r-07

mag

-07

giu-

07

lug-

07

ago-

07

set-

07

ott-0

7

nov-

07

dic-

07

gen-

08

feb-

08

mar

-08

apr-

08

mag

-08

giu-

08

lug-

08

ago-

08

set-

08

ott-0

8

nov-

08

dic-

08

gen-

09

feb-

09

mar

-09

Erla

ng B

H

Meas urement Eo_MonthMax t raf f ic @ Target Ef f ic ienc y = 65.8%Max t raf f ic @ Target Ef f ic ienc y = 63.3%OutlineRevis ione outline + You & Me (21 Apr 08)Baseline 2+10 (01-Ago-08)

Page 6: Performance optimization, troubleshooting &   filetroubleshooting & dimensioning of a ... NodeB NodeB NodeB mRAB calls F1 mRAB DCR F1mRAB calls F2 mRAB DCR F2 ... 5 g o-2 8 s

Radio Network Troubleshooting

• The “real-time” monitoring of the radio KPis and its correlation with the alarms reported by the RAN equipments speed up the maintenance of the radio network, thus limiting the impact of the occurred issues

Page 7: Performance optimization, troubleshooting &   filetroubleshooting & dimensioning of a ... NodeB NodeB NodeB mRAB calls F1 mRAB DCR F1mRAB calls F2 mRAB DCR F2 ... 5 g o-2 8 s

However, they are not perfect ….

• Very effective from a pure network perspective, the radio “raw-counters” provide a partial view only of the actual user experience

• They doesn’t allow to have real “customer based” KPIs or to monitor the actual achieved “end-to-end” performance

• Different approach are required to close such a gap (probes, collection of the network-elements internal log, etc.), all of them requiring huge additional investments

Customer “Smith”• #Calls = 3• #Customer Dropped = 2• #Customer successful call = 1• Customer Drop Call rate = 66%

User experience Cell KPIs

Cell “NA0101”• #Calls = 692 • #Dropped = 2• #Successful call = 690• Drop Call rate = 0.29%

Page 8: Performance optimization, troubleshooting &   filetroubleshooting & dimensioning of a ... NodeB NodeB NodeB mRAB calls F1 mRAB DCR F1mRAB calls F2 mRAB DCR F2 ... 5 g o-2 8 s

RAN “raw counters” next steps: Automated Network Troubleshooting

• Among all the potential usage of Radio “raw counters”, the one which may enable the “automated troubleshooting” processes is certainly the most innovative, as well as the most interesting for an Operator

• In the usual troubleshooting process, most of the effort is spent in the “Cause Diagnosis” phase an automated correlation of raw counters, alarms, skills and experience could lead to a dramatic reduction of both effort and time spent

Fault

Detection

Solution

Deployment

Network

Monitoring

Cause

Diagnosis

Solution Deployment

(implement solutionHow to fix

the problem)

Cause Diagnosis

(determineWhat causesthe problem)

Fault Detection

(based on symptom, determine

Which cellshave a problem)

10% Effort 70% Effort 20% Effort

Page 9: Performance optimization, troubleshooting &   filetroubleshooting & dimensioning of a ... NodeB NodeB NodeB mRAB calls F1 mRAB DCR F1mRAB calls F2 mRAB DCR F2 ... 5 g o-2 8 s

Automated Network Troubleshooting: the Vodafone-Italy case

1. Detect a wrong behavior, e.g. cells w/high DCR

2. What could have caused the fault?

• Interference in TRX

• Bad Coverage at cell Border

• HW Fault

• Outage of Neighbor Cell

• Other …

• Increased TCH Fail Rate

• IOI Levels increase

• Excessive TCH alarms seen

• Increased number of intra-cell HOs

• Increased number of UL Quality HOs

• Increased number of UL Interference Hos

• Others ...

3. How do we detect the presence of Interference in TRX?

4. KPI, Alarms ... which of them?

• Bayesian Networks?

• Bayesian Networks are based on Bayes’ formula for reversing the causal direction of conditional probabilities, allowing one to reason about causes based on information about the effects or symptoms

Page 10: Performance optimization, troubleshooting &   filetroubleshooting & dimensioning of a ... NodeB NodeB NodeB mRAB calls F1 mRAB DCR F1mRAB calls F2 mRAB DCR F2 ... 5 g o-2 8 s

The “Bayesian Networks” approach

High DCR

Ck, Prob(Ck)

Interference in TRX

Sj

Incr. TCH fail rate

Ck+1, Prob(Ck+1 )

Bad Coverage

Ck+2, Prob(Ck+2 )

HW Fault

Sj+1

IOI Levels increase

Sj+2

Excessive

TCH Alarms

Sj+3

Incr. Qual. HO’s

Sj+4

Avg. DL signal lev

< -95dbm

Prob(Sj/Ck)

Prob(Sj/Ck+2 )

Prob(Sj+1 /Ck)

Prob(Sj+2 /Ck)

Prob(Sj+2 /Ck+2 ) Prob(Sj+3 /Ck+1 )Prob(Sj+4 /Ck+1 )

Symptoms (KPI, alarms) observed

during daily network monitoring

Causes: logical or physical problem to be diagnosed

Macro problem which defines

the model to be used

• Prob(Ck) determined by the planner expertise (faults probability), it represents the “a priori knowledge”

• Prob(Sj | Ck) calculated from network data and alarms according to following process:

1. Identify faulty cells (i.e. with DCR > x%)

2. Discard those cells not relevant for analysis (e.g. due low traffic or know issues)

3. Analyze cells from step 1 and perform fault diagnosis (e.g. cell LA0151 high DCR caused by fault in MHA)

4. Collect symptoms statistics for the cause identified at step 3

Page 11: Performance optimization, troubleshooting &   filetroubleshooting & dimensioning of a ... NodeB NodeB NodeB mRAB calls F1 mRAB DCR F1mRAB calls F2 mRAB DCR F2 ... 5 g o-2 8 s

Automated Network Troubleshooting: many things still to do ….

Knowledge

Daily

KPIs

Faulty

Data

Knowledge

Builder

Decision

Engine

Verification &

Model Tuning

Fault

Probability

Engineer

Model

(Sj, Ck)

Faulty Cells &

Prob(Ck, Sj))

•The attempt has failed so far, due to the issues experienced in exhaustively collecting the actual fault statistics

•For the same reason, an exhaustive collection of the symptoms statistics has been impossible so far

…. but the principle is certainly good and promising …. and, furthermore, absolutely required by a mature operator !

Page 12: Performance optimization, troubleshooting &   filetroubleshooting & dimensioning of a ... NodeB NodeB NodeB mRAB calls F1 mRAB DCR F1mRAB calls F2 mRAB DCR F2 ... 5 g o-2 8 s

Conclusions

•A mature Cellular Operator playing in a mature market needs to automate more and more its operational processes

•Most of the needed tools are already available, inherently implemented into the network elements currently in place

•The Operators need to learn how to use them in more innovative, as well as more effective ways

•The support of all the competence centres, like e.g. Universities, is obviously mandatory

Page 13: Performance optimization, troubleshooting &   filetroubleshooting & dimensioning of a ... NodeB NodeB NodeB mRAB calls F1 mRAB DCR F1mRAB calls F2 mRAB DCR F2 ... 5 g o-2 8 s

Thank you for your attention !

Any questions?