performance optimization, troubleshooting & filetroubleshooting & dimensioning of a ......
TRANSCRIPT
Performance optimization, troubleshooting & dimensioning of a radio access network through the "raw counters” KPI
Roberto CarrettoVodafone Omnitel N.V.
Technology – Network Engineering – RAN Optimization
2nd TMA PhD SchoolNapoli – June 6th 2011
… a challenging world for a RF engineer …
• Competition and cost reduction require very high level of effectiveness and efficiency
• The progressive refinement of the tools for network management is a way to achieve both targets
Customer very
“sensitive”
WEB2.0
Total convergence
Mature Market
Digital divide
Crisis
RAN “raw counters”: a deposit of information
• The RAN “raw” counters are the perfect feeder for most of the management tools and process in place in Vodafone Italy
• All the radio procedures can be deeply monitored
• The “raw counters based” KPIs allows to summarize the carried traffic and its achieved performance, as well as to highlight any occurred issues
• The steering of their evolution and their deep validation has become one of the most important task for the Operators, now having the same priority put on the development of the radio features
Data OSS
Database
VodkaVodka
RNC
DrawSitiDrawSiti
NodeB
NodeB
NodeB
mRAB calls F1 mRAB DCR F1 mRAB calls F2 mRAB DCR F221-mar 174,101 0.54 51,170 0.9222-mar 180,301 0.56 54,605 0.8923-mar 183,745 0.55 55,218 0.8724-mar 188,331 0.51 56,323 0.925-mar 192,585 0.58 58,650 0.8928-mar 179,874 0.62 54,629 0.9529-mar 188,439 0.74 58,747 1.0630-mar 192,917 0.58 61,549 0.9731-mar 194,007 0.59 59,995 0.9301-apr 197,977 0.58 59,439 0.804-apr 186,883 0.62 56,184 0.9105-apr 210,744 0.73 69,523 1.2506-apr 189,757 0.67 57,869 1.0707-apr 193,859 0.68 61,765 1.0408-apr 204,914 0.58 64,804 0.9511-apr 197,798 0.59 67,311 0.9912-apr 217,676 0.63 83,416 1.5813-apr 227,818 0.6 87,703 2.114-apr 225,366 0.65 87,875 2.0615-apr 225,895 0.56 86,817 1.718-apr 187,266 0.48 59,052 0.9119-apr 199,003 0.47 61,991 0.920-apr 201,159 0.49 63,340 0.921-apr 185,576 0.48 57,720 0.8122-apr 138,839 0.43 42,599 0.7326-apr 147,160 0.44 45,423 0.8327-apr 177,756 0.43 55,720 0.7628-apr 183,240 0.49 56,605 0.8529-apr 185,505 0.49 57,408 0.93
Radio Network Optimization • The configuration of the deployed radio features can be tuned and progressively optimized
• The performance of the new services can be monitored and improved
• The configuration of each cell can be tailored to the actual needs of the offered traffic, potentially leading to the “Self-Organized-Network” (SON) approach
HSDPA_MAC_D_all
HSDPA_normal_rel
Abn_Rel_HSDPA
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
Abn
rel
rat
e%
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1,400,000
1,600,000
1,800,000
N abs
mar
-23
mar
-26
mar
-31
apr-
03ap
r-08
apr-
13ap
r-16
apr-
21ap
r-24
apr-
29m
ag-0
4
mag
-08
mag
-13
mag
-18
mag
-21
mag
-26
giu-
01gi
u-04
giu-
09gi
u-12
giu-
17
giu-
23gi
u-26
lug-
01lu
g-06
lug-
10lu
g-15
lug-
20lu
g-23
lug-
28
ago-
03ag
o-06
ago-
11ag
o-14
ago-
19
ago-
25ag
o-28
set-02
set-07
set-10
set-16
set-21
set-24
set-30
ott-05
Data
HSDPA abnormal release rate
Radio Network Dimensioning
Distribuzione siti fra classi e criticità (week-end febbraio)
692
214
11773111
3065
581678
0
500
1000
1500
2000
2500
3000
3500
4000
High Loaded Mid Loaded Low Loaded
• Both the radio base stations and the related controllers can be properly dimensioned
• The “real-time” monitoring of the radio KPIs can allow to immediately cope with “lack of capacity” issues or to unexpected peak of traffic
180,000
220,000
260,000
300,000
340,000
380,000ap
r-07
mag
-07
giu-
07
lug-
07
ago-
07
set-
07
ott-0
7
nov-
07
dic-
07
gen-
08
feb-
08
mar
-08
apr-
08
mag
-08
giu-
08
lug-
08
ago-
08
set-
08
ott-0
8
nov-
08
dic-
08
gen-
09
feb-
09
mar
-09
Erla
ng B
H
Meas urement Eo_MonthMax t raf f ic @ Target Ef f ic ienc y = 65.8%Max t raf f ic @ Target Ef f ic ienc y = 63.3%OutlineRevis ione outline + You & Me (21 Apr 08)Baseline 2+10 (01-Ago-08)
Radio Network Troubleshooting
• The “real-time” monitoring of the radio KPis and its correlation with the alarms reported by the RAN equipments speed up the maintenance of the radio network, thus limiting the impact of the occurred issues
However, they are not perfect ….
• Very effective from a pure network perspective, the radio “raw-counters” provide a partial view only of the actual user experience
• They doesn’t allow to have real “customer based” KPIs or to monitor the actual achieved “end-to-end” performance
• Different approach are required to close such a gap (probes, collection of the network-elements internal log, etc.), all of them requiring huge additional investments
Customer “Smith”• #Calls = 3• #Customer Dropped = 2• #Customer successful call = 1• Customer Drop Call rate = 66%
User experience Cell KPIs
Cell “NA0101”• #Calls = 692 • #Dropped = 2• #Successful call = 690• Drop Call rate = 0.29%
RAN “raw counters” next steps: Automated Network Troubleshooting
• Among all the potential usage of Radio “raw counters”, the one which may enable the “automated troubleshooting” processes is certainly the most innovative, as well as the most interesting for an Operator
• In the usual troubleshooting process, most of the effort is spent in the “Cause Diagnosis” phase an automated correlation of raw counters, alarms, skills and experience could lead to a dramatic reduction of both effort and time spent
Fault
Detection
Solution
Deployment
Network
Monitoring
Cause
Diagnosis
Solution Deployment
(implement solutionHow to fix
the problem)
Cause Diagnosis
(determineWhat causesthe problem)
Fault Detection
(based on symptom, determine
Which cellshave a problem)
10% Effort 70% Effort 20% Effort
Automated Network Troubleshooting: the Vodafone-Italy case
1. Detect a wrong behavior, e.g. cells w/high DCR
2. What could have caused the fault?
• Interference in TRX
• Bad Coverage at cell Border
• HW Fault
• Outage of Neighbor Cell
• Other …
• Increased TCH Fail Rate
• IOI Levels increase
• Excessive TCH alarms seen
• Increased number of intra-cell HOs
• Increased number of UL Quality HOs
• Increased number of UL Interference Hos
• Others ...
3. How do we detect the presence of Interference in TRX?
4. KPI, Alarms ... which of them?
• Bayesian Networks?
• Bayesian Networks are based on Bayes’ formula for reversing the causal direction of conditional probabilities, allowing one to reason about causes based on information about the effects or symptoms
The “Bayesian Networks” approach
High DCR
Ck, Prob(Ck)
Interference in TRX
Sj
Incr. TCH fail rate
Ck+1, Prob(Ck+1 )
Bad Coverage
Ck+2, Prob(Ck+2 )
HW Fault
Sj+1
IOI Levels increase
Sj+2
Excessive
TCH Alarms
Sj+3
Incr. Qual. HO’s
Sj+4
Avg. DL signal lev
< -95dbm
Prob(Sj/Ck)
Prob(Sj/Ck+2 )
Prob(Sj+1 /Ck)
Prob(Sj+2 /Ck)
Prob(Sj+2 /Ck+2 ) Prob(Sj+3 /Ck+1 )Prob(Sj+4 /Ck+1 )
Symptoms (KPI, alarms) observed
during daily network monitoring
Causes: logical or physical problem to be diagnosed
Macro problem which defines
the model to be used
• Prob(Ck) determined by the planner expertise (faults probability), it represents the “a priori knowledge”
• Prob(Sj | Ck) calculated from network data and alarms according to following process:
1. Identify faulty cells (i.e. with DCR > x%)
2. Discard those cells not relevant for analysis (e.g. due low traffic or know issues)
3. Analyze cells from step 1 and perform fault diagnosis (e.g. cell LA0151 high DCR caused by fault in MHA)
4. Collect symptoms statistics for the cause identified at step 3
Automated Network Troubleshooting: many things still to do ….
Knowledge
Daily
KPIs
Faulty
Data
Knowledge
Builder
Decision
Engine
Verification &
Model Tuning
Fault
Probability
Engineer
Model
(Sj, Ck)
Faulty Cells &
Prob(Ck, Sj))
•The attempt has failed so far, due to the issues experienced in exhaustively collecting the actual fault statistics
•For the same reason, an exhaustive collection of the symptoms statistics has been impossible so far
…. but the principle is certainly good and promising …. and, furthermore, absolutely required by a mature operator !
Conclusions
•A mature Cellular Operator playing in a mature market needs to automate more and more its operational processes
•Most of the needed tools are already available, inherently implemented into the network elements currently in place
•The Operators need to learn how to use them in more innovative, as well as more effective ways
•The support of all the competence centres, like e.g. Universities, is obviously mandatory
Thank you for your attention !
Any questions?