ipg_log
TRANSCRIPT
Alarm: UtranCell_InternalResourceUnavailable
86%
Usage Count: 28
Network: WCDMA
Service: W-RAN
Node: CPP RNC 3820
Alarm: UtranCell_InternalResourceUnavailable
Alarm: UtranCell_InternalResourceUnavailable
Cold Restart of TX board in cell 3 Node does not work
Cold restart of Node B on older CV does not work
Remodule Node B in RNC and problem dissapears
Module Error: [2011-01-11 15:57:09.388] RnhLmCellCPT(rnhCellRoC[52]) ../src/RnhCellRoC.cpp:7901 INFO:rnhCellRoC[cellFroId xxx, iubLinkFroId xxx] failed to unlock cell, reason : RnhCellDataD::errorStatusNoDrhResources lhsh 000600 drh_ccrh_hostdata 0006: 0x21000067 24 10 23769 0 0xffffffff releasing sendReleaseRspToClient 0006: 0x21000068 23 29 19088 0 0xffffffff 0006: 0x21000069 24 10 22361 0 0xffffffff releasing sendReleaseRspToClient 0006: 0x21000026 24 10 23637 0 0xffffffff releasing sendReleaseRspToClient In absence of crashes in ETIPG search for cold restarts: Lh etipg te log read | grep -I restart
[2011-03-09 02:07:05.032] Ipet_atish_proc atish_trafind.c:351 INFO:TrafficIndication: COLD restart [2011-03-09 02:07:05.088] Ipet_scish_proc scish_trafind.c:435 INFO:TrafficIndication: COLD restart ETIPG Log:lhsh 002500 dumpelg
LOG ENTRIES: seqNr date time message 2 100729 104733 000;;Subrack 02;Slot 25 3 100903 102509 000;VANRNC1;Subrack 00;Slot 25 4 101103 200144 000;CXP9013831_R9YC/28 5 101130 050310 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 6 110222 180100 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 7 110222 191901 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 8 110223 064546 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 9 110226 202109 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 10 110226 233552 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 11 110227 055434 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 12 110309 020556 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 Coli printouts from commands:lh mod drh_ccrh_topdata lh mod drh_ccrh_celldata all lh mod drh_ccrh_hostdata
0202: [723]: cellRef= xxx, clientModuleId = 3, spmFroId = xx, msgBoard = [ releasing ] 0202: [737]: cellRef= xxx, clientModuleId = 3, spmFroId = xx, msgBoard = [ releasing ] 0216: [1303]: cellRef= xxx, clientModuleId = 16, spmFroId = xx, msgBoard = [ releasing ] 0216: [614]: cellRef= xxx,clientModuleId = 16, spmFroId = xx, msgBoard = [ releasing ]
HS configuration updated in Node B
References: TR HN50575 :
REMEDY:
CONDITIONS:1. Ensure SW is below the version that this is fixed in2. Alarm is present for UtranCell_InternalResourceUnavailable unavailable3. From Coli commands lh mod drh_ccrh_topdata, lh mod drh_ccrh_celldata all, lh mod drh_ccrh_hostdata That the message in the logs shows msgBoard = [ releasing ] 4. Customer permission is granted to use the work around which will affect traffic in the Module with the problematic cell
PROCEDURE:Locate the RncLmCell load module in the affected RNC module and restart it with "lh modx progkill RncLmCell" Note X = RNC ModuleSOLUTION:
CONDITIONS:The fault occur due to Hanging in the RNC Module RncLmcell. The UtranCell_InternalResourceUnavailable alarms are triggered by a cells hanging in 'releasing' or 'clearing' state in DrhCcRh block. These cells cannot be released because there are IpTp (IP termination point) sessions associated with cells which are also hanging in 'releasing' state. Such hanging sessions are caused by a fault in an audit procedure, which is performed after ET-IPG crash or restart. When ET-IPG goes down, an application receives two signals: hostStateChangeInd an serverDownInd (from IPAPPLSCI).During first of these signals, the IP sessions associated with the restarted ET-IPG are marked as 'releasing'. and sessionReleaseReq signal is sent to IPAPPLSCI (CPP interface) to release the sessions. However, instead a release response, the application receives a serverDownInd signal which triggers the audit. Unfortunately, the audit procedure skips a removal of IP sessions which are marked as 'releasing'.
+---------+ +---------+ +---------+ +---------+ +---------+ | RNHCell | | DrhCcRh | | Aal2Eri | | Aal2Nci | | ApplSci | | (Mod_A) | | (Mod_B) | | (Mod_B) | | (Mod_A) | | (IP-ET) | +----+----+ +----+----+ +----+----+ +----+----+ +----+----+ | | | | | #----{initialResourceReq}---->* | | | *<----{initialResourceCfm}----# | | | #--------{ipTpUdpReq}-------->* | | | | #----------------------------------{setUpUdpSessionReq}---------------------------------->* | *<-----------------------------------{sessionSetUpCfm}------------------------------------# *<-------{ipTpUdpCfm}---------# | | | #------{modifyIpTpReq}------->* | | | | #---------------------------------{modufyUdpSessionReq}---------------------------------->* | *<----------------------------------{sessionModifyCfm}------------------------------------# *<-----{modifyIpTpReq}--------# | | | #----{reserveAal2CepReq}----->* | | | | #--{reserveLocalAal2CepReq}-->* | | | *<----{reserveAal2CepCfm}-----# | | *<----{reserveAal2CepCfm}-----# | | | #--------------------------------------{nodeConnReq}------------------------------------->* | | | | #--------{connectCep?}------->* | | | *<-----
{disconnectCep?}-------# *<---------------------------------------{connCfm}----------------------------------------# | | | | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>{restart of Mod_A}<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< | | | | | X {connNotOkInd}------->* | X | #--{releaseLocalAal2CepReq}-->* | *<----{releaseAal2CepCfm}-----# | #-----------------------------------{sessionReleaseReq}---------------------------------->* >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>{FLOW IS HANGED - no response from ApplSci towards DrhCcRh!}<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< *<----------------------------------{sessionReleaseCfm}-----------------------------------#
PROCEDURE:
Upgrade RNC W10.1.3.7.
M3UA goes down when ET-MFG hanging 86%
Usage Count: 11
Network: WCDMA
Node: CPP RNC 3810
Service: W-RAN P7.1
Software: CPP RNC P7.1.4 EU4
Software: CPP RNC CXP9013831 R9YC/6
All M3UA goes down
All Mu3a connections are disable
Alarm: M3UA Association Down
Alarm: Contact to Default Router 1 Lost
Alarm: Contact to Default Router 0 Lost
ETMFG - te log readIpet_ipps_proc pcidrv_coli.c:3284 ERROR:ttyram: Could not send dataETMFG - te log readIpet_scish_proc scish_root.c:429 INFO:ColiDumpReq not yet implemented for BHRIIpet_ethost_proc ethost_root.c:437 INFO:ColiDumpReq not yet implemented for BHRIIpet_ethost_proc ethost_root.c:456 INFO:ColiDumpReq not yet implemented for INTERNAL_HOST
Refer to UABtr79810; WRNae89971; WRNae88084;HM15053; HM27791; HM52198; HM52208; HM52231;HL93961ET-MFG memory leakage makes the ET-MFG hanging happening. When the problem happens, ET-MFG will not be able to handle traffic.SOLUTION:
CONDITIONS:
PROCEDURE:
1. CPP has provided the solution for this. EU55 for P7.1.4 on 2010-09-03; W10.1.1-4 scheduled for delivery on Sept 15th and W10.1.2.REMEDY:
CONDITIONS:
PROCEDUREL:
1. Cold restart of ET-MFG or RNC node cold restart.This memory problem has tight relation with fragamented traffic. The counters: pmIpReasmReqds, pmIpReasmOks and pmIpReasmFails (IpAccessHostEt) can be used to check fragamented traffic. # This helps us to check the impact of each defined interface individuallyIpinterface pmDot1qTpVlanPortInFrames pmDot1qTpVlanPortOutFrames pmIfStatsIpOutDiscards pmIfStatsIpOutRequests # This helps us to check if any discarded data sent/received at the ETMFG GigaBithEthernet ports.GigaBitEthernet pmIfInDiscardsLink1 pmIfInDiscardsLink2 pmIfOutDiscardsLink2 pmIfOutDiscardsLink2 # Information for the fragmented traffic, to be activated at cusotmer's conveniences.IpAccessHostEt pmIpReasmFails pmIpReasmOks pmIpReasmReqds pmIpFragCreates pmIpFragFails pmIpFragOks
see also SCS1079568
WCDMA RNC W10 : Failed to unlock utrancell in IUB over IP site 86%
Usage Count: 4
Network: WCDMA
Node: CPP RNC
Service: W-RAN
Alarm: UtranCell_InternalResourceUnavailable
Utrancell do not come up
ReadErrorLog: ModuleMP, te log read
RnhLmCellCPT(rnhCellRoC[n]) ../src/RnhCellRoC.cpp:7901 INFO:rnhCellRoC[cellFroId n, iubLinkFroId n] failed to unlock cell, reason : reason : RnhCellDataD::errorStatusNoDrhResourcesRnhCellDataD::errorStatusNoDrhResourcesReadErrorLog: ModuleMP, te e trace3 drhCcRhRouterC
RnhLmCellCPT(drhCcRhRouterC) ../src/DrhCcRhRouterC.cpp:491 TRACE3:cellRef=n is already in our list. We'll reject until the old one is removedReadErrorLog: ModuleMP, drh_ccrh_celldata all
00nn: List of all CC SPs owned by ccRhModule n:00nn: List of cells from SpId n:00nn: [n]: cellRef=<cellRef>, clientModuleId = 0, spmFroId = <spmFroId> msgBoard = [ releasing ]ReadErrorLog: ModuleMP, drh_ccrh_hostdata
00nn: IpTp table:00nn: ipTpSessionId ipHostFroId piuId serverSessionId clientPortIndex clientId msgBoard…00nn: 0x.. .. .. .. 0 0xffffffff releasing 00nn: 0x.. .. .. .. 0
0xffffffff releasing 00nn: 0x.. .. .. .. 0 0xffffffff releasing …00nn: 0x.. .. .. .. 0 0xffffffff releasingReadErrorLog: ET-IPG, te log read
Ipet_scish_proc scish_session.c:5234 ERROR:SciShSession applaudit req on non tagged session, true is returned
Ipet_lh_proc ipplh_agent.c:783 INFO:Restart Rank COLD and updated State is: 4
Ipet_atish_proc atish_trafind.c:398 INFO:TrafficIndication: COLD restartReadErrorLog:ET-IPG, llog
Board restart rank=Coldwithtest Proc=Cs_boardManager_proc Err=0xB0AD0006 (eri_api). Board manager restart. Restart ordered by system manager
ET-IPG crash or ET-IPG restart with Cold With HW test
Transmission outage
Node B restart
Root cause of the problem found :
Some cells could not come up after IP-ET board had been restarted. UtranCell_InternalResourceUnavailable alarms were raised.
Cause : The root cause of this problem was triggered by ET-IPG crash. DrhCcRh starts releasing of all IpTp (IP termination point) sessions created on all IpAccessHostEts that located on restarted ET-IPG board. During this procedure a release request signal is sent towards ApplSci. All these sessions are flagged as "releasing" until response from ApplSci service is received. But such response will not be received since ET-IPG board is restarted. Instead a serverDownInd signal is received. After that IP service initialization procedure is performed including an audit between DrhCcRh and ApplSci, which purpose is to clean unused (marked as faulty) IpTp sessions.
The cause of the problem is located in audit handling - if IpTp session is marked as faulty, then it should be removed and release procedure should go on. But if IpTp session is marked as faulty and that IpTp session is flagged as "releasing", then it is not removed - DrhCcRh still waits for release response from ApplSci. This hanged IpTp session prevent cell from releasing, so affected cell hangs after it is locked and eventually could not be unlocked.References:
HN49221 : W10B: Sector not come up after Node B Restart
Mapped to HN61121 : W10B: Sector not come up after Node B Restart
HM48225SOLUTION:
CONDITIONS:
1. ET-IPG board crash or has been restarted with rank "Cold With HW Test"
2. Utrancell does not come Up after node B restarted or after transmission outage.
3. Failed to unlock utrancell
4. Check if there is hanging on IpTp session in ModuleMP ,check printout of "lh mod drh_ccrh_celldata all" . If there is either "releasing" or "clearing" flags on cells, that means that ModuleMP has hanging on IpTp sessions
Note! This procedure requires software delivery. Please contact your local Ericsson Support for more
information.
PROCEDURE:
The correction will be delivered in W11.0.1.2 (CXP9014711/3-R2C)REMEDY:
CONDITIONS:
1. ET-IPG board crash or has been restarted with rank "Cold With HW Test"
2. Utrancell does not come Up after node B restarted or after transmission outage.
3. Failed to unlock utrancell
4. Check if there is hanging on IpTp session in ModuleMP ,check printout of "lh mod drh_ccrh_celldata all" . If there is either "releasing" or "clearing" flags on cells, that means that ModuleMP has hanging on IpTp sessions
This procedure is for recovery of the problem
PROCEDURE:
Restart RncLmCell process on problematic ModuleMp, please refer to KCS document SCS1003029 "CPP : How To restart a board a process or JVM. Using telnet, NCLI, Moshell or EMAS"
WCDMA RNC : High Module MP load on extension subrack with only one ET-MFX
86%
Usage Count: 3
Network: WCDMA
Service: W-RAN
Node: CPP RNC
Alarm: Ethernet Switch Port Fault
RRC degradation in one extension subrack
High RRC Failure in one subrack
RRCSucc degradation on all Module of an RNC Extension Subrack
High processor load can be observed in extension subrack.
Module MP overload in an entire RNC subrack (processor load >85%)
High MP load in ETMFX
ReadErrorLog: ET-MFX
Ipet_scish_proc scish_root.c:320 INFO:Changing priority from 21 to 19Ipet_scish_proc scish_root.c:320 INFO:Changing priority from 19 to 21
No access/connectivity to ET-MFX board
Root cause of the fault found. Configuration problem:
- High RRC failures on sites belonging to Module MP's on extension subracks which have one ET-MFX board
- High MP load on module MP's on Extension subracks with one ET-MFX board
- No or bad Connectivity to the impacted ET-MFX boards was restored after remoduling sites to other subracks that have two load sharing ET-MFX boards.
Investigation :
This problem is due to a dimensioning issue. There is too much Iublink activity on the ES for one ET-MFX board. The dimensioning on the node did not follow the Ericsson recommendation and it did not take the full advantage of the Spanning Tree Protocol.
Connectivity to the ET-MFX boards was restored after remoduling sites to other subracks that have two load sharing ET-MFX boards.
General recommendations about ET-MFX usage:
1- It is recommended to have two ET-MFX boards per subrack for load sharing and redundancy. So that if one ET-MFX board was lost the other will take all the traffic.
2- ET-MFX load sharing is supported only in the subrack. Intersubrack ET-MFX load sharing is not supported.
3- If both ET-MFX boards on the subrack were lost, the Iublink need to be re-allocated to a new subrack manually (Iublink preferredSubrack attribute) to remain operational.
SOLUTION:
CONDITIONS:
1- High MP load on module MP's on Extension subracks with one ET-MFX board
2- Alarm: Ethernet Switch Port Fault
This procedure is for correction of the configuration problem.
PROCEDURE:
Add a second ET-MFX board to the subrack that has only one ET-MFX board.REMEDY:
CONDITIONS:
1- High MP load on module MP's on Extension subracks with one ET-MFX board
2- Alarm: Ethernet Switch Port Fault
This procedure is for work around to avoid the problem until a better configuration is used.
PROCEDURE:
Re-alloacte the RBS's on module MP's with high MP load to a new subrack manually (preferred to an Extension subrack with two ET-MFX boards), so that the Iub links remain operational.
Board Restart: ET-IPG Error code: 0xB0AD0006 Process: Cs_boardManager_proc
86%
Usage Count: 2
Data collection for ET-IPG restart Error code: 0xB0AD0006 Process: Cs_boardManager_proc
Network: WCDMA
Service: W-RAN
Node: CPP RNC
Board Restart: Board manager restart. Restart ordered by system manager
Process Restart: ET-IPGError code: 0xB0AD0006 (Reported via CELLO:ERI IF)Process: Cs_boardManager_procRestart type: ProcessorERROR NUMBER 0xB0AD0006 WITH EXTRA DATA 0x00A60ABC WAS REPORTED BYPROCESS Cs_boardManager_procTYPE PRI-10BLOCK osemainReadErrorLog: ET-IPGIpet_scish_proc scish_session.c:5233 ERROR:SciShSession applaudit req on non tagged session, true is returned
Refer to TR HO87223 for details
Root cause not found
There is some timing issue between memory and network processor (NPU) cause the NPU HW alarms and ET-IPG board restarts.
The timing issue already improve in the TR HM97390. The delay value is experimentally determined on the basis of the worst board we had at that moment. It is required to analyze the faulty board to adjust the timing issue further. SOLUTION:
CONDITIONS:
1. ET-IPG restarts without board alarms
2. ReadErrorLog: ET-IPG
Ipet_scish_proc scish_session.c:5234 ERROR:SciShSession applaudit req on non tagged session, true is returned
3. This procedure is to send the faulty board to PLM for further analysis PROCEDURE:
If the problem happens again please do the following steps:
1. collect dcgm/dcgi 2. change the ET-IPG board with good one
3. Send the board in the following address with new TR no
Ericsson ABUlf WallgrenSE KI30 06401Färögatan 6.SE-164 80 StockholmSweden
SCS1198264, SCS1049250
PLM needed board for further investigation. After that event the board is working fine. So customer does not want to send the board.
They will wait until it occur again.
ERROR:SciShSession applaudit req on non tagged session, true is returned.
86%
Usage Count: 2
Network: WCDMA
Node: CPP RNC 3810
Software: CPP P7FP CU4 EU67
High speech drop on IP/IUB
ET-FMX shows Ipet_scish_proc scish_session.c:4168 ERROR:SciShSession applaudit req on non tagged session, true is returned
Upgrade from P7FP CU4 EU44 to P7FP CU4 EU67
Error trace states that application has setupped session before audit was finished. Reason for the call drops cannot be localized for nowREMEDY:
CONDITIONS:
1.- During failure it is found in error logs of ET-MFX the following message:
[2010-12-07 20:44:42.660] Ipet_scish_proc scish_session.c:4168 ERROR:SciShSession applaudit req on non tagged session, true is returned [2010-12-07 20:44:42.660] Ipet_scish_proc scish_session.c:4168 ERROR:SciShSession applaudit req on non tagged session, true is returned [2010-12-07 20:44:42.660] Ipet_scish_proc scish_session.c:4168 ERROR:SciShSession applaudit req on non tagged session, true is returned [2010-12-07 20:44:42.660] Ipet_scish_proc scish_session.c:4168 ERROR:SciShSession applaudit req on non tagged session, true is returned [2010-12-07 20:44:42.660] Ipet_scish_proc scish_session.c:4168 ERROR:SciShSession applaudit req on non tagged session, true is returned [2010-12-07 20:44:42.660] Ipet_scish_proc scish_session.c:4168 ERROR:SciShSession applaudit req on non tagged session, true is returned [2010-12-07 20:44:42.660] Ipet_scish_proc scish_session.c:4168 ERROR:SciShSession applaudit req on non tagged session, true is returned
PROCEDURE:
1.- board ET-MFX cold restart
Speech performance degraded after ET-IPG restart 86%
Usage Count: 2
Network: WCDMA
Service: W-RAN W10.1
Node: CPP RNC
Software: CPP RNC W10.1.2
Software: CPP RNC CXP9014711/2 R3F
Product Name: ET-IPG
Product ID: ROJ1192345/1
Speech performance degraded after ET-IPG restart
Board Restart: Board manager restart. Restart ordered by system manager
Process Restart: ET-IPGError code: 0xB0AD0006Process: Cs_boardManager_procRestart type: Processor
Speech performance affected on same subrack, where ET-IPG is located
ReadErrorLog: ET-IPGCls_Cls_atmPdr_proc atmpdr.c:596 INFO:Lost 1 packets on channel 14 due to Error: Errors: Length ReadErrorLog: ET-IPGCls_atmPdr_proc atmpdr.c:598 INFO:Egress VPI:0 VCI:134 Ingress VPI:0 VCI:131 Tag:0x4a1001bfbfbfbfReadErrorLog: ET-IPGIpet_scish_proc scish_session.c:5233 ERROR:SciShSession applaudit req on non tagged session, true is returnedReadErrorLog: ET-IPGapp6drProc app6dr_bh_hwsup.c:1519 INFO:NPU HW alarm: n2=0x80000002, n3=0x0, n4=0x0, n4top=0x0, n5=0x0, n8=0x0ET-IPG recovered after restart by it self but from the statistics was observed that RRC, RAB and CCSR degradation started when the ET-IPG board restarted.
After ET-IPG restart there were the following captured: Ipet_scish_proc scish_session.c:5233 ERROR:SciShSession applaudit req on non tagged session, true is returned
This trace means that application sent in CELLO_IPAPPLSCI_AUDIT_SET_REQ signal session Id was not marked by special flag on CPP-IPET RO level. This could happen as application begins setup sessions over IPAPPLSCI before it ends sessions audit and sessions which are setup after begining of audit procedure do not have AUDIT flag. We have sessions audit after warm restart on ET-board, after application reconnection or application could start audit procedure by sending CELLO_IPAPPLSCI_AUDIT_SET_REQ.
This trace point out that application should not setup sessions before end of audit procedure over IPAPPLSCI.
Anyway this sessions will be kept after end of session audit and It will not affect traffic. See HN32045-AA001 for more details. Besides this no suspicious traces has been seen in log. So, reason of traffic degradation after ET-IPG restart is unclear.
References: HN32045, Restart is handled in solution SCS1049250
REMEDY:
CONDITIONS:
1. Speech RAB, RRC and CCSR success rate decradation2. For recovery
PROCEDURE:
1. Soft lock the ET-IPG that restarted2. Cold restart ET-IPG3. Unlock ET-IPGREMEDY:
CONDITIONS:
For data collection
PROCEDURE:
1. Several times with 1 minute delay, please run data collection script attached to TR with the following command: ipg_dcs_r1.mos <node password> (see note, ipg_dcs_r1.mos)
and provide logs captured for further analysis.
2. Log in to ET-IPG affected, enable and capture the following traces: te e all SCISH_SESSION
Also capture output from following coli commands: SciShDump -o 0 -c EtHostDump -o 0 -c
ipg_dcs_r1.mos - Copy script below line and run it as described in the procedure________________________________________________________________l+mmo $tempdir/dummy# to be silent...
l echo "### ET-IPG Data Collection Script - version R01"
##################### Script Variables #####################$NumberOfRepeats = 1$WaitTime = 0
############################################### ipg data function step 1 call function ###############################################
func get_data_from_ipg_step1_call
if $board ~ all for $board1 in group_ipg get_data_from_ipg_step1 $board1 done else if $board ~ ^[0-9]+$ get_data_from_ipg_step1 $board fi
endfunc
################################# ipg data function step 1 #################################
func get_data_from_ipg_step1
if $1 ~ ^[0-9]+$ $etipg = $1 else return fi
#start logging l+mmo $logdir/$nodename_$ipaddress_step0_$etipg_$date.log
lhsh $etipg appdh info for $v1 = 0 to 8 lhsh $etipg appdh ipif $v1 lhsh $etipg appdh dist $v1 lhsh $etipg appdh rps $v1 lhsh $etipg apphost data $v1 done lhsh $etipg apphost info lhsh $etipg applh info lhsh $etipg applh attr #stop logging l-
endfunc
############################################### ipg data function step 2 call function ###############################################
func get_data_from_ipg_step2_call
if $board ~ all for $board2 in group_ipg get_data_from_ipg_step2 $board2 done else if $board ~ ^[0-9]+$ get_data_from_ipg_step2 $board fi
endfunc
################################# ipg data function step 2 #################################
func get_data_from_ipg_step2
if $1 ~ ^[0-9]+$ $etipg = $1 else return fi lt InternalEthernetPort lt IpInterface lma vlnids GigabitEthernet mr vlnids GigabitEthernet #start logging l+mmo $logdir/$nodename_$ipaddress_ipg_dcg_$etipg_stage$var_$date.log lhsh $etipg apparp info for $mo in vlnids lhsh $etipg apparp print $mo done for $v2 = 0 to 8 lhsh $etipg appdh cnt $v2 done lhsh $etipg appph info lhsh $etipg appph cnt lhsh $etipg applh cnt lhsh $etipg;appapi;pm all;q; lhsh $etipg;appapi;npr 3.0.0xc 6;q lhsh $etipg;appapi;npr 8.0.0xe00 0x48;q; lhsh $etipg;appapi;npr 8.0.0x3002c0 0x48;q; lhsh $etipg;appapi;npr 8.0.0x320000 0x44;q; lhsh $etipg;appapi;npr 8.0.0x330000 0x4c;q; lhsh $etipg;appapi;npRGS;q; lhsh $etipg;appapi;npRSS 0 40;q; #stop logging l- endfunc
################# BP traces #################
func get_BP_traces
if $1 ~ ^[0-9]+$ $intboardaddr = $1 else return fi if $debuglevel = 3 lhsh $intboardaddr; te log clear lhsh $intboardaddr; te e send_sig Ipet_ipps_proc fi if $debuglevel = 4 lhsh $intboardaddr; te log clear lhsh $intboardaddr; te e rec_sig Ipet_ipps_proc fi if $debuglevel = 5 && $mycpp_version = old5 lhsh $intboardaddr; te log clear lhsh $intboardaddr; te e trace3 Ipet_ipps_proc else if $debuglevel = 5 && $mycpp_version = new lhsh $intboardaddr; te log clear lhsh $intboardaddr; te e trace3 IPET_NPCI_IF fi if $debuglevel = 6 && $mycpp_version = old5 lhsh $intboardaddr; te log clear lhsh $intboardaddr; te e trace4 Ipet_ipps_proc else if $debuglevel = 6 && $mycpp_version = new lhsh $intboardaddr; te log clear lhsh $intboardaddr; te e trace4 IPET_NPCI_IF fi if $debuglevel = 7 && $mycpp_version = new lhsh $intboardaddr; te log clear lhsh $intboardaddr; te e param Ipet_ipps_proc fi
if $debuglevel = 8 && $mycpp_version = old5 #start logging l+mmo $logdir/$nodename_$ipaddress_BP_traces_debuglevel_$debuglevel_$intboardaddr.log lhsh $intboardaddr; te log read lhsh $intboardaddr; te default Ipet_ipps_proc #stop logging l- else if $debuglevel = 8 && $mycpp_version = new #start logging l+mmo $logdir/$nodename_$ipaddress_BP_traces_debuglevel_$debuglevel_$intboardaddr.log lhsh $intboardaddr; te log read lhsh $intboardaddr; te default Ipet_ipps_proc lhsh $intboardaddr; te default IPET_NPCI_IF
#stop logging l-
fi endfunc
############### MO data ###############
func get_MO_data
l echo "\n## Collecting MO information ##\n" #start logging l+mmo $logdir/$nodename_$ipaddress_MO_data_$date.log get GigaBitEthernet get IpInterface get IpAccessHostGpb pcr pmGigaBitEthernet GigaBitEthernet pcr pmIpInterface IpInterface pcr pmIpAccessHostGpb IpAccessHostGpb if $mycpp_version = old5 get UdpHostMainMsb get IpAccessHostMsb pcr pmIpAccessHostMsb IpAccessHostMsb
else
get IpAccessHostEt get IpAccessHostSpb
pcr pmIpAccessHostEt IpAccessHostEt pcr pmIpAccessHostSpb IpAccessHostSpb
fi if $mycpp_version = old5 && $debuglevel = 2 pdiff GigaBitEthernet|ipinterface|IpAccessUdpHostMsb|IpUdpHostMainMsb|IpAccessHostMsb|IpAccessHostGpb else if $mycpp_version != old5 && $debuglevel = 2 pdiff GigaBitEthernet|ipinterface|IpAccessHostEt|IpAccessHostGpb|IpAccessHostSpb fi
#stop logging l- endfunc
####################### get PM counters #######################
func get_PM_counters
#start logging l+mmo $logdir/$nodename_$ipaddress_PM_counters_stage$var_$date.log pget GigaBitEthernet pget IpInterface pget IpAccessHostGpb if $mycpp_version = old5
pget IpAccessHostMsb
else
pget IpAccessHostEt
pget IpAccessHostSpb
fi #stop logging l- endfunc
####################### del PM scanners #######################
func del_PM_scanners
pdel pmGigaBitEthernet pdel pmIpInterface pdel pmIpAccessHostGpb if $mycpp_version = old5 pdel pmIpAccessHostMsb else
pdel pmIpAccessHostEt pdel pmIpAccessHostSpb fi endfunc
######################################### SPAS statistics #########################################
func get_spashwinfo
l echo "### Collecting SPAS statistics ..." #start logging l+mmo $logdir/$nodename_$ipaddress_SPAS_statistics_$date.log ######################################## # ipg Boards # ######################################## if $board ~ all for $board1 in group_ipg lhsh $board1; spashwinfo all lhsh $board1; spashwinfo egrq lhsh $board1; spashwinfo ingrq done else if $board ~ ^[0-9]+$ lhsh $board1; spashwinfo all lhsh $board1; spashwinfo egrq lhsh $board1; spashwinfo ingrq fi ######################################## # GPB Boards # ######################################## for $board1 in group_gpb lhsh $board1; spashwinfo all lhsh $board1; spashwinfo egrq lhsh $board1; spashwinfo ingrq done
######################################## # SCB Boards # ######################################## for $board1 in group_scb lhsh $board1; spashwinfo all lhsh $board1; spashwinfo egrq lhsh $board1; spashwinfo ingrq done #stop logging l-
endfunc
######################################### T&E, alarm, event logs and other info#########################################
func get_logs l echo "\n## Collecting Alarm and Event logs ##\n" #start logging l+mmo $logdir/$nodename_$ipaddress_Alarm_and_Event_logs_$date.log lgaer #stop logging l- l echo "\n## Get boards configuration ##\n" #start logging l+mmo $logdir/$nodename_$ipaddress_cabx_$date.log cabx #stop logging l- endfunc
################################### M F G - M A I N ###################################
func focus_on_ipg
get_MO_data get_logs get_data_from_ipg_step1_call for $var = 1 to $NumberOfRepeats get_data_from_ipg_step2_call get_PM_counters wait $WaitTime done del_PM_scanners
if $debuglevel = 2 get_spashwinfo fi ################### ## BP Traces ## ################### if $board ~ all && $debuglevel > 2 for $board4 in group_ipg get_BP_traces $board4 done else if $board ~ ^[0-9]+$ && $debuglevel > 2 get_BP_traces $board fi endfunc
############# USAGE #############func print_usage l echo "\n###########################################################################################" l echo "Syntax: run <script name> <password to node> <debuglevel> all|<specific>\n" l echo "where '<debuglevel>' is a value from 1 upwards telling type of info grabbed and" l echo "where 'all|<specific>' means all boards or a specific one which is referred as 012300"
l echo "(If only password to node set script will run with debug level=1 and collect iformation" l echo "from all boards)" l echo "example: run /home/xxkuzyaa/tmp/ipg_dcg.mos x 2 000900" l echo "\n###########################################################################################" l echo "\n<debuglevel>" l echo "--------------------------------" l echo " 1\t Collect ipg TE Log, NP counters, MO and PM counters" l echo " (without pdiff), Alarm and Event logs" l echo " 2\t Collect ipg TE Log, NP counters, MO and PM counters, Alarm and Event logs, SpasHwInfo," l echo " 3\t BP traces: enable send_sig on Ipet_ipps_proc" l echo " 4\t BP traces: enable rec_sig on Ipet_ipps_proc" l echo " 5\t BP traces: enable trace3 on Ipet_ipps_proc (CPP5.1) or trace3 on IPET_NPCI_IF (CPP6,7)" l echo " 6\t BP traces: enable trace4 on Ipet_ipps_proc (CPP5.1) or trace4 on IPET_NPCI_IF (CPP6,7)" l echo " 7\t BP traces: enable param on Ipet_ipps_proc (CPP6,7)" l echo " 8\t BP traces: Read and store T&E log" l echo "\n###########################################################################################"
endfunc
########################## #### M A I N #### ##########################
# check argumentsif $1 l echo "\nStarting ..." $password = $1 unset $1else print_usage l- returnfi
if $2 ~ ^[0-9]+$ $debuglevel = $2 unset $2else $debuglevel = 1fi
if $3 = all || $3 ~ ^[0-9]+$ $board = $3 unset $3 else $board = allfi
#some info to the userl echo "\n####################################################################################################"l echo "### Data collection executing ..."l echo "### Result is stored here: $logdir/$nodename_$ipaddress_$ipg_or_ipg_..."l echo "####################################################################################################"
$date = `date +%y%m%d-%H%M`
#start loggingl+mmo $logdir/$nodename_$ipaddress_ipg_dcg_$date.log
ba group_ipg ipg
######################################################Print all user variables and scripting variables######################################################uvpv
readclock################################### Get the MO's ###################################lt all
readclock
################################### Check the MOM version...#################################
#Possible printouts:#$cellomomversion = 6-LSV31-1#$cellomomversion = 6.1-LSV13-2#$cellomomversion = 7-LSV26_13-3#$celloversion = 7-LSV34.6BC1-1
if $cellomomversion >= 7 || $celloversion >= 7 $mycpp_version = newfi
if $cellomomversion >= 6 && $cellomomversion < 7 $mycpp_version = old6fi
if $celloversion >= 6 && $celloversion < 7 $mycpp_version = old6fi
if $cellomomversion >= 5 && $cellomomversion < 6 $mycpp_version = old5fi
if $celloversion >= 5 && $celloversion < 6 $mycpp_version = old5fi
l echo "\n###################################"l echo "## MOM version is $mycpp_version "l echo "###################################\n"
################################### Do the work!#################################focus_on_ipg
readclock
unset $date
#stop loggingl-
#stop silent loggingl-
#Done
ET-MFX Board Restart : OSE_ECORRUPTED_POOL 86%
Usage Count: 2
Network: WCDMA
Node: CPP RNC 3810
Service: W-RAN P7.1
Software: CPP RNC P7.1.4 EU55
Software: CPP RNC CXP9013831 R9YC/65
ET-MFX Board Restart : OSE_ECORRUPTED_POOL
ReadErrorLog: ET-MFXExs_spi_proc exspi_proc_write_normal.c:280 ERROR:Normal IO write failed with 3, page 0x80, reg 0x38, size 2, data 0x7C sender 0x101DDIpet_scish_proc scish_session.c:2166 ERROR:Illegal sessionId=4294967295Ipet_scish_proc scish_session.c:2166 ERROR:Illegal sessionId=4294967295
Ipet_scish_proc scish_session.c:2166 ERROR:Illegal sessionId=4294967295Root cause not found. The current problem is reported on a dbm2 based board where all the load modules share a common pool called mainpool for signal allocations. It is quite possible that a signal buffer can corrupt the other signals that lies adjacent to it. In such cases, the problems like OSE_ECORRUPTED_POOL can be reported on different processes.REMEDY:
CONDITIONS:1. The problem happens frequently in the RNC.
PROCEDURE:1. Enable bellow traces on the module MP which is using the ET-MFX board.
lh modx te e trace1 drhTrBrIpClh modx te e trace1 drhCcRhClh modx te e rec_sig send_sig param trace1 cpxApplSciCThe conditions under which OSE_ECORRUPTED_POOL is reported is most likely related to a user error. This kind of errors will be reported by kernel when it detects that the buffer that is presented to it via system calls such as send(), sender(), restore() etc.. is corrupted. The usual case for the fault is that some other process write to a buffer outside its allocated size. This will result in the overwriting of the next buffer i.e. you are likely to have the problem in some other part of the code that overwrites the buffer, but the problem is reported when this corrupted signal is presented to the kernel via system calls send, receive, restore etc...