overview of madeira case study and functional requirements

40
MADEIRA Design Document Scenario Description for the Madeira Project MAD-WP4:0001 Ver. D1.03 Author(s) L. Fallon Ericsson J. Neilsen Ericsson A. Leddy Ericsson M. Zach Siemens Date: 11/01/2005 Pages: 40

Upload: peterbuck

Post on 23-Dec-2014

189 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Overview of Madeira case study and functional requirements

MADEIRA

Design Document

Scenario Description for the Madeira Project

MAD-WP4:0001 Ver. D1.03

Author(s) L. Fallon EricssonJ. Neilsen EricssonA. Leddy EricssonM. Zach SiemensDate: 11/01/2005 Pages: 32

Page 2: Overview of Madeira case study and functional requirements

MADEIRA 11/01/2005Scenario Description MADWP4:0001 Ver. D1.03

Table of Contents

1 Document Information.............................................................................................................51.1 Document History................................................................................................................51.2 Keywords.............................................................................................................................51.3 Glossary and Abbreviations................................................................................................51.4 Purpose of the Document....................................................................................................51.5 Project Internal References.................................................................................................51.6 External References............................................................................................................51.7 Relationship to Other Documents.......................................................................................61.8 Open Issues........................................................................................................................6

2 Approach.................................................................................................................................9

3 Background...........................................................................................................................10

4 Scenario Description.............................................................................................................114.1 Scene 1: Wireless Base Station Deployment....................................................................114.2 Scene 2: Wireless Meshed Network Formation................................................................124.3 Scene 3: Wireless Equipment Communication.................................................................144.4 Scene 4: Wireless Meshed Network Re-Formation..........................................................154.5 Scene 5: Preferred Base Station Fails..............................................................................174.6 Scene 6: Bridging Base Station Fails................................................................................194.7 Scene 7: Gateway Base Station Fails...............................................................................214.8 Scene 8: All Bridging Base Stations for a Base Station Fail.............................................224.9 Scene 9: All Gateway Base Stations Fail..........................................................................244.10 Scene 10: Wireless Meshed Network Load Balancing..................................................26

5 Scenario Realization using Traditional Approaches.............................................................28

6 Scenario Realization using the Madeira Approach...............................................................30

7 Appendices...........................................................................................................................317.1 Appendix 1: Fault Management Overview........................................................................31

Page 2 of 32

Page 3: Overview of Madeira case study and functional requirements

MADEIRA 11/01/2005Scenario Description MADWP4:0001 Ver. D1.03

Index of TablesTable 1: Document History........................................................................................................5Table 2: Glossary and Abbreviations........................................................................................5Table 3: Project Internal References........................................................................................5Table 4: External References....................................................................................................5Table 5: Open Issues................................................................................................................8Table 6: Fault Cases for Failure of Preferred Base Station....................................................18Table 7: Fault Cases for Failure of Bridging Base Station......................................................20Table 8: Fault Cases for Failure of Gateway Base Station.....................................................21Table 9: Fault Cases for Failure of All Bridging Base Stations for a Base Station..................23Table 10: Fault Cases for Failure of All Gateway Base Stations............................................25Table 11: Fault Cases for Wireless Mesh Load Balancing.....................................................27

Page 3 of 32

Page 4: Overview of Madeira case study and functional requirements

MADEIRA 11/01/2005Scenario Description MADWP4:0001 Ver. D1.03

Index of FiguresFigure 1: Deployment of Wireless Base Stations....................................................................11Figure 2: Typical Primary and Secondary Mesh Connectivity................................................12Figure 3: Base Station Capabilities for a Network Subset......................................................13Figure 4: NE Connection to Meshed Network.........................................................................15Figure 5: Meshed Network Reformation.................................................................................16Figure 6: Preferred Base Station Failure................................................................................17Figure 7: Bridging Base Station Failure..................................................................................19Figure 8: Gateway Base Station Failure.................................................................................21Figure 9: Failure of all Bridging Base Stations for a Base Station..........................................23Figure 10: Failure of all Gateway Base Stations.....................................................................25Figure 11:Traditional Approach to Meshed Network Management.........................................28

Page 4 of 32

Page 5: Overview of Madeira case study and functional requirements
Page 6: Overview of Madeira case study and functional requirements

1 Document Information

1.1 Document HistoryVersion Date Comments EditorD1.01 2004-11-03 Initial Issue L. Fallon, EricssonD1.02 2004-11-10 Updated after Consortium Review L. Fallon, Ericsson

D1.032005-01-11 Updated after Discussions at Ipswich

MeetingL. Fallon, Ericsson

Table 1: Document History

1.2 Keywords

MADEIRA, P2P, Network Management

1.3 Glossary and AbbreviationsTerm Explanation3G Third Generation Mobile NetworkBS Base StationIP Internet ProtocolLAN Local Area NetworkManagement Function

The mechanism used to manage the network, whatever that might be

Table 2: Glossary and Abbreviations

1.4 Purpose of the Document

This document describes and analyses the scenario for the Madeira project and suggests how that scenario might be realised on real networks. It goes on to describe how a subset of that scenario might be deployed on real equipment and on a simulated environment for demonstration purposes.

1.5 Project Internal ReferencesNo. Short Code Document Reference

I1 MAD-WP1-REQ-0001 Madeira: Requirement Specification, Architecture and Interfaces

I2 MAD-WP1-DD-0002 Madeira Architectural SpikeI3 Missing Reference to Madeira Modelling DocumentsI4 Missing Reference to Operator Overview DocumentI5 Missing Reference to State of Art documents on mesh network

management (TSSG)

Table 3: Project Internal References

1.6 External ReferencesNo. Short Code Document Reference

E1 X.733 ITU Recommendation X.733: Information technology - Open Systems Interconnection – Systems Management: Alarm Reporting Function

Table 4: External References

Page 7: Overview of Madeira case study and functional requirements

1.7 Relationship to Other Documents

The Operator Overview document [I4] and the Mesh Networking State of Art document [I5] are input to this document.

This document is used as input into the requirements [I1], architecture [I2], and modelling [I3] for the Madeira Project.

1.8 Open IssuesNumber Description Issue Closing Comments

1 The scenario described in this document addressed the management of connectivity in a homogeneous network. Extending the scenario to include service provisioning, service re-provisioning and service assurance over a set of heterogeneous networks might be interesting. Such an extension of the scenario might look at how connectivity is configured when an 802.11b wireless network and a wireless network with wide area coverage such as 3G or 802.16 is available. In such an extension, the secondary connective path might be over the network with wide area coverage.

7th -8th December, 2004, Ipswich:The Project agreed to write an Operator Overview document that describes the macro service view into which the scenario described in this document fits. That document will link to this scenario document. The Operator Overview document should be owned by one of the operator partners in the project. This issue will be incorporated into the Operator View document [I4].

2 It is not clear if the configuration changes should be initiated automatically by the management functionality itself or should be initiated by the operator.

7th -8th December, 2004, Ipswich:The Madeira Project will investigate both manual and automatic initiation.

3 A study of the radio meshing techniques available today is required so that known issues with today’s approaches can be addressed.

7th -8th December, 2004, Ipswich:The TSSG agreed to undertake this study. Ericsson agreed to distribute some reports and surveys on meshing techniques (AP2, Ipswich).

4 Currently, the scenario is framed such that it applies to a conference at a single location only. The scenario will be updated to include a conference distributed over many locations.

Scene x: Multi-operator scenario over 2 locationsAt some point, some users need to negotiate strategies with users in the alternative venue. The OSS/NMS for that system negotiates with all the sub-networks for service provision and access.On the way to second venue on the metro train, the group of users wish to finish a presentation before reaching the second venue. For this purpose, they need to make use of the city’s UMTS network. The OSS/NMS for the mobile system must negotiate with all the sub-networks for service provision and access.A different operator provides the WLAN/LAN services at the other venue. On reaching the

7th -8th December, 2004, Ipswich:The Project agreed to write an Operator Overview document which describes the macro service view into which the scenario described in this document fits. That document will link to this scenario document. The Operator Overview document should be owned by one of the operator partners in the project. This issue will be incorporated into the Operator View document [I4].

Page 8: Overview of Madeira case study and functional requirements

second venue, The OSS/NMS for the new system must negotiate with all the sub-networks for service provision and access.

5 The project definitions and project strategy for mobility versus portability must be defined. Will it be possible to move APs and NEs and will it be possible to keep the services active while those devices are moving?

7th -8th December, 2004, Ipswich:The Madeira Project agreed that Service Portability is in the scope of the project but Service Mobility is not in the scope of the project.

6 Concerning Fault Management, a first classification must be carried out, for example in a table:· How is a fault detected? (Might be by user/equipment, Base Station itself, other Base Station, or higher management layer - e.g. for scene 9)· Origin of fault · More detailed case differentiation · Is the problem solved autonomously by management function, or operator intervention required? · What is the proposed repair action?

7th -8th December, 2004, Ipswich:Siemens have distributed and presented material on this issue. That material will be included in this version of the scenario document.

7 Scenes where service degradation or cessation occurs (Especially scenes 8 and 9) must be tied into the business layer the upper layers of management must get involved; a discussion will be carried out in the consortium on this issue.For example, if a customer calls and complains that they cannot access a specific telephone number e.g. an emergency number - what does the operator actually do? And what does the management functionality actually do to help identify where the problems lies?

7th -8th December, 2004, Ipswich:This issue will be covered in the Operator View document (See Issues 1 and 4).BT agreed to investigate if work carried out on the Android project might be useful here [I4].

8 This document is describing the scenario. The implementation of that scenario on a Madeira prototype on real equipment and on simulators must also be described by the project. That description could be in this document but it may be more appropriate to have a separate document for implementation. To be discussed in the consortium.

7th -8th December, 2004, Ipswich:This will be included in a later version of this scenario document when the Madeira Management Solution is defined and described.

9 For comparison and for emphasising Madeira specific approaches this document must examine to what extent this scenario (with given connectivity conditions) might be realized using a traditional hierarchical network management approach, and where the shortcomings and limitations of such an approach is.

7th -8th December, 2004, Ipswich:This will be included in the current version of the scenario document.

Table 5: Open Issues

Page 9: Overview of Madeira case study and functional requirements

2 Approach

The scenario for the Madeira project must satisfy three criteria. It is crucial that the scenario that is identified for the Madeira project be challenging; providing a number of tasks that can be used to exercise the management approach being championed by the Madeira project. It important that the scenario be grounded; it must describe identifiable, familiar, and realistic management problems that could, or better, do arise. The scenario must also be practical; it must be possible to demonstrate on a small scale on real, available, and inexpensive network equipment; and it must also be possible to simulate on a large scale on simulators.

The scenario selected by the Madeira project looks at the challenging area of management of wireless meshed networks, identifying a number of management tasks to be implemented using the management approach advocated in Madeira. The scenario is grounded because it identifies real problems that arise during the deployment and management of wireless meshed networks; operators will authenticate the scenario during the project. Finally, the scenario is practical because it uses a Wireless LAN meshed network as its target network because it is possible to build a wireless meshed network using inexpensive Wireless LAN equipment and a Wireless LAN network can be simulated.

It is important to point out that the management approach being championed by Madeira can be used to manage any network; wireless or wired. In this scenario, the Wireless LAN network is a metaphor for any carrier-grade wireless or wired network.

The scenario goes into some detail describing the particular issues in managing a wireless meshed network; a certain level of understanding is required in order to come to grips with the domain being managed. One should remember, however, that the purpose of the scenario is to validate the management approach being taken by Madeira, not to provide a comprehensive solution for management of wireless meshed networks.

Page 10: Overview of Madeira case study and functional requirements

3 Background

The traditional method of deploying a wireless network is to use a wired network for backhaul and to connect wireless base stations to the wired network at various points. The wireless base stations are deployed independently of each others’ back haul; with each base station having its own connection to the wired backhaul network.

The concept of wireless meshed networks is gaining currency. In a wireless meshed network, wireless base stations exist just as they do in a traditional wireless network configuration. The difference is that each base station may or may not have a wired connection to a backhaul network. In a wireless meshed network, the wireless network sector of each base station co-operates with the surrounding sectors to provide backhaul connectivity to all network elements connected to the meshed network. Only some of the network sectors are connected directly to a wired backhaul network and thus to external networks. Other sectors of the wireless meshed network use adjacent sectors for external connectivity.

The following capabilities must be realised in the auto-configuration feature of a wireless meshed network in order to implement the scenes in the scenario:

1. The ability to form a wireless network using whatever wireless base stations are in range

2. The ability to provide backhaul connectivity to any network element in range over that wireless network

3. The ability to add a network element to the network in a manner which will have the least impact on existing network elements on the network

4. The ability to redistribute network resources to the remaining network elements in the most fair and equitable manner when a network element is removed from the network

5. The ability to add base stations to the network in a manner which will have the least impact on existing network elements on the network

6. The ability to redistribute network resources in the most fair and equitable manner when a base station is removed from the network

7. The ability to identify which base stations in the wireless network have external backhaul connectivity

8. The ability to provide external backhaul connectivity to network elements that require it, even if those network elements are connected to a base station that is not directly connected to an external backhaul network

9. The ability to re-configure connectivity to the backhaul network in the most fair and equitable manner when a new base station with connectivity to the backhaul network is added to the network

10. The ability to re-configure connectivity to the backhaul network in the most fair and equitable manner when a base station with connectivity to the backhaul network is removed from the network

Page 11: Overview of Madeira case study and functional requirements

4 Scenario Description

The annual Total Worldwide Information Technology (TWIT) exposition in Technoville is getting more popular year by year. The organiser of TWIT realises that Futura Centre in Technoville where the conference is normally held is too small to accommodate the number of exhibitors and attendees that wish to attend the conference. In fact, they have had to restrict the number of both attendees and exhibitors for the last two years. The Futura Centre is a purpose built exhibition centre, with wired LAN connections to each exhibition space. The TWIT organisers have been providing isolated hotspots at the Futura Centre for the last two years at TWIT.

The Technoville football team, the Zombies, have just moved into a modern stadium, the Zombie Bowl. The Zombie Bowl has a closable roof, and has a playing surface that can be converted into a conference floor. The Zombie Bowl has 4 times more floor space than the Futura Centre, but costs approximately the same as the Futura Centre to hire. The Zombie Bowl has a fibre LAN at the periphery of the conference floor, but has no wired connections to each exposition spaces.

4.1 Scene 1: Wireless Base Station Deployment

The organisers of TWIT decide to deploy a meshed wireless network to provide connectivity to all the participants at the conference for the duration of the conference. They deploy the network so that each exposition space has coverage from two wireless base stations. The wireless base stations at the periphery of the conference floor are connected to the fibre LAN. The wireless base stations in the centre of the conference floor are not directly connected to the fibre LAN.

Figure 1: Deployment of Wireless Base Stations

Scene Challenge: Deploy all wireless base stations and activate wireless coverage

Page 12: Overview of Madeira case study and functional requirements

Operator Perspective: The operator distributes the base stations throughout the conference space, placing them in a manner that gives optimal coverage for the conference. S/he connects base stations that are adjacent to a wired network to that network. S/he then switches all the base stations on.

Management Issues: How base stations are discovered?

Is there a difference of node type at this stage?

What unique management functionality is needed or are all nodes equal?

Domain Specific Issues: What is the basic boot strapping required at the transport layer?

4.2 Scene 2: Wireless Meshed Network Formation

The TWIT organisers ask the management function of the network to set up a wireless meshed network.

Figure 2: Typical Primary and Secondary Mesh Connectivity

The management function sets up the wireless meshed network. The network depicted in Figure 2 shows the connectivity that might result from such an activity. The network is set up so that each base station has a primary path to one of the available LAN gateway points and a secondary path to another of the available LAN gateway points. The management function ensures that the network is configured so that a network element in any exhibition space can connect to at least two base stations to get connectivity. The management functionality must set up two connections to the Internet for each base station so that, in the event that one connection fails or is disconnected, a base station has another connection that can be used to manage the base station.

There are three capabilities that base station can have:

Page 13: Overview of Madeira case study and functional requirements

i. Providing wireless connectivity to network elements

ii. Acting as a gateway for external connectivity

iii. Bridging connections for other base stations

Figure 2 shows a sample configuration for the meshed network. All the base stations in the network provide wireless connectivity for network elements. BS04, BS21, and BS23 act as gateways for the meshed networks. All base stations can bridge connections if they are configured to do so.

Figure 3: Base Station Capabilities for a Network Subset

Figure 3 shows how the base stations might be configured in the mesh for a subset of the network. The primary connection of BS00 uses BS10 as a bridging node and BS21 as its gateway node; the secondary connection of BS00 uses BS01, BS02, and BS03 as bridging nodes and BS04 as its gateway node. BS12 connects directly to gateway nodes BS23 and BS21 for its primary and secondary connections respectively. BS24 connects directly to BS23 for its primary connection and bridges over BS14 to gateway BS04 for its secondary connection.

Gateway nodes are directly connected to the wired LAN and thus do not bridge for their primary connection. They must bridge for their secondary connections. In Figure 3, BS21 uses BS23 as a gateway for its secondary connection and bridges over BS22 to reach it.

Scene Challenge: Management functionality configures primary and secondary backhaul connectivity for all base stations

Operator Perspective: The operator asks the management functionality to build a wireless meshed network. S/he provides the management functionality with some

Page 14: Overview of Madeira case study and functional requirements

parameters such as the level of redundancy required, the maximum number of bridged base stations to use in a connection, and the identities of the gateway base stations. The management function sets up the meshed network and reports back to the operator. The operator can then carry out some tests to ensure that all base stations are connected and that test network elements can connect using the meshed network. The operator then declares the meshed network to be operational.

Management Issues: How are nodes partitioned; what is the best selection process?

What mechanism is used to do resource discovery?

Where is the state information for the mesh stored and cached?

Is there an overseer (Super Peer) to resolve conflicts?

Which fault conditions such as failure to establish secondary paths must be considered?

Domain Specific Issues: How is IP routing set up in this context?

4.3 Scene 3: Wireless Equipment Communication

A participant at TWIT decides to turn on an item of wireless equipment.

The wireless equipment determines which base stations are in range and designates one of those base stations as its preferred base station. All other base stations that are in range are designated as alternative base stations. The wireless equipment connects to the Internet using the primary connection of its preferred base station and starts using services available on the Internet over that connection. Such services might include email, access to corporate office environment, web services, or the ability to access financial systems.

Figure 4 shows how three network elements connect to the meshed network. NE01 can see four base stations; BS00, BS01, BS10, and BS11. NE01 determines that the best base station to connect to is BS01 so that base station is designated as its preferred base station. Base stations BS02, BS03, and BS04 are designated as alternative base stations. In the case of NE02, it can also see four base stations; BS12, BS21, BS22, and BS23. NE02 determines that the best base station to connect to is BS12 so that base station is designated as its preferred base station. Base stations BS21, BS22, and BS23 are designated as alternative base stations. NE03 can also see four base stations; BS13, BS14, BS23, and BS24. NE03 determines that the best base station to connect to is BS24 so that base station is designated as its preferred base station. Base stations BS13, BS14, and BS23 are designated as alternative base stations.

Page 15: Overview of Madeira case study and functional requirements

Figure 4: NE Connection to Meshed Network

Scene Challenge: Connect network element to Internet, and the services available over the Internet, over bridged backhaul

Operator Perspective: The operator can view and monitor the connectivity of any network element in the network. S/he can trace the connection being used from the terminating base station, across bridging base stations, to the gateway base station being used.

Management Issues: How is security authentication achieved at both network and user level?

Domain Specific Issues: Are there routing priorities for certain user types or is it best effort?

4.4 Scene 4: Wireless Meshed Network Re-Formation

The TWIT organisers add some base stations to the network, move some base stations around, and remove some base stations from the network because the conference director wishes for all participants to converge for a Q/A forum. The management functionality re-configures the meshed network, exactly as described in Scene 2.

Page 16: Overview of Madeira case study and functional requirements

Figure 5: Meshed Network Reformation

Figure 5 shows how the network depicted in Figure 2 might look after a reformation. The management functionality has reconfigured the primary and secondary paths to the base stations as a result of base stations being moved around. The paths being used by the base stations have altered considerably compared to the previous paths.

Scene Challenge: Meshed network adapts to utilize extra resources

Operator Perspective: The operator asks the management functionality to re-form the wireless meshed network. S/he can provide the management functionality with some parameters or can ask the management functionality to use the parameters set at network initiation. The management function adapts the meshed network and reports back to the operator. The operator can then carry out some tests to ensure that all base stations are connected and that test network elements can connect using the meshed network.

Management Issues: How are existing users affected by this operation?

What criteria are used to initiate this operation, monitoring or operator action?

Who or what stores the new topology information or is there any such cache?

Which fault conditions such as failure to establish secondary paths must be considered?

The post condition for removal of a base station is the same as for a base station failure in later scenes. How will the management functionality distinguish this case from a case where a base station fails?

Page 17: Overview of Madeira case study and functional requirements

Domain Specific Issues: None Identified

4.5 Scene 5: Preferred Base Station Fails

Figure 6: Preferred Base Station Failure

In this scene, the base station being used by the wireless equipment fails. The management functionality re-configures the meshed network, exactly as described in Scene 2.

The wireless equipment switches over to an alternative base station, using that base station’s primary path for connectivity to the Internet. All services remain available and are not interrupted. The meshed network’s fault management function issues an X.733 compliant [E1] alarm on its northbound interface. The table below details the fault handling cases that arise in this scene.

Scene case

Root of problem

Fault detection (who, how?)

Impact (from view of detecting element)

Way of restoring

Repair action Severity of raised alarm

Post condition (after alarm has been raised)

5.1 BS x outage (complete)

WE (NE) cannot connect via BS x

Loss of service Automatic (WE -> alt. BS -> NMF)

WE contacts its alternative BS and triggers reconfiguration

Warning Network reconfigured: secondary gets new primary path; new secondary path has been established

5.2 Other BS (BS y, using BS x for bridging) realizes loss of communication

Loss of service for all WE connected via BS x

Automatic (BS y -> alt. B-BS -> NMF)

BS y contacts alternative B-BS and triggers reconfiguration

Warning -“-

5.3 Other BS (BS z, used by BS x for bridging)

No direct impact for WE connected via BS z, but loss of

Automatic (BS z -> NMF)

BS z triggers reconfiguration

Warning -“-

Page 18: Overview of Madeira case study and functional requirements

Scene case

Root of problem

Fault detection (who, how?)

Impact (from view of detecting element)

Way of restoring

Repair action Severity of raised alarm

Post condition (after alarm has been raised)

realizes loss of communication

service for all WE connected via BS x or any other BS y that uses BS x for bridging)

5.4 BS x outage (partial; only WE p connectivity lost)

BS x Loss of service for WE p connected via BS x)

Automatic (BS x -> NMF)

BS x triggers reconfiguration

Warning -“-

5.5 -“- -“- -“- -“- -“- Minor Network reconfigured: secondary gets new primary path; but no new secondary path is available1

Table 6: Fault Cases for Failure of Preferred Base Station

Scene Challenge: Meshed network reports fault and adapts its back haul connectivity to deal with base station failure

Operator Perspective: The operator is informed by an alarm that a preferred base station has failed and that the management functionality has re-formed the wireless meshed network. S/he can view the report generated by the management functionality and carry out some tests to ensure that all base stations are connected and that test network elements can connect using the meshed network. The operator may then repair or replace the failed base station and carry out a wireless network re-formation as described in Scene 4 to bring that base station back into the meshed network. The management functionality will then cease the alarm.

Management Issues: Where is the reported fault sent?

Reconfiguration events (CM) and alarm events (FM) must be distinguished; these events are strongly interrelated. Alarms may trigger reconfigurations and reconfigurations may trigger alarms. How are these events correlated by the management functionality in network elements and management entities?

When multiple alarms are generated, where are they correlated, and what level of fault information is presented to which actors within the system?

Arbitrary combinations and multiplicities of considered faults are possible, how will the management functionality distinguish between them?

How is topology related information represented in a dynamic and distributed network, and how is topology information reported in alarms?

The assumption is that automatic reconfiguration works as expected and is fast enough to ensure no interruption of service. Must we also consider cases where reconfiguration fails and/or reconfiguration delays cause loss of service?

1 This row with alarm severity minor suggests the following escalation strategy: For all cases in Scene 5, an automatic reconfiguration is triggered to maintain connectivity. As long as this reconfiguration results in a new primary and secondary path for connectivity of each BS, the overall state of the system with respect to connectivity has not been changed, and therefore only an alarm with severity warning is being raised. If a secondary path is not available after reconfiguration (as in case of multiple BS outages), an alarm with severity minor is being raised, as there is a higher risk now of service loss.

Page 19: Overview of Madeira case study and functional requirements

Domain Specific Issues: In the recovery scenario, how is IP address allocation resolved?

4.6 Scene 6: Bridging Base Station Fails

In this scene, a bridging base station in the connection being used by the wireless equipment fails. The management functionality re-configures the meshed network, exactly as described in Scene 2.

Figure 7: Bridging Base Station Failure

The wireless equipment continues to use its preferred base station for connection to the Internet. All services remain available and are not interrupted. The meshed network’s fault management function issues an X.733 compliant [E1] alarm on its northbound interface. The table below details the fault handling cases that arise in this scene.

Scene case

Root of problem

Fault detection (who, how?)

Impact (from view of detecting element)

Way of restoring

Repair action Severity of raised alarm

Post condition (after alarm has been raised)

6.1 B-BS x outage

Other BS (BS y, using BS x for bridging) realizes loss of communication

Loss of service for all WE connected via BS x)

Automatic (BS y-> Alternative B-BS -> NMF)

BS y contacts alternative B-BS and triggers reconfiguration

Warning Network reconfigured: secondary gets new primary path; new secondary path has been established

6.2 Other BS (BS z, used by BS x for bridging) realizes loss of communication

No direct impact for WE connected via BS z, but loss of service for all WE connected via BS x or any other BS y that uses BS x for bridging)

Automatic (BS z -> NMF)

BS z triggers reconfiguration

Warning -“-

6.3 -“- -“- -“- -“- -“- Minor Network reconfigured: secondary gets new primary path; but no new secondary path is available2

Page 20: Overview of Madeira case study and functional requirements

Table 7: Fault Cases for Failure of Bridging Base Station

Scene Challenge: Meshed network reports fault and adapts its backhaul connectivity to deal with bridging base station failure

Operator Perspective: The operator is informed by an alarm that a bridging base station has failed and that the management functionality has re-formed the wireless meshed network. S/he can view the report generated by the management functionality and carry out some tests to ensure that all base stations are connected and that test network elements can connect using the meshed network. The operator may then repair or replace the failed base station and carry out a wireless network re-formation as described in Scene 4 to bring that base station back into the meshed network. The management functionality will then cease the alarm.

Management Issues: Is there state information in the bridge that needs to be reformed or recovered and taken over by new base station?

When multiple alarms are generated, where are they correlated, and what level of fault information is presented to which actors within the system?

Arbitrary combinations and multiplicities of considered faults are possible, how will the management functionality distinguish between them?

Reconfiguration events (CM) and alarm events (FM) must be distinguished; these events are strongly interrelated. Alarms may trigger reconfigurations and reconfigurations may trigger alarms. How are these events correlated by the management functionality in network elements and management entities?

How is topology related information represented in a dynamic and distributed network, and how is topology information reported in alarms?

The assumption is that automatic reconfiguration works as expected and is fast enough to ensure no interruption of service. Must we also consider cases where reconfiguration fails and/or reconfiguration delays cause loss of service?

Domain Specific Issues: Are there issues for our IP routing protocol in this scenario?

4.7 Scene 7: Gateway Base Station Fails

In this scene, the gateway base station in the connection being used by the wireless equipment fails. The management functionality re-configures the meshed network, exactly as described in Scene 2.

2 This row with alarm severity minor suggests the following escalation strategy: For all cases in Scene 6, an automatic reconfiguration is triggered to maintain connectivity. As long as this reconfiguration results in a new primary and secondary path for connectivity of each BS, the overall state of the system with respect to connectivity has not been changed, and therefore only an alarm with severity warning is being raised. If a secondary path is not available after reconfiguration (as in case of multiple BS outages), an alarm with severity minor is being raised, as there is a higher risk now of service loss.

Page 21: Overview of Madeira case study and functional requirements

Figure 8: Gateway Base Station Failure

The wireless equipment continues to use its preferred base station for connection to the Internet. All services remain available and are not interrupted. The meshed network’s fault management function issues an X.733 compliant [E1] alarm on its northbound interface. The table below details the fault handling cases that arise in this scene.

Scene case

Root of problem

Fault detection (who, how?)

Impact (from view of detecting element)

Way of restoring

Repair action Severity of raised alarm

Post condition (after alarm has been raised)

7.1 Gateway BS g outage

Other BS (BS y, using BS g as gateway) realizes loss of communication

Loss of service for all WE connected via BS y)

Automatic (BS -> Alternative G-BS (or B-BS) -> NMF)

BS y contacts alternative gateway BS (or B-BS) and triggers reconfiguration

Minor Network reconfigured; but higher risk for loss of service and maybe reduced performance

7.2 Other G-BS (h) realizes loss of communication

No direct impact for connections via G-BS h, but loss of service for all connections via G-BS g

Automatic (G-BS -> NMF)

G-BS h triggers reconfiguration

Minor -“-

Table 8: Fault Cases for Failure of Gateway Base Station

Scene Challenge: Meshed network reports fault and adapts its backhaul connectivity to deal with gateway base station failure

Operator Perspective: The operator is informed by an alarm that a gateway base station has failed and that the management functionality has re-formed the wireless meshed network. S/he can view the report generated0.00cm by the management functionality and carry out some tests to ensure that all base stations are connected and that test network elements can connect

Page 22: Overview of Madeira case study and functional requirements

using the meshed network. The operator may then repair or replace the failed base station and carry out a wireless network re-formation as described in Scene 4 to bring that base station back into the meshed network. The management functionality will then cease the alarm.

Management Issues: Is there correlation of faults sent to a higher layer?

When multiple alarms are generated, where are they correlated, and what level of fault information is presented to which actors within the system?

Arbitrary combinations and multiplicities of considered faults are possible, how will the management functionality distinguish between them?

Reconfiguration events (CM) and alarm events (FM) must be distinguished; these events are strongly interrelated. Alarms may trigger reconfigurations and reconfigurations may trigger alarms. How are these events correlated by the management functionality in network elements and management entities?

How is topology related information represented in a dynamic and distributed network, and how is topology information reported in alarms?

The assumption is that automatic reconfiguration works as expected and is fast enough to ensure no interruption of service. Must we also consider cases where reconfiguration fails and/or reconfiguration delays cause loss of service?

Domain Specific Issues: None Identified

4.8 Scene 8: All Bridging Base Stations for a Base Station Fail

In this scene, all bridging base stations for a base station fail. In this case, the meshed network cannot be reconfigured to re-establish connectivity for that base station.

Page 23: Overview of Madeira case study and functional requirements

Figure 9: Failure of all Bridging Base Stations for a Base Station

The wireless equipment has lost its preferred and alternative connections to the Internet and therefore has lost access to its services. The wireless equipment must connect to an alternative network to re-establish its services. The meshed network’s fault management function issues an X.733 compliant [E1] alarm on its northbound interface. The table below details the fault handling cases that arise in this scene.

Scene case

Root of problem

Fault detection (who, how?)

Impact (from view of detecting element)

Way of restoring

Repair action Severity of raised alarm

Post condition (after alarm has been raised)

8.1 Multiple B-BS outage (all for BS x)

BS x Loss of service for all WE connected via BS x)

Indeterminate; BS -> WE -> indication for end user

Instruction for end user (retry, move to other BS, contact administrator, etc.)

Major3 End user informed; operator not yet (but see below)

8.2 Other BS/G-BS (BS z, used by any of the failing B-BSs for bridging) realizes loss of communication

No direct impact for connections via BS z, but loss of service for all connections via the failing B-BS(s)

Manual (BS -> NMF -> operator)

BS triggers reconfiguration but this fails; the operator has to ensure that connectivity of BS x is being restored

Major The operator has been informed and has all the information needed to solve the problem

Table 9: Fault Cases for Failure of All Bridging Base Stations for a Base Station

Scene Challenge: Meshed network reports service failure fault

3 It may be, from the viewpoint of affected BS, that this fault cannot be distinguished from the fault described in Scene 9.1.

Page 24: Overview of Madeira case study and functional requirements

Operator Perspective: The operator is informed by an alarm that all the bridging base stations for a base station has failed and that network elements connected to that base station have lost service S/he can repair or replace the failed base stations and carry out a wireless network re-formation as described in Scene 4 to bring those base stations back into the meshed network. The management functionality will then cease the alarm.

Management Issues: Where is the error report sent?

Where is the management station in this context?

When multiple alarms are generated, where are they correlated, and what level of fault information is presented to which actors within the system?

In the case where one or more isolated parts, “islands”, of the meshed network arise, how will the management functionality adjust and coordinate the restoration process and synchronize the fault management information after the overall network connectivity has been re-established?

Arbitrary combinations and multiplicities of considered faults are possible, how will the management functionality distinguish between them?

Reconfiguration events (CM) and alarm events (FM) must be distinguished; these events are strongly interrelated. Alarms may trigger reconfigurations and reconfigurations may trigger alarms. How are these events correlated by the management functionality in network elements and management entities?

How is topology related information represented in a dynamic and distributed network, and how is topology information reported in alarms?

The assumption is that automatic reconfiguration works as expected and is fast enough to ensure no interruption of service. Must we also consider cases where reconfiguration fails and/or reconfiguration delays cause loss of service?

Domain Specific Issues: None Identified

4.9 Scene 9: All Gateway Base Stations Fail

In this scene, all gateway base stations fail. In this case, the meshed network cannot be reconfigured to re-establish connectivity.

Page 25: Overview of Madeira case study and functional requirements

Figure 10: Failure of all Gateway Base Stations

The wireless equipment has lost its preferred and alternative connections to the Internet and therefore has lost access to its services. The wireless equipment must connect to an alternative network to re-establish its services. The meshed network’s fault management function issues an X.733 compliant [E1] alarm on its northbound interface. The table below details the fault handling cases that arise in this scene.

Scene case

Root of problem

Fault detection (who, how?)

Impact (from view of detecting element)

Way of restoring

Repair action Severity of raised alarm

Post condition (after alarm has been raised)

9.1 All G-BSs outage (complete)

Any BS x Loss of service for all WE connected via BS x)

Indeterminate; BS -> WE -> indication for end user

Instruction for end user (retry, contact administrator, etc.)

Critical End user informed; operator not yet (but see below)

9.2 Network management element (close to operator)

Loss of service for all WE connected via all BSs

Manual (NMF -> operator)

the operator has to ensure that function and/or connectivity of G-BSs is restored

Critical The operator has been informed and has all the information needed to solve the problem

9.3 All G-BSs outage (only wireless connectivity lost)

G-BS g Loss of service for all connections via G-BS g

Manual (G-BS -> NMF)

G-BS triggers reconfiguration but this fails; the operator has to ensure that function and/or connectivity of G-BSs is restored

Critical The operator has been informed and has all the information needed to solve the problem

Table 10: Fault Cases for Failure of All Gateway Base Stations

Page 26: Overview of Madeira case study and functional requirements

Scene Challenge: Meshed network reports service failure fault

Operator Perspective: The operator is informed by an alarm that all the gateway base stations for a meshed network has failed and that all network elements connected to the meshed network have lost service S/he can repair or replace the failed base stations and carry out a wireless network formation as described in Scene 2 to bring the meshed network back into service. The management functionality will then cease the alarm.

Management Issues: When multiple alarms are generated, where are they correlated, and what level of fault information is presented to which actors within the system?

In the case where one or more isolated parts, “islands”, of the meshed network arise, how will the management functionality adjust and coordinate the restoration process and synchronize the fault management information after the overall network connectivity has been re-established?

Arbitrary combinations and multiplicities of considered faults are possible, how will the management functionality distinguish between them?

Reconfiguration events (CM) and alarm events (FM) must be distinguished; these events are strongly interrelated. Alarms may trigger reconfigurations and reconfigurations may trigger alarms. How are these events correlated by the management functionality in network elements and management entities?

How is topology related information represented in a dynamic and distributed network, and how is topology information reported in alarms?

The assumption is that automatic reconfiguration works as expected and is fast enough to ensure no interruption of service. Must we also consider cases where reconfiguration fails and/or reconfiguration delays cause loss of service?

Domain Specific Issues: None Identified

4.10 Scene 10: Wireless Meshed Network Load Balancing

During the course of the day, one TWIT participant had been giving some interesting demonstrations. That user is attracting a lot of attention from other participants and a crowd is gradually forming around him. The area around him requires more capacity to accommodate the newcomers. The management functionality determines that the load on the base stations in that location is critical. The management functionality reconfigures the meshed network so that network elements using a critically loaded base station as their preferred base station are disconnected from that base station and reconnected to a less heavily loaded alternative base station.

The meshed network’s fault management function may issue an X.733 compliant [E1] alarm on its northbound interface. The table below details the fault handling cases that arise in this scene.

Scene case

Root of problem

Fault detection (who, how?)

Impact (from view of detecting element)

Way of restoring

Repair action Severity of raised alarm

Post condition (after alarm has been raised)

10.1 BS critically loaded

WE / NE (observes performance degradation below defined threshold)

Severe degradation of service

Automatic (WE -> alt. BS -> NMF)

WE contacts alternative BS and triggers reconfiguration

Warning Network reconfigured: secondary gets new primary path; new secondary path has been established

10.2 BS x (load Severe Automatic (BS BS x triggers Warning -“-

Page 27: Overview of Madeira case study and functional requirements

Scene case

Root of problem

Fault detection (who, how?)

Impact (from view of detecting element)

Way of restoring

Repair action Severity of raised alarm

Post condition (after alarm has been raised)

exceeds a defined threshold)

degradation of service for all WE connected via BS x)

-> NMF) reconfiguration

10.3 Other BS (BS y, using BS x for bridging, observes performance degradation below defined threshold)

Severe degradation of service for all WE connected via BS y)

Automatic (BS -> Alternative B-BS -> NMF)

BS y contacts alternative B-BS and triggers reconfiguration

Warning -“-

10.4 -“- -“- -“- -“- -“- Minor Network reconfigured: secondary gets new primary path; but no new secondary path with adequate performance is available

Table 11: Fault Cases for Wireless Mesh Load Balancing

Scene Challenge: Meshed network adapts to utilize extra resources

Operator Perspective: The operator is informed (by an alarm or event?) that the management functionality has re-formed the wireless meshed network in order to balance the network load. S/he can view the report generated by the management functionality and carry out some tests to ensure that all base stations are connected and that test network elements can connect using the meshed network.

Management Issues: Should a warning alarm (or an event) be issued in this case?

How are the network elements connected to the critically loaded base stations forced onto alternative base stations?

Will the task of forcing the network elements to an alternative base station result in temporary loss of services?

Is some of the management functionality for realising this scene deployed on network elements?

Is an alarm with severity warning raised here, or is a notification of successful reconfiguration issued? This depends on the operating principles being used; does such a situation require a possible intervention or not?

Domain Specific Issues: IP addressing and routing

Wireless channel allocation

Page 28: Overview of Madeira case study and functional requirements

5 Scenario Realization using Traditional Approaches

Management of the Meshed Network scenario presented in the previous chapter poses many challenges if a traditional approach is adapted. The current generation of management architectures are hierarchical.

Figure 11:Traditional Approach to Meshed Network Management

Fault management is event based, with fault events being propagated upwards from network elements using SNMP traps, CORBA notifications, or even spontaneous text-based printouts. These events are filtered, aggregated, and correlated as they percolate upwards through the management layers, with filtration, aggregation, and correlation being applied using static topology data stored in management systems.

Configuration management of connectivity is file based, with data files being generated manually or by off-line planning tools, again using static topology data stored in management systems. These files are “run” on network elements, by the element managers or NMSs, thereby applying the configuration changes, and updating the topology data in the management systems. Configuration management for services is command-response based, with commands propagated down from service management to the network elements and responses going in the opposite direction. The service management system that issues service configuration requests to the network uses static topology data stored in the management systems to implement service provisioning.

The traditional approach as outlined above does not support two new networking concepts well. These being: a) Dynamic Change of Network Topology and b) Dynamic Change of Network Element Role. The following challenges for such a traditional management approach now arise.

1. It is very difficult for a traditional management system to dynamically reconfigure the network if an off-line file based mechanism is used for re-configuration of network connectivity

Page 29: Overview of Madeira case study and functional requirements

2. Traditional approaches assume that every network is always connected to the hierarchical management system; in a meshed network some of the elements may be disconnected from upper management layers during reconfigurations or for extended periods

3. Alarm correlation in currently deployed systems assumes that topology information is static; in a meshed network, the topology data is dynamic and time-based

4. Service provisioning assumes that the network capabilities are static; in a meshed network, the service capabilities of the network can change over time

5. Current configuration applications assume that the features offered by network elements is static or configurable by management request; in a network such as that described in the previous chapter, network element roles can change as the network connectivity changes

6. Traditional management approaches assume a static network topology, abstracted at different levels in the TMN, and changed in a controlled manner by pushing the network topology down into the network. Such an approach allows the management system to keep track and update its topology at each level on an ongoing basis.

The challenges above lead us to propose the Madeira approach as an alternative approach for managing such networks. The next chapter outlines how the Madeira approach is used to manage the scenario laid out in Chapter 4.

Page 30: Overview of Madeira case study and functional requirements

6 Scenario Realization using the Madeira Approach

This chapter will be written in a later version of this document when the Madeira approach to management has been defined and demonstrated.

Page 31: Overview of Madeira case study and functional requirements

7 Appendices

7.1 Appendix 1: Fault Management OverviewThe tables in Scenes 4.5 to 4.10 identify the fault handling cases for each of those scenes. A fault event may be detected by more than one component, so each root cause in the table for each scene may have more than one row in a table. The detector of a particular fault initiates an alarm for that fault, and will also initiate an automatic repair action if such a repair action is possible.

The tables list the following fields:

Scene case number

Root cause of the problem

Component that detected the fault

The impact of the fault as evaluated by the detecting component before network reconfiguration, without considering the result of any automatic self-restoration

Proposed Severity of the alarm being raised, taking into account the result of any network reconfiguration

Post condition after the alarm has been raised

The Network Management Function (NMF) mentioned in the tables should be considered as an abstract capability whose location is not defined. This function could be distributed in the meshed network or be a centralized external function.

The following information must be present in all alarms issued. The first five items correspond to the corresponding classifications from the tables in Scenes 4.5 to 4.10.

1. The identified root of the problem. This corresponds to X.733 attributes managedObjectClass (MOC), managedObjectInstance (MOI), and probableCause. This information is essential for correlating multiple alarms from multiple sources for the same problem.

2. The reporting MOC, MOI and/or details on alarm correlation in the case of consolidation of more than one alarm event.

3. Service impact of the fault if restoration is not performed automatically.

4. Repair action. This is important if restoration is not performed automatically.

5. Severity. This is required to allow alarm prioritization.

6. Event Type.

7. Event Time. There might be cases where more than one time stamp is required. Some cases are time of failure detection, time of a correlated event, and time of forwarding the alarm.

8. All information required for mapping the alarm to a topology view.

9. Information on how the alarm might be cleared; automatic clearance, or manual clearance required by operator after resolving problem.

All alarms can be considered as events with an inherent state, and have to be cleared, either automatically or manually by the operator. We have to consider the mechanism for alarm clearance for all cases. Our assumption here is that each alarm shall be cleared only if:

Page 32: Overview of Madeira case study and functional requirements

1. The original state of the network has been restored completely, i.e. the failed element has been repaired/replaced and re-configuration has been successful (triggering clearance of the original alarm)

2. The operator manually clears the alarm, which means a final acceptance of the changed configuration (note the impact of this on CM).