d3.2.2 design specification of communication solution for...

SUNSEED, Grant agreement No. 619437 of 82

D3.2.2 Design specification of

communication solution for the smart grid support

Deliverable report


NOTICE The research leading to the results presented in the document has received funding from the European Community's Seventh Framework Programme under Grant agreement number 619437. The content of this document reflects only the authors’ views. The European Commission is not liable for any use that may be made of the information contained herein. The contents of this document are the copyright of the SUNSEED consortium.


Document Information

1

PU Public

RP Restricted to other programme participants (including the Commission Services)

RE Restricted to a group specified by the consortium (including the Commission Services)

CO Confidential, only for members of the consortium (including the Commission Services)

Call identifier FP7-ICT-2013-11

Project acronym SUNSEED

Project full title Sustainable and robust networking for smart electricity distribution

Grant agreement number 619437

Deliverable number D3.2.2

WP / Task WP3 / T3.2

Type (distribution level)1 PU

Due date of deliverable 31.12.2016 (Month 35)

Date of delivery 30.12.2016

Status, Version 1.0

Number of pages 82 pages

Responsible person, Affiliation Ljupco Jorguseski, TNO

Authors Jimmy Jessen Nielsen, AAU

Ljupco Jorguseski, Haibin Zhang, Sylvie Dijkstra-Soudarissanane TNO

Varun Nair, TNO

Zhu Ziming, TREL

Rudolf Susnik, Luka Premelc, Milovan Stropnik, TS

Herve Ganem, GTOSA

Reviewers David Gustincic, TS

Christian Richter, Gemalto M2M

Design specification of communicationcommunicat solution for the smart grid support Version 1.0


Revision history Version Date Author(s) Notes Status

0.1 20.10.2016 L. Jorguseski ToC and suggestions per chapter/sections Done

0.5 29.11.2016 L. Jorguseski, Ziming Z., R. Susnik, J. Nielsen

Draft content of chapters Done

0.8 22.12.2016 L. Jorguseski and co-authors

Second review round Done

1.0 29.12.2016 L. Jorguseski, H. Ganem, R. Susnik

Final version Done



Table of Contents

Table of Figures.......................................................................................................................................... 7

Table of Acronyms ..................................................................................................................................... 9

SUNSEED project ...................................................................................................................................... 11

Executive Summary .................................................................................................................................. 12

1 Introduction ................................................................................................................................... 14

1.1 Approach and Relation to the Rest of the Project ................................................................. 14

1.2 Outline of This Report ............................................................................................................ 15

2 Analysing and reingeneering of access networks for smart grid support ..................................... 16

2.1 LTE capacity vs delay analysis.............................................................................................. 16

2.2 LTE access bottleneck analysis ............................................................................................ 28

2.3 Station regrouping for contention based IEEE 802.11ah wireless LAN ................................ 39

2.4 Ultra-reliable communication via multiple access networks .................................................. 46

3 Design criteria and approach for joint DSO and telecom operator network solution .................... 59

3.1 GIS based topology approach ............................................................................................... 59

3.2 Design of data and control planes ......................................................................................... 61

3.3 Design for secure data transfer and storage ......................................................................... 72

4 SUNSEED recommendations for joint DSO and telecom operator networking for smart grids ... 77

Appendix A: LTE and NB-IoT performance comparison for different environments ............................. 80



Table of Figures

Figure 1: Example NAN smart grid network provided via LTE cellular network. .................................................. 14 Figure 2: Illustration of random location generation for the WAMS and SM nodes ............................................ 17 Figure 3: Achievable 95% of the maximum delay of WAMS nodes (top) and SM nodes (bottom) for urban deployment scenario. The WAMS vs SM ratio is 1/3. ........................................................................................... 21 Figure 4 Different Operating Modes in NB-IoT (Copy & Paste from [11]) ............................................................ 23 Figure 5 Uplink and Downlink In-Band Operation ................................................................................................ 23 Figure 6 Half-Duplex FDD Type A and Type B ....................................................................................................... 23 Figure 7 Example mapping of transport block to Resource Unit, and uplink slots in NB-IoT ............................... 24 Figure 8 BLER vs SNR for (NREP = 1 ), 15 kHz single tone SC-FDMA Transmission (TBS = 776 Bits) ....................... 25 Figure 9 Allocation of NSC, NREP, NRU, and TBS for the NB-IoT analysis .................................................................. 26 Figure 10: 95% Maximum Delay for different number of WAMS and SM nodes communicating via NB-IoT in urban environment ............................................................................................................................................... 27 Figure 11: Simplified illustration of downlink and uplink subframe organization in a 1.4 MHz system. .............. 30 Figure 12: Message exchange between a smart meter and the eNodeB. ............................................................ 31 Figure 13: Flow diagram of LTE access reservation protocol: one-shot transmission model and full 𝑚-retransmissions model (dashed lines). ................................................................................................................. 33 Figure 14: Probability of outage in LTE with respect the number of M2M arrivals per second in a 1.4 MHz and 5 MHz system for different models, payloads and number of RAOs. ...................................................................... 35 Figure 15: Outage comparison for only ARP and data transmission (ARP + Data) and full message exchange (ARP + Signaling + Data). ................................................................................................................................................ 36 Figure 16: Outage comparison for different number of RAOs per frame in a 5 MHz system with a payload of 1 kbyte (ARP + Signaling + Data). ............................................................................................................................. 36 Figure 17: Outage comparison for different payload sizes and channel constraints in a 5 MHz system. ............ 37 Figure 18: Collecting hidden node information using the IEEE 802.11ah MAC. ................................................... 40 Figure 19: Example of a hidden node status table. ............................................................................................... 41 Figure 20: Example of a hidden traffic table (bottom) based on a given active levels of the STAs (top) and hidden node information as in Figure 19. ............................................................................................................. 41 Figure 21: The Viterbi concept used in the proposed regrouping algorithm. ....................................................... 42 Figure 22: An example of the network layout....................................................................................................... 43 Figure 23: Performance of the Viterbi-like STA regrouping algorithm (N=4). ...................................................... 44 Figure 24: Performance of the Viterbi-like STA regrouping algorithm (N=8). ...................................................... 45 Figure 25: Conceptual illustration of latency-reliability function.......................................................................... 46 Figure 26: Multiple paths between M2M device (left) and remote host (right). ................................................. 47 Figure 27: Transmission strategies, with 2-out-of-3 as example of 𝑘-out-of-𝑁. The time instant 𝜏 is when the payload can be successfully decoded. .................................................................................................................. 48 Figure 28: Packet size and RTT relationship for ping measurements in TS mobile network. ............................... 49 Figure 29: Latency-reliability curves 𝐹𝑖(𝑥, 𝐵) for all considered technologies for 𝐵 = 1500 bytes. ................... 50 Figure 30: CTMC model of states in the three interface system. Colors indicate the number of interfaces up/down as: Green: 3/0, yellow: 2/1, orange: 1/2, red: 0/3. An arrow represents a failure rate in the right direction and restoration rate in the left direction, e.g., 𝜆𝐶1 and 𝜇𝐶1 between states 1 and 2. ........................ 51 Figure 31: Reliability results for scenario 𝒜. ........................................................................................................ 52 Figure 32: Reliability results for scenario ℬ. ......................................................................................................... 53 Figure 33: Reliability results for scenario 𝒞. Note: the target latency 𝑙2 = 0.9 s only applies to the last strategy. .............................................................................................................................................................................. 54 Figure 34: Reliability results for scenario 𝒟. ......................................................................................................... 55 Figure 35: Reliability results for scenario ℰ, where we found 𝛾 = 0.55. .............................................................. 56 Figure 36: Efficiency results for scenario ℰ. .......................................................................................................... 57 Figure 37: The list of all implemented and some visible Infrastructure EP layers on the field trial location Kromberk .............................................................................................................................................................. 60



Figure 38: The list of all implemented and some visible Infrastructure TS layers on the field trial location Kromberk .............................................................................................................................................................. 60 Figure 39: Architecture of telecom operator's network. ...................................................................................... 61 Figure 40: Sunseed network topology with focus on core network and data center components. ..................... 63 Figure 41: OSI layer 3 details of mobile access network structure used in Sunseed architecture........................ 66 Figure 42: Connecting measurement devices through fixed access network. In such scenario DSL or FTTH modem is needed. ................................................................................................................................................ 67 Figure 43: Connecting DSO's network to telecom operator's network. ............................................................... 67 Figure 44: Satellite access network and connections on the client side (measurement location) and on the core side (PF Sense FW as a VPN endpoint and gateway to core network via OPT1 interface). .................................. 69 Figure 45: Nagios monitoring server's position within Sunseed network. ........................................................... 69 Figure 46: Entry page of Sunseed Nagios web GUI. One can observe sites which are alive (green ones) and those which are temporarily unavailable (red ones). ..................................................................................................... 70 Figure 47: Example of graph in which LTE signal strength is displayed. ............................................................... 71 Figure 48: General principle of connecting user or site into certain network domain usingVPN. ........................ 73 Figure 49: Starting point for generating one-time-password as part of credentials to establish client ............... 74 Figure 50: Overview of the network and application security for the Sunseed state estimation use case. ......... 75 Figure 51:. Comparison of 95

th Percentile Maximum Delay for Urban, Suburban and Rural Environments- 1PRBs

(LTE Random and Time Scheduler) ....................................................................................................................... 80 Figure 52: Comparison of 95

th Percentile Maximum Delay for Urban, Suburban and Rural Environments- 6 PRBs

(LTE Random and Time Scheduler) ....................................................................................................................... 80 Figure 53: Comparison of 95

th Percentile Maximum Delay for Urban, Suburban and Rural Environments – 20

PRBs (LTE Random and Time Scheduler) .............................................................................................................. 81 Figure 54: SINR Distribution of users in urban, suburban and rural environments. ............................................. 81 Figure 55: Comparison of 95

th Percentile maximum delay for NBIoT and LTE (1PRB) in the case of urban,

suburban and rural environments. ....................................................................................................................... 82



Table of Acronyms

Acronym Description 3GPP 3

rd Generation Partnership Project

AAA Authorization, Authentication and Accounting

AP Access Point

API Application Programming Interface

APN Access Point Name

ARP Access Reservation Protocol

BLER Block Error Rate

BPSK Binary Phase Shift Keying

BRAS Broadband Remote Access Server

CDF Cumulative Density Function

dB Decibel

DHCP Dynamic Host Configuration Protocol

DSO Distribution System Operator

DSL Digital Subscriber Line

DSLAM Digital Subscriber Line Access Multiplekser

EDGE Enhanced Data rates for Global Evolution (2nd

generation data mobile network)

FFA Fair Fixed Assignment

FTTH Fiber-To-The-Home

FW Firewall

GGSN Gateway GPRS Support Node

GIS Geographic Information System

GPRS General Packer Radio Network

GRE Generic Routing Encapsulation

GSM Global System for Mobile communications (2nd

generation mobile network)

GUI Graphical User Interface

IoT Internet of Things

IPsec IP security

ISP Internet Service Provider

LAN Local Area Network

LTE Long-Term Evolution (4th

generation mobile network)

M2M Machine-to-Machine

MAC Medium Access Control

MCS Modulation and Coding Scheme

MIMO Multiple Input Multiple Output

MPLS Multiprotocol Label Switching

MSAN Multi-Service Access Node

MTC Machine Type communication

MTTR Mean Time to Restoration

NAN Neighbourhood Area Network

NAT Network Address Translation



OFDM Orthogonal Frequency Division Multiplexing

OSI Open Systems Interconnection (standardization model proposed by ISO along with ITU-T)

P-GW Packet data network Gateway

PRACH Physical Random Access Channel

PDSCH Physical Downlink Shared Channel

PHICH Physical Hybrid-Automated-Repeat-Request Indication Channel

PHY Physical

PRAW Periodic Restricted Access Window

PRB Physical Resource Block

PSS Primary Synchronization Signal

PUSCH Physical Uplink Shared Channel

QoS Quality of Service

RAN Radio Access Network

RAO Random Access Opportunity

RAR Random Access Response

RAW Restricted Access Window

RB Resource Block

SM Smart Meter

SNMP Simple Network Management Protocol

SSS Secondary Synchronization Signal

STA Station

TIM Traffic Indication Map

TTI Transmit Time Interval

UE User Equipment

UMTS Universal Mobile Telecommunications System (3rd generation mobile network)

VPN Virtual Private Network

W3C World Wide Web Consortium

WAMS Wide Area Measurement System

WAMS-SPM WAMS with Synchro-Phasor Measurement capability

WAN Wide Area Network

WP Work Package



SUNSEED project

SUNSEED proposes an evolutionary approach to utilisation of already present communication networks from

both energy and telecom operators. These can be suitably connected to form a converged communication

infrastructure for future smart energy grids offering open services. Life cycle of such communication network

solutions consists of six steps: overlap, interconnect, interoperate, manage, plan and open. Joint

communication networking operations steps start with analysis of regional overlap of energy and

telecommunications operator infrastructures. Geographical overlap of energy and communications

infrastructures identifies vital DSO energy and support grid locations (e.g. distributed energy generators,

transformer substations, cabling, ducts) that are covered by both energy and telecom communication

networks. Coverage can be realised with known wireline (e.g. copper, fiber) or wireless and mobile (e.g. Wi-Fi,

4G) technologies. Interconnection assures end-2-end secure communication on the physical layer between

energy and telecom, whereas interoperation provides network visibility and reach of smart grid nodes from

both operator (utility) sides. Monitoring, control and management gathers measurement data from wide area

of sensors and smart meters and assures stable distributed energy grid operation by using novel intelligent real

time analytical knowledge discovery methods. For full utilisation of future network planning, we will integrate

various public databases. Applications build on open standards (W3C) with exposed application programming

interfaces (API) to 3rd parties enable creation of new businesses related to energy and communication sectors

(e.g. virtual power plant operators, energy services providers for optimizing home energy use) or enable public

wireless access points (APs) (e.g. Wi-Fi nodes at distributed energy generator locations). SUNSEED life cycle

steps promise much lower investments and total cost of ownership for future smart energy grids with dense

distributed energy generation and prosumer involvement.

Project Partners

1. TELEKOM SLOVENIJE, D.D.; TS; Slovenia

2. AALBORG UNIVERSITET; AAU; Denmark

3. ELEKTRO PRIMORSKA, PODJETJE ZA DISTRIBUCIJO ELEKTRICNE ENERGIJE, D.D.; EP; Slovenia

4. ELEKTROSERVISI, ENERGETIKA, MERILNI LABORATORIJ IN NEPREMICNINE, D.D.; ES; Slovenia

5. INSTITUT JOZEF STEFAN; JSI; Slovenia

6. GEMALTO SA; GTOSA; France

7. GEMALTO M2M GMBH; GTOM2M; Germany

8. NEDERLANDSE ORGANISATIE VOOR TOEGEPAST NATUURWETENSCHAPPELIJK ONDERZOEK-TNO; TNO; The

Netherlands

9. TOSHIBA RESEARCH EUROPE LIMITED; TREL; United Kingdom

Project webpage http://www.sunseed-fp7.eu/

http://www.sunseed-fp7.eu/



Executive Summary

This deliverable is presenting the final results from the SUNSEED research in communication networks for smart grid. The investigations focus on the wireless wide area networks for providing communication links to various smart grid nodes installed in the low/medium voltage electricity grid. The analysis of LTE wireless cellular network shows that LTE cellular systems can provide suitable networking capabilities for smart grid application of the future if the delay requirement is not strict (e.g. higher than few seconds). For strict end-to-end delay requirements (e.g. up to 1s) the analysis shows that up to few hundred (e.g. around 250) smart grid nodes can be supported per LTE cell with low number of reserved resources at the LTE cellular operator (e.g. 1 PRB). If this delay requirement is relaxed or the commercial conditions allow for higher number of reserved resources at the LTE cellular operator then up to few thousands (e.g. around 2000 for 6 PRB reserved) of smart grid nodes can be supported, which makes LTE suitable for practical deployments. It should be stressed that the LTE scheduler has to be time-based and flexibly allocate the available PRBs to end nodes in order to optimize the delay performance. Additions to the LTE system such as NB-IoT might be attractive due to low complexity and cost of the end-nodes, and yet the analysis show that from delay performance point of view the delay is increased roughly three times when compared to regular LTE communication. Therefore, NB-IoT deployments can be only used for non-delay critical applications in smart grids. The telecom LTE operator should consider tuning the inactivity timer of LTE connections, so that WAMS-SPM nodes are always connected. In many deployments this timer is set to around 10 s, meaning that if a WAMS-SPM is reporting every 10 s, it could in principle be forced to go through the access reservation protocol (ARP) each time. As the analysis showed, this procedure has large signaling overhead and the overall capacity of an LTE cell would decrease due to this. Setting the timer slightly higher, would result in the WAMS-SPMs being always connected. Maintaining the connectivity would require a slight overhead, however, this is negligible compared to the signaling required for the ARP. In the SUNSEED field trial the WAMS-SPMs are configured to send measurements every 1 s, and in this case the LTE connection will not time out, but instead let the device be always connected. The analysis of multi-interface communications showed that the reliability of real-time communications (e.g. for WAMS-SPM reports) can be increased by several orders of magnitude by transmitting simultaneously via different communication technologies. In cases where less capable communication technologies such as GPRS, EDGE, UMTS or HSDPA are used to complement for example a primary LTE link for increased reliability, a large payload (such as a WAMS-SPM measurement of ~800 bytes) may be impossible for those complementing technologies to deliver in time. For such cases, we showed that by encoding and splitting the payload across the available complementing interfaces, so that each interfaces was carrying a smaller amount of data than the original payload data, the packet transmission latency could be significantly reduced, while still adding some degree of redundancy and thereby increasing overall reliability. Besides the 3GPP cellular technologies that have been investigated and tested in SUNSEED, there are several other network technologies emerging, which are directly targeting the support of a large number of low capability devices, such as the WAMS device. Next to e.g. LTE cellular networks we also consider the need in terms of overprovisioning the infrastructure in order to provide extra capacity and reliability to the network. In this direction there is a positive view in the use of non-3GPP technologies for the smart grids, in particular, low-power local area and wide area networks including the 802.11ah, Sigfox, and LoRa, These technologies are designed for the emerging IoT applications with low cost of deployment. For instance, the DSO is able to deploy its own local area 802.11ah network for collecting data between the measurement nodes and the data concentrators



in dense areas such as blocks of apartments, or where the telecom’s LTE network capacity is not guaranteed. A LoRa network can also be deployed in a cost-effective way based on the telecom’s current cellular infrastructure. In short, we envisage that a heterogeneous network can achieve optimal performance for smart grid communications. Based on design plan and experiences gained through physical implementation of SUNSEED field test trial network the basic idea is unified logical network, which enables efficient real-time data exchange and real-time monitoring. Unification of the network could be also seen as a network convergence since heterogeneous access network technologies and topologies should be part of such unified network. To provide openness of the solution, standardized methods should be used, i.e. IP protocol as a basis of the joint network. The telecom operator can have either passive or active role. Example passive role was identified through satellite ISP in which certain ISP (or telecom operator) provides internet access only. In such scenario, core network is in control of DSO, while telecom operator provides only access technologies – various tunneling protocols based on IP could be used to help creating a unified logical network, eg. IPsec, GRE, APNs (within mobile network), etc. Multiple telecom operators could join such type of network. Having an active role, telecom operator controls core network. DSO’s network is typically enterprise-grade, while telecom operator’s network is carrier-grade which means that telecom operator would normally have to put some effort into supporting and adapting its network to certain specifics of DSO’s network which is from the joint network’s perspective just one of access technologies included among other access networks already operated by telecom operator. Having control over whole joint network enables telecom operator to manage the joint network more flexible and efficiently as many existing solutions could be used – like infrastructure and procedures regarding security, MPLS technology in core network, avoiding L3 tunneling protocols in favor of L2 mechanisms (less overhead) etc. Due to the flexibility, one telecom operator can manage multiple DSO networks within its network, while each DSO could have its own logical network entity. Integral part of joint network are security mechanisms which enable security of the network within telecom operator’s network and within DSO’s enterprise network as well. Regarding internal security within joint network it is necessary to define and implement to what an extent certain devices are allowed to communicate with other devices and systems within network (databases, monitoring system, human users etc.). Security should incorporate mechanisms of authentication and authorization in terms of accessing the network, being either a smart-meter device, a database or a human user retrieving some data.



1 Introduction

The scope of the networking research within the SUNSEED project is to investigate the convergence of the networking infrastructure at the telecom operator and the electricity distribution system operator (DSO) for the purpose of future smart grid application. The so-called neighbourhood access network (NAN), as illustrated in Figure 1, is seen as the most challenging part of this research question. This is due to the potentially large number of nodes attached to the electricity network at the low and medium voltage level that will communicate data with smart grid applications. Next to the large scale of nodes the heterogeneity of the deployable network technologies is also an issue due to the availability of networking infrastructure at particular geographical area, required performance, associated costs, etc.

Figure 1: Example NAN smart grid network provided via LTE cellular network.2

In this deliverable the networking research results from SUNSEED project is captured for the NAN part of the future smart grid, mostly related to wireless networks. This is due to the fact that currently the different wireless networks throughout Europe (especially cellular wireless networks such as GSM, UMTS, and LTE) provide wide coverage and are accessible in large portions of the geographical area where the electricity grids are located.

1.1 Approach and Relation to the Rest of the Project

This delievarble is presenting the final results from the SUNSEED research in WP3, Task 2 on communication networks for smart grid. The major investigations focus on the wireless wide area networks for providing communication links to various smart grid nodes installed in the low/medium voltage electricity grid. The requirements from WP2 are taken into account when evaluating the performance of the wirelesse access networks. Additonally, the experiences and the insgihts from the SUNSEED trial’s deployments are major input for the designing criteria of the communication networks chapter. The security work in WP3/Task 4 is complemented with the security approaches in

2 Note that instead of the LTE cellular network also GSM and UMTS wide area wireless networks can

be used as well as WiFi networks or fixed communication lines (e.g. xDSL, fiber optics, etc.).



the access network and on the transport link. The deliverable also gives the final conclusions and recommendations for communication network support of the future smart grid.

1.2 Outline of This Report

The remaining of the deliverable is organised as follows. Chapter 2 presents the analysis and recommendations for improvements of wireless networks (e.g. LTE and IEEE 802.11ah) for the support of smart grids. In Chapter 3 the design criteria are presented for the joint telecom operator and DSO network solution. Chapter Error! Reference source not found. illustrates the ommunication flow and the relevant communication protocols for the joint telecom and DSO network. The deliverable is finalized with the overall recommendations for joint telecom and electricity distribution operator network in Chapter 4.



2 Analysing and reingeneering of access networks for smart grid support

This chapter finalizes the SUNSEED’s research in the wireless access netoworks enabling the neighbourhood area network and wide area network for the smart grid. Section 2.1 and Section 2.1.5 focus on the capacity evaluations of LTE cellular systems for a desired end-to-end delay requirement as well as the access bottlenecks, respectively. The evaluation of improved station grouping for 802.11ah networks is presented in Section 2.3. The chapter is finalized with the study on ultra-reliable communications for smart grids in Section 2.4.

2.1 LTE capacity vs delay analysis

In this section, we investigate the end-to-end delay performance of WAMS and SMs, by taking into account the communication requirements and assumptions in [1]. The investigations aims at quantifying the end-to-end delay depending on the number of active nodes in the LTE cell, amount of LTE resources available for the smart grid traffic and the scheduling scheme deciding the transmission turns of the different WAMS and SM nodes. A snap-shot simulation approach is used with a number of nodess, denoted with NUE, randomly generated within a circle arround the central LTE site located at (0,0) as illustrated in Figure 2, for a cell range of 0.34 Km assuming urban deployment. The location of the nodes is uniformly distributed in the Cartesian domain, hence there is uniform density of the users in the circular region ( of radius equal to the cell range) around the central site at (0,0). Each site is assumed to have 3 sectors with an antenna pattern model based on [2]. The uplink signal to noise ratio (SINR) is calculated for each user with respect to the central site and each of the 6 neighbour sites (shown in green). The serving site/sector is then the one where the UE experience the maximum SINR. The users covered by the central site are shown in blue in Figure 2 while the users covered by the neighbouring sites are coloured with yellow. It should be noted that the users served by the central site and surrounding site are scattered within the central circle due to the randomness of the log-normal shadowing component included in the radio propagation modelling. Finally, only the users covered in a cell (i.e sector) are considered for the consequent scheduling process.



Figure 2: Illustration of random location generation for the WAMS and SM nodes

For an arbitrary i-th node, the achievable wideband signal-to-interference-plus-noise ratio sinri is calculated as follows:

𝑠𝑖𝑛𝑟𝑖 =𝑝𝑈𝐸𝐿𝑖𝐺𝑑𝑖𝑣

12𝑁𝑃𝑅𝐵(𝑛𝑡ℎ_𝑖 + 𝑖𝑢𝑙_𝑖)

(1)

where, pUE is the uplink transmission power of a given node (e.g. 23 dBm), Li is the propagation loss (including antenna gain, path-loss, shadowing, but not the multipath fading) for the particular location of the i-th node. Gdiv is the environment-specific macro-diversity gain [3]. NPRB denotes the (fixed) number of PRBs assigned per node, and correspondingly pUE/12NPRB is the transmit power level per allocated frequency subcarrier . nth_i and iul_i are the thermal noise (including noise figure at the receiver) and inter-cell interference experienced by the i-th node on the allocated PRBs, respectively. For simplicity, we assume a given uplink inter-cell interference level of iul_i = 3 dB for all the allocated PRBs. This is motivated by the fact that in the busy hours an operator typically plans and operates the LTE uplink with a desired uplink inter-cell interference target.

Based on the sinri, an appropriate Modulation and Coding Scheme (MCSi) can be selected for the i-th node, such that the corresponding BLock-Error Rate (BLER) is in a range of [0, 10%]. In our analysis we assume an average BLER of 5%. For the numerical evaluations the MCS selection is performed by checking the SINR vs BLER performance results derived via link-level simulations [4].

2.1.1 Random scheduling with fixed PRB allocation This LTE resource allocation approach follows the fair fixed assignment (FFA) as described in [7].

Let’s first denote with NPRB the fixed amount of PRBs allocated to the nodes.Then, from the achievable sinr in (1) a MCSi can be selected for the i-th node and the corresponding transport block size (TBSi) in bits can be derived according to the 3GPP specification [8]. Consequently, the number of TTIs Nr_TTIi needed by the i-th node to transmit the packet with size P in the case of no retransmission can be calculated as follows:



𝑁𝑟_𝑇𝑇𝐼𝑖 = ⌈𝑃

𝑇𝐵𝑆𝑖

⌉ (2)

A scheduling loop is executed advancing with one TTI step (i.e. 1 ms step) where at each scheduling turn the following steps are performed:

Step-1: Determine the number of PRBs which are free for initial transmission allocation, denoted as NPRB,free. Ideally, there will be the total number of PRBs of the cell free, denoted with NTOTAL,PRB (e.g. NTOTAL,PRB = 50 PRBs for a 10 MHz LTE carrier). However, an operator can also decide to reserve only a fraction of the whole LTE carrier for supporting smart grid nodes.3 From NPRB, free the resources claimed by the following WAMS and SM nodes have to be extracted:

o Nodes that were transmitting in previous TTI (if applicable) but need additional TTI to finish the transmission, see Step3-b.

o Nodes that need to re-transmit a packet (if applicable) that was originally transmitted at least 8 TTIs earlier [9] 4, see also Step-3a below.

Step-2: Select randomly from at most NPRB,free/NPRB nodes for initial transmission in the given TTI i.e. users that are not continuing their transmission from previous TTI or scheduled for retransmission. Note again that NPRB is the fixed number of PRBs that can be allocated for a single node. For each selected node, the following is performed:

o Reduce Nr_TTIi by one TTI as this node is scheduled for initial transmission. If Nr_TTIi is larger than zero than this node is scheduled for transmission also in the next TTI.

o If the packet is erroneously received, assuming this will happen in average for 5% of the transmissions, schedule this node for retransmission at earliest 8 TTIs further ahead [9]. Otherwise set the delay to the current TTI.

Step-3a: Select randomly from nodes which are scheduled for retransmission. For each of these nodes, the following is performed (no reduction of Nr_TTIi as this is a retransmission):

o If the packet is erroneously received, as this will happen in average for 5% of the transmissions, schedule this node for retransmission at earliest 8 TTIs further ahead and set the UE delay to the current TTI plus 8 ms. Otherwise set the UE delay to the current TTI.

Step-3b: Allocation of NPRB resources to nodes that are continuing its transmission from previous (if applicable) TTI. For each selected node, the following is performed:

o Reduce Nr_TTIi by one as this node is scheduled for transmission. If Nr_TTIi is larger than zero than this node is scheduled for transmission also in the next TTI.

o If the packet is erroneously received, assuming this will happen in average for 5% of the transmissions, schedule this node for retransmission at 8 TTIs further ahead. Otherwise set the delay to the current TTI.

The scheduling loop is stopped when all nodes have sent their packets including retransmissions.

2.1.2 Time-based scheduling with flexible PRB allocation This LTE resource allocation scheme utilizes flexible NPRB,i allocation per node, meaning that as many LTE PRBs as possible will be allocated for the i-th node in order to transmit its remaining data Di.

3 For example, if an operator would allocate 10% of a 10 MHz LTE carrier for supporting smart grid traffic then NTOTAL, PRB =

0.1x50 = 5 PRBs. 4 In LTE the minimum time needed for a transmitter to realize its previous transmission is erroneously received and needs to be

re-transmitted is 8 TTIs.



Note that at the initial transmission of the i-th node the Di equals the size of the measurement report that needs to be transmitted.

Denote with NPRB, free the number of PRBs that can be allocated in total per TTI, excluding the PRBs needed (if applicable) for retransmissions. Note that this definition deviates from the NPRB,free definition in the previous section. In this allocation approach it is not enforced that a node transmitting in previous TTI and having unfinished data transmission will continue in the subsequent TTIs until it finishes its data packet transmission.

The flexible allocation per node NPRB,i also results in flexible 𝑇𝐵𝑆𝑁𝑃𝑅𝐵,𝑖 per node per TTI. The

𝑇𝐵𝑆𝑁𝑃𝑅𝐵 ,𝑖 is again derived according to the 3GPP specification [8]. Note that as NPRB,i increases from

(1) the sinri decreases (as pUE remains constant). Consequently, depending on the radio conditions (i.e. propagation loss and uplink interference) allocating NPRB,i,MAX + 1 PRBs to the node could decrease its sinri to a level where even with most robust modulation and coding no data will be correctly received at the base station. Therefore, the node might be able to only utilize NPRB,i,MAX ≤ N

PRB, free resources.

The maximum TBS per TTI for given NPRB, free and i-th node, denoted with TBSMAX,i is defined as:

𝑇𝐵𝑆𝑀𝐴𝑋,𝑖 = max𝑙=1..min {𝑁𝑃𝑅𝐵,𝑓𝑟𝑒𝑒,𝑁𝑃𝑅𝐵,𝑖,𝑀𝐴𝑋}

{𝑇𝐵𝑆𝑙,𝑖}

The TBSMAX,i is then used for calculating the ‘time-based’ scheduling parameter Ki for each node and at each TTI as follows:

𝐾𝑖 =𝐷𝑖

𝑇𝐵𝑆𝑀𝐴𝑋,𝑖+ 𝑛𝑟_𝑇𝑇𝐼𝑖

Here, nr_TTIi is number of TTI that the i-th’ node is waiting to be scheduled (at the transmission initiation the nr_TTIi equals zero). Consequently, the definition of the scheduling loop is as follows: Step-1: For each TTI select all nodes that have data to transmit (i.e. Di > 0), and calculate their

scheduling parameter Ki, as illustrated above.

o Order the nodes that have data to transmit according to Ki in descending order i.e. the UE with largest Ki is first then followed by the UE with second-largest Ki etc.

Step – 2: Select the node with the largest Ki for transmission at this TTI:

o If Di ≥ TBSMAX,i then reduce Di by TBSMAX,i. Otherwise, select the minimum 𝑇𝐵𝑆𝑁𝑃𝑅𝐵 ,𝑖 ≥ Di and reduce Di to zero. Allocate the corresponding NPRB,i to the node. This is in order to avoid unnecessary allocating more PRBs than needed to transmit the remaining data.

o Remove the scheduled i-th node from the list of ordered nodes for scheduling in this TTI.

o Reduce the NPRB,free with the allocated NPRB,i to the node. If NPRB,free ≥ 1 then recalculate the scheduling parameter Ki with the new value for NPRB,free and start with Step-2 again. Otherwise, proceed with Step-3.

Step – 3: If there is a node with Di > 0 then proceed with the next TTI and go to Step-1. Otherwise, stop.

2.1.3 Numerical results and analysis In this section the results from snapshot simulations are illustrated from an LTE cell populated with randomly placed SM and WAMS nodes as explained in Figure 2. For each ‘snap-shot’ the scheduling procedure from Section 2.1.1 or Section 2.1.2 is executed and the delay in number of TTIs is determined for sending the packets generated by the individual SM and WAMS nodes. The ‘snap-shot’ realizations are repeated many times and in each realization the individual packet delays are



collected forming a statistical set that is used to derive the maximum delay per snapshot. Then, the 95% maximum delay bound is quantified over all snapshot i.e. the 95% bound of the maximum delay is taken as such that 95% of all generated snapshots have maximum delay that is smaller or equal than this bound. The 95% maximum delay bound can be used to quantify at what time interval all the measurements from the installed WAMS and SM nodes can be collected and made available to the smart-grid estimation application.

In each TTI the packets from the WAMS nodes are prioritized over the SM nodes. This way reflects the more importance of WAMS nodes in smart grids, and the (to some extent) stricter delay requirement for WAMS nodes [5] relative to the SM nodes [6]. Is is also assumed that all active WAMS or SM nodes would like to trigger uplink measurement transmissions uniformly distributed within 1s interval. This is in order to model the situation in practice when all uplink transmissions from the smart grid nodes does not occur synchronously at the same time instant. The same approach is applicable for other possible time intervals.

The details about the radio propagation modelling and network configuration for the LTE cellular deployment in an urban scenario the reader is referred to [10].

The results from the snapshot simulations are for the random scheduler with fixed 1 PRB allocated per node and the time-based scheduler with flexible PRB allocation. The number of LTE PRB resources available for the smart grid traffic NPRB,free equals 1, 6, 10, or 20 PRBs modeling up to 40% reservation for a 10 MHz LTE carrier. The 95% bound for the maximum delay is presented in Figure 3 with the two LTE resource allocation approaches labeled as ‘Random Scheduler’ and ‘Time Scheduler’. The following conclusions are observed:

For both schedulers, the 95% maximum delay of WAMS and SM nodes increases with the traffic load, as expected. An exception is the 95% maximum delay for WAMS nodes in case of 20 PRBs where the 95% maximum delay bound is hardly increasing up to 5000 nodes per LTE cell due to the sufficient amount of LTE resources to accommodate the generated traffic.

For both schedulers, the 95% maximum delay of WAMS and SM nodes decreases as NPRB,free is increased from 1 to 20 PRBs, as expected. The ‘Time Scheduler’ clearly outperforms the ‘Random Scheduler as expected. Therefore, time-based scheduling combined with flexible PRB allocation is strongly recommended for supporting smart-grid state estimation applications.

The envisaged scenario in terms of number of installed nodes per LTE cell the LTE cellular operator can decide the amount of reserved PRBs for smart grid traffic given the 95% maximum delay bound from the smart grid operator. For example, if it is important that the 95% maximum delay is below 1s for both WAMS and SM the operator should reserve only 1 PRB if it is envisaged that the number of nodes per LTE cell will not be higher than e.g. 250. Alternatively, if higher number of nodes are expected per LTE cell then increasing the number of reserved PRBs to 6 can accommodate up to 2000 WAMS or SM nodes.



Figure 3: Achievable 95% of the maximum delay of WAMS nodes (top) and SM nodes (bottom) for urban deployment scenario. The WAMS vs SM ratio is 1/3.

The difference of the LTE delay performance in case of different deployment environments such as suburban and rural are not significant. For more details the reader is referred to Appendix A (see also Figure 51 to Figure 53).

2.1.4 Conclusions and Recommendations This study investigates the uplink delay performance of an LTE system supporting packet transmission from SM and WAMS nodes in Smart Grids application. The analysis presented in this study leads to the following conclusions:

a) The scheduling combined with fixed or flexible PRB allocations per node has significant impact on the 95% maximum delay LTE performance for both SM and WAMS nodes. A time-based scheduling and flexible PRB allocation per node is recommended as it



significantly improves the uplink delay performance when compared to fixed PRB allocation and random scheduling.

b) The 95% maximum uplink delay of 1s for measurement reports from WAMS and SM nodes is difficult to achieve for high number of nodes (e.g. higher than 250) without reserving more than 1 PRB of the LTE carrier. Note that reserving considerable percentage of the LTE carrier resources for the smart grid traffic might be too costly for the cellular network operator. Consequently, if the 95% maximum delay is relaxed from e.g. 1 to 5 s from Figure 3 it can be concluded that with 1 PRB the LTE cell can accommodate up to 2000 nodes

c) Given the result in Figure 3 and the knowledge of the desired 95% maximum delay as well as the expected number of supported WAMS and SM nodes per cell the cellular network operator has sufficient input to plan the amount of needed PRBs to be reserved within a LTE carrier.

2.1.5 NB-IoT extension in LTE and its applicability to smart grids The Release 13 standardisation in 3GPP has defined the so-called Narrow Band – Internet of Things (NB-IoT) technology for addressing the Internet of Things (IoT) deployments. The NB-IoT technology is primarily designed for low data rates in downlink/uplink (e.g. up to 250 Kbps), and low cost terminals with very efficient battery usage of say more than 10 years. Although the battery usage at the terminals is not applicable for smart grid measurement nodes (i.e. as they typically have access to a power supply) the scope of this section is to quantify the achieveable end-to-end delay if the smart grid nodes communicate via the NB-IoT technology. Based on the achievable end-to-end delay it can be further decided what kind of smart grid applications as well as how many smart grids nodes per LTE cell can be supported. Finally, these insights are needed to decide if the NB-IoT technology in LTE is suitable for supporting the smart grids in the future.

2.1.5.1 Brief NB-IoT Technology Specification

The system bandwidth for a NB-IoT is (a multiple of) 180 kHz and is driven by two key reasons:

a) Narrow bandwidth reduces the complexity of the receiver electronics and thus reduces the

device cost.

b) It allows for easy re-fitting with LTE and GSM networks.

The LTE deployment has two options: as a single PRB (180 kHz) within an LTE Carrier or the so-called

In-Band Operation; or within the guard band of the LTE Carrier or the so-called Guard Band

Operation. Additionally, the NB-IoT carrier can be defined separately from the LTE carrier (with 20

kHz guard band) or the so-called Stand-Alone Operation, which allows for easy co-existence with

e.g. GSM 200 KHz carriers. This is illustrated in Figure 4. The time-frequency NB-IoT transmission in

downlink and uplink for the NB-IOT in-band operation is presented in Figure 5.



Figure 4 Different Operating Modes in NB-IoT (Copy & Paste from [11])

Figure 5 Uplink and Downlink In-Band Operation

For low-cost (and simple) terminals as well as for energy saving purposes the NB-IoT technology supports only half-duplex FDD operation which means that the UE cannot receive and transmit simultaneously. 3GPP defines two types of Half-Duplex FDD Operation [12] see Figure 6:

- Type A: Guard Period created by not receiving the last part of the DL subframe immediately preceding a UL subframe.

- Type B: Guard Period created by not receiving a DL subframe immediately before and after an UL subframe

Figure 6 Half-Duplex FDD Type A and Type B

The NB-IoT uplink transmission uses SC-FDMA, equally as in LTE, with an allocated resource grid of

NSC subcarriers transmitted over Nsymb symbols. The combination of NSC = 7 and Nsymb = 1, 3, 6 or 12

is called an uplink slot, with normal cyclic prefix. The selection of NSC is influenced by the targeted

coverage requirement (i.e. maximum link budget). NB-IoT supports two types of subcarrier spacing

options in the uplink - 3.75 kHz and 15 kHz. The uplink slot duration shall vary depending on this

spacing. As expected, the slot duration for 3.75 kHz spacing will be 4 times as that for 15 kHz. Hence,

this will be an important consideration when minimising delay.



A Resource Unit (RU) in NB-IoT is a combination of the number of subcarriers allocated and the

number of UL slots and is the smallest unit to which a transport block can be mapped, as presented

in Figure 7. It is analogous to the term “Resource Block” (RB) in LTE, except that multiple RUs extend

in time domain rather than in the frequency domain in the case of LTE.

Figure 7 Example mapping of transport block to Resource Unit, and uplink slots in NB-IoT

Each RU may be repeated NREP times (i.e containing the same data), before the channel coding

process. The objective of the repetitions is to increase coverage. The effective number of consecutive

UL slots for NB-IoT transmission is then N = NREP*NRU*Nslots. Maximum of 10 RUs can be allocated at a

time and regarding repetitions, a maximum of 128 repetitions are allowed.

For single subcarrier operation, NB-IoT supports only π/2 BPSK and π/4 QPSK while for multiple subcarrier transmission QPSK is supported. Based on the number of RUs and the achievable modulation and coding the corresponding Transport Block Size index (ITBS) can be determined. The ITBS is specified in [8] and illustrated in Table 1 and it has a maximum of 1000 bits.

Table 1 Transport Block Size Reference Table from [8] and Table 16.5.1.2-2



2.1.5.2 Modeling the NB-IoT Uplink Resource Allocation

This section describes the NB-IoT resource allocation model in the uplink, based on the NB-IoT technology specifics as presented in Section 2.1.5.1, for the purpose of simulation evaluations in Section 2.1.5.3. In [13], link level simulations were done for 15 kHz single carrier M-PUSCH transmission and the Block Error Rate (BLER) performance is shown in Figure 8 for different modulation and coding schemes. For the number of sub-carriers higher than one, since non-differential QPSK is used, we add a gain of 2 dB to the existing link level simulation. This is a rough estimate based on existing studies [14][15]. In [16] repetitions as coverage enhancement techniques were studied , for Machine Type Communications (MTC) and the Extended Pedestrian A channel. Based on the results in [16] we can conclude roughly that doubling the repetitions may provide up to 3 dB gain. Repetitions follow the rule of diminishing returns i.e the gain decreases with increase in repetitions. For example, for up to 16 repetitions we assume a gain of 3 dB for every doubling of repetitions, while for 32, 64 and 128 repetitions, we add 2, 1 and 0.5 dB respectively.

Figure 8 BLER vs SNR for (NREP = 1 ), 15 kHz single tone SC-FDMA Transmission (TBS = 776 Bits)

The overall allocation model is presented in Figure 9. Based on the node location in the simulated cellular system its serving cell is determined based on the maximal uplink SINR for NSC = 12, as coverage is of less importance than the end-to-end delay. If the achievable uplink SINR is lower than e.g. -22.4 dB then the UE is removed from the simulations as it is considered out-of-coverage. All the nodes served by one of the central LTE cells are then considered for scheduling and determination of the end-to-end delay. In order to minimize this delay the highest modulation with lowest coding rate and number of repetitions is selected while still satisfying 10% BLER requirement. This determines the NRU, NREP, and TBS for the particular user that will be applied at its transmission turn as decided by the LTE scheduler.



Figure 9 Allocation of NSC, NREP, NRU, and TBS for the NB-IoT analysis

2.1.5.3 NB-IoT numerical results and analysis

In order to quantify the end-to-end delay for the NB-IoT technology the same snap-shot simulation approach was used as in Section 2.1.3 with time-based scheduler as described in Section 2.1.2 and 1 PRB reseverd for NB-IoT transmissions. The resulting 95% delay performance for different number of WAMS and SM nodes per LTE cell is presented in Figure 10. It can be observed that, as expected the delay increases with increasing the number of nodes supported by the LTE cell and that the WAMS nodes have lower delays due to their higher priority when scheduled for resources.



When we compare the achievable delays via NB-IoT in Figure 10 with the delays observed in Figure 3 for the regular LTE communication with 1 PRB reserved (i.e. to enable fare comparison with respect to reserved resources) and the time-based scheduler it can be concluded that NB-IoT technology has significantly higher delay performance (e.g.roughly 3 times higher delay). Alternatively, for the same delay performance roughly 3 times lower number of users can be served with NB-IoT when compared to regular LTE.

Figure 10: 95% Maximum Delay for different number of WAMS and SM nodes communicating via NB-IoT in urban environment

The main reasons for this larger delays withwith NB-IoT transmissions are as follows:

a) NB-IoT was designed for coverage and energy efficient communication, while low delay was not design goal.

b) The uplink NB-IoT transmissions utilize only BPSK and QPSK modulation whereas upto 64-QAM is supported in LTE uplink . As a result, the maximum Transport Block Size (TBS) that can be transmitted in a single subframe (or single resource unit in case of 12 subcarriers with 15 kHz subspacing) for the case of 1 PRB allocation is 208 bits for NB-IoT vs 712 bits for LTE. As a result, the achievable throughputs for users in good coverage conditions are much lower which leads to a higher transmission time.

c) For the cell-range (0.34 Km) considered in the simulations, it was observed from the coverage analysis that about 98% users of LTE are covered, which means that there are roughly 2% users which have SINRs lower than the minimum required for the lowest modulation and coding scheme for LTE. However, exactly these users were covered by NB-IoT (e.g. in the considered simulation scenario there was 100% coverage with NB-IoT) as the required SINR for successful NB-IoT communication is drastically reduced with the repetitions. As a consequence this leads to an increase in the overall delay for those users, and indirectly an increase in the waiting time of other users.

The difference of the NB-IoT delay performance (also with respect to LTE) in case of different deployment environments such as suburban and rural are not significant. For more details the reader is referred to Appendix A (see also Figure 55).



2.1.6 References [1] FP7 SUNSEED, Deliverable D2.1.1, “Preliminary requirements and architectures for DSO-telecom converged communication

networks in dense DEG smart energy grid networks”, July, 2014.

[2] F. Gunnarsson et al., “Downtilted Base Station Antennas – A Simulation Model Proposal and Impact on HSPA and LTE Performance”, IEEE Vehicular Technology Conference, Calgary, Fall 2008.

[3] R. Litjens, Y. Toh, H. Zhang and O. Blume, ‘Assessment of the Energy Efficiency Enhancement of Future Mobile Networks’, Proceedings of WCNC ‘14, Istanbul, Turkey, 2014.

[4] Motorola, 3GPP contribution R1-081638, TBS and MCS Signaling and Tables, 2008.

[5] Yuzhe Xu; Fischione, C., "Real-time scheduling in LTE for smart grids," Communications Control and Signal Processing (ISCCSP), 2012 5th International Symposium on, May 2012

[6] IEC 61850-5: Communication networks and systems for power utility automation – Part 5: Communication requirements for functions and device models, January 2013

[7] D.C. Dimitrova, J.L. van den Berg, G.Heijenk, R. Litjens, “LTE uplink schedulink-flow level analysis”, Multiple Access Communications

Lecture Notes in Computer Science Volume 6886, 2011

[8] 3GPP TS 36.213, Evolved Universal Terrestrial Radio Access (E-UTRA); Physical layer procedures.

[9] 3GPP TR 36.912 version 12.0.0, Feasibility study for Further Advancements for E-UTRA (LTE-Advanced), September 2014.

[10] L. Jorguseski, et.al., “LTE Delay Assessment for real-time Management of Future Smart-Grids”, 1st EAI SmartGIFT Conference, Liverpool, UK, May 19-20 2016.

[11] Y.P. Eric Wang, et.al., “A Primer on 3GPP Narrowband Internet of Things (NB-IoT)”, available from https://arxiv.org/ftp/arxiv/papers/1606/1606.04171.pdf.

[12] 3GPP 36.211, “XYZ”,

[13] R1-157421 - NB-IoT - Performance of 15 kHz subcarrier spacing for NB-IoT uplink shared channel

[14] R1-160271 - NB-IoT - NB-PUSCH design

[15] R1-160272 - NB-IoT - Link performance of NB-PUSCH

[16] R1-150254 Coverage Enhancements Nokia

2.2 LTE access bottleneck analysis

For traditional mobile broadband uses, the capacity of an LTE network is usually limited by the resources available for data payload transmissions. However, when used for Machine-to-Machine (M2M) or machine-type communications (MTC), such as smart grid data traffic, where a large number of individual data packets are sent from many different devices in the network, the capacity of the different control channels may instead be a limiting factor. The goal of the analysis presented in this section is to identify the potential bottlenecks in LTE when used for MTC and to study different bottleneck scenarios through a proposed mathematical model and protocol simulations. The work is published in the journal article [1], and is an extension of the work in the article [13], which was presented in D3.2.1. In the following we provide an overview of the methodology and results, but refer to [1] for details about the mathematical model. As already outlined, the traffic profile generated by smart-grid monitoring devices is an example of MTC/M2M traffic, characterized by a sporadic transmissions of small amounts of data from a very large number of terminals. This is in sharp contrast with the bursty and high data-rate traffic patterns of the human-centered services. Another important difference is that smart grid services typically require a higher degree of network reliability and availability than the human-centered services [9]. So far, cellular access has been optimized to human-centered traffic and M2M related standardization efforts came into focus only recently [17]. Due to the sporadic, i.e., intermittent nature of M2M communications, it is typically assumed that M2M devices will have to establish the connection to the cellular access network every time they perform reporting. Usually, the inactivity timer in LTE is around 10 sec, meaning that after 10 sec of idle connectivity, the UE will have to go through the steps of the LTE access procedure to obtain a new connection. In the following description of this procedure it becomes apparent that connection establishment requires extensive signaling, both in the uplink and the downlink, and the total



amount of the signaling information that is exchanged may well outweigh the information contained in the data report. Moreover, the total number of resources available in the uplink and downlink is limited, and in the case of a massive number of M2M devices, the signaling traffic related to the establishment of many connections may pose a significant burden to the operation of the access protocol. Thus, it is of paramount importance to consider the whole procedure associated with the transmission of a data (report) in order to properly estimate the number of M2M devices that can be supported in the LTE access network.

2.2.1 LTE access procedure In this section, we first describe the organization of the LTE access resources and channel in the downlink and uplink. We then turn to the description of the connection establishment.

2.2.1.1 Downlink

The downlink resources in LTE in the case of frequency division duplexing (FDD) are divided into time-frequency units, where the smallest unit is denoted as a resource element (RE). Specifically, the time is divided in frames, where every frame has ten subframes, and each subframe is of duration 𝑡𝑠 = 1 ms. An illustration of a subframe is presented in Figure 11. Each subframe is composed in time by 14 OFDM modulated symbols, where the amount of bits of each symbol depends on the modulation used, which could be QPSK, 16QAM or 64QAM. The system bandwidth determines the number of frequency units available in each subframe, which is typically measured in resource blocks (RBs), where a RB is composed by 12 frequency units and 14 symbols, i.e., a total of 168 REs. The amount of RBs in the system varies from 6 RBs in 1.4 MHz system to 100 RBs in 20 MHz system. In the downlink, there are two main channels; these are the physical downlink control channel (PDCCH) and the physical downlink shared channel (PDSCH). The PDCCH carries the information about the signaling/data being transmitted on the current PDSCH and the information about the resources which the devices need to use for the physical uplink shared channel (PUSCH), as illustrated in Figure 11. Therefore, signaling and data messages consume resources both in the control and shared data channels. The PDCCH is composed by the first 𝑁CFI symbols in each subframe. This value is controlled by the CFI parameter indicated in the physical control format indicator channel (PCFICH) [2], see Figure 11.5 The CFI takes values 𝑁CFI = {1,2, or 3}, where it is recommended to use 𝑁CFI = 3 for a system bandwidth of 1.4 MHz and 5 MHz and 𝑁CFI = 2 for a system bandwidth of 10 MHz to 20 MHz [3]. The remaining resources are used for the physical broadcast channel (PBCH), primary and secondary synchronization signals (PSS and SSS respectively), and PDSCH, as shown in Figure 11.6 Obviously, there is a scarcity of resources for MAC messages in the PDSCH.

5 Note that not all REs are used for PDCCH, some of them are reserved for other channels such as the PCFICH and the physical hybrid indicator channel (PHICH). 6 We note that PSS and SSS only take place every 5 subframes.



Figure 11: Simplified illustration of downlink and uplink subframe organization in a 1.4 MHz system.

2.2.1.2 Uplink

The uplink resources are organized similarly as in the downlink, with the main difference that the smallest resource that can be addressed is a RB. The physical uplink shared channel (PUSCH) is used by devices for signaling and data messages, where it should be noted that several devices can be multiplexed in the same subframe. As shown in Figure 11, the physical uplink control channel (PUCCH) takes place in RB 0 in slot 0 and then in RB 5 in slot 1 (x=0), where 𝑥 denotes the PUCCH index.7 In other words, to enable frequency diversity the PUCCH transmission takes place in the lowest and highest part of the frequency grid. When present, the PRACH occupies 6 RBs and occurs periodically, from once in every two frames (20 sub-frames) to once in every sub-frame. A typical PRACH period is once every 5 sub-frames [4].

2.2.1.3 Connection establishment

The connection establishment in LTE starts with the access reservation procedure. The ARP in LTE consists of the exchange of four MAC messages between the accessing device, in further text denoted as user equipment (UE), and the eNodeB, as shown in Figure 12. The first message (MSG 1) is a random access preamble sent in the first random access opportunity (RAO) that is available, where RAO is a PRACH subframe. The number of subframes between two RAOs varies between 1 and 20, and it is denoted as 𝛿RAO. In other words, 𝛿RAO indicates the number of subframes between PRACH occurrences. The preambles that UEs contend with are randomly chosen from the set of 64 orthogonal preambles, where only 𝑑 = 54 are typically available for contention purposes and the rest are reserved for timing alignment. The contention is slotted ALOHA based [5][6], but unlike in

7 PUCCH Index is used to indicate to user which PUCCH resources shall be used.



typical ALOHA scenarios, the eNodeB can only detect which preambles have been activated but not if multiple activations (collisions) have occurred. In particular, this assumption holds in small/urban cells [7].8

Figure 12: Message exchange between a smart meter and the eNodeB.

Via MSG 2, the eNodeB returns a random access response (RAR) to all detected preambles. The contending devices listen to the downlink channel, expecting MSG 2 within time period 𝑡RAR. If no MSG 2 is received and the maximum of 𝑇 MSG 1 transmissions has not been reached, the device backs off and restarts the random access procedure after a randomly selected backoff interval 𝑡𝑟 ∈ [0, 𝑊c − 1]. If received, MSG 2 includes uplink grant information that indicates the RB in which the connection request (MSG 3) should be sent. The connection request specifies the requested service type, e.g., voice call, data transmission, measurement report, etc. When two devices select the same preamble (MSG 1), they receive the same MSG 2 and experience collision when they send their MSG 3s in the same RB. In contrast to the collisions for MSG 1, the eNodeB is able to detect collisions for MSG 3. The eNodeB only replies to the MSG 3s that did not experience collision, by sending message MSG 4 (i.e., RRC Connection Setup). The message MSG 4 may carry two different outcomes: either the required RBs are allocated or the request is denied in case of insufficient network resources. The latter is however unlikely in the case of M2M communications, due to the small payloads. If the MSG 4 is not received within time period 𝑡CRT since MSG 1 was sent, the random access procedure is restarted. Finally, if a device does not successfully finish all the steps of the random access procedure within 𝑚 + 1 MSG 1 transmissions, an outage is declared.

8 If the cell size is more than twice the distance corresponding to the maximum delay spread, the eNodeB may be able to differentiate the case that preamble has been activated by two or more users, but only if the users are separable in terms of the Power Delay Profile [13, 14].



Table 2: List of messages exchanged between the smart meter and the eNodeB.

After ARP exchange finishes, there is an additional exchange of MAC messages between the smart meter and the eNodeB, whose main purposes is to establish security and quality of service for the connection, as well as to indicate the status of the buffer at the device. These extra messages are detailed further in Table 2. Besides MAC messages, there are PHY messages included in the connection establishment [8]. Table 3 presents a complete account of both PHY and MAC messages exchanged during connection establishment, data report transmission and connection termination (the PHY messages are indicated in gray). As it can be seen from the table, for every downlink message a downlink grant in the PDCCH is required. Similarly, every time a smart meter wishes to transmit in the uplink after the ARP, it first needsneeds to ask for the uplink resources by transmitting a scheduling request in the PUCCH.9 This is followed by provision of an uplink grant in the PDCCH by the eNodeB.

9 We note that the amount of resources reserved for PUCCH is very small for scheduling periodicity above 40 ms [14] and therefore will not be considered in the following text and analysis.



2.2.2 Analysis overview For simplicity we assume a single LTE cell with 𝑁 UEs. However, it should be noted the

proposed model could be easily adapted to a more realistic scenario with inter-cell interference as the main difference would be a decreased packet transmission success probability, mainly due to a lower SNIR. Further, we assume that the smart grid application, associated with UEs, generates new uplink transmissions with an aggregate rate that is Poisson distributed with parameter 𝜆I, as depicted in Figure 13; note that the unit of 𝜆I is the number of transmission attempts per second.

Figure 13: Flow diagram of LTE access reservation protocol: one-shot transmission model and full 𝑚-retransmissions model (dashed lines).

In particular, 𝜆I = 𝑁 ⋅ 𝜆app, where 𝜆app is the transmission generation rate at each UE. For each new

data transmission, up to 𝑚 retransmissions are allowed, resulting in a maximum of 𝑚 + 1 allowed transmissions. When transmissions fail and retransmission occurs, then an additional load is put on the access reservation protocol, since the backlogged retransmissions 𝜆R add to the total rate 𝜆T. The total rate 𝜆T corresponds to the traffic generated by the preamble activations by UEs in the PRACH channel. After the PRACH stage, the traffic represented by 𝜆A corresponds to the detected preambles, where 𝜆A ≤ 𝜆T since in case of a preamble collision only 1 preamble is activated. As shown in Figure 13, we split the access reservation model into two parts: (i) the one-shot transmission part in Figure 13(a) (solid lines only) that models the bottlenecks at each stage of the access reservation protocol; (ii) the 𝑚-retransmission part in Figure 13(b) (dashed lines), where finite number of retransmissions and backoffs are modeled. The modeling approach used for the two parts is an extension of our preliminary work [13], by taking into account the details of PDCCH, PDSCH and PUSCH channels. For the detailed model description, see [1].

2.2.3 System Performance Evaluation In this section we first describe the traffic models used here. Thereafter we present and discuss numerical results, where we compare results from our analytical model to the simulation results.

2.2.3.1 Model of the Smart Grid Traffic

At the time of writing, there is no standardized traffic model that could be used to describe reporting activities of the eSMs. In the following, we develop a model by considering the typical smart metering traffic models and enhancing them in order to achieve PMU-like functionalities that eSMs are expected to have. In the literature there are different examples of traffic models for smart meters, such as [14]-[17]. Of these, the OpenSG Smart Grid Networks System Requirements Specification (described in [14]) from the Utilities Communications Architecture (UCA) user group is the most coherent and detailed network requirement specification. This specification describes the typical configuration where billing reports are collected as often as every 1 hour for industrial smart meters and every 4 hours for

IdlePopulation

Backlogged

PRACHData

Phase

λI Access Granting

One-shot

m-retransmissions

λT

λR

NTX ≤ mYes

No

λA

Failure

Success



residential smart meters. While this is sufficient for billing purposes, such low reporting frequency does not allow real-time monitoring and control. A way to enable this, as proposed and analysed in our work in [12], would be to drastically increase the reporting frequency of all smart meters so that reports are collected, e.g., every 10 seconds. While such a configuration is not described in OpenSG [14], it is mentioned that on-demand meter read response messages are 100 bytes, wherefore we will use this value in the following evaluation. Besides the basic measurements of consumption and production, the distribution system operators need to collect more detailed information of the distribution grid behavior in the form of power phasors from certain, strategically chosen measurement points. As an example in the following numerical results, we assume that every 10 seconds an eSM sends a measurement report that consist of concatenated PMU measurements (1 Hz sample rate) from the preceding 10 second measurement interval. The samples are, as specified in PMU standards IEEE 1588 [18] and C37.118 [19], timestamped using GPS time precision. Assuming that the floating point PMU frame format from IEEE 1588 is used and that each sample covers 6 phasors, 1 analog value and 1 digital value, each PMU sample accounts to 76 bytes. Adding UDP header (8 bytes) and IPv6 header (40 bytes) to each report of 50 PMU samples, an eSM packet is 808 bytes. Assuming that additional headers, e.g., for security purposes are needed, we round this up to an assumed eSM packet size of 1000 bytes.

2.2.3.2 Numerical Results

In order to evaluate the performance of the LTE system for smart metering and validate the proposed model, we have developed an event-driven simulator in MATLAB. This simulator models the main downlink and uplink channels. More specifically, we model the downlink control and data channels (PDCCH and PDSCH respectively); and the uplink data and random access channels (PUSCH and PRACH). The uplink control channel (PUCCH) can be shared among multiple users and its impact on the performance for typical configurations can be neglected [8][15]. We consider a typical 5 MHz (25 RBs) cell configured with one RAO every 5 ms (𝛿RAO = 5), 54 available preambles (𝑑) for contention and a backoff value of 20 ms [20]. In addition, we also investigate the performance of the smallest bandwidth cell in LTE, which corresponds to a 1.4 MHz (6 RBs), where 𝛿RAO = 20. Link adaptation is out of the scope of this paper and therefore we focus in the lowest modulation in LTE (QPSK). The packet fragmentation threshold 𝑁frag is set to 6 RBs, which corresponds to the maximum

uplink bandwidth transmission foreseen for LTE-M (low cost LTE for M2M) [20][21]. The maximum number of PRACH retransmissions for a given data packet is set to a typical value (𝑚 = 9) [20]. Further we consider SMs and eSMs reporting every 10 s, which allows for a more frequent monitoring of the grid [12]. The report size is set to 𝑅𝑆 = {100,1000} bytes, which illustrates small and large payloads described in the previous section (one order of magnitude of difference) impact on the system performance. However, we note that the proposed model can be also used for different payloads sizes and reporting intervals. The rest of parameters of interest are listed in Table 3.

Table 3: LTE simulation and model parameters

Parameter Value

Preambles per RAO (d) 54 Subframes between RAOs (𝛿RAO) 20 or 5 Max number of retransmissions (𝑚) 0 or 9 CFI Value 3 [20] Number of CCEs (𝜇) 6 or 21 System bandwidth 1.4 MHz or 5 MHz eNodeB processing time 3 ms UE processing time 3 ms



MSG 2 window (𝑡RAR) 10 ms Contention time-out (𝑡CRT) 40 ms Backoff limit (𝑊c) 20 ms Rest of Messages window 40 ms

Figure 14: Probability of outage in LTE with respect the number of M2M arrivals per second in a 1.4 MHz and 5 MHz system for different models, payloads and number of RAOs.

The evaluation is performed in terms of outage and number of supported users. The outage probability is defined as the probability of a device not being served before reaching the maximum number of PRACH transmissions. First we consider the case where immediately after the ARP (i.e., after MSG 4), the data transmission starts. That is, we have only the messages shown in bold text in Table 4.1.210. Figure 14 shows the outage probability 𝑃outage for 1.4 MHz and 5 MHz systems, both for SM and eSM traffic models. It

can be seen that the analytical model is very capable of capturing the outage point, where the system gets destabilized and the outage events become overwhelming. Since the intention is to characterize when the system is reliable, we focus on the region where the service outage is below 10%. The impact of the payload (MAC layer limitations) becomes clear in Figure 14. A 1.4 MHz system can support a few hundreds (100 arrivals/s) for large eSM payloads (1000 bytes) and up to 1000 arrivals/s for small SM payloads (100 bytes). As expected, increasing the bandwidth does help to increase the capacity of the system, raising the number of supported arrivals to 700 arrivals/s and 4000 arrivals/s respectively. It should be noted that if the ARP is neglected and the focus is solely on the data capacity as in [10][11], up to 9000 arrivals/s can be supported. When compared to our results where the different ARP limitations are taken into account, it is clear that for M2M scenarios, data capacity based analyses are too simplistic and give overly optimistic results [10][11], which was also pointed out in [22].

10 The case where the data transmission occurs immediately after the ARP, without the additional signaling denoted in Table 3, is denoted as lightweight-signaling access and corresponds to an extreme case of signaling overhead reduction, beyond what has been proposed in 3GPP [15, 16, 20].



Figure 15: Outage comparison for only ARP and data transmission (ARP + Data) and full message exchange (ARP + Signaling + Data).

In Figure 15 we investigate the impact of the additional signaling messages that follows the ARP, as described in Section 2.3. The striking conclusion is that, for both the 1.4 MHz and 5 MHz cases the number of supported arrivals is decreased by almost a factor of 3, decreasing from 1000 to 400 arrivals/s and from 4000 to 1500 arrivals/s respectively. Obviously, the additional signaling must be accounted for as it has a large impact on the system performance.

Figure 16: Outage comparison for different number of RAOs per frame in a 5 MHz system with a payload of 1 kbyte (ARP + Signaling + Data).



Further, in Figure 16 we illustrate the outage performance as the number of RAOs per frame is increased, i.e., when the distance between RAOs is decreased as 𝛿RAO = {10,5,2,1} subframes for the 5 MHz system with large payload and the entire sequence of messages considered. Although increasing the number of RAOs per frame is seen as the optimal solution for massive M2M [9], it does not help when the rest of the limitations of the system is considered. It can be clearly seen that the best performance (supporting up to 750 arrivals/s) is achieved with a single RAO per frame (𝛿RAO = 10), while the worst performance is present when the maximum number of RAOs per frame is selected (𝛿RAO = 1). Similar behavior can be observed for other cases.

Figure 17: Outage comparison for different payload sizes and channel constraints in a 5 MHz system.

We conclude by illustrating the importance of considering not only the ARP limitations but also the PHY and MAC layer limitations in Figure 17. The scenario considered is a 5 MHz system with 2 RAOs per frame (𝛿RAO = 5) with 100 bytes and 1 kbyte. The 100 bytes case is limited by the number of PDCCH messages required, and therefore we see the outage peaks in approximately 1.5 ⋅ 104 arrivals/s. In the 1 kbyte case, the major limitation is the MAC layer, or more specifically the PUSCH, which limits the number of supported arrivals to 7000 arrivals/s. It should be noted that the supported number of arrivals per second has been halved if the PUSCH limitation is considered. On the other hand, if we only consider the collisions in the PRACH we can support up to 3.9 ⋅ 104 arrivals/s, which represents an astonishing difference with respect to the actual performance of the system.

2.2.4 Conclusion One of the main messages is that the study of the performance of the LTE access in case of massive M2M traffic requires a fundamentally different approach compared to the study of human-type traffic. Specifically, in M2M, it is necessary to take into account the features of the actual channels used to exchange signaling information, such as PRACH, PDCCH and PUSCH. In case of small payloads, the main limitations are posed by PDCCH or PRACH if the system bandwidth is very large. On the other hand, in case of larger payloads (1000 bytes), the limitations are posed by PUSCH. Also, it was shown that, surprisingly, increasing the number of RAOs does not always help, as in most cases provision of RAOs per frame above a certain limit will negatively impact the performance.



While it is possible to obtain these results for any given scenario using tedious simulations, e.g., for different payload sizes or RAO configurations, we have shown that the analytical model developed in the paper, which can be rapidly implemented and evaluated, allows to obtain the service outage breaking point accurately. The proposed modeling and evaluation of LTE access can be easily extended to include more limitations such as the PDSCH if the M2M service is also intensive in downlink messages. However, judging from [14] the downlink is barely used in smart grid monitoring applications, except for occasional software and firmware updates, and it is natural to assume that its impact can be neglected in such cases. Another major insight is that the additional signaling that follows the ARP has very large impact on the capacity in terms of the number of supported devices; in the assessed setup we observed a reduction in the capacity by almost a factor of 3. This calls for the consideration of a more efficient procedure in case of M2M connection establishment in future LTE standardization, e.g., a lightweight procedure in which the data report is sent immediately after the ARP. We conclude by noting that, to the best of our knowledge, this the first study that accurately models and shows the full impact of the connection establishment on the support of massive M2M reporting in LTE, and, as such, may provide basis for the future standardization work.

2.2.5 References

[1] G. C. Madueno, J. J. Nielsen, D. M. Kim, N. K. Pratas, C. Stefanovic, and P. Popovski, “Assessment of LTE wireless access for monitoring of energy distribution in the smart grid”, IEEE J. Sel. Areas Commun., vol. 34, no. 3, pp. 675-688, Mar. 2016.

[2] 3GPP, “TS 36.212 E-UTRA Multiplexing and channel coding (Section 5.3.4),” Tech. Rep., 2015. [3] ——, “TS 36.508 E-UTRA and EPC Common test environments for User Equipment (UE) conformance testing,”

Tech. Rep., 2015. [4] 3GPP, “MTC simulation assumptions for RACH performance evaluation,” 3rd Generation Partnership Project

(3GPP), TR R2-105212, August 2010. [5] 3GPP, “TS 36.321 E-UTRA medium access control (MAC) protocol specification,” Tech. Rep., 2015. [6] ——, “TS 36.213 E-UTRA physical layer procedures,” Tech. Rep., 2015. [7] S. Sesia, I. Toufik, and M. Baker, LTE-The UMTS Long Term Evolution: From Theory to Practice. Wiley, 2011. [8] 3GPP, “TR 36.822: LTE Radio Access Network (RAN) enhancements for diverse data applications, Rel. 11,” Tech.

Rep., September 2011. [9] “IEEE Vision for Smart Grid Communications: 2030 and Beyond,” IEEE Vision for Smart Grid Communications:

2030 and Beyond, pp. 1–390, May 2013. [10] C. Hagerling, C. Ide, and C. Wietfeld, “Coverage and capacity analysis of wireless M2M technologies for smart

distribution grid services,” in IEEE International Conference on Smart Grid Communications (SmartGridComm 2014). IEEE, 2014, pp. 368–373.

[11] NIST, “NIST PAP2 guidelines for assessing wireless standards for smart grid application,” 2012. [12] J. J. Nielsen, G. Corrales Madueno, N. K. Pratas, R. B. Sørensen, C. Stefanovic, and P. Popovski, “What can wireless

cellular technologies do about the upcoming smart metering traffic?” IEEE Communications Magazine, vol. abs/1502.01188, 2015.

[13] J. J. Nielsen, D. M. Kim, G. Corrales Madueno, N. K. Pratas, and P. Popovski, “A Tractable Model of the LTE Access Reservation Procedure for Machine-Type Communications,” in Proceedings of IEEE Globecom, 2015.

[14] E. Hossain, Z. Han, and H. V. Poor, Smart grid communications and networking. Cambridge University Press, 2012. [15] J. G. Deshpande, E. Kim, and M. Thottan, “Differentiated services QoS in smart grid communication networks,”

Bell Labs Technical Journal, vol. 16, no. 3, pp. 61–81, 2011. [16] R. H. Khan and J. Y. Khan, “A comprehensive review of the application characteristics and traffic requirements of a

smart grid communications network,” Computer Networks, vol. 57, no. 3, pp. 825–845, 2013. [17] M. Lander, P. Svoboda, N. Nikaein, and M. Rupp, “Traffic Models for Machine Type Communications,” in Proc. of

the International Symposium on Wireless Communication Systems (ISWCS 2013), Aug. 2013. [18] K. Lee, J. C. Eidson, H. Weibel, and D. Mohl, “IEEE 1588-standard for a precision clock synchronization protocol for

networked measurement and control systems,” in Conference on IEEE, vol. 1588, 2005, p. 2. [19] K. Martin, D. Hamai, M. Adamiak, S. Anderson, M. Begovic, G. Benmouyal, G. Brunello, J. Burger, J. Cai, B.

Dickerson, V. Gharpure, B. Kennedy, D. Karlsson, A. Phadke, J. Salj, V. Skendzic, J. Sperr, Y. Song, C. Huntley, B.



Kasztenny, and E. Price, “Exploring the ieee standard c37.118-2005 synchrophasors for power systems,” Power Delivery, IEEE Transactions on, vol. 23, no. 4, pp. 1805–1811, Oct 2008.

[20] 3GPP, “MTC simulation assumptions for RACH performance evaluation,” 3rd Generation Partnership Project (3GPP), TR R2-105212, August 2010.

[21] ——, “Overview of 3GPP release 13,” 3rd Generation Partnership Project (3GPP), Tech. Rep. [22] G. Corrales Madueno, C. Stefanovic, and P. Popovski, “Reengineering GSM/GPRS Towards a Dedicated Network

for Massive Smart Metering,” in Proc. of the IEEE Internation Conference on Smart Grid Communications (SmartGridComm 2014), Nov. 2014.

2.3 Station regrouping for contention based IEEE 802.11ah wireless LAN

The IEEE 802.11 task group TGah proposes to use group-based contention scheme in the MAC of its proposed amendment 11ah in order to cope with massive power constrained machine-type communication stations (STAs) per access point (AP) [1][2]. The PHY layer of IEEE 802.11ah is MIMO OFDM based transmission with 32 or 64 subcarriers. Channel bandwidths of 1 MHz and 2 MHz are expected to be adopted in Europe. The MAC layer is designed to maximise the number of stations supported while endeavouring to maintain minimum energy consumption. A group-synchronised distributed coordination function is featured in the MAC of the proposed amendment. The 12 bit Association IDentifier (AID) used in the latest 11ah draft supports more than 8000 STAs for one AP. An AID classifies stations into 4 pages, which includes 4 or 8 Traffic Indication Maps (TIMs), and 32 sub-blocks in each TIM. The AIDs are allocated to the STAs according to a grouping mechanism. In addition, three categories of STA are defined. The TIM Stations need to listen to the AP's beacon to send or receive data. They are active in the scheduled Restricted Access Windows (RAW). TIMs can be allocated with different RAWs. Only the STAs within the same group will be contending for transmission with each other at once. The Non-TIM Stations negotiate with the AP to access the channel at the Periodic Restricted Access Window (PRAW) defined in every TIM period. The Unscheduled Stations are allowed to access the channel in the Any Access window in the TIM period. STAs that are not in permitted RAW can switch to sleep mode and save power. Under such mechanism, collisions and delays caused by the contention based MAC process can be reduced significantly in a dense network scenario. The detailed STA regrouping technique is described as implementation specific. In addition to reducing the number of STAs for contention by using grouping, the unexpected transmission from the so called 'hidden nodes' is another major cause of collision in the IEEE 802.11 wireless LAN. The potential hidden node problem should also be carefully managed in the grouping of STAs. The ultimate goal is to identify all potential hidden node pairs and allocate them into different contention groups. We identify several recent research on resolving hidden node problems based on STA regrouping techniques. For example, paper [3] proposed that the STAs can sense the signals from other STAs frequently and report the recorded MAC address to the AP. The technique proposed in [4] used several traffic sniffers to capture traffic of nodes and report to AP, then scheduling is taken by the AP to mitigate the misbehaviour nodes. These approaches often require additional signalling and probably a large memory space in each STA for storing the MAC addresses of the detected STAs. In terms of IEEE 802.11ah, the authors of [5] proposed a new frame in the Power Saving Poll to include an eight byte first Tx time stamp for the AP to generate a hidden node matrix and regroup the STAs accordingly. This approach was able to eliminate most hidden nodes without extra processing at the STAs. We propose a novel method consisting of a signalling process and a regrouping algorithm. The aim of the regrouping is to minimise the potential transmission collisions caused by the hidden node problem. Firstly, the AP acquires knowledge of potential hidden node pairs and the traffic requirement of the STAs in the network. The AP then regroup the STAs into different contention



groups according to a centralised Viterbi-like algorithm. The proposed technique can also be used in other contention based wireless networks where grouping is supported in the MAC protocol.

2.3.1 The procedure for collecting hidden node information Figure 18 depicts the time slot allocation in the IEEE 802.11ah group based MAC protocol. The proposed hidden node information collection method will happen in the multicast, Downlink (DL) and Uplink (UL) slots in every TIM as indicated in Figure 18, for a minimum of two full transmission periods, i.e., when each STA has reported its hidden node information twice. All STAs need to listen to the channel during the whole signalling period to collect information. The detailed signalling procedure is described below.

Figure 18: Collecting hidden node information using the IEEE 802.11ah MAC.

An explicit order of roll call is broadcasted by the AP. As a result, all connected STAs implicitly understand the order of following transmissions and the desired responding STA. In the following DL period, all STAs listen to the communications. If the DL packet is sent for STA_i, STA_i will reply with an ACK. All the other STAs will try to capture this ACK and record the listening result. If STA_m can hear the ACK from STA_i, then it will record STA_i as audible in its hidden node table. In the UL stage, the STAs will upload the record to the AP in their given turns, if any new hidden node has been recorded. The AP is then able to create a table of the potential hidden nodes for the network. Figure 19 shows an example of 10 by 10 table (matrix) with an indicator 1 used to represent hidden. The AP keeps updating the table until no report is collected, which requires at least two full rounds of roll calls.



Figure 19: Example of a hidden node status table.

In addition to the hidden node information, the STAs also report its expected traffic requirement. The AP quantifies such requirement as different transmission active levels. For example, a number in [0, 10] represents the amount of expected traffic. Larger number indicates the STA has more data to send. In the SUNSEED scenario, the traffic expectation can also be based on the type of the STA, i.e., the WAMS device or smart meters. If the STA is a potential hidden node to one another, its transmission attempts may cause collision. In order to minimise such impact, active hidden node pairs should be separated into different groups. The active levels are integrated into the hidden node table to form a comprehensive knowledge of potential impact of hidden nodes, namely hidden traffic table. Figure 20 gives an example of the table. It is possible that certain malicious STAs report fake information, however this is beyond the scope of this report.

Figure 20: Example of a hidden traffic table (bottom) based on a given active levels of the STAs (top) and hidden node information as in Figure 19.



2.3.2 The Viterbi-like STA regrouping algorithm Having gathered the above information about hidden nodes, a Viterbi-like algorithm is applied to allocate the STAs into groups in order to minimise the intra-group hidden traffic and hence the potential collisions in the whole network. The algorithm is described as follows. At the start, the STAs are sorted according to their active level. The STAs with the highest traffic level are put into empty groups. Then, the idea is to calculate the hidden traffic a STA will cause to every group, and allocated it to the group which gives the least additional hidden traffic. The subsequent W (known as the Viterbi window size) STAs are then taken into consideration. One new STA will be allocated to the group where lowest hidden traffic would be created due to their existence. If the STA has the same impact to more than one existing groups, multiple allocation choices can be made. There will be multiple paths in the Viterbi diagram. This process continues until the W STAs have been considered. The AP will then decide the allocation of these STAs according to the path associated with the smallest added hidden traffic. Using the same mechanism, another W STAs will be assigned to the existing groups, until all the STAs are regrouped. Figure 21 illustrates the algorithm. As seen, STAs 2, 14, 25, and 28 are the most active STAs. Hence, they are firstly assigned to Group1, Group2, Group3, and Group4, respectively. A Viterbi window of five is used. STAs 31, 46, 69, 74, and 89 are then chosen for regrouping. STA 31 can potentially create the same hidden traffic to Group1 and Group2. In this case, two possible allocations of STA 31 exist and therefore two paths are used to connect the starting state with the second state. Then, for each path, there are another two possibilities for the allocation of STA 46. Hence the paths extend further. After calculating the potential impact of STA 89 on all the possible grouping results (paths), we allocate it to Group2. The optimal path is selected as the current state for the allocation of the next five STAs.

Figure 21: The Viterbi concept used in the proposed regrouping algorithm.

The proposed regrouping mechanism is conducted by the AP to achieve a global scheduling. Reasonably high computation capability is required at the AP. The complexity of the algorithm is determined by the number of groups and window size (NW). The conventional overlapped Viterbi sliding window can also be employed as a possible implementation of the algorithm. If the window size is equal to the total number of STAs, the result is globally optimal. However this can create undesired complexity in a massive WLAN. Therefore a proper choice of W is vital. The tradeoff



between the performance in terms of hidden traffic reduction and the system complexity should be carefully considered.

2.3.3 System level simulations We demonstrate the proposed technique and evaluate its performance via Matlab based system level simulations considering the SUNSEED WAMS scenarios. We simulate networks containing 500 to 4000 STAs (WAMS devices and smart meters) that are randomly distributed within the radius. The WAMS devices are assumed to be ten times more active than the smart meters. The number of WAMS devices is 1/9 of smart meters. Figure 22 depicts one network layout with a total of 1000 STAs. We consider a single lower transmit power network which achieves around 600m radius. Major communication parameters are referred to [1]. In particular, the carrier sense threshold is set to -95dBm and contention window is within [31, 1023]. Data rate is set to MCS0 with constant transmit power of 15dBm and a noise figure of 7dBm.The operating frequency is 900MHz with 1MHz bandwidth. RTS/CTS has been disabled.

Figure 22: An example of the network layout.

We allocate the STAs into four groups (N=4) using the centralised Viterbi-like algorithm. Different Viterbi windows between 5 and 30 are used. Simple grouping where the STAs are grouped only according to geographical regions without considering their potential hidden node impact is considered for result comparison. Table 4 lists the collision rates achieved when only simple grouping is used. The result is as expected that as the number of groups increases, i.e., less STAs contending at the same time, collision rate will decrease. We are more interested in investigating the difference in collision reduction by using the proposed technique compared to simple grouping. Figure 23 depicts the performance of the regrouping. As seen, in the network of 500 STAs, a reduction of approximately 40% in data traffic collision is obtained by using the proposed technique with a Viterbi



window of 10. As the number of STAs increases, the grouping technique achieves larger reductions. When there are 4000 STAs in the network, a 55% reduction is seen. This also implies that STA regrouping is more critical for larger networks. Improved performance in collision reduction can also be achieved by using larger Viterbi windows. The collided transmission can be reduced to 35% compared with simple grouping. However, the use of large window may not benefit all networks. As seen, when using a window larger than 20, the performance decreases for networks consistingconsisting of 3000 or lower STAs. Therefore, we infer that the Viterbi window should be carefully chosen in order to achieve optimal performance for the proposed algorithm. Similar results are also obtained when there are eight groups (N=8), as depicted in Figure 24. Comparing with the previous case, we can see only slightly improved performance. We claim that the proposed technique has a robust collision reduction performance which does not depend on the number of groups.

Table 4: Collision reduction by using region based grouping (1000 STAs).

Number of STAs per group Collision rate

8 0.19%

16 0.42%

32 0.82%

64 1.54%

128 3.03%

256 5.91%

1000 (no grouping) 31.48%

Figure 23: Performance of the Viterbi-like STA regrouping algorithm (N=4).



Figure 24: Performance of the Viterbi-like STA regrouping algorithm (N=8).

2.3.4 References

[1] IEEE P802.11- Task Group AH, “P802.11ah draft 8.0,” Available online: http://grouper.ieee.org/groups/802/11/Reports/tgah update.htm, May 2016.

[2] T. Adame, A. Bel, B. Bellalta, J. Barcelo, and M. Oliver, “IEEE 802.11AH: the WiFi approach for M2M communications,” IEEE Wireless Communications, vol. 21, pp. 144–152, December 2014.

[3] W. Y. Choi, “Clustering Algorithm for Hidden Node Problem In Infrastructure Mode IEEE 802.11 Wireless LANs,” in 10th International Conference on Advanced Communication Technology (ICACT), vol. 2, pp. 1335–1338, Feb 2008.

[4] M. Shanthi and S. Suresh, “Detecting misbehavior node in Wifi networks by co-ordinated sampling of network monitoring,” in 2014 International Conference on Information Communication and Embedded Systems (ICICES), pp. 1–4, Feb 2014.

[5] S.-G. Yoon, J.-O. Seo, and S. Bahk, “Regrouping algorithm to alleviate the hidden node problem in 802.11ah networks,” Computer Networks, vol. 105, pp. 22 – 32, 2016.

http://grouper.ieee.org/groups/802/11/Reports/tgah%20update.htm



2.4 Ultra-reliable communication via multiple access networks

The SUNSEED communication architecture is largely based on WAMS nodes being connected through Telekom Slovenie’s mobile networks. In SUNSEED, especially the communication for the WAMS-SPM nodes is mission-critical in the sense that for the state estimation algorithm to produce an accurate view on the power grid state, measurements from the majority of deployed WAMS-SPM nodes should be received within approximately 1 sec after being measured. In case of communication path breakdown, for example in the form of mobile network outage caused by equipment failure or a fractured optical fiber, the real-time power grid monitoring will be compromised if the communication solution relies only on a single technology. In SUNSEED we have investigated the possibility of exploiting multiple communication technologies to improve the resilience against failures in the communication between WAMS-SPM nodes and the core network in which the real-time power measurements are collected. The studies have considered both technologies that can be considered independent in terms of error causes such as LTE (4G) and Wi-Fi, and technologies that may fail due to a common error cause such as LTE (4G) and UMTS/HSDPA (3G) that are likely to be served from the same base station tower. These studies are documented in the articles [1] and [2]. The following description emphasizes the main points, while the articles can be consulted for detailed information.

2.4.1 Reliability in communications In reliability engineering, reliability is the probability that a system can provide uninterrupted delivery of acceptable service for a given mission time [3]. The most basic approach to estimating reliability is to assume exponential failure times and then to use parallel and series systems theory or a CTMC state model to calculate the probability of no system failures during the mission time [12]. These methods alone do, however, not let us account for the typical latency and jitter of communication technologies. In communications, reliability is typically defined as a system's ability to deliver some amount of information (data packet) within a certain (application dependent) deadline [4].

Figure 25: Conceptual illustration of latency-reliability function.

For representing the latency characteristics of an interface, we use the latency CDF [4]. For the example in Figure 25, the blue curve is the probability that a packet is delivered from source to destination within a certain deadline. That is, for a given latency 𝑥 on the x-axis, the y-axis shows the reliability, i.e., the probability that the latency is 𝑋 ≤ 𝑥. Such a curve can be produced from network monitoring measurements, e.g., by continually pinging the remote host and recording success rate and latency.



The shape of the curve depends on two factors: 1) the variability of latency, i.e., the latency distribution, due to factors such as medium access, routing, queuing and processing delays, and 2) the loss probability due to various failures between two end hosts. Specifically, this loss can be caused by transmission errors such as low SINR, network access overload, packet drops, and congestion, or by Infrastructure failures such as cable fractures, equipment malfunction, or power outage. The combined loss probability resulting from transmission errors and infrastructure failures is denoted 𝑃e in the figure. Unless 𝑃e = 0, the latency distribution curve never reaches 1, meaning that the curve is technically not a CDF. We will refer to it as latency-reliability function in the rest of this paper.

2.4.2 System model We consider an M2M device that needs to communicate reliably with a specific end-host, e.g., a monitoring device reporting measurements, status and alarm messages to a control unit. The M2M device has 𝑁 communication interfaces (wired and cellular) available to reach the end-host. An example of a deployment with two cellular and one wired interface is depicted in Figure 26. Notice that some interfaces that are physically separated are almost independently failing, while cellular connections that share the same base station may have a higher degree of failure correlation. When transmitting information through the different communication interfaces, the individual data packets will be subject to varying delays and packet losses, which can be characterized using latency-reliability function as shown in Figure 25. For the analysis conducted in this work, we assume that the latency-reliability functions of the interfaces are available, which can be obtained from network monitoring measurements of end-to-end delay.

Figure 26: Multiple paths between M2M device (left) and remote host (right).

2.4.2.1 Transmission Strategies

In this work we consider the following three strategies (see examples in Figure 27):

2.4.2.1.1 Cloning

Is a simple approach to increase reliability in which the source device sends a full copy of the payload through each of the 𝑁 available interface. Since only one copy is needed at the receiver to decode the message this makes the communication robust, but this strategy requires 𝑁 times the resources of a traditional transmission.



2.4.2.1.2 Splitting

Covers the types of strategies where instead of sending a full copy on each interface, only a fraction of the payload is sent on each interface. This allows to trade-off reliability and latency through the selection of the fraction sizes. For this, we assume that the payload is encoded using a fountain code such as the Raptor code [5]. From the payload, we can thus generate a desired number of coded fragments that we can send through different interfaces. The receiver will be able to decode the encoded payload with very high probability as long as it receives coded fragments corresponding to approximately 105% of the initial payload size. We denote this threshold as 𝛾d = 1.05. The coded fragments of a payload that are to be sent on the same interface, will be sent together in a redundancy packet. We assume that for a specific payload message, we let the raptor code generate coded fragments of a relatively small size, e.g. 10 bytes, and the challenge (for weighted splitting) is then to determine how many fragments to assign to each interface. Depending on whether identical or different types of interfaces are used, splitting can be realized through either k-out-of-N splitting or weighted splitting, respectively:

k-out-of-N splitting generates 𝑁 equally sized redundancy packets from the payload, so that the receiver only needs to receive 𝑘 of the redundancy packets to successfully decode the message. This strategy allows to trade off reliability and latency, since large redundancy packets leads to higher reliability but longer transmission times, whereas small redundancy packets give less error protection but shorter transmission times.

Weighted splitting aims to split the payload across interfaces so that the size of the individual rendundancy packets is optimized according to a specific objective. That objective could be to minimize the expected overall transmission latency or to maximize the reliability for a given latency constraint. The optimal solution is however not trivial to find as the analyses in this paper show.

Figure 27: Transmission strategies, with 2-out-of-3 as example of 𝑘-out-of-𝑁. The time instant 𝜏 is when the payload can be successfully decoded.

It is necessary to characterize the relationship between payload size and latency distribution because the duration of a packet transmission is usually depending on the packet size. We specify the latency-reliability function of interface 𝑖 as 𝐹𝑖(𝑥, 𝐵). This gives the probability of being able to transmit a data



packet of 𝐵 bytes from a source to a destination via interface 𝑖 within a latency deadline 𝑥. In other words, the value of 𝐹𝑖(𝑥, 𝐵) is the achievable reliability (𝑃(𝑋 ≤ 𝑥), c.f., Figure 25) for a latency value 𝑥 and payload size 𝐵. In the following, we let 𝛾𝑖 specify the fraction of payload assigned to interface

𝑖, where 𝛾𝑖 = [0, 𝛾d]. Further, by 𝑃e(𝑖)

we refer to 𝑃e (defined in Figure 25) for the 𝑖th interface.

2.4.2.2 Influence of payload size on latency

As the relationship between payload size and round-trip time (RTT)/latency for the different technologies is needed for the analyses, Telekom Slovenije has provided ping statistics from which the plots in Figure 28 has been created. The plots show the relationship between packet size B and the RTT, from which the one-way latency can usually be well approximated as RTT/2. Since the data exhibits a close to linear relationship, we have performed a linear regression analysis to determine parameters from which the RTT for any packet size can be estimated. In the analysis, we assume that the latency of transmissions of packet size 𝐵 through a specific interface/path Gaussian distributed with mean 𝜇 defined as:

𝜇 =𝛼⋅𝐵+𝛽

2[𝑚𝑠] (13)

and due to lack of information about the distribution, we assume 𝜎 =𝜇

10[𝑚𝑠]. The parameters 𝛼

and 𝛽 characterize the assumed linear relationship between packet size and delay for an interface. The values of 𝛼 and 𝛽 are shown in Table 5.

Figure 28: Packet size and RTT relationship for ping measurements in TS mobile network.

Table 5: Linear regression parameters from RTT measurements and assumed reliability values of equipment.



Based on the assumed Gaussian distribution and approximated mean latency values, the resulting latency-reliability characteristics are shown in Figure 29 for the case of 𝐵 = 1500 bytes11.

Figure 29: Latency-reliability curves 𝐹𝑖(𝑥, 𝐵) for all considered technologies for 𝐵 = 1500 bytes.

2.4.3 Reliability of multi-interface transmissions In the articles [1] and [2] we present in details how the reliability of multi-interface transmissions can be evaluated for the considered transmission strategies in different scenarios, both in case of independent failures and the case of correlated failures. For the latter case with correlated failures, it is necessary to explicitly include the failure correlations in the system model. Therefore, we are considering a specific case study where two cellular interfaces and one Wi-Fi interface are available. The operation states and possible transitions of this system is depicted in Figure 30. The total system reliability can be calculated by assigning failure and restoration rates in the state model and combining the resulting state probabilities with the latency-reliability characterizations of each interface and described in details in [1].

11 Note that with smaller values of 𝐵, the curves shift towards the left.



Figure 30: CTMC model of states in the three interface system. Colors indicate the number of interfaces up/down as: Green: 3/0, yellow: 2/1, orange: 1/2, red: 0/3. An arrow represents a failure

rate in the right direction and restoration rate in the left direction, e.g., 𝜆𝐶1 and 𝜇𝐶1 between states 1 and 2.

2.4.4 Results and discussion For the numerical results we will consider the different scenarios specified in Table 6.

Table 6: Interface and parameter specifications of scenarios 𝒜, ℬ, 𝒞, 𝒟, and ℰ.

2.4.4.1 Independent interfaces

Initially, we study the simple scenario 𝒜, for which we solved the weighted splitting between two interfaces analytically in [2]. Notice that 𝐥 and 𝐰 are parametrized so that the numerical optimization calculates the expected latency like the analytical optimization. The results are shown in Figure , and



show a visually good correspondence between the analytical result and the brute-force search. The brute-force search has a slightly lower expected latency, due to the weight assignment being different. We attribute this minor difference to the use of the approximation of 𝔼[max(𝑋𝐴, 𝑋𝐵)] from [6].

Figure 31: Reliability results for scenario 𝒜.

In relation to the general idea of splitting, the most important question we seek to answer, is if it makes sense to spend the additional effort required to find the optimal 𝛾-values for a weighted splitting or if it suffices to use one of the simpler 𝑘-out-of-𝑁 strategies. It is intuitively clear that if the used technologies are all identical, then a 𝑘-out-of-𝑁 strategy will be optimal. But how much better is a weighted scheme in a heterogeneous scenario? To answer this we study three different scenarios that are specified in Table 3. The resulting reliabilities for the different transmission strategies are shown for scenario ℬ in Figure . The most distinctive observation is that in the low latency region 𝑥 < 0.3 s, only the 1-out-of-4 and Weighted strategies provide any reliability. However, around the target latency 𝑥 = 0.7 s, both the 2-out-of-4 and 1-out-of-4 strategies achieve higher reliability than the 1-out-of-4 since the payload is split between the interfaces. Nevertheless, the optimal weight assignment used by the Weighted strategy is leading to the highest reliability at 𝑥 = 0.7 s. The weight assignment in terms of 𝛾-values are shown in the figure legend. In comparison to the 1-out-of-4 (Cloning) strategy we see a significant improvement in reliability from 0.95 to 0.997 at the target latency 𝑥 = 0.7 s. In terms of latency, at R=0.997, we see a reduction from 1.05 s to 0.7 s.



Figure 32: Reliability results for scenario ℬ.

While scenario ℬ demonstrated how latency can be lowered, the results for scenario 𝒞 in Figure show two examples of latency-reliability trade-offs that are achieved by considering both when the starred 𝑙 and 𝑤 values in Table 3 are included and excluded. In both cases the weighted strategy achieves some reliability in the low latency region (𝑥 < 0.2 s) similar to the 1-out-of-5 strategy and it has the reliability of the 2-out-of-5 strategy around 𝑥 = 0.4 s. The difference between the 2 results is that the last one transmits more redundancy data and achieves higher reliability in the 𝑥 > 0.4 s region.



Figure 33: Reliability results for scenario 𝒞. Note: the target latency 𝑙2 = 0.9 s only applies to the last strategy.

The last results concerning scenario 𝒟 that are shown in Figure are interesting since they demonstrate a more mixed data allocation. This results in the reliability at 𝑥 = 0.5 s being 0.9999, which is one decade better than any of the 𝑘-out-of-𝑁 strategies that only go up to 0.999.



Figure 34: Reliability results for scenario 𝒟.

2.4.4.2 Interfaces with failure correlation

For this case study, we consider that besides failing independently, C1 and C2 can also fail simultaneously due to a common BS failure. This will be reflected in the MC model results, whereas the ideal results do not account for common cause failures. For evaluating the resulting performance of the considered transmission modes, actual data on MTTR and availability levels of different technologies has been used. From these numbers, the unspecified failure and restoration rates have been determined. The approach to parametrize the CTMC model is explained in [1]. Table 7 presents the failure and restoration rates used in the numerical evaluation.

Table 7: Case study failure and restoration rates



With failure and restoration rates fully specified, the resulting latency-reliability performance is calculated using the methods outlined in [1]. The different model results have been verified using Matlab-based simulation. We first simulated the transitions between states in the CTMC model in Figure 30 with exponential sojourn times given from the rates in Table 7. Hereafter we replayed the state sequence and for every 1 min simulation time, a random Gaussian latency value was drawn for the interfaces available in the current state. Depending on the required packet fragments of the strategy either a transmission latency or timeout value resulted. The CDF of these values is shown with crosses in Figure .

Figure 35: Reliability results for scenario ℰ, where we found 𝛾 = 0.55.

In all plots in Figure we see that the Cloning strategy, which uses three times as much bandwidth as a single-interface transmission, achieves the highest reliability in the high latency region. The impact of failure correlations is shown from the difference between the ideal and MC model curves. For cloning, the difference amounts to more than one decade at high latency values. This difference results from the fact that both cellular interfaces are depending on the base station being operational. That is, in cases where the base station fails (model states: 5, 8, 10, 11, 13, 14, 15, 16) neither C1 or C2 will be operational. 12 The 2-out-of-3 strategy uses only half the total bandwidth of cloning. However, the dependence on at least two working interfaces causes the lack of reliability before 0.2 s. While in the ideal case (without failure correlation) the 2-out-of-3 strategy is able to reduce latency up to almost 0.9 s at 𝑅 = 0.996, the MC model result that accounts for correlated failes, is the worst strategy, except the small interval 0.35 − 0.42 s where it reaches the same reliability level as cloning. Finally, the Weighted strategy shows the best performance for low latency (𝑥 ≤ 0.5 s), whereas it is only slightly worse than Cloning (ideal) for higher latency values. It is worth noticing that the difference between the ideal and MC model results for this strategy is minimal. We explain this from the fact that in the weighted strategy the cellular interfaces are inherently

12 The used value of 𝑃e = 0.99 for the base station may be high compared to a real-life system, however the main point of the analysis is

to show how such factors can be modeled.



depending on each other also in the ideal case, whereas for cloning and 2-out-of-3, the two cellular interfaces are independent in the ideal case, but dependent when using the MC model. Besides considering only the level of reliability that each strategy can achieve, we are showing also the efficiency as nines per 𝐵 bytes transmitted in Figure . While the ideal results show that both the 2-out-of-3 and cloning strategies are better than the weighted strategy, this observation does not hold for the case with correlated failures (MC model). In this case the weighted strategy is the best choice for the whole span of latency values.

Figure 36: Efficiency results for scenario ℰ.

2.4.5 Conclusions It is expected that 5G will integrate various communication technologies to support ultra-reliable and low latency (URLLC) use cases. In this work we denote this integration interface diversity and consider different strategies for utilizing multiple interfaces simultaneously, to achieve high reliability and low latency. By encoding every payload message to transmit using of a rateless code, coded fragments can be freely assigned to interfaces, allowing to trade-off transmission latency and reliability. We have considered both static 𝑘-out-of-𝑛 strategies and optimized weighted strategies, that are however computationally challenging when considering more than 3-4 interfaces. For evaluating performance, we have proposed an analysis framework that combines traditional reliability models with technology-specific latency probability distributions. The proposed models can be used both for systems with independently failing communication paths and for systems with common error causes, e.g. if cellular technologies reside in the same base station tower. Our main findings are that 1) interface diversity strategies can lower the latency up to around 40% in some of the considered scenarios that consider mixes of Wi-Fi, 2G, 3G, and 4G technologies; 2) in some cases only the optimized weighted strategy (and not 𝑘-out-of-𝑛) can deliver latency reduction and reliability at low latencieslatencies; and 3) the optimized weighted strategy enables the fine-tuning of the latency-reliability trade-off for a specific scenario.

2.4.6 References [1] J. J. Nielsen and P. Popovski, “Latency analysis of systems with multiple interfaces for ultra-reliable m2m

communication,” in The 17th IEEE International workshop on Signal Processing advances in Wireless Communications. IEEE, 2016, pp. 1–6.

[2] ——, “Latency-optimized interface diversity for ultra-reliable low latency communication (urllc),” Submitted to ICC’17, 2017.



[3] M. Rausand and A. Høyland, System reliability theory: models, statistical methods, and applications. John Wiley & Sons, 2004, vol. 396.

[4] E. G. Ström, P. Popovski, and J. Sachs, “5g ultra-reliable vehicular communication,” arXiv preprint arXiv:1510.01288, 2015.

[5] D. J. MacKay, “Fountain codes,” IEE Proceedings-Communications, vol. 152, no. 6, pp. 1062–1068, 2005. [6] C. E. Clark, “The greatest of a finite set of random variables,” Operations Research, vol. 9, no. 2, pp. 145–162,

1961.



3 Design criteria and approach for joint DSO and telecom operator network solution

3.1 GIS based topology approach

Geographic Information System (GIS) can be used for a static representation of spatial topological

datasets, e.g. electricity grid network. In Sunseed, on the other hand, we also need highly efficient

and dynamic GIS, capable of providing real-time data about energy and communication data flows

within local smart-grid networks.

In the process of the snmart grid topology determination the primary objective of data visualization

is to enable a unified view, i.e. integral presentation of power grid infrastructure and communication

network infrastructure with its coverage, together with power grid elements and communication

network elements, i.e. assets from electrical utility and telecom operator. In other words GIS must be

capable to simultaneously spatialy represent energy network infrastructure and energy flow, as well

as communication network infrastructure and data flow.

The Figure 37 and Figure 38 represent the list of all implemented infrastructure spatial data layers,

marked with the Core “Infrastructure” and Workspace “EP” and “TS” tags. Some of the layers are

also visible on the map on the area of field trial location Kromberk. All implemented Infrastructure

layers are:

- Elektro Primorska (EP): - Power nodes - Power lines

- Distributed generation - MW remote controlled switches - MW manual switches - MV poles - LV poles - House consoles

- Telekom Slovenije (TS): - xDSL and FTTH access nodes and user locations - GSM, UMTS, LTE radio access network base stations and cells - GSM, UMTS and LTE radio access network coverage



Figure 37: The list of all implemented and some visible Infrastructure EP layers on the field trial location Kromberk

Figure 38: The list of all implemented and some visible Infrastructure TS layers on the field trial location Kromberk



In the process of the smart grid topology determination the GIS plays very important role. Beside

providing the general orientation in space and the unified view of power grid and communication

infrastructure, GIS can offer many useful spatial functions and tools. In case of broadband radio

access network (RAN) connections of grid nodes, very valuable functions are the visibility and terrain

profile analysis. Web-based opensource GIS application, specially designed and used for RAN

planning, optimisation and monitoring in Telekom Slovenije, also offers highly advanced modules for

calculation of radio coverage as well as modules for statistical processing and visualization of RAN

parameters. With all this functionality GIS represents very valuable tool for various Sunseed project

teams and plays important role in the process of grid topology definition.

3.2 Design of data and control planes

For the purposes of transferring data from smart meter’s location to the database servers we use three different technologies, ie. mobile network, fixed network and satellite network. See Figure 39. General requirement at the locations where smart meter’s or/and WAMS’s are installed is ability to access (public) internet. Access to the internet must be reliable and must provide acceptable bitrate even in peak traffic hours, while end-to-end security mechanisms need not to be provided by telecommunication technology itself since they are also part of Sunseed architecture which is thoroughly discussed in Chapter 3.3.

IP MPLS core network(VPN Sunseed)

mobile access network

data center(database etc.)

satellite access network

laboratory network

(montoring etc.)

fixed network(site-to-site VPNs)

fixed network(residenial & SME)

Figure 39: Architecture of telecom operator's network.

Core network Sunseed network represents quite complex geographically distributed network which consists of data center, laboratory networks where servers for monitoring communication and measurement devices are located, and various access networks through which measurements sites on remote locations communicates to data center and servers in laboratory network. This is illustrated in



Figure 40. Data center, which serves mainly for storing measured and computed data, is physically connected into telecom operators (ie. Telekom Slovenije) MPLS core network via Ethernet (fiber) infractructure. In similar way, servers for monitoring are connected into TS’s MPLS core network, while they are not part of data center. Sites where measurements take place are connected to core network by different access technologies which are further discussed in more details. Access technologies could be divided into three categories:

- mobile network: sites connecting via TS’s LTE, 3G or EDGE, all are aggregated on mobile core’s GGSN (or, specifically, P-GW in LTE network) and connected to MPLS core through Juniper M320 edge router (Juniper M320 SDP);

- fixed network: TS’s DSL and FTTH sites which are all aggregated on BRAS SmartEdge; fixed network is also used to connect DSO’s network to core MPLS network through so-called LAR router;

- VPN tunnels: suitable for any site/location where internet access is available, but not provided by TS (eg. satellite links), tunnels are aggregated on PFsense firewall which is connected to MPLS core through Juniper M320 edge router (Juniper M320 LAB).

Through VPNs, APNs (mobile network) and VLANs (fixed network), all devices are connected into the same logical network. Same is true with DSO’s network which is connected directly to MPLS core through ELAS and LAR routers. ELAS and LAR routers are part of TS’s backbone, they are located on remote locations and cover certain geographical regions. Within the core MPLS network, Sunseed has its own VPN network. Edge routers are routers which serves as gateways to each of access network listed above. Besides routers mentioned (Juniper M320 SDP, Juniper M320 LAB, BRAS SmartEdge, LAR), there is fourth router (Cisco ASR 9000) to which data center is connected. Data center for the purpose of Sunseed project runs as virtual platform. Physically it is part of TS’s data center with its own core network infrastructure built around Nexus servers and Checkpoint firewall. The latter also serves as a VPN endpoint. VPN tunnels related to Checkpoint firewall are used only to access and manipulate data stored in Sunseed’s database. That is also the reason why data center infrastructure uses public IP addresses. To notify (see Figure 40), servers are connected through load balancer since big amounts of data could be expected. There is another data center in laboratory (BrihtaLAB) where monitoring systems are installed. This network is connected to core MPLS through PFsense firewall and Juniper M320 LAB edge router.



Figure 40: Sunseed network topology with focus on core network and data center components.

IP addressing plan Sunseed network is its own logical entity based on IPv4 protocol. Network is identified with B-class address pool 10.161.0.0/16 and C-class address pool 10.122.248.0/24. The latter is used for data center components only. The B-class address pool 10.161.0.0./16 is further divided into:

- addresses for access from mobile network: 10.161.1.1 – 10.161.49.254; - addresses for access from fixed network: 10.161.64.1 – 10.161.89.254; - addresses for access via VPN (eg. satellite network): 10.161.93.0/24, 10.161.94.0/24,

10.161.126.0/24 and /27 classes from 10.161.127.0/27 to 10.161.127.224/27; - data center: 10.161.91.0/24; - BrihtaLAB (monitoring): 10.161.90.40/29 and 10.161.92.0/24; - intermediate segments: /30 classes from 10.161.90.0/32 to 10.161.90.36/30 and from

10.161.90.48/30 to 10.161.90.252/30; - addresses for DSO’s network of smart meters: 10.161.128.0/17; - rest of the range is not used: 10.161.95.0 – 10.161.127.254.



IP pools for access networks are divided by regions and by access technologies, as well, addresses are also divided to those for WAN interfaces and for LAN ones. We have:

- region Nova Gorica: - mobile:

WAN: /32 classes from 10.161.1.1/32 to 10.161.1.64/32

LAN: /27 classes from 10.161.10.0/27 to 10.161.17.224/27 - fixed:


LAN: /27 classes from 10.161.70.0/27 to 10.161.73.224/27 - region Tolmin:

- mobile:




LAN: /27 classes from 10.161.74.0/27 to 10.161.77.224/27 - region Razdrto:

- mobile:




LAN: /27 classes from 10.161.78.0/27 to 10.161.81.224/27 - region Koper:

- mobile:




LAN: /27 classes from 10.161.82.0/27 to 10.161.85.224/27 - region “Telekom”:

- mobile:




LAN: /27 classes from 10.161.86.0/27 to 10.161.89.224/27 Mobile network Being geographically wide-spreaded, the mobile network is the most flexible way of connecting measurement locations into the smart grid network. For telecom operator with its own mobile network, this scenario is also more economical than, for example, using satellite network which is even wider spreaded comparing to mobile network. Due to the flexibility of mobile access, measurement sites with mobile connection represents the largest part of sites within Sunseed experimental network while in real production environment we do not expect same situation. As



discussed in WP 2 business models, it is more economical to prefer existing fixed network access accounts where it is possible. Classic mobile networks are primarily optimized to achive high data throughputs, while in case of smart grid (or IoT speaking more generally) we deal with lots of clients which do not claim high bitrates, but have, in certain case, rather very predictable and very low bitrate. Even low bitrates became remarkable when number of clients/sites increases as is expected in such smart grid which afterwards affects access, transport and core network regarding licenses, line capacities and processing capacities to mention just few of them. However, from technical point of view these issues are part of mobile network operator’s daily tasks, while economic impact is more thoroughly discussed in WP 2. When choosing modem for mobile data network we primarily focus on LTE network, while 3G and EDGE technologies must be supported as well since LTE network is still being deployed in certain regions. Insight into coverage is obtained through GIS system. All technologies mentioned are capable to provide enough bitrate required for successful communication in Sunseed network. For Sunseed experimental network Teltonika RUT950 modem with router capabilities integrated have been chosen. Regarding network connectivity we must provide secure access from measuring devices directly to data center which resides in core network, see Figure 41. To provide suitable solution regarding security and optimal bitrate we use private Acess Point Names (APNs) and routing behind mobile station (routing behind MS) function. Like VPN tunnels or VLANs this functions enable extending private network to remote locations and enable ability to route private IP addresses within such network. Each geographical region in which measurement sites communicates via mobile network (Nova Gorica, Razdrto etc.) has its own APN (eg. sunseed.ng.ts.si, sunseed.ra.ts.si) and its own private range of IP addresses which are delivered by mobile modem’s (Teltonika) DHCP server. Range of IP addresses is linked to modem’s WAN interface which further enables routing behind MS, ie. router/modem does not perform NAT when it forwards data packets to GGSN. On the other side, GGSN is connected to core network through which Sunseed’s data center is reachablereachable. Relations and routing rules regarding data delivery through mobile network are configured in Radius server’s database where SIM cards and usernames allowed to connect into Sunseed’s APNs are provisioned as well. Using private APNs to build geographically distributed private and secure network represents well-known and reliable way to avoid additional overhead in data communication which is unavoidable in case if OpenVPN, IPsec or some other tunneling L3 transparent protocol would be chosen. When properly designed and managed, mobile networks are known to be very reliable regarding security. This holds true in case of private APNs as well, therefore there is no known issues preventing us not to choose such solution.



LTE

RUT950LTE

WANLAN

APN

sunseed.test.ts.si

10.161.90.18/30 10.161.90.17/30

JUNIPER

M320 SDP

10.161.5.1/3210.161.42.0/27

LTE

RUT950

LTE

WANLAN

APN

sunseed.ng.ts.si10.161.1.1/32

10.161.10.0/27

LTE

RUT950

LTE

WANLAN

APN

sunseed.to.ts.si10.161.2.1/3210.161.18.0/27

LTE

RUT950

LTE

WANLAN

APN

sunseed.ra.ts.si10.161.3.1/3210.161.26.0/27

LTE

RUT950LTE

WANLAN

APN

sunseed.kp.ts.si10.161.4.1/3210.161.34.0/27

ROUTE

10.161.0.0/16 GW 10.161.90.18

ROUTE

0.0.0.0/0 GW 10.161.90.17

WAMS

DHCP

10.161.42.7/27

G

G

S

N

to core

network

Figure 41: OSI layer 3 details of mobile access network structure used in Sunseed architecture.

Fixed network As vital part of each telecom operator, connecting measurement sites via fixed network is obvious, in the Sunseed project it turned out that only few sites could be connected in such way, but in real production we expect the majority of measurement sites would be connected through fixed network – target are households which already use fixed connections for 3-play services, see Figure 42. From the technological point of view we will describe here connecting via DSL lines and FTTH lines which are different regarding physical layer, but similar regarding network connectivity needed in case of Sunseed project. Basic principle of connecting measurement devices into Sunseed network is same as within other access technologies, ie. geographically distributed devices must be connected into the same logical network to be able sending data into Sunseed data center. As mentioned previously, BRAS serves as gateway in-between fixed access network and core MPLS network. Access network on the users’ side of BRAS enables L2 connectivity, therefore a certain VLAN has been created within fixed access network. In this architecture, BRAS SmartEdge serves as DHCP server and gateway (MPLS edge router) for measurement devices connected to certain modems. As stated in section “IP addressing plan”, IP addresses has been divided and are therefore pre-reserved for each location separately. BRAS identifies measurement device’s location on the basis of unique port ID in DSLAM to which modem is connected. Port ID is therefore communicated in option field within DHCP request. If there would be need to limit measuerement device’s bitrate, the feature is also configurable in BRAS.



BRAS

SE600

10.161.42.0/27

DSL/FTTH

modem

LAN

WAMS

DHCP

10.161.42.7/27

to core

network

Figure 42: Connecting measurement devices through fixed access network. In such scenario DSL or FTTH modem is needed.

Above scenario is suitable in case of end users - mainly residential and small & medium enterprises. In case of large enterprises or even DSO’s own communication network, direct connection to telecom operator’s access network backbone is more common scenario. In such scenario tunnel (eg. IPSec, GRE etc.) is established between DSO’s endpoint and telecom operator’s network, see Figure 43. As in other scenarios, this enables communication with private IP addresses and therefore no public addressing space is needed. DHCP server for devices at the DSO’s site is located in DSO’s LAN network.

ELAS

KROMBERK

Firewall

ELEKTRO PRIMORSKA

10.161.90.50/29

WAMS

10.161.192.0/24

10.161.193.0/24

10.161.194.0/24

IP MPLS core network(VPN Sunseed)

fixed access network backbone

10.161.90.49/29

LAR

NOVA GORICA

10.161.90.51/29

10.161.128.1/17

EP SUBNET

Figure 43: Connecting DSO's network to telecom operator's network.

In certain case (Telekom Slovenije and Elektro Primorska), tunnel is established between EP firewall and edge router of TS backbone access network (ELAS). Data traverse access network’s backbone and are delivered into core MPLS network by router LAR. On the EP’s side, there is EP’s own DHCP server which delivers IP addresses to measurement devices according to IP addressing plan. Since all the data coming from EP go directly to telecom operator’s network, there is no need forfor additional security mechanisms.



Satellite network – VPN tunnel Communication via satellite is possible almost everywhere around Europe, since Europe is well covered by satellite signals. There are certain cases in which communication to satellite is not possible due to physical obstacles, ie. view in the direction of the desired satellite is blocked by mountains, high buildings etc. Since telecom operators usually do not operate satellite networks, this approach is less cost-optimized comparing to networks which are operated and owned by telecom operator. We propose using satellite link in case where there is no mobile or fixed network and any extensions of them would cause greater costs than using satellite link. Satellite network in our case provides (only) internet connectivity, which is different to mobile and fixed network scenarios which are more tightly integrated into Sunseed infrastructure. This is however not of big concern since security is provided using VPN tunnel. Satellite network operators may otherwise also provide L2 connectivity as well which would enable integration with Sunseed infrastructure in a way similar to approach used in mobile and fixed networks, while in certain case this would mean excessive costs. As mentioned above, to provide access to data center for WAMSes and smart meters, data are sent through OpenVPN tunnel with local endpoint on InHand router and other endpoint on PFsense FW in core network. In case of mobile and fixed network WAMS is connected directly to network access element (eg. modem), while in case of satellite access technology, there is a router in-between modem and WAMS since satellite modem itself has no routingrouting capabilities and network access operates in L3 mode only (in certain case). Router obtains public IP address on its WAN side which gives it all possibilities to create VPN or any other connections. Regarding VPN tunnel it is not necessary that router’s public IP address is static, unless this is required by the policy of firewall which is the endpoint of the VPN tunnel. However, having static or dynamic IP is matter of configuration which should also take security issues into consideration. On the LAN side of the router, IP addresses are delivered by router’s DHCP server and must be configured in a way thatthat enables routing through VPN tunnel into the core network. Routing mechanisms might be configured in a way that also allows access to satellite modem and access to internet which is extremely useful in case of satellite modem and link maintenance. Using VPN means overhead in communication. According to our experiences, additional bitrate needed would represent not more than 10% which is not significant impact comparing standard satellite user profile bitrate and bitrate needed for measurement equipment at the typical site. Typical user profiles in satellite communications starts with 10 Mbit/s in download and with 2 Mbit/s in upload. Those bitrates are more than good enough regarding our situation. Otherwise attention should be also paid to total data volume, which is usually limited on monthly basis and after exceeding quota, fair-usage policy prinicples are employed (typically 128 kbit/s or 256 kbit/s in each direction). In case of extreme weather conditions (heavy rain, heavy snowing) we may expect that satellite link could fail, while in case of bad weather conditions link performances are normally degraded. which means reduced bitrate. However, this would normally not cause major problems since typical measurmenet site do not generate more than few hundreds kilobits per second. Another possibility of degrading the performance are congestions on satellite. To avoid this, we use business user profile



which means that traffic from Sunseed measurement site is prioritized over consumer traffic and have therefore same priority as VoIP traffic. Figure 44 depicts connections in satellite access network scenario. Primary concern to secure delivery of measured data to data center (part of core network) is VPN which is enabled through satellite internet access. In certain case router obtains static public IP. Router also provides IP address to WAMS, which is private and routable (only) within Sunseed private/core network.

SATELLITE

ISP

ROUTER

WAMS

WAN

SAT

MODEMLAN

LAN

LAN 193.77.30.0/24

(delivered from Sat. ISP)

VPN-SAT

10.161.126.9/24

R 10.161.129.0/27

INTERNETDHCP serverOSS system

internet GW

PF SENSE

FWWAN

OPT1

PUBLIC

89.143.232.140/32

OVPN-GW

10.161.126.1/24

10.161.90.34/30

VPN-SAT

192.168.100.1

to core

network

DHCP

SERVER

10.161.127.0/27

Figure 44: Satellite access network and connections on the client side (measurement location) and on the core side (PF Sense FW as a VPN endpoint and gateway to core network via OPT1 interface).

Monitoring the network Telecom operator typically monitors CPEs which are its property. In case of Telekom Slovenije this means monitoring DSL/FTTH modems and satellite modems. Other equipment at the costumers' site is not monitored. Regarding access and core network, all elements are monitored as well. The same is true for elements/servers in data center if we limit ourselves to the Sunseed project network. From the perspective of telecom operator, Sunseed communication equipment is owned by Sunseed project and system for monitoring those elements (ie. mobile modems, routers etc.) should be set up separately. Nagios server, located in laboratory (see previous sub-chapters) has been implemented to provide monitoring remote devices as illustrated in Figure 45.

TS MPLS

JUNIPER

M320 LAB

LAB PFSENSE

FW

ROUTE

10.161.0.0/16

GW 10.161.90.33

10.161.90.41/29

NAGIOS

10.161.90.42/29

Figure 45: Nagios monitoring server's position within Sunseed network.



Monitoring is based on SNMP protocol. Various parameters are monitored, eg. round-trip-time between Nagios and device, uptime, signal strength (in case of mobile modem), temperature, memory used etc. Parameters' values are recorded and displayed on graph visible in Nagios' web GUI as illustrated in Figure 46 and Figure 47.

Figure 46: Entry page of Sunseed Nagios web GUI. One can observe sites which are alive (green ones) and those which are temporarily unavailable (red ones).



Figure 47: Example of graph in which LTE signal strength is displayed.

Monitoring uses the same (logical) network as is used for data transfer and was described previouslypreviously, ie. no separate (logical) L2 or L3 networks areare used for the purpose of monitoring.



3.3 Design for secure data transfer and storage

Telecom operators are making their network available as part of a service they offer, and as such they provide solutions to secure their communication channels, using mechanisms and credentials under their control. The use of private VPN (or, specifically, APN in mobile network) enables the telecom operator to offer its customer a (logical) virtual network within its own operated network, thus providing an extra level of isolation from the public network. However, in some cases the customer data transiting though the telecom network needs its own protection under the control of the customer to ensure its data may be transferred privately through telecom networks. Such protection may be achieved at two levels:

At the network level using VPNs . Using VPNs, the customer data is encapsulated in a tunnel and may transit encrypted through the public network. VPN provide generally an overall protection for the data transisting between 2 sites. It does offer specific protection for application.

Data may also be also protected at the transport level, offering an application specific protection level

The security in sunseed is addressed at these 2 levels: network and transport.

3.3.1 Network access security At the network level, security in mobile networks is standardized – the user (or device) is authenticated via the SIM card and encryption is applied both in the radio network and in the transit network. Further on, in Sunseed case, data are isolated in Sunseed’s private APNs and after all do not traverse public internet since they are directly delivered from mobile network edge elements into Sunseed MPLS VPN. APNs in mobile network might be considered as tunnels. They are enabled by GGSN which is usually provisioned by Radius server and its corresponding database. Mobile modem is able to send data within certain APN only if it is aware of username and password of the APN. Therefore, provisioning of each APN consists of its credentials, IP addressing and routing on the GGSN. We presume that, as for mobile networks, the level of security provided by the telecom operator is also acceptable for fixed network. In fixed access network there is no encryption from the modem to the DSLAM, while each port on DSLAM/MSAN is authenticated and authorized in the provisioning system. Isolation of each user’s data is achieved on physical level, ie. each user has its own copper pair or its own fibre in FTTH network. Potential eavesdropping is possible but not believable. In the case of Sunseed’s measurement sites, secure data channel is by default not available in communications via satellite links where 3rd party, ie. satellite ISP, is involved and communication channel traverses public internet. Situation of establishing communication through public internet is also when someone wants connect to Sunseed database from his office. In both cases mentioned, VPN tunnel is used. VPN tunnels for remote measurement sites are created using open-source application Open VPN. Server’s endpoint is firewall on the edge of core network, and client’s endpoint is on the router connected to the satellite modem. VPN tunnel should be always up, so in case physical connection fails, the router should be configured to immediately set it up when connection become alive again. In case of person who wants connect to Sunseed database, Checkpoint firewall is used. For that purpose client VPNs and site-to-site VPNs are used.



Authentication, and data protection need to be addressed for network access security in order to control the use of the VPN (a distinct level of authentication is also required at the application level to control application data protection). The Open VPN server (see Figure 48), or any other VPN concentrator/server (eg. Checkpoint) is responsible for authenticating and authorizing VPN creation requests, therefore it needs to maintain a database containing relevant data. However, such database could also be located on some other server and protocols to access authentication and authorization data could be various as well, eg. Radius, Diameter, Tacacs etc. There are few different methods of authentication, the two most practical seem to be use of username & password and digital certificates. To authenticate VPN creators/users uniquely, which is goal of such procedure, usernames (and corresponding passwords) must be unique for each users. As well, in case of digital certificates use, those must be also unique for each separate user. The latter requires setting up PKI infrastructure including Certificate Authority server which is responsible for issuing and validating digital certificates – each user/device should have its own pair of private and public key which enables authentication server to uniquely authenticate it. After user/device is authenticated and authorized by VPN concentrator/server, encryption is applied to enable secrecy of data transferred. Usually symmetrical encryption keys are used for encrypting data transfer, which is contrary to asymmetrical keys (private and public) used in authentication phase. Symmetric encryption is much faster, ie. it requires far less computational power comparing to asymmetric encryption. However, a suitable encryption algorithm should be used as well and further on, such algorithm should periodically change ciphering keys to avoid attackers from breaking it. Nowadays, algorithm AES has been proved to provide appropriate level of security.

Sunseed

network internet

OpenVPN

server

PFsense FW

89.143.2.32.140/32

openVPN

client

Figure 48: General principle of connecting user or site into certain network domain usingVPN.

In certain case of Sunseed OpenVPN server which serves for VPNs from remote measurement sites, users/devices are authenticated by username, password and digital certificate. As well, device authenticates OpenVPN server using its digital certificate. There are no special authorization rules, since all sites have same level of rights, ie. they have full access to Sunseed core network, data center, monitoring servers and to other measurement devices as well. The latter could be denied by authorization rules (eg. access lists) or by routing rules within network itself. Encryption in OpenVPN tunnels is performed by AES algorithm. Here follows OpenVPN server’s configuration: #viscosity startonopen false #viscosity dhcp true #viscosity dnssupport true #viscosity name dev tun persist-tun



persist-key cipher AES-256-CBC auth SHA1 tls-client client resolv-retry infinite remote 89.143.232.140 1194 udp lport 0 verify-x509-name "OVPNSRV" name auth-user-pass ns-cert-type server ca ca.crt tls-auth ta.key 1 cert cert.crt key key.key For the purpose of accessing data to make some analysis, personal/client VPN connections are used. In this scenario each person form Sunseed partner organization who is authorized to access data center is assigned its own username, password and PIN code. Security is enhanced by the use of one-time-password which is generated every time user wants to establish VPN to data center. The procedure of establishing client VPN is as follows:

- user opens VPN web page and enters his/hers username and password (see Figure 49), - user gets one-time-password to his/hers mailbox and/or on his/hers mobile phone via SMS, - user now starts clientVPN application and enters username and sequence of PIN and one-

time-password in password field. Back processes run on Apache web server, MySQL database and FreeRADIUS server. As a client VPN application Cisco AnyConnect Secure Mobility Client is used.

Figure 49: Starting point for generating one-time-password as part of credentials to establish client



3.3.2 Security at the transport level In addition to the network access security described above, an application level security was implemented in Sunseed to protect application data flows at the transport level. Unlike the Network access security , which is managed either by the network access providers (such as Telekom Slovenia) or the project IT team, application security is usually managed by the application managers. The overview in Figure 50 summarizes the different types of data protection coexisting when implementing the Sunseed state estimation use case. This use case involves software components executing in three different computing environments:

WAMS devices are deployed in the field. They publish power measurements data via a 4G LTE data connection, using the mqtt protocol or via a fixed internet connection secured with a VPN.

The data from the WAMS devices transits via the wireless mobile or fixed access network and then via the IP MPLS core network to reach the Telekom Slovenia data center, as explained in section 3.2. In this center the data is received by an mqtt broker.

The state estimation application is executed in a remote computing center in the JSI facility. In terms of security, the mqtt data originating from the WAMS devices is protected on the wireless part of the data link by the mobile network security layer established using the credentials located in the SIM card. The data link uses dedicated APN which can be considered as a type of data tunneling mechanism.

Figure 50: Overview of the network and application security for the Sunseed state estimation use case.



The mqtt connection between WAMS devices and the mqtt brokers is protected at the transport level using TLS security protocol and client certificates. Client certificates are obtained from the authorization server using a rest API described in SUNSEED’s deliverable D3.4.1 (SUNSEED, 2015). When WAMS are deployed in the field, security boostrapping will take place automatically (plug and play security) in two steps:

At first connection of the WAMS device to a communication network, an initial security bootstrap takes place, in a one time operation and results in the creation of long time credentials which will be used to protect applications.

Upon first execution of an application on the WAMS (such as the power publishing application) a second security bootstrap (application security bootstrap) takes place resulting in the delivery of the actual credentials which will be used by the application to authenticate.

These two security bootstrap operations are described in SUNSEED’s deliverable D3.4.2. The authorization server is also managing publish/subscribed access control using authorization delegation mechanisms also described in SUNSEED’s deliverable D3.4.1 On the left handside of Figure 50, the state estimation application is executing in the JSI computing center. A site-to-site VPN is linking this computing center with the Sunseed data center, providing network access security. The state estimation subscribes to the mqtt data streams published by the WAMS deployed. The mqtt link between the state estimation application and the mqtt broker is protected using TLS security protocol secured with client side certificates. The state estimation application obtains those certificates, as well as the proper authorizations to subscribe to WAMS data from the authorization server using protocols described in (SUNSEED, 2015) Figure 50 shows clearly that different security layers are used in Sunseed: an outer network access security envelop is used to vehicle another transport security envelop. The outer envelop enables to manage data protection and access control at the network access level. It provides a global data protection for a data link between two sites possibly carrying several application data streams. So the data protection is global for all applications data. The second data security layer is used to manage

Authentication (via the delivery of client certificates per application)

Authorization at the application level. In the example provided, this means defining who can publish and subscribe on which topic. Thus in the use case outline on Figure 50, WAMS devices will be granted the right to publish power data on a specific mqtt topic, while the state estimation application will be granted the right to subscribe to this data. The data protection resulting from this second level is per application. Different application data streams are protected with different credentials.



4 SUNSEED recommendations for joint DSO and telecom operator networking for smart grids

The SUNSEED’s analysis of LTE wireless cellular network shows that depending on the required end-to-end delay performance the LTE system can provide suitable networking capabilities for smart grid application of the future. If the end-to-end delay requirements is rather stringent (e.g. up to 1s) then the analysis shows that up to few hundred (e.g. around 250) smart grid nodes can be supported per LTE cell with low number of reserved resources at the LTE cellular operator (e.g. 1 PRB). If this delay requirement is relaxed or the commercial conditions allow for higher number of reserved resources at the LTE cellular operator then up to few thousands (e.g. around 2000 for 6 PRB reserved) of smart grid nodes can be supported, which makes LTE suitable of many practical deployments. It should be stressed that the LTE scheduler has to be time-based and flexibly allocate the available PRBs to end nodes in order to optimize the delay performance. Additions to the LTE system such as NB-IoT might be attractive due to low complexity and cost of the end-nodes, and yet the analysis show that from delay performance point of view the delay is increased roughly three times when compared to regular LTE communication. Therefore, NB-IoT deployments can be only used for non-delay critical applications in smart grids. The telecom LTE operator should consider tuning the inactivity timer of LTE connections, so that WAMS-SPM nodes are always connected. In many deployments this timer is set to around 10 s, meaning that if a WAMS-SPM is reporting every 10 s, it could in principle be forced to go through the access reservation protocol (ARP) each time. As the analysis showed, this procedure has large signalling overhead and the overall capacity of an LTE cell would decrease due to this. Setting the timer slightly higher, would result in the WAMS-SPMs being always connected. Maintaining the connectivity would require a slight overhead, however, this is negligible compared to the signalling required for the ARP. In the SUNSEED field trial the WAMS-SPMs are configured to send measurements every 1 s, and in this case the LTE connection will not time out, but instead let the device be always connected. The analysis of multi-interface communications showed that the reliability of real-time communications (e.g. for WAMS-SPM reports) can be increased by several orders of magnitude by transmitting simultaneously via different communication technologies. In cases where less capable communication technologies such as GPRS, EDGE, UMTS or HSDPA are used to complement for example a primary LTE link for increased reliability, a large payload (such as a WAMS-SPM measurement of ~800 bytes) may be impossible for those complementing technologies to deliver in time. For such cases, we showed that by encoding and splitting the payload across the available complementing interfaces, so that each interfaces was carrying a smaller amount of data than the original payload data, the packet transmission latency could be significantly reduced, while still adding some degree of redundancy and thereby increasing overall reliability. Besides the 3GPP cellular technologies that have been investigated and tested in SUNSEED, there are several other network technologies emerging, which are directly targeting the support of a large number of low capability devices, such as the WAMS device. Although LTE provides high capacity for carrying data transmission for the WAMS, we also consider the need in terms of overprovisioning the infrastructure in order to provide extra capacity and reliability to the network.



There is a positive view in the use of non-3GPP technologies for the smart grids, in particular, low-power local area and wide area networks including the 802.11ah, Sigfox, and LoRa, These technologies are designed for the emerging IoT applications with low cost of deployment. For instance, the DSO is able to deploy its own local area 802.11ah network for collecting data between the measurement nodes and the data concentrators in dense areas such as blocks of apartments, or where the telecom’s LTE network capacity is not guaranteed. A LoRa network can also be deployed in a cost-effective way based on the telecom’s current cellular infrastructure. In short, we envisage that a heterogeneous network can achieve optimal performance for smart grid communications. Based on design plan and experiences gained through physical implementation of SUNSEED field test trial network the basic idea is unified logical network, which enables efficient real-time data exchange and real-time monitoring. Unification of the network could be also seen as a network convergence since heterogeneous access network technologies and topologies should be part of such unified network. To provide openness of the solution, standardized methods should be used, i.e. IP protocol as a basis of the joint network. In terms of joint network, telecom operator can have either passive or active role. Regarding Sunseed test network, passive role was observed through satellite ISP in which certain ISP (or telecom operator) provides internet access only. In such scenario, core network is in control of DSO, while telecom operator provides only access technologies – various tunneling protocols based on IP could be used to help creating a unified logical network, eg. IPsec, GRE, APNs (within mobile network), etc. Multiple telecom operators could join such type of network. Having an active role, telecom operator controls core network. DSO’s network is typically enterprise-grade, while telecom operator’s network is carrier-grade which means that telecom operator would normally have to put some effort into supporting and adapting its network to certain specifics of DSO’s network which is from the joint network’s perspective just one of access technologies included among other access networks already operated by telecom operator. Having control over whole joint network enables telecom operator to manage the joint network more flexible and efficiently as many existing solutions could be used – like infrastructure and procedures regarding security, MPLS technology in core network, avoiding L3 tunneling protocols in favor of L2 mechanisms (less overhead) etc. Due to the flexibility, one telecom operator can manage multiple DSO networks within its network, while each DSO could have its own logical network entity. Integral part of joint network are security mechanisms which enable security of the network within telecom operator’s network and within DSO’s enterprise network as well. Regarding internal security within joint network it is necessary to define and implement to what an extent certain devices are allowed to communicate with other devices and systems within network (databases, monitoring system, human users etc.). Security should incorporate mechanisms of authentication and authorization in terms of accessing the network, being either a smart-meter device, a database or a human user retrieving some data. Due to network managing and maintenance IP addressing plan should be prepared carefully. It is strongly recommended that IP addresses are delivered to devices via DHCP protocol. Ideally, IP addresses would be delivered from central point, ie. core network, which is possible in case of complete layer 2 network only. Since this is not always possible (at least not in case of mobile network), device must be appropriately preconfigured depending to its location. In any case, IP address should reflects physical location of the device which is especially helpful in terms of



monitoring the network. Regarding the latter, helpful means are so called DHCP options which are features within DHCP protocol.



Appendix A: LTE and NB-IoT performance comparison for different environments

The results in Figure 51 to Figure 53 show the results of the LTE Random and Time scheduler simulations for urban, suburban and rural environments. For both scheduler implementations, the difference in the delay values for the 3 environment scenarios is not significant, except for the case of 20 PRBs in the time scheduler for SM nodes (for high number of users). In this case, the delay for rural environment is the highest followed by suburban and urban.

Figure 51:. Comparison of 95th

Percentile Maximum Delay for Urban, Suburban and Rural Environments- 1PRBs

(LTE Random and Time Scheduler)

Figure 52: Comparison of 95th

Percentile Maximum Delay for Urban, Suburban and Rural Environments- 6 PRBs

(LTE Random and Time Scheduler)




Percentile Maximum Delay for Urban, Suburban and Rural Environments – 20

PRBs (LTE Random and Time Scheduler)

The observations described above can be explained by looking at the SINR distribution of the users in the three environments as illustrated in Figure 54. We see that the mean and standard deviation of the SINR of the users decreases from urban to suburban, and to rural. The mean SINR decreases due to the larger cell radius, while the standard deviation decreases due to the decrease in the shadowing standard deviation. For the case of 20 PRBs, the users in the urban environment with higher mean SINRs can achieve higher throughputs and are more likely to finish their transmission quickly indirectly resulting in lesser waiting time for the cell edge users. However, for the suburban and rural environments the users with lower mean SINR will have longer transmission time indirectly also increasing the transmission time for the cell edge users. This effect is less pronounced with decreasing number of available PRBs as the users with good SINR cannot fully utilize the higher throughput (i.e. only one or 6 PRBs are available) and hence the waiting for the users is not significantly affected.

Figure 54: SINR Distribution of users in urban, suburban and rural environments.




Percentile maximum delay for NBIoT and LTE (1PRB) in the case of urban,

suburban and rural environments.

Figure 55 shows a similar comparison of the 95th percentile maximum delay values between NBIoT and LTE (for 1PRB) for the urban, suburban and rural environments. We see that for NBIoT, the delay is highest for the urban environment followed by suburban and rural environments, which is opposite trend than for the LTE delay performance. The reason for this is the higher coverage (i.e. down to SINR of -20 dB) and low throughput for NBIoT . Refering to the SINR distrubtion in Figure 54 it can be seen that for the urban environment a greater proportion of users (e.g. the tail of the distribution below –5 dB in Figure 54) are covered by NBIoT for urban environment than for suburban or rural. Further, the users with good SINR conditions do not experience significantly higher throughputs as in LTE due to the limited uplink throughput of NB-IoT. As the proportion of served cell edge users is highest in the urban environment and the users with good SINR conditions cannot profit from higher throughput the overhead due to the repetitions in NB-IoT is also the highest, which leads to higher transmission and waiting time for the users in urban environment.

d3.2.2 design specification of communication solution for...

Documents