Download - Optimal Allocation of Smart Meters to Real or Virtual Data

Optimal Allocation of Smart Meters to

Real or Virtual Data Concentrators

Christian Johansson

ICS

Master Thesis

Stockholm, Sweden 2015

XR-EE-ICS 2015:007

I

Abstract

The Smart Grid is the new, modernized, electrical distribution grid. It allows for many

applications such as automation, reliability and efficiency of electrical distribution. A

key feature of the smart grid is the AMI (Advanced Metering Infrastructure) which is

the system that measures, collects and analyzes energy use.

When designing an LV/MV AMI smart grid, one can use several different media

types, two of those are PLC (Power Line Communication) and GPRS (General Packet

Radio Service). Choosing one over another may have differences in both

communication performance and overall economic cost.

This thesis describes a method to optimally assign the smart meter communication

devices in an AMI network to either the secondary substation via the power lines (PLC)

or towards the head-end via GPRS. When assigned to the secondary substation, the

data collected from the smart meters will be managed by a Data Concentrator or

forwarded by a Gateway towards the head-end where a Virtual Data Concentrator

would be located. As an alternative to PLC communication, GPRS can be used to

wirelessly communicate between smart meters and the virtual data concentrator.

The method proposed uses MatLab to read the users input data, such as smart

meter and power line link locations. The read data then gets compiled into a network

topology consisting of smart meter nodes and power line links between them. For

easier comparisons, the network is then clustered into groups. The clustering is done

using two methods, one fore PLC and another for GPRS.

The cluster data will then be compared with the packet loss ratio data acquired from

either simulations or other sources, stored in a repository. The comparisons, along

with various constraints set by the user, will then determine if the communication is fit

for use.

If deemed fit for use, each type of communication have their CAPEX and OPEX costs

calculated, based on the users input. This thesis will use data acquired from DSOs

(Distribution System Operators) to analyze the costs of four network locations and

determine the cheapest assignments for each one for various settings.

II

Sammanfattning

Smart Grids, eller smarta elnät, är framtidens moderna elektriska distributionsnät.

Smarta elnät tillåter många tillämpningar som automatisering, tillförlitlighet och

effektiv elektrisk distribution. En nyckelegenskap hos de smarta näten är funktionen

AMI (Advanced Metering Infrastructure), ett system som samlar ihop, mäter och

analyserar energiförbrukning.

När man designar ett låg- eller mellanspännings AMI elnät kan man utnyttja ett

flertal mediatyper, två av dem är PLC (Power Line Communication) eller GPRS (General

Packet Radio Service). Att välja den ena framför den andra kan leda till olikheter både

när det gäller kommunikationsprestanda och den ekonomiska kostnaden.

Detta examensarbete beskriver en metod för att på ett optimalt sätt tilldela smarta

elmätare i ett AMI elnätverk till antingen det sekundära ställverket via de elektriska

ledningarna (PLC) eller till huvudcentralen (head-end) trådlöst via GPRS. När mätarna

tilldelas det sekundära ställverket samlas data från dem ihop av antingen en Data

Concentrator (DC) eller en Gateway (GW) . En Gateway skickar därefter datan vidare

till huvudcentralen där en Virtual Data Concentrator (VDC) finns. Alternativt utnyttjas

GPRS för att trådlöst kommunicera mellan de smarta elmätarna och VDC.

Metoden som föreslås utnyttjar MATLAB för att avläsa användarens indata, som till

exempel positioner till de smarta elmtarna och de elektriska elledningarna som länkar

dem med det sekundära ställverket. Den avlästa indatan används till att bilda en

elnätverkstopologi bestående av smarta elmätare och elledningarna. För att enklare

jämföra de skapade topologierna med simulerade generella topologier, grupperas

nätverket i kluster. Klustermetoden skiljer sig mellan PLC och GPRS.

Klusterdatan används sedan för att jämföra packet loss ratio erhållen från antingen

simuleringar eller från andra källor. Jämförelserna avgör sedan huruvida

kommunikationen med sagd mediatyp upprätthåller användarens

kommunikationskrav.

Om kommunikationsmöjligheterna för topologin anses vara tillräcklig, kommer varje

kommunikationstyp att få sin kostnad uträknad utifrån CAPEX och OPEX. I detta

examensarbete utnyttjas kostnader och topologidata erhållen från DSO’s (Distribution

System Operators) för att analysera kostnaderna för fyra elnätverk och avgöra de

billigaste tilldelningarna för vardera utifrån olika insättningar.

III

Acknowledgement

I would offer my greatest gratitude to my master thesis supervisor Mikel Armendariz,

who has given me invaluable help during my project.

I would also like to thank the ICS department at KTH for providing me with an

interesting project to work with, and being very helpful throughout.

I would like to thank the DISCERN project and in particular UFD (Union Fenosa

Distribucion) for providing me with comments and the electrical grids to use for the

thesis.

My thanks also go to Fariba Aalamifar (University of British Columbia), Yunta Huete

Angel (UFD), Miguel Garcia Lobo (Gas Natural Fenosa Engineering) and prof. Ljiljana

Trajkovic (Simon Fraser University).

1

Contents Chapter 1 Introduction ......................................................................................................................... 1

Chapter 2 Background .......................................................................................................................... 2

2.1 The DISCERN Project ............................................................................................................... 2

2.2 Power Line Communication .................................................................................................... 2

2.2.1 The Electrical Grid and the PLC Channel ......................................................................... 2

2.2.2 PLC Standards and Protocols ........................................................................................... 4

2.2.3 PLC Simulation Tools ....................................................................................................... 4

2.3 General Packet Radio Service .................................................................................................. 5

2.3.1 GPRS/GSM Network ........................................................................................................ 5

2.3.2 GPRS Simulation Tools ..................................................................................................... 8

2.4 Thesis Goals and Objectives .................................................................................................... 8

Chapter 3 Method .............................................................................................................................. 10

3.1 The Model ............................................................................................................................. 10

3.2 Input Data .............................................................................................................................. 10

3.3 Topology Creation ................................................................................................................. 11

3.4 Smart Meter Clustering ......................................................................................................... 16

3.4.1 PLC Clustering ................................................................................................................ 16

3.4.2 GPRS Clustering ............................................................................................................. 18

3.5 Simulation Data Comparisons ............................................................................................... 21

3.6 Cost Function ......................................................................................................................... 21

3.6.1 CAPEX and OPEX ............................................................................................................ 21

3.6.2 Combinatory Cost Function ........................................................................................... 23

3.7 Optimization Problem ........................................................................................................... 23

Chapter 4 Simulations ........................................................................................................................ 26

4.1 PLC Simulations ..................................................................................................................... 26

4.1.1 Channel .......................................................................................................................... 26

4.1.2 Nodes and Net-devices .................................................................................................. 27

4.1.3 Protocols and Applications ............................................................................................ 29

4.1.4 PLC Simulation Results and Discussion.......................................................................... 30

4.2 GPRS Simulations ............................................................................................................... 31

Chapter 5 Theory ................................................................................................................................ 34

5.1 K-means Clustering ................................................................................................................ 34

Chapter 6 Discussion .......................................................................................................................... 37

6.1 Results ................................................................................................................................... 37

6.2 Future Work .......................................................................................................................... 41

References ............................................................................................................................................. 42

1

Chapter 1 Introduction

With increasing use of power around the world, the current electrical grid is testing its

limitations. By implementing a Smart Grid, the utilities in an electrical grid can

communicate and cooperate to improve operations, increase power distribution

efficiency, and better utilize renewable energy sources and various automation

functions.

The use of smart grids is increasing, but there are many standards out there, which all

seem to have their own advantages and disadvantages. Some of these standards are

bound by location, as the electrical distribution and various local laws, has differences.

As for now, until a global standard is established, one needs to choose what fits based

on media and locality.

The key factor of having a successful smart grid is the two-way communication

between a smart utility meter and a utility company. AMI (Advanced metering

infrastructure) is the system which collect, measures and analyzes energy usage. The

customers use a smart meter to measure its energy consumption, and sends data to the

energy supplier which in turn can register it. The two-way communication makes it

possible for the consumer to decrease costs due to demand response (electricity price

changes due to for example the peak load).

For the smart meters to communicate with the energy supplier, or the AMI head-end,

it will need a medium. There are several alternatives to which media to use, but they

can all be categorized into either wired or wireless. This thesis focuses on Power Line

Communication (PLC) and General Packet Radio Service (GPRS).

2

Chapter 2 Background

2.1 The DISCERN Project

The DISCERN project (Distributed Intelligence for Cost-effective and Reliable

Solutions) is a collective effort where DSO‟s (Distribution System Operators), research

institutions and technology providers share information regarding control and

monitoring of distribution networks.

The purpose of the project[14]

is to improve the understanding of the complex LV/MV

distribution network, the economic viability and ensuring higher standards of security

and reliability. DISCERN‟s objective is to enhance the European distribution grids and

provide DSOs with tested and validated solutions.

This thesis aim to contribute to the project by analyzing power line communication

within the LV/MV distribution network, with the help of some contributed network

topologies from a DSO. It fits within the work package WP6.

2.2 Power Line Communication

2.2.1 The Electrical Grid and the PLC Channel

The idea of communication through the electrical grid is not new, but it is not until

late the technique has been developed enough for practical use in a larger scale. The

power lines themselves were originally meant to be used exclusively for power

distribution, and use the 50-60 Hz frequency as standard.

Because of the power wire circuits being adapted for normal AC power usage, they

have a limited capability to utilize higher frequencies. Also, apart from the limitations

from the power line cables themselves, each country or region have their own laws

dictating which frequencies are available for use. The transmissions need to take place

within the license-free frequency bands, which spans between 3-148 kHz[17]

(the

narrowband) and 2-30 MHz[18]

(the broadband). The reason for this regulation is

because PLC is regarded as an unshielded transmission, sharing the same frequencies as

radio.

Higher frequencies allow for higher data rates, but at the cost of range. Usually PLC

broadband is used for Local Area Networks (LAN), for example in a building. To

access the Internet though, it will need to use a router and another medium such as

Ethernet.

3

Lower frequencies such as the narrowband mentioned only offer low data rates, but it

allows for further transmissions within the LV/MV (low-voltage/medium-voltage)

electrical grid.

Between the two alternatives, the lower frequency range is most suitable for AMI

communication between smart meter and data concentrator. In EU, within this

narrowband frequency range, only the 3-95 kHz range is reserved for utility

applications. This narrowband frequency is called the CENELEC A[5]

. The PRIME

(PoweRline Intelligent Metering Evolution) protocol uses the higher end of this

spectrum, 42-89 kHz[19]

.

While there is no official standard for PLC communication, PRIME is currently the

prevalent in the EU. An alternative to PRIME is the G3[20]

protocol and they both have

their ups and downs. In the DISCERN Project framework, and therefore in this thesis,

PRIME is being used.

As the power lines were not designed for data transmission they make a harsh

environment for communication. The distribution grid has the features of a complex

network, which not only is dependent on frequency, but also time, location and noise.

This makes PLC a difficult communication medium to generalize, because the main

factors affecting PLC communication are the high signal attenuation and noise levels.

This is especially true with low frequency bands, where examples of noise would be:

Continuous background noise; both time-variant (changes with line-voltage)

and time-invariant (constant for a long period, from for example thermal noise).

Narrowband noise, from for example broadcast stations.

Impulsive noise, consisting of abrupt impulses with short duration but high

amplitude. Either synchronous (light dimmers) or asynchronous (switching

regulators) to AC line voltage.

Mostly, the noise is caused by devices connected to the same power line, but it may

also be caused by nearby sources not directly connected to the power line.

Apart from the noise, signal attenuation is also a problem for PLC communication.

First there is the most obvious cause; the line loss from the power lines. As with every

media type, the further a carrier signal traverse, the weaker it becomes. This effect can

be prevented by using repeaters, either strategically placed along the network or using

the smart meters as repeaters. If the signal becomes too weak it may be drowned by the

noise. Another cause of signal attenuation is the impedances of all the loads connected

to the power line.

There are other causes of signal attenuation such as passing transformers (from LV to

MV for example) and multipath propagation (causing reflection). Keeping the

communication in the LV network (the transformer is likely in the secondary

substation, where the data concentrator is located), and operating in the lower frequency

band, this is prevented. There is, however, always a probability of impedance mismatch

occurring in the power line branching points (where the power lines are extended or

forking).

4

2.2.2 PLC Standards and Protocols

As mentioned, there are no standards set for PLC. Which protocols to consider depends

largely on what PLC is used for and where. It does not help that the Smart Grid itself, is

not standardized. To narrow down the choices, there are two rivaling PLC standards

used in the EU today; PRIME and G3. Both are developed to be able to optimize PLC

communication within the narrowband, and are sets of protocols. Generally one can say

that PRIME is designed for low voltage lines with low noise, while G3 is designed for

medium voltage lines. Which of these protocols will “win” may be a political choice,

but for this thesis the parameters used shall be consistent with PRIME.

PRIME is based on Orthogonal Frequency Division Multiplexing (OFDM) and

adaptively use three modulation schemes (DBPSK, DQPSK and D8PSK), with or

without FEC. It also has a MAC layer and an IP layer. The MAC layer has CSMA/CA

(Carrier Sense Multiple Access with Collision Avoidance) and ARQ (Automatic Repeat

reQuest).

The PRIME technology, being made for AMI, has defined Base Nodes and Service

Nodes. The Base node is the data concentrator, which is normally connected at the

secondary substation. The Service Nodes are the nodes serving the base node, which in

this case would be the smart meters. A Service Node is either „disconnected’, a

„terminal’ or a ‘switch’. When a Service Node is a terminal, it is registered in the

network and is ready to communicate with the Base Node. If the Service Node is a

switch, it has the additional property as a repeater, and communicates with other

Service Nodes. It has been shown that the availability of the Service Nodes are very

good, though there may be disconnects because of noise variations (during the

simulations, “worst case” noise will be added).

The state of the Service Nodes depends on the network conditions (such as noise and

attenuation), making it dynamic. During the simulations, discussed in later chapters,

each smart meter node is considered to be in a switch state.

Advantages of PLC

PLC infrastructure already in place, cheaper implementation.

No third party communication supplier required, from an electric company

perspective.

Good enough bandwidth for AMI.

Disadvantages of PLC

Technically challenging transmission medium, which is noisy and difficult to

model.

Not viable for applications in need of higher bandwidth.

2.2.3 PLC Simulation Tools

The simulation tool chosen for the task is NS-3 (Network Simulator 3), with the help of

an externally made module[4]

. NS-3 by itself is a flexible tool, consisting of a library of

modules, but does not by default support PLC.

5

By writing the program in C++ or Python, the programmer can include (import) the

modules needed to create a network of communicating nodes, operating within a PLC

channel.

A more detailed description of both the program itself and how it is used is described

in chapter 4.1.

2.3 General Packet Radio Service

GPRS (General Packet Radio Service)

[21] is a mobile packet oriented data service, using

the GSM network. GSM/GPRS is a widely available cellular communication system

which is mostly used for mobile phones (also known as 2.5G). The GPRS throughput

and latency depends on how many users are sharing the service.

2.3.1 GPRS/GSM Network

In a cellular system, antennas are installed in a grid of regular shaped cells (for example

hexagons) to cover the area (fig 1). The antennas in the GSM/GPRS network are called

BTS (Base Transceiver Stations), and forwards packets from the Mobile Stations (MS,

in this case smart meters).

fig 1. architecture of a cellular network

[3], such as GSM

6

fig 2. GPRS architecture

[8].

The packets received by a BTS is collected by BSC (Base Station Controller), both of

these are commonly referred to as BSS (Base Station Subsystem). The BSC then

forwards the signal it received to either the Mobile Switching Center (MSC), if it is a

standard GSM phone call, or the Serving GPRS Support Node (SGSN) if it is GPRS

data. For this function to work, the BSC will need a Packet Control Unit (PCU), which

either is an additional hardware router or incorporated into the BSC.

SGSN authenticates the source and collects charging information, it can also be

viewed as a gateway to services within the local GPRS network. From the SGSN the

packets will be forwarded to a Gateway GPRS Support Node (GGSN) which is a router

working as an interface to external networks such as the internet or other GPRS

networks. The GGSN can also act as a packet filter and collect tariff information from

the external network if needed.

When reaching the BSC/PCU, the phone calls are normally more prioritized than

GPRS data transmissions. This is because the phone calls are much more tolling (due to

continuous activity during calls) on the network. Therefore the GPRS block rate

increases with increasing GSM phone call activity, thus increasing the overall packet

loss.

From the outside, the MS communicates via a standard IP and reaches the end-point

via IP. Inside the GPRS network though, it uses an IP-based GPRS tunneling protocol.

Inside the GPRS network, between the MS and the SGSN, Sub-Network Dependent

Convergence Protocol (SNDCP) and Logical Link Control (LLC) are used. The main

function of SNDCP is to[9]

:

Compress and decompress user data and protocol control information.

Packet Data Protocol multiplexing (saving bandwidth).

Segmentation of N-PDU‟s (network protocol data units) into LL-PDU‟s (LLC

protocol data units), and also re-assembly from LL-PDU into N-PDU.

Once the compression and data unit conversions have been done by SNDCP, LLC is

used as an interface between the network layer (such as the now compressed IP) and the

link layer (MAC). LLC offers encryption within the same network, and is renewed

when reaching new external networks.

7

fig 3. layout of the layer 3 GPRS tunneling protocol

[8].

The Quality of Service (QoS) offered by GPRS is separated into

Service Precedence; a three level priority system categorized as High, Normal

or Low. This is used to prioritize packet transmissions during congestion, where

low priority packets are discarded.

Reliability; defines maximum values of packet loss, duplication and corruption

of packets.

Delay; end-to-end transmission time, including all delays within the GPRS

network.

Throughput; this usually depends on the agreement between customer and

supplier, but the billing is usually done per packet sent.

When it comes to the billing, the GPR supplier usually base the tariff on the data

volume or packets sent. But there are other alternative agreements between customer

and supplier which can be made.

Volume; payment is based on the data volume sent, and is proportional to the

number of smart meters.

Duration; A given timeslot for the specific AMI data-polling, where a high

priority is given. The length of the duration would be proportional to the

number of data polls, which is directly proportional to the number of smart

meter devices.

Flat rate; A fixed monthly fee to allow for either unlimited, or a specific

maximum, data volume.

Location; the location of the AMI network may have an impact on price.

Time; price may change depending on the time of day, when the network load

is different.

Quality of Service; costs may be modified depending on the required priority,

delay or throughput.

8

2.3.2 GPRS Simulation Tools

The method proposed allows one to utilize any simulation results acquired from any

source, as long as it is provided in the correct input format. Because of the great variety

of GPRS networks, it can be difficult to setup an accurate simulation to satisfy most

smart meter topologies put into the clustering method.

The first issue is that while the smart meters are stationary, the GPRS network shares

its buffer with mobile users traveling between the cells. The number of mobile users is

dependent on the location, and is more common in urban areas than rural areas.

Normally the service provider prioritizes GSM (calls) over GPRS for time slot

allocation, but this is again is dependent on the service provider and the level of QoS.

The second issue to address is whether the clustered groups of smart meters resides

within the same cell or divided across multiple cells. Even within the same cell, signals

may be picked up by other base stations due to signal strength or if the BTS is busy

with GSM calls. The easiest way to accommodate this issue is either to have specific

information regarding a specific GPRS network, or work under the assumption that the

clusters share the same cell without the smart meters changing BTS.

Depending on the service provider, the queuing process may utilize different queuing

schemes such as various versions of FIFO (First In First Out), FCFS (First Come First

Served), EDF (Earliest Deadline First) and SJF (Shortest Job First) amongst others. In

the case of buffer overflow, incoming packets will be dropped and considered lost.

2.4 Thesis Goals and Objectives

The two media alternatives used in this thesis are PLC (Power Line Communication)

and GPRS (General Packet Radio Service). The main reasons for using these are the

availability and price. In the case of PLC, the infrastructure already exists and for

GPRS it uses the widespread GSM network. Both of these provide low data rates,

which for AMI purposes is adequate.

The purpose of this thesis is to analyze smart-meter to data concentrator

communication in LV/MV electrical grids. The communication should follow the

PRIME protocol, which operates under the CENELEC A standard.

Specifically, the goal is to optimally assign the smart meters in a network to either a

Data Concentrator located at the secondary substation, a Gateway which also is

located at the secondary substation, or directly to a Virtual Data Concentrator located

at the head-end. The first and the second option both use the LV/MV PLC network for

communication, while the third is using wireless GPRS communication.

9

Table 1. List of scenarios to be compared.

The goal is to develop a method to have any PLC network topology inserted into a

MatLab program, and get an output of smart-meter assignments. The assignments are

based on both an economic standpoint and on communication availability. The

economic factors are based on CAPEX and OPEX, while the communication

availability is derived from PLC and GPRS simulations.

10

Chapter 3 Method

3.1 The Model

To assign smart meters to a data concentrator or a virtual data concentrator, several

tasks needs to be done. At the highest level, one inputs topology data into a MatLab

program and gets an output of assigned smart meters. At a lower level, one could

recognize simulations written in NS3/C++. To get an overview, one could look at the

model in figure x.

fig 3. Overview model of the method of assigning smart meters.

In total there are three simulation tools used to reach the results, MatLab, Riverbed

Network Management (formerly OPNET) and NS-3. In addition to this, a program such

as Excel should be used to create an input data file.

3.2 Input Data

The main purpose for the MatLab program is to compile various input data and use it to

get an output, the smart meter assignments. There are two types of inputs going into the

MatLab program, user data input and program data input.

The user data input consists of LV grid network topology data and various user

options. This data is presented as an Excel-spreadsheet with a specific layout, so that

the MatLab program can identify and read the correct fields of data. There are two

different files for the topology, one for the households and one for the cable links. The

households, which can be seen as smart meter hubs, are identified by position,

identifying code, network area name, and finally the allocated power. The Cable links

are similarly identified, but here there are two positions, one for both ends of a cable

link. In addition to these, each network area should have the position their Secondary

Substation, so that the feeders can later be identified. Other user inputs, such as various

output options and CAPEX/OPEX costs, are inserted into the MatLab program itself.

11

The program data input is the collective name of the data inserted into the MatLab

program from other programs. This consists primarily of the simulation data matrix

created from NS-3 and Riverbed Network Management. While this simulation data can

be edited by a user, it is not intended and should only be done if better simulation

results are achieved.

The MatLab program is intended to be able to read any network topology, as long as

it is within the simulation parameters (distance to secondary substation, number of

smart meters). If a household is located further away from the secondary substation

than it has been simulated for, it will be considered „too far away‟ for communication.

3.3 Topology Creation

To create a network topology, MatLab will read the presented input data. Once it is

read, it will create household nodes and link nodes.

The household nodes (HH-nodes) are positioned according to their X and Y position,

and are attributed with their allocated power. The allocated power varies greatly from

one HH-node to another, depending on the household‟s occupancy and power

consumption. Here we approximate that each customer has similar power consumption,

which indirectly tells how many customers there are per household.

Approximate number of smart meters of network j Nj.

Total allocated power of network j Pss,j.

Power per Meter of network j Psm = Pss / Nj.

Number of smart meters of household i Nhh,i ≈ Nj / Psm.

Having the exact number of smart meters and their position would be preferable and

yield a more exact model of the actual network. As it is now, the number of smart

meters deployed is proportional to the allocated power, which brings a density weight

to each household node. This way, the exact number of smart meters is not important,

as long as it is sufficiently high to provide accuracy to the weighs. For the size of the

specific network samples provided in this thesis, around 150 smart meters are deployed

per sub-network. The number of smart meter nodes deployed at each household node is

rounded upwards, which most often makes the number above 150.

12

Fig 4a. Model of Network 28CZ15with active household identification codes and the number of

allocated smart meters.

13

Fig 4b. Model of Network 28CGD8 with active household identification codes and the number

of allocated smart meters.

14

fig 4c. Model of Network 28CC6 with active household identification codes and the number of


15

fig 4d. Model of Network 28CC15 with active household identification codes and the number of


16

To be able to calculate the cable distance from each household node and their

assigned smart meters, they need to be assigned to a nearby link node. A straight cable

line is represented by two end points; each of those is categorized as a link node. From

the user input data the MatLab program link the end points by adding them into a

matrix. Also, the point-to-point distance is calculated between the nodes and added to

the matrix. Furthermore, to connect the cable lines, link nodes which are on top of each

other are linked with a distance of zero. Each household node is assigned to the link

node which is closest, with a margin of 1 meter.

Having assigned each household to a link node, and having the distance calculated

between all connecting links, the shortest path to the secondary substation is calculated

using Dijkstra‟s algorithm[1]

. The worst case computational complexity of Dijkstra‟s

algorithm is O(|E|+|V|log|V|), where V is the number of vertices/nodes and E the

number of edges/links.

Also part of the topology is the feeder nodes. The feeder nodes are link nodes which

begins at the secondary substation and branches outwards. The difference between the

feeder nodes and the link nodes is that the feeder nodes are not linked if they are on top

of each other. This makes it that the power lines can be separated by feeder, which is

important during the PLC clustering.

3.4 Smart Meter Clustering

When analyzing the communication capabilities of a smart meter, one needs to consider

its surrounding environment. Depending on the medium, the environment is different.

3.4.1 PLC Clustering

In PLC the smart meters are communicating through the electrical grid towards the

secondary substation, where there is either a data concentrator or a gateway. A smart

meter shares its medium with other smart meters in the same feeder, with various

distances from each other. Apart from the channel characteristics, such as background

noise levels and frequency band, the main concerns regarding the communication

quality are channel congestion and signal attenuation. The channel congestion is caused

by additional traffic from other communication devices. The signal attenuation is

mainly caused by channel noise and cable distance. To accommodate these two issues

the distance from the smart meter to the secondary substation, as well as the density of

proximate smart meters, are investigated. To find the distance and density of the smart

meters in a network, one can cluster the data points representing the smart meter

positions.

There are several clustering methods which can be used for grouping the smart

meters. Going by the positions alone can cause two smart meters to be considered

sharing the same media even though they are not, so the first step would be to separate

the set of smart meters into feeder clusters. The number of feeder clusters corresponds

with the number of feeder nodes which are populated with smart meters. The smart

meters gets assigned to a feeder cluster by looking for a path from a smart meter to all

feeder nodes, if a path exists it is the feeder it belongs to. There are cases where there

17

are paths from one smart meter to several feeder nodes; most often the cause is when

the input data puts a link node of one feeder on top of a link node of another feeder. In

this case the shortest path is used, but the consequence is that both feeders can be

considered to be of the same feeder cluster, which may or may not be intended. To

prevent this, the link cable edges should be placed more than one meter apart.

After the network is divided into feeder clusters, the smart meters are still scattered.

To divide the smart meters into groups by proximity, K-means clustering is used. There

are several possible clustering methods to use, such as hierarchical-clustering,

distribution-based clustering and density-based clustering. K-means, which is a centroid

based clustering method, is chosen for its simplicity and general usage.

One of the prerequisites of using K-means is determining the variable K. This can be

done by several methods, such as the elbow method or the silhouette method, where it

can be done case-by-case. In this thesis, the determination of K has been done by

experimenting with different K values. There are a few things to think about when

choosing K:

K must be equal or less than the number of smart meters.

Too small K results in smart meters too far away from the centroid.

Too large K can result in empty clusters or clusters splitting into more parts,

being too close to each other

The MatLab program automatically reduces the chosen K until the first requirement is

fulfilled, so the challenge lies in the two other points. One suggestion is to use a “rule of

thumb”[2]

which would be

K ≈ √ (f 3.4.1-1)

with n as the number of objects (smart meters or households).

When analyzing the congestion, one needs to consider the other clusters sharing the

same feeder. The number of populated clusters in a feeder is hereby called Q, and is

used during the simulations. Q can range from zero, when there are no smart meters in a

feeder, to a maximum of K.

0 ≤ Q ≤ K (f 3.4.1-2)

As the simulations are done independently from the MatLab program, K must be set

to a maximum of Q. If K is above Q, then the program will try to reach out of bounds

and exit the program. This is normally a problem if feeders get merged, and an

unusually large n gets into (f1). This can be accommodated for by either capping K or

increase the simulated Q. As the merging should be an anomaly, the program chooses

to cap K.

(f 3.4.1-3)

where Qsim is the largest Q that has been simulated.

18

For the network samples provided for the thesis, Qsim is set to 3, which also set the K

in K-means to 3. This K will be producing a maximum of three clusters within each

feeder cluster, depending on the data set. Also, when determining Q, the size of the

populated clusters must be of similar size to be considered. This is to very small

clusters to be considered larger than they are in the simulation, and vice versa. If a PLC

cluster is about the same size as the other two clusters combined, Q will be reduced

from 3 to 2.

When all smart meters and households has been put into PLC clusters, each meter

will have their density, distance and Q attributes set to the average of its cluster. These

attributes will be used for comparing with the repository PLC simulation results to see

if the communication quality is up to par.

fig 5. PLC clusters of Network 28CZ15 with K = 3.

3.4.2 GPRS Clustering

When communicating via a wireless medium such as GPRS, there is no need to

consider the electrical grids cable lines. Here, the smart meters communicates point-to-

point with a nearby BTS (Base Transceiver Station), which in turn forwards the data

towards the head-end where a virtual data concentrator is operating.

Apart from the GSM channel characteristics such as noise, the smart meter

communication is susceptible to congestion from other communication devices sharing

19

the same channel, especially when close. Since GPRS is using a cellular GSM network,

some of the smart meters might be considered to be in other cells than other smart

meters in other parts of the grid.

Because of this, K-means clustering can be used to group each smart meter data

points. With K-means a centroid is created from the set of smart meter data points. A

centroid is the mean position of all the points (smart meters or households). When

speaking of smart meters the centroid represents the mean position, when speaking of

households the centroid could be looked at as a center of mass with the allocated power

as weights. The centroid, created for each K-means cluster, is used to calculate the

mean point-to-point distance from the cluster to the BTS. The distance affects signal

attenuation, and is a variable used for the repository GPRS simulation results together

with the density of the clusters smart meters.

When determining K for GPRS K-means clustering, one could use the “rule of

thumb” discussed in the PLC clustering section. However, since each smart meter in the

same household shares its location, the number of households will be used in the (f1)

formula.

K ≈ √ (f. 3.4.2-1)

where n is the number of active households.

Below are examples from the four networks provided for the thesis, the calculated K

for each one is rounded upwards.

28CCH6 19 households K ≈ √ ≈ 3.08 ≈ 4

28CCI5 34 households K ≈ √ ≈ 4.12 ≈ 5

28CGD8 15 households K ≈ √ ≈ 2.73 ≈ 3

28CZ15 61 households K ≈ √ ≈ 5.52 ≈ 6

After determining K, the number of smart meters of the household nodes will be

considered so that the centroid gets positioned based on density of smart meters.

Figure 6 shows the plot of the 28CCI5 network, where K was calculated to be 5. As it

can be seen, there are only 4 clusters created because of the positions of the data points.

20

figure 6. GPRS clusters of Network 28CCI5, K = 5.

21

3.5 Simulation Data Comparisons

One of the reasons for clustering the smart meters is to simplify an otherwise complex

topology into entities consisting of Density (D), Distance (L) and number of adjacent

clusters (Q). These three values can then be compared to a repository matrix of

simulation results, to acquire the corresponding packet drop rate of the cluster.

Q = i Distance L

Density

D

P11 P12 P13

P21 P22 P23

P31 P32 P33

P41 P42 P43

Table 2. an example of a small repository of packet losses PLD.

If the packet drop rate of the cluster is above the threshold set by the user data input,

the specific option is considered to be too poor for communication.

SLDQ =

(f 3.5-1)

This factor is added to the smart meters cost function, so that the option will never be

chosen as the least cost option. If all alternatives are infinites, then the smart meter is

set to inactive, and assigned to neither a data concentrator nor a virtual data

concentrator.

3.6 Cost Function

3.6.1 CAPEX and OPEX

The cost of assigning a smart meter is different depending on the scenario and the

CAPEX and OPEX user input values.

CAPEX (Capital expenditure) are the costs of buying fixed assets or adding costs to

existing fixed assets. This category includes costs such as deployment of data

concentrators, smart meters, infrastructure upgrades and hardware upgrades. One could

also use the term transitional CAPEX, which is the cost difference between the old

system and the new. For a company in possession of power-lines, the transitional

CAPEX would be zero in regards of PLC cabling purchases.

OPEX (Operating expense) are the ongoing costs to uphold the network and its

communication. Examples of this would be maintenance, tariff and operation costs.

In this thesis focus on three different scenarios (Table 1), or communication topology

setups. Each of these scenarios has different sets of CAPEX and OPEX cost-values to

22

be set as input variables for the cost function (chapter 3.6.2) or TCO (Total Cost of

Ownership).

The first scenario, where the smart meters communicate to a data concentrator via

PLC, may have the following costs to assess.

Scenario 1

CAPEX OPEX

Deployment of a Data Concentrator at the

secondary substation.

PLC infrastructure maintenance.

Deployment of a smart meter for each

customer.

Data concentrator and smart meter

hardware maintenance.

PLC infrastructure upgrades.

Data Gathering & Com. analysis tools Maintenance/development of the tools

Virtual Data Concentrator Maintenance/development of VDC

The second scenario would be similar to the first, but with a smaller CAPEX cost

because of the cheaper gateway instead of a data concentrator at the secondary

substation.

Scenario 2

CAPEX OPEX

Deployment of a Gateway at the

secondary substation.

PLC infrastructure maintenance.


customer.

Gateway and smart meter hardware

maintenance.

PLC infrastructure upgrades.

Data Gathering & Com. analysis tools Maintenance/development of the tools

Virtual Data Concentrator Maintenance/development of VDC

In the third scenario the communication devices bypass the PLC infrastructure by

communicating directly to a Virtual Data Concentrator via GPRS provided by a third

party supplier. The general difference here from the other scenarios is a lower CAPEX

but a higher OPEX. The party offering the GSM/GPRS network access may have

different payment plans, but here we consider either a fixed fee per data message or

fixed annual fee for sufficient communication capabilities.

Scenario 3

CAPEX OPEX


customer.

Smart meter hardware maintenance.

Virtual Data Concentrator setup. GSM/GPRS service provider tariff.

23

For the thesis, the CAPEX and OPEX used has been provided by a DSO. Without

disclosing the actual numbers, the costs included in the CAPEX/OPEX calculations are

as follows.

DC (PLC) GW (PLC) GPRS

CAPEX OPEX CAPEX OPEX CAPEX OPEX SWITCH X - X - X - MV Supervisor X - X - X - SCADA + DMS X X X X X X GCT X X X X - - MDMS X - X - X - Smart meter X X X X X X Virtual Data Concentrator - - X X X X Power Analysis Tool X X X X - - Network Information System

- - - - - -

Table 3. The CAPEX/OPEX data parameters used in the case study.

Furthermore, each CAPEX and OPEX cost are divided by per smart meter and per

substation. For this thesis, the „CAPEX per substation‟ cost of scenario 2 (gateway) will

be set to 30% of that of scenario 1. This being said, in the case study the largest cost is

not he „CAPEX per substation‟ but the „CAPEX per smart meter‟, so the difference in

cost is not great even with the additional CAPEX and OPEX from „Virtual Data

Concentrator‟.

3.6.2 Combinatory Cost Function

The cost function is the combinatory costs of CAPEX, OPEX and the Loss-factor.

Cij = CAPEXij + OPEXij + Sij (f 3.6-1)

CAPEXij ≥ 0 (f 3.6-2)

OPEXij ≥ 0 (f 3.6-3)

where i is the smart meter and j the scenario (Table 1). CAPEXij and OPEXij are the

sums of costs according to (Table 3) and variations thereof depending on the user.

As it can be seen, the loss factor S decides if the cost C will be infinite or not

according to (f 3.5-1).

3.7 Optimization Problem

The task to optimally assign the smart meters in the smart grid AMI network can be

described as an Assignment Problem. An assignment problem is, in general terms, the

description of a problem where the task is to find a maximum/minimum weight

matching in a bipartite graph. A weighted bipartite graph is a graph where each edge

has an assigned value (weight), which in this case is the combinatorial cost from (f 3.6-

1).

24

fig 7. Assignment Problem, agents to the left assigned to tasks to the right.

In an Assignment Problem, it is optimal to assign agents (in this case the clusters) to

tasks (in this case DC, GW or GPRS) while minimizing the cost cij. A semi-assignment

problem normally deals with optimally assigning m tasks to n agents so that each task is

assigned to one agent. By relaxing the semi-assignment problem, the later constraint is

lifted, allowing for more than one agent being assigned to a task. Also, this will allow

for a task not being assigned an agent. Mathematically, this relaxed linear semi-

assignment problem[7]

can be defined as followed.

∑∑

(f 3.7-1)

Subject to

∑

(f 3.7-2)

where

(f 3.7-3)

(f 3.6-1)

(f 3.7-4)

Smart meter i is within the set of smart meters I, which is assigned to task j within the

set of tasks/scenarios J. Formula 3.7-2 together with 3.7-3 says that a cluster can only

be assigned to one task. By removing the original constraint (from the semi-assignment

problem) of

∑

it is possible for a task (scenario) to have more than one agent (smart meter) assigned to

itself.

The relaxed linear semi-assignment problem is a special case of the transportation

problem, which is a special case of the minimum cost flow problem, which is a special

case of Linear Programming (LP). A Linear Programming problem produces a feasible

25

region set by a series of linear constraints. The optimal assignment of smart meters is

finding the minimal possible cost within this region.

Because of the relaxations, one cannot use the Hungarian method which is often used

for combinatorial optimizations. However, since more than one agent per task is

possible, the minimum cost becomes as simple as choosing the cheapest option for each

smart meter, as the sum (f 3.7-1) would be minimized by doing this. That would not be

enough though, since there is an additional constraint implemented that says that only

one of the tasks can be used per network. This makes the practical method of finding

the minimized cost of assigning all smart meters to a scenario quite simple; sum the

costs of having all smart meters assigned to the tasks and select the cheapest option.

26

Chapter 4 Simulations

4.1 PLC Simulations

The PLC simulations are done using NS-3

[10] (Network Simulator 3). NS-3 is a discrete

event network simulator, built using C++ and Python. The default NS-3 library does not

include PLC support, so an external module named “PLC Software”[4]

is used for PLC

specific properties.

NS-3 is fundamentally a C++ object system, with node-objects sending packets via a

channel to each other. In the simulations, the smart meters and the data concentrator is

created as similar objects. The difference between the two is that the data concentrator

is the application source while one of the smart meters is the application sink.

Figure 8. A general model of an NS-3 simulation program.

4.1.1 Channel

The PLC channel is the connection between the nodes, such as PLC cables and various

environmental settings. Here, one can specify the frequency range, cable characteristics,

noise levels and various physical layer settings.

The channel is specified so that it emulates a CENELEC A PLC channel, and more

specifically within the PRIME protocol. The PRIME protocol uses the upper

CENELEC A frequency band, which ranges from 42 kHz to 95 kHz.

The noise floor, the sum of all noise sources such as background noise. Noise is

generally generated from electrical loads, which varies with location, time of day,

frequency and the distance to the noise sources. The PLC Software module has a pre-

defined “worst case” background noise based of the power spectrum, for this

configuration it is 1e-12 dBm which is used in the simulations. The noise floor affects

the signal attenuation, so the choice of noise floor will determine the distance of which

the PLC signals can traverse between repeaters.

Repeaters may either be placed as standalone devices along the PLC network, or as in

this case, as a smart meter functionality. Because of this, one can assume that each

meter of a PLC cluster has its signal repeated by the cluster‟s smart meters towards the

secondary substation. Therefore, when reaching the cluster‟s smart meter with the

27

shortest cable distance to the secondary substation, the signal is presumably repeated its

last time.

The simulation program models this by linking a point-to-point NAYY150SE PLC

cable from a data concentrator node to a link node of the distance of the smart meter

closest to the secondary substation. Propagating signals has an increasing attenuation

with length and frequency[6]

. The distances simulated ranges from 1 meter to 250

meters, as the furthest located smart meter used is less than that.

The channel uses OFDM, which primary advantage over single-carrier alternatives is

its ability to deal with harsh channel conditions such as narrowband interference and

frequency-selective fading due to multipath. OFDM splits the data into sub-carriers of

different frequencies (here it is within the upper CENELEC A band). PRIME uses

OFDM with adaptive equalization to overcome the intersymbol interference.

Intersymbol interference is when one symbol in a signal interferes with subsequent

symbols. This kind of interference is comparable to noise, and is usually caused by

multipath propagation in wireless media (a signal arrives from different paths), but in

this case most likely because of signal reflection (some signal power gets reflected back

to its origin).

To prevent signal reflection, an outlet is installed at the end of each cable line with an

impedance matched to the cables characteristic impedance. A common value for this

impedance is 50 Ω (as with radio-frequency systems) because of the long length of the

cable compared to the signal wavelength. A smaller impedance, such as 5 Ω, will

increase the packet loss ratio.

With OFDM, the data signal is modulated in parallel with BPSK (Binary phase-shift

keying). PSK (phase-shift keying) is one of three methods of conveying an analog

carrier signal as a digital data signal, the other two are either amplitude-shift keying or

frequency-shift keying. BPSK is the simplest (and most robust) form of PSK and uses

two different phases to indicate ones and zeros.

Figure 9. A BPSK signal over time.

4.1.2 Nodes and Net-devices

A node in ns-3 can be viewed as an empty computer chassis, in which a host can install

various hardware and software. To make a node being able to communicate with

another node, each node needs to have an interface to connect to the PLC channel.

A net-device can be viewed as a peripheral card acting as a medium between a

motherboard and the outgoing interface to the PLC channel. This PLC net-device

contains various link layer settings and protocols such as MAC.

To simulate an arbitrary power line network, the smart meter clusters are categorized

into groups of Density, Distance to DC and the number of these clusters sharing the

same feeder (denoted here as Q). The simulation program allows one to enter these

28

variables, and will then create a simple network representing an approximation of a

feeder with these clusters.

Since the cluster data from the MatLab program, or rather the user input data, is not

directly input into the NS-3 simulation code, every possible topology combination

cannot be foretold and simulated. Therefore each additional cluster (Q-1) is

approximated to be of similar size of the chosen smart meter‟s cluster (explained in

3.4.1).

The NS-3 program will place the smart meters (SM), Link nodes (LN) and the Data

Concentrator (DC) according to the figures below:

fig 10. Simulation placement of a [D, L, Q] = [3, 25, 1] topology.

fig 11. Simulation placement of a [D, L, Q] = [3, 25, 2] topology.

As can be seen from figure 10 and figure 11, the distance L is from the closest smart

meter in the cluster (SM-0) to the DC. The data transfer between smart meter and DC is

between these two nodes, while the others are there fill up the densities of the clusters

in the feeder.

There are one meter long links between each node to prevent errors during the

execution of the code. Because of this, large topologies will have their furthest SM-

nodes far away from the DC-node, but their signals will be repeated by nearby link-

nodes (which are placed so that they act like a shared medium) until the one closest to

the DC. The distance of the closest smart meter of the additional cluster to the shared

link node is approximately D∙(Q-1) meters, which in a [D, L, Q] = [3, 25, 2] topology

will be 6 meters. This will only add a small attenuation, but in a [D, L, Q] = [100, 25,

3] the distance from the furthest cluster to the shared link node will be 200m. While

this distance is large, and will introduce a more attenuated signal to the shared link

node, it corresponds with clusters being further away from the relevant data transfer.

Looking at figure 11, if one were to be interested in analyzing the data transfer between

SM-3 and the DC, it would be as simulating a

29

[D, L, Q] = [3, 25 + Q*D, 2] = [3, 31, 2]

topology as the relevant data transfer is by default between SM-0 and the DC. Also,

worth mentioning is that the number of assigned smart meters in the MatLab program

ranges from approximately 150-190 depending on power allocation and the user input

(one can use less number of smart meters). If the clusters are of similar size, it would

mean that the maximum distance from the closest smart meter of the third cluster has an

additional distance of:

→ Ladditional ≈ D∙(Q-1) ≈ 63∙(3 - 1) = 126m.

However, since each PLC network usually has more than one feeder, the size of each

cluster is usually small enough to not make a difference.

4.1.3 Protocols and Applications

CSMA/CA and ARQ is implemented in the module. With the nodes and the PLC

channel configured, there is a need for protocols and applications both to enable data

transmissions and to analyze the packets sent.

The PLC Software Module‟s PLC channel has already implemented ARQ and

CSMA/CA by default, which happens to be in line with the PRIME protocol. ARQ is

an error-control method used during the transmission of packets, which uses

acknowledgements and time-outs to attain more reliable data transmissions over the

unreliable power line environment. If the sender fails to receive an acknowledgement

from the receiver before the timeout, it will resend the packet. It continues to do so until

the number of maximum retransmissions has exceeded.

CSMA/CA (Carrier Sense Multiple Access with Collision Avoidance) is a multiple

access method which operates in the data link layer. CSMA/CA is mostly used for

wireless networks to prevent the hidden node problem, but is also used in PRIME due

to its ability to control shared media, which the power line network has in common with

a wireless network. With CSMA/CA activated, the nodes only transmit data when it

senses the PLC channel idle. If the channel is not idle, it will wait for a period of time

(backoff-time) before trying again.

To be able to send data, it needs to be in the form of packets. The packets are created

and sent by a traffic-generator application called “OnOffApplication”. The application

generates UDP packets with a size of 341 bytes and data rate of 21.4 kbps[5]

, for 200

seconds. The longer the application runs, the more packets can be analyzed with a

higher accuracy. Below is an example where one DC sends packets to a SM (D = L = Q

= 1).

Density Length Simulation

time Delivery rate Packet Loss

ratio Execution

time

1 1 30s 100 0 2s

1 1 60s 62 37 5s

1 1 100s 35 64 10s

1 1 200s 17 82 11s

1 1 400s 15 84 55s

1 1 800s 15 84 109s

30

Table 4. simulation results showing the difference between results based on simulation time.

Because of the vast number of simulations required to compose the repository to be

used in the MatLab program, one must weigh the packet loss accuracy to the execution

time. One can see that having a simulation time of 200s will have almost as accurate

results as having 4 times as long simulation time. The figure shows the execution times

of the smallest possible PLC network, when having a density closer to one hundred

smart meters the execution time will exponentially rise to approximately 3800 seconds

depending on computer power. This time is dependent on the computer used for the

simulations, which also dictates the maximum density/distance before a SIGKILL

occurs and terminates the simulations.

To use the application, IPv4 is installed on all smart meter nodes. PRIME supports

both IPv4 and IPv6, but NS-3 has a better support for IPv4 than IPv6, and the main

advantage of having a larger amount of addresses is not needed during the simulations.

With the packets now being sent from the DC to a smart meter, the FlowMonitor

module is implemented into the code. The FlowMonitor module‟s function is to

monitor and register packet-related events occurring at each node it is installed on.

Specifically, in this case, the FlowMonitor registers transmitted packets, received

packets, packet delay and packet loss ratio.

4.1.4 PLC Simulation Results and Discussion

The most accurate way to translate the results from the PLC simulations into a packet

loss repository would be to simulate each variation of D, L and Q. That option would,

however, not be feasible to complete within reasonable time. Depending on the D/L/Q

setup, each simulation may take between 10 seconds to an hour. The total number of

simulations, limiting it to a density of 100 per Q (beyond that is mostly 100 loss or

SIGKILL), is approximately 75,000.

Because of this, there is a need to apply a method to reduce the number of simulations

needed, a design of experiment (DoE). The deterministic nature of the simulation-

program makes alternatives which rely on variance dismissed.

Because of the small changes in distance related packet loss, usually one or two

percentage units, one can apply the following algorithm to fill the empty spaces in the

repository:

For each Q and for every fifth density variable starting from 1

Simulate the first (L = 1) and the last (L = 250)

If the results differ, simulate between the two results (L/2 = 125)

If the result differs from the result closest behind and the result closest ahead,

simulate between those two.

Repeat until there are no differences.

When there are nothing left to fill in according to the criteria above, linear interpolation

is done between the input results

( ) | |

( ) (f 2.3.4-1)

31

Then

( )

(f 2.3.4-2)

After every fifth row is filled, the other rows are assigned the row closest to itself.

This method decreases the number of simulations from approximately 75,000 to

roughly 250 depending on the results from the simulations. The method, while being

more time efficient, is still very time-consuming.

The packet loss retrieved from the simulations usually increases with length, but

slightly. The density usually dictates how long the distance can be set before reaching a

100 % packet loss. The number of clusters Q usually decreases this distance a bit

further, while also decreasing the density which can be simulated before a SIGKILL

termination. Hundred percent packet loss is first reached at [D = 75, L = 230m, Q = 1],

and remains a hundred percent from [D = 100, L = 1m, Q = 1]. These ranges vary

depending on Q, mostly because of the increased number of nodes. When unable to

simulate a topology because of SIGKILL, it will also be considered to be hundred

percent packet loss.

While both the packet delay and packet loss are available for tracking during the

simulations, the most interesting part is when we are unable to communicate. If there is

no larger interest in the throughput, unless the delay is too great to ignore, the focus lies

when the packet loss becomes 100 %. The packet loss, for this PLC channel

configuration, is consistently high and ranges between 82-88 % (unless 100%). The

packet loss is mostly dependent on the noise floor set; higher noise will increase the

packet loss because it will drown the PLC signals over the cable distance.

4.2 GPRS Simulations

Even though the majority of NS-3 users normally focus on wireless simulations, there

is no support for GPRS/GSM. Instead, in lack of simulation tools of our own, we will

use externally acquired simulation data.

For our case study, where availability is of more concern than the specific packet loss

rate of the specific topologies, we can take some liberties when selecting the data to use

for our packet loss repository. A simulation is done by [23] with variations of different

queuing schemes, arrival rates and ratios between GPRS and GSM calls. Not knowing

the GPRS service provided, it will be assumed that the smart meter communication will

share the same channels as both other GPRS traffic and prioritized GSM voice calls.

This dynamic schema is in contrast to a static schema, in which there is a static division

between GSM and GPRS. In the latter, the GPRS traffic (smart meter communication

included) would still be buffered and prioritized according to the level of QoS provided.

There are various scenarios depicted in [23], which are variations between different

queuing schemes and ratio between GSM (session) and GPRS (packet) traffic. We‟ll

32

use the data from their results using a FIFO queue and 20% packet traffic, which gives

an approximate of

( ( ))

( ( ))

(f4.2-1)

The data is then put into the GPRS packet loss repository, which will be used along

the PLC packet loss repository. When comparing the overall packet loss of GPRS with

that of PLC, one can see that it is much lower.

33

fig 12. Simulation results for PLC and GPRS

34

Chapter 5 Theory

5.1 K-means Clustering

K-means clustering is a popular prototype based method of partitioning a group of data

points into smaller groups. The definition of the problem is to partition n observations

into k mutually exclusive clusters, where k is set by the user. Each observation (data

point) is assigned to the cluster with the closest mean distance to the cluster‟s centroid.

The end result is a Voronoi cell diagram, where each observation is assigned to a cell.

fig 13. Voronoi diagram of 10 cells, each with a centroid.

The problem is defined as finding the global optimum of the objective function

∑∑

(f 5.1-1)

for a given set of observations x = X1, X2,…, Xn, where S = S1, S2,…, Sk are the

clusters. As µi is the mean distance point of the data points assigned to cluster i, one can

see that the objective function is minimized when all data points are as close to the

mean µi as possible. Note that the number of clusters k cannot be less than the number

of observations n or (k ≤ n).

Normally the mean distance is the Euclidian Distance, defined as:

√∑( )

(f 5.1-2)

The Euclidian Distance is also known as the line segment connecting point a and b,

or ab . An alternative distance type is the Manhattan distance, or city-block distance, is

the sum of distances along each dimension. One could see the city-block distance as

walking along a city road, while the Euclidian distance is as the crow flies.

Also non-Euclidian metrics such as Correlation can be used between the points (the

mean and the observation data point).

Error! Bookmark not defined.

( )

( ) ( )

(f 5.1-3)

35

Where the covariance is

( )

∑( ) ( )

(f 5.1-4)

And the standard deviation

( ) √

∑( )

(f 5.1-5)

where

∑

(f 5.1-6)

As the centroids positions are not defined, and needs to be tested before the

minimized solution of the objective function is found, it is clear that the problem is NP-

hard. With the dimension d and number of clusters k as constants, the computational

complexity of the problem can be defined as O(ndk+1

log n).

Because of the problem being NP-hard, there is a need for heuristic solutions

(algorithms) to solve the problem. Using a heuristic algorithm to solve the problem will

not make it optimal per definition, but close enough depending on utility. The most

commonly used algorithm to solve the k-means problem is called the Lloyd’s

algorithm, also conveniently known as just the „K-means algorithm‟ or the „Voronoi

iteration‟.

The algorithm starts by initializing k centroids, one for every cluster. This is done

uniformly random from the data points. Each data point will them be assigned to its

closest centroid, and then the centroids will be recomputed based on the assigned data

points. The last two steps will be repeated until it converges. Depending on where the

initial centroids are located, there might be slight differences in the results. Lloyd‟s K-

means algorithm has the complexity[12]

of O(nkdi), where i is the number of iterations.

As the number of iterations before convergence is often small, the algorithm is

considered to be linear.

There are two factors which potentially makes the algorithm a bad choice though. In a

worst case scenario, the algorithm can become very slow to converge. The second

reason that can lead to bad results is how the centroids are initially located (usually

because they are located too close to each other). By adjusting the “assignment” step, or

the expectation step, an improved[13]

k-means method can be used. The name of the

method is k-means++, and is used by programs such as MatLab.

1a. Choose one center c1 uniformly at random from the set of data points (x ϵ X).

1b. Choose another center ci, now from x ϵ X with a probability of ( )

∑ ( ( ) ) . This

step is called “D2 weighting”

1c. Repeat 1b until all k centroids has been located.

2. Now that all centroids have been initially located, proceed with the standard Lloyd‟s

algorithm from the “update” step (maximization step).

36

By changing the expectation step, the computational complexity becomes O(log k).

According to [12], the k-means++ consistently outperforms standard k-means

algorithm.

Before one begins to use K-means, one must determine how many clusters K is

needed, a task considered to be one of the algorithms biggest disadvantages. A good

choice of K will yield a result of clusters being away from each other, with their

assigned data points having a small mean distance and being few in numbers. Too few

clusters will result in the mean distance getting larger, too many will result in the

cluster centroids being too close.

One way to choose is to use the rule of thumb[2]

(f. 3.4.2-1), which is a simple way to

provide the algorithm with k, without external constraints. Another method is the Elbow

Method, which analyses the variance of “sum-of-squares” within the clusters when

increasing K. Here it is possible to see where an increase of K stops yielding good

enough results (see fig 14).

fig 14. An illustration of the „Elbow Method‟.

37

Chapter 6 Discussion

6.1 Results

To test the program, the following user data input were used along with four network

topologies provided by a DSO:

The number of smart meters per network is set to be approximately 150, which

will be rounded upwards depending on allocated power.

Packet loss rate threshold is set to 90% and later 20%

The smart meter nodes are set as switches.

Three different sets of CAPEX/OPEX, one for each scenario.

Different variations of GPRS CAPEX and OPEX for scenario 3.

The Gateway threshold is set to 100 and later 200.

Variable number of years of operation

As the four network topologies are defined, the difference in results will be seen when

adjusting the different parameters mentioned above. The costs of each scenario will be

shown for each of the four provided network topologies, as well as to which scenario

they will be assigned. Each set of parameter inputs will have their costs plotted against

time, which will show both the differences in costs between the different scenarios and

the development over time. The time interval used is 0 to 5 years, with markings every

quarter of a year.

The first set of parameters is chosen so that the number of smart meters for each

network is above the gateway threshold, this will disable the second scenario and only

the first and the third scenario will be plotted. The GPRS tariff will be chosen as having

no CAPEX and an OPEX of 0.002€ per data polling per smart meter.

The second run of the program will use the same parameters as the first, but now the

gateway threshold will be set to 200, enabling it for all the networks used. To see that

there‟s a difference, the „CAPEX per smart meter‟-cost of scenario 2 will be reduced by

50%.

The third run will return the gateway threshold to 100, but instead the GPRS service

provider now offers unlimited network usage for 50€ per month per network, with an

initial starting fee of 20€.

The fourth run will set the parameters identical to the first run, but with a packet loss

threshold set as 5%.

The fifth and final run will set the approximate number of smart meters assigned per

network to 100 instead of 150. The gateway threshold high enough to include all the

smart meters. Otherwise the same parameters as run 1.

38

Program

run #

Packet loss

threshold Gateway

threshold GPRS CAPEX and

OPEX (per SM per

day)

GPRS CAPEX

and OPEX (per

SS per year)

SM/network

1 90% 100 [0, 0.002] [0, 0] 150

2 90% 200 [0, 0.002] [0, 0] 150

3 90% 100 [0, 0] [20, 600] 150

4 5% 100 [0, 0.002] [0, 0] 150

5 90% 200 [0, 0.002] [0, 0] 100

Table 5. Parameters for each run of the program.

Fig 15. First run of the program. The red curve is scenario 1 and the blue is scenario 3.

Fig 16. Second run of the program. The blue curve is scenario 1, the red is scenario 2 and the

black is scenario 3.

-- (1) DC PLC

-- (3) GPRS

-- (1) DC PLC

-- (3) GPRS

-- (1) DC PLC

-- (3) GPRS

-- (1) DC PLC

-- (2) GW PLC

-- (3) GPRS

-- (1) DC PLC

-- (2) GW PLC

-- (3) GPRS

-- (1) DC PLC

-- (2) GW PLC

-- (3) GPRS

-- (3) GPRS

-- (3) GPRS

39

Fig 17. Third run of the program. The red curve is scenario 1 and the blue is scenario 3.

Fig 18. Fourth run of the program. The black curve is scenario 3.

-- (1) DC PLC

-- (3) GPRS

-- (1) DC PLC

-- (3) GPRS

-- (3) GPRS -- (1) DC PLC

-- (3) GPRS

-- (3) GPRS -- (3) GPRS

-- (3) GPRS -- (3) GPRS

40

Fig 19. Fifth run of the program. The blue curve is scenario 1, the red is scenario 2 and the black

is scenario 3.

Fig 20. Fifth run of the program, a zoomed in version of Fig 19 between year 0.8 and year 1.

From the first run, one can see that GPRS starts off as the cheaper alternative up

until approximately 1.5 years of operation. This seems to be true for all of the

networks except for 28CGD8, which has GPRS as its only alternative due to the packet

loss of its PLC clusters (some or all) being too high.

The second run shows that the gateway is acceptable, and cheaper than the other

options after approximately 1 year. As the packet loss threshold is the same for this

run as the previous one, the 28CGD8 network still has GPRS as only option.

The third run shows no noticeable difference in GPRS cost from the first run, as the

change is relatively too small.

The fourth run has a low packet loss threshold, which disables the PLC options but

allows GPRS for all the networks.

-- (1) DC PLC

-- (2) GW PLC

-- (3) GPRS

-- (1) DC PLC

-- (2) GW PLC

-- (3) GPRS

-- (1) DC PLC

-- (2) GW PLC

-- (3) GPRS

-- (1) DC PLC

-- (2) GW PLC

-- (3) GPRS

-- (1) DC PLC

-- (2) GW PLC

-- (3) GPRS

-- (1) DC PLC

-- (2) GW PLC

-- (3) GPRS

-- (1) DC PLC

-- (2) GW PLC

-- (3) GPRS

-- (1) DC PLC

-- (2) GW PLC

-- (3) GPRS

41

In the fifth run we set the number of smart meters to 100 instead of 150, this will

make the clusters smaller and the packet loss rate also as a result. Previously network

28CGD8 had scenario 1 and 2 disabled due to high packet loss, in this run both become

available. Because the difference between the scenario 2 and scenario 1 is relatively

small, figure 20 will show a zoomed version between year 0.8 and year 1.

The overall results from the MatLab program follows what is to be expected from

the simulation results. Different parameters such as various thresholds have been

tested for the four network topologies provided for the thesis.

6.2 Future Work

The results could be improved by having alternative simulation tools for the Power Line

Communication. By having access to only one NS-3 module, one cannot compare the

results and conclude that the results are accurate. One could also improve the work by

simulating GPRS, as this thesis had to rely on externally acquired packet loss rate data.

There are more clustering methods which can be explored, including not clustering at

all. Comparisons between different clustering methods could possibly approximate a

larger variety of distribution grids into more equivalent units. The more accurate one

can approximate a distribution grid, the more accurate one can compare it with pre-

measured results.

One of the main purposes of this thesis was to develop a tool for future use, to be able

to generate many random (possibly with the help of Monte Carlo algorithm) LV/MV

network topologies for analysis. Also, the method is not limited to LV/MV distribution

grids, with bits of changes in the code I would imagine this could be used for MV/HV

networks as well but for different parameters and constraints.

42

References

[1] E.W. Dijkstra,”A Note on Two Problems in Connexion With Graphs”,

Numerische Mathematik 1, 269-271 (1959).

[2] Kanti Mardia et al. (1979). “Multivariate Analysis”. Academic Press.

[3] Richard H. Frenkiel, “Cellular radiotelephone system structured for

flexible use of different cell sizes”, patent US4144411 A. (1976)

[4] F. Aalamifar, A. Schloegl, D. Harris, L. Lampe,“Modelling Power Line

Communication Using Network Simulator-3”, IEEE Global

Communications Conference (GLOBECOM), Atlanta, GA, USA,

December 2013.

[5] Don Shaver , “Low Frequency, Narrowband PLC Standards for Smart

Grid – The PLC Standards Gap!”,

http://cms.comsoc.org/SiteGen/Uploads/Public/Docs_Globecom_2009/6

_-_12-03-09_shaver_smart_grid_panel_final.pdf, Texas Instruments

Incorporated, December 2009.

[6] Manfred Zimmermann, “A Multi-Path Signal Propagation Model for the

Power Line Channel in the High Frequency Range”, Institute of

Industrial Information Systems University of Karlsruhe.

[7] A. Volgenant, “Linear and Semi-Assignment Problems: A Core Oriented

Approach”, University of Amsterdam, 1996.

[8] “Learn GPRS”, http://www.tutorialspoint.com/gprs/.

[9] “GPRS Family”, http://www.protocols.com/pbook/gprsfamily.htm.

[10] “Network Simulator 3”, http://www.nsnam.org.

[11] A.Fernandez Olivera, A.Sendin Escalona, Urrutia Galdos, J. Mateo

Arenas, Angueira Buceta, JJ. Ferro Vázquez, “Analysis of PRIME PLC

Smart Metering Networks Performance”, Iberdrola Engineering and

Construction S:A.U, Iberdrola Networks, University of the Basque

Country (UPV/EHU), 2013.

[12] “Clustering Algorithms: K-

means”,http://www.cs.princeton.edu/courses/archive/spr08/cos435/Class

_notes/clustering2_toPost.pdf, Princeton University.

43

[13] David Arthur & Sergei Vassilvitskii, “k-means++ : The Advantage of

Careful Seeding”.

[14] “The DISCERN Project”, http://www.discern.eu/project/vision-and-

mission.html.

[15] Juan Andrés Negreira, Javier Pereira, Santiago Pérez,“End-to-end

measurements over GPRS-EDGE networks”, Universidad de la

República Montevideo Uruguay.

[16] Sami Tabbane, “Quality of Service (QoS) definition and standards”, ITU

Academy November 2013.

[17] CENELEC EN 50065-1:2011: "Signaling on low-voltage electrical

installations in the frequency range 3 kHz to 148,5 kHz - Part 1: General

requirements, frequency bands and electromagnetic disturbances".

[18] Bogdan Baraboi,“Narrowband Powerline Communication Applications

and Challenges”, Ariane Controls inc.

[19] PRIME Project, “PRIME Technology Whitepaper; PHY, MAC and

Convergence layers”

[20] “G3-PLC Alliance”, http://www.g3-plc.com/

[21] “3GPP, A Global Initiaive”,

http://www.3gpp.org/technologies/keywords-acronyms/102-gprs-edge

[22] “Introducing the power of PLC, White Paper”, Landis+Gyr,

http://www.landisgyr.com/webfoo/wp-

content/uploads/2012/11/LG_White_Paper_PLC.pdf

[23] Maurizio D‟Arienzo, Antonio Pescapè, Rajiv Chakravorty, Giorgio

Ventre, “A Comparative Simulation Study for Multiple Traffic

Scheduling Algorithms over GPRS”,Computer Science department

Univeristy of Napoli, University of Cambridge Computer Laboratory.

Download - Optimal Allocation of Smart Meters to Real or Virtual Data

Top Related