Optimal Allocation of Smart Meters to
Real or Virtual Data Concentrators
Christian Johansson
ICS
Master Thesis
Stockholm, Sweden 2015
XR-EE-ICS 2015:007
I
Abstract
The Smart Grid is the new, modernized, electrical distribution grid. It allows for many
applications such as automation, reliability and efficiency of electrical distribution. A
key feature of the smart grid is the AMI (Advanced Metering Infrastructure) which is
the system that measures, collects and analyzes energy use.
When designing an LV/MV AMI smart grid, one can use several different media
types, two of those are PLC (Power Line Communication) and GPRS (General Packet
Radio Service). Choosing one over another may have differences in both
communication performance and overall economic cost.
This thesis describes a method to optimally assign the smart meter communication
devices in an AMI network to either the secondary substation via the power lines (PLC)
or towards the head-end via GPRS. When assigned to the secondary substation, the
data collected from the smart meters will be managed by a Data Concentrator or
forwarded by a Gateway towards the head-end where a Virtual Data Concentrator
would be located. As an alternative to PLC communication, GPRS can be used to
wirelessly communicate between smart meters and the virtual data concentrator.
The method proposed uses MatLab to read the users input data, such as smart
meter and power line link locations. The read data then gets compiled into a network
topology consisting of smart meter nodes and power line links between them. For
easier comparisons, the network is then clustered into groups. The clustering is done
using two methods, one fore PLC and another for GPRS.
The cluster data will then be compared with the packet loss ratio data acquired from
either simulations or other sources, stored in a repository. The comparisons, along
with various constraints set by the user, will then determine if the communication is fit
for use.
If deemed fit for use, each type of communication have their CAPEX and OPEX costs
calculated, based on the users input. This thesis will use data acquired from DSOs
(Distribution System Operators) to analyze the costs of four network locations and
determine the cheapest assignments for each one for various settings.
II
Sammanfattning
Smart Grids, eller smarta elnät, är framtidens moderna elektriska distributionsnät.
Smarta elnät tillåter många tillämpningar som automatisering, tillförlitlighet och
effektiv elektrisk distribution. En nyckelegenskap hos de smarta näten är funktionen
AMI (Advanced Metering Infrastructure), ett system som samlar ihop, mäter och
analyserar energiförbrukning.
När man designar ett låg- eller mellanspännings AMI elnät kan man utnyttja ett
flertal mediatyper, två av dem är PLC (Power Line Communication) eller GPRS (General
Packet Radio Service). Att välja den ena framför den andra kan leda till olikheter både
när det gäller kommunikationsprestanda och den ekonomiska kostnaden.
Detta examensarbete beskriver en metod för att på ett optimalt sätt tilldela smarta
elmätare i ett AMI elnätverk till antingen det sekundära ställverket via de elektriska
ledningarna (PLC) eller till huvudcentralen (head-end) trådlöst via GPRS. När mätarna
tilldelas det sekundära ställverket samlas data från dem ihop av antingen en Data
Concentrator (DC) eller en Gateway (GW) . En Gateway skickar därefter datan vidare
till huvudcentralen där en Virtual Data Concentrator (VDC) finns. Alternativt utnyttjas
GPRS för att trådlöst kommunicera mellan de smarta elmätarna och VDC.
Metoden som föreslås utnyttjar MATLAB för att avläsa användarens indata, som till
exempel positioner till de smarta elmtarna och de elektriska elledningarna som länkar
dem med det sekundära ställverket. Den avlästa indatan används till att bilda en
elnätverkstopologi bestående av smarta elmätare och elledningarna. För att enklare
jämföra de skapade topologierna med simulerade generella topologier, grupperas
nätverket i kluster. Klustermetoden skiljer sig mellan PLC och GPRS.
Klusterdatan används sedan för att jämföra packet loss ratio erhållen från antingen
simuleringar eller från andra källor. Jämförelserna avgör sedan huruvida
kommunikationen med sagd mediatyp upprätthåller användarens
kommunikationskrav.
Om kommunikationsmöjligheterna för topologin anses vara tillräcklig, kommer varje
kommunikationstyp att få sin kostnad uträknad utifrån CAPEX och OPEX. I detta
examensarbete utnyttjas kostnader och topologidata erhållen från DSO’s (Distribution
System Operators) för att analysera kostnaderna för fyra elnätverk och avgöra de
billigaste tilldelningarna för vardera utifrån olika insättningar.
III
Acknowledgement
I would offer my greatest gratitude to my master thesis supervisor Mikel Armendariz,
who has given me invaluable help during my project.
I would also like to thank the ICS department at KTH for providing me with an
interesting project to work with, and being very helpful throughout.
I would like to thank the DISCERN project and in particular UFD (Union Fenosa
Distribucion) for providing me with comments and the electrical grids to use for the
thesis.
My thanks also go to Fariba Aalamifar (University of British Columbia), Yunta Huete
Angel (UFD), Miguel Garcia Lobo (Gas Natural Fenosa Engineering) and prof. Ljiljana
Trajkovic (Simon Fraser University).
1
Contents Chapter 1 Introduction ......................................................................................................................... 1
Chapter 2 Background .......................................................................................................................... 2
2.1 The DISCERN Project ............................................................................................................... 2
2.2 Power Line Communication .................................................................................................... 2
2.2.1 The Electrical Grid and the PLC Channel ......................................................................... 2
2.2.2 PLC Standards and Protocols ........................................................................................... 4
2.2.3 PLC Simulation Tools ....................................................................................................... 4
2.3 General Packet Radio Service .................................................................................................. 5
2.3.1 GPRS/GSM Network ........................................................................................................ 5
2.3.2 GPRS Simulation Tools ..................................................................................................... 8
2.4 Thesis Goals and Objectives .................................................................................................... 8
Chapter 3 Method .............................................................................................................................. 10
3.1 The Model ............................................................................................................................. 10
3.2 Input Data .............................................................................................................................. 10
3.3 Topology Creation ................................................................................................................. 11
3.4 Smart Meter Clustering ......................................................................................................... 16
3.4.1 PLC Clustering ................................................................................................................ 16
3.4.2 GPRS Clustering ............................................................................................................. 18
3.5 Simulation Data Comparisons ............................................................................................... 21
3.6 Cost Function ......................................................................................................................... 21
3.6.1 CAPEX and OPEX ............................................................................................................ 21
3.6.2 Combinatory Cost Function ........................................................................................... 23
3.7 Optimization Problem ........................................................................................................... 23
Chapter 4 Simulations ........................................................................................................................ 26
4.1 PLC Simulations ..................................................................................................................... 26
4.1.1 Channel .......................................................................................................................... 26
4.1.2 Nodes and Net-devices .................................................................................................. 27
4.1.3 Protocols and Applications ............................................................................................ 29
4.1.4 PLC Simulation Results and Discussion.......................................................................... 30
4.2 GPRS Simulations ............................................................................................................... 31
Chapter 5 Theory ................................................................................................................................ 34
5.1 K-means Clustering ................................................................................................................ 34
Chapter 6 Discussion .......................................................................................................................... 37
6.1 Results ................................................................................................................................... 37
6.2 Future Work .......................................................................................................................... 41
References ............................................................................................................................................. 42
1
Chapter 1 Introduction
With increasing use of power around the world, the current electrical grid is testing its
limitations. By implementing a Smart Grid, the utilities in an electrical grid can
communicate and cooperate to improve operations, increase power distribution
efficiency, and better utilize renewable energy sources and various automation
functions.
The use of smart grids is increasing, but there are many standards out there, which all
seem to have their own advantages and disadvantages. Some of these standards are
bound by location, as the electrical distribution and various local laws, has differences.
As for now, until a global standard is established, one needs to choose what fits based
on media and locality.
The key factor of having a successful smart grid is the two-way communication
between a smart utility meter and a utility company. AMI (Advanced metering
infrastructure) is the system which collect, measures and analyzes energy usage. The
customers use a smart meter to measure its energy consumption, and sends data to the
energy supplier which in turn can register it. The two-way communication makes it
possible for the consumer to decrease costs due to demand response (electricity price
changes due to for example the peak load).
For the smart meters to communicate with the energy supplier, or the AMI head-end,
it will need a medium. There are several alternatives to which media to use, but they
can all be categorized into either wired or wireless. This thesis focuses on Power Line
Communication (PLC) and General Packet Radio Service (GPRS).
2
Chapter 2 Background
2.1 The DISCERN Project
The DISCERN project (Distributed Intelligence for Cost-effective and Reliable
Solutions) is a collective effort where DSO‟s (Distribution System Operators), research
institutions and technology providers share information regarding control and
monitoring of distribution networks.
The purpose of the project[14]
is to improve the understanding of the complex LV/MV
distribution network, the economic viability and ensuring higher standards of security
and reliability. DISCERN‟s objective is to enhance the European distribution grids and
provide DSOs with tested and validated solutions.
This thesis aim to contribute to the project by analyzing power line communication
within the LV/MV distribution network, with the help of some contributed network
topologies from a DSO. It fits within the work package WP6.
2.2 Power Line Communication
2.2.1 The Electrical Grid and the PLC Channel
The idea of communication through the electrical grid is not new, but it is not until
late the technique has been developed enough for practical use in a larger scale. The
power lines themselves were originally meant to be used exclusively for power
distribution, and use the 50-60 Hz frequency as standard.
Because of the power wire circuits being adapted for normal AC power usage, they
have a limited capability to utilize higher frequencies. Also, apart from the limitations
from the power line cables themselves, each country or region have their own laws
dictating which frequencies are available for use. The transmissions need to take place
within the license-free frequency bands, which spans between 3-148 kHz[17]
(the
narrowband) and 2-30 MHz[18]
(the broadband). The reason for this regulation is
because PLC is regarded as an unshielded transmission, sharing the same frequencies as
radio.
Higher frequencies allow for higher data rates, but at the cost of range. Usually PLC
broadband is used for Local Area Networks (LAN), for example in a building. To
access the Internet though, it will need to use a router and another medium such as
Ethernet.
3
Lower frequencies such as the narrowband mentioned only offer low data rates, but it
allows for further transmissions within the LV/MV (low-voltage/medium-voltage)
electrical grid.
Between the two alternatives, the lower frequency range is most suitable for AMI
communication between smart meter and data concentrator. In EU, within this
narrowband frequency range, only the 3-95 kHz range is reserved for utility
applications. This narrowband frequency is called the CENELEC A[5]
. The PRIME
(PoweRline Intelligent Metering Evolution) protocol uses the higher end of this
spectrum, 42-89 kHz[19]
.
While there is no official standard for PLC communication, PRIME is currently the
prevalent in the EU. An alternative to PRIME is the G3[20]
protocol and they both have
their ups and downs. In the DISCERN Project framework, and therefore in this thesis,
PRIME is being used.
As the power lines were not designed for data transmission they make a harsh
environment for communication. The distribution grid has the features of a complex
network, which not only is dependent on frequency, but also time, location and noise.
This makes PLC a difficult communication medium to generalize, because the main
factors affecting PLC communication are the high signal attenuation and noise levels.
This is especially true with low frequency bands, where examples of noise would be:
Continuous background noise; both time-variant (changes with line-voltage)
and time-invariant (constant for a long period, from for example thermal noise).
Narrowband noise, from for example broadcast stations.
Impulsive noise, consisting of abrupt impulses with short duration but high
amplitude. Either synchronous (light dimmers) or asynchronous (switching
regulators) to AC line voltage.
Mostly, the noise is caused by devices connected to the same power line, but it may
also be caused by nearby sources not directly connected to the power line.
Apart from the noise, signal attenuation is also a problem for PLC communication.
First there is the most obvious cause; the line loss from the power lines. As with every
media type, the further a carrier signal traverse, the weaker it becomes. This effect can
be prevented by using repeaters, either strategically placed along the network or using
the smart meters as repeaters. If the signal becomes too weak it may be drowned by the
noise. Another cause of signal attenuation is the impedances of all the loads connected
to the power line.
There are other causes of signal attenuation such as passing transformers (from LV to
MV for example) and multipath propagation (causing reflection). Keeping the
communication in the LV network (the transformer is likely in the secondary
substation, where the data concentrator is located), and operating in the lower frequency
band, this is prevented. There is, however, always a probability of impedance mismatch
occurring in the power line branching points (where the power lines are extended or
forking).
4
2.2.2 PLC Standards and Protocols
As mentioned, there are no standards set for PLC. Which protocols to consider depends
largely on what PLC is used for and where. It does not help that the Smart Grid itself, is
not standardized. To narrow down the choices, there are two rivaling PLC standards
used in the EU today; PRIME and G3. Both are developed to be able to optimize PLC
communication within the narrowband, and are sets of protocols. Generally one can say
that PRIME is designed for low voltage lines with low noise, while G3 is designed for
medium voltage lines. Which of these protocols will “win” may be a political choice,
but for this thesis the parameters used shall be consistent with PRIME.
PRIME is based on Orthogonal Frequency Division Multiplexing (OFDM) and
adaptively use three modulation schemes (DBPSK, DQPSK and D8PSK), with or
without FEC. It also has a MAC layer and an IP layer. The MAC layer has CSMA/CA
(Carrier Sense Multiple Access with Collision Avoidance) and ARQ (Automatic Repeat
reQuest).
The PRIME technology, being made for AMI, has defined Base Nodes and Service
Nodes. The Base node is the data concentrator, which is normally connected at the
secondary substation. The Service Nodes are the nodes serving the base node, which in
this case would be the smart meters. A Service Node is either „disconnected’, a
„terminal’ or a ‘switch’. When a Service Node is a terminal, it is registered in the
network and is ready to communicate with the Base Node. If the Service Node is a
switch, it has the additional property as a repeater, and communicates with other
Service Nodes. It has been shown that the availability of the Service Nodes are very
good, though there may be disconnects because of noise variations (during the
simulations, “worst case” noise will be added).
The state of the Service Nodes depends on the network conditions (such as noise and
attenuation), making it dynamic. During the simulations, discussed in later chapters,
each smart meter node is considered to be in a switch state.
Advantages of PLC
PLC infrastructure already in place, cheaper implementation.
No third party communication supplier required, from an electric company
perspective.
Good enough bandwidth for AMI.
Disadvantages of PLC
Technically challenging transmission medium, which is noisy and difficult to
model.
Not viable for applications in need of higher bandwidth.
2.2.3 PLC Simulation Tools
The simulation tool chosen for the task is NS-3 (Network Simulator 3), with the help of
an externally made module[4]
. NS-3 by itself is a flexible tool, consisting of a library of
modules, but does not by default support PLC.
5
By writing the program in C++ or Python, the programmer can include (import) the
modules needed to create a network of communicating nodes, operating within a PLC
channel.
A more detailed description of both the program itself and how it is used is described
in chapter 4.1.
2.3 General Packet Radio Service
GPRS (General Packet Radio Service)
[21] is a mobile packet oriented data service, using
the GSM network. GSM/GPRS is a widely available cellular communication system
which is mostly used for mobile phones (also known as 2.5G). The GPRS throughput
and latency depends on how many users are sharing the service.
2.3.1 GPRS/GSM Network
In a cellular system, antennas are installed in a grid of regular shaped cells (for example
hexagons) to cover the area (fig 1). The antennas in the GSM/GPRS network are called
BTS (Base Transceiver Stations), and forwards packets from the Mobile Stations (MS,
in this case smart meters).
fig 1. architecture of a cellular network
[3], such as GSM
6
fig 2. GPRS architecture
[8].
The packets received by a BTS is collected by BSC (Base Station Controller), both of
these are commonly referred to as BSS (Base Station Subsystem). The BSC then
forwards the signal it received to either the Mobile Switching Center (MSC), if it is a
standard GSM phone call, or the Serving GPRS Support Node (SGSN) if it is GPRS
data. For this function to work, the BSC will need a Packet Control Unit (PCU), which
either is an additional hardware router or incorporated into the BSC.
SGSN authenticates the source and collects charging information, it can also be
viewed as a gateway to services within the local GPRS network. From the SGSN the
packets will be forwarded to a Gateway GPRS Support Node (GGSN) which is a router
working as an interface to external networks such as the internet or other GPRS
networks. The GGSN can also act as a packet filter and collect tariff information from
the external network if needed.
When reaching the BSC/PCU, the phone calls are normally more prioritized than
GPRS data transmissions. This is because the phone calls are much more tolling (due to
continuous activity during calls) on the network. Therefore the GPRS block rate
increases with increasing GSM phone call activity, thus increasing the overall packet
loss.
From the outside, the MS communicates via a standard IP and reaches the end-point
via IP. Inside the GPRS network though, it uses an IP-based GPRS tunneling protocol.
Inside the GPRS network, between the MS and the SGSN, Sub-Network Dependent
Convergence Protocol (SNDCP) and Logical Link Control (LLC) are used. The main
function of SNDCP is to[9]
:
Compress and decompress user data and protocol control information.
Packet Data Protocol multiplexing (saving bandwidth).
Segmentation of N-PDU‟s (network protocol data units) into LL-PDU‟s (LLC
protocol data units), and also re-assembly from LL-PDU into N-PDU.
Once the compression and data unit conversions have been done by SNDCP, LLC is
used as an interface between the network layer (such as the now compressed IP) and the
link layer (MAC). LLC offers encryption within the same network, and is renewed
when reaching new external networks.
7
fig 3. layout of the layer 3 GPRS tunneling protocol
[8].
The Quality of Service (QoS) offered by GPRS is separated into
Service Precedence; a three level priority system categorized as High, Normal
or Low. This is used to prioritize packet transmissions during congestion, where
low priority packets are discarded.
Reliability; defines maximum values of packet loss, duplication and corruption
of packets.
Delay; end-to-end transmission time, including all delays within the GPRS
network.
Throughput; this usually depends on the agreement between customer and
supplier, but the billing is usually done per packet sent.
When it comes to the billing, the GPR supplier usually base the tariff on the data
volume or packets sent. But there are other alternative agreements between customer
and supplier which can be made.
Volume; payment is based on the data volume sent, and is proportional to the
number of smart meters.
Duration; A given timeslot for the specific AMI data-polling, where a high
priority is given. The length of the duration would be proportional to the
number of data polls, which is directly proportional to the number of smart
meter devices.
Flat rate; A fixed monthly fee to allow for either unlimited, or a specific
maximum, data volume.
Location; the location of the AMI network may have an impact on price.
Time; price may change depending on the time of day, when the network load
is different.
Quality of Service; costs may be modified depending on the required priority,
delay or throughput.
8
2.3.2 GPRS Simulation Tools
The method proposed allows one to utilize any simulation results acquired from any
source, as long as it is provided in the correct input format. Because of the great variety
of GPRS networks, it can be difficult to setup an accurate simulation to satisfy most
smart meter topologies put into the clustering method.
The first issue is that while the smart meters are stationary, the GPRS network shares
its buffer with mobile users traveling between the cells. The number of mobile users is
dependent on the location, and is more common in urban areas than rural areas.
Normally the service provider prioritizes GSM (calls) over GPRS for time slot
allocation, but this is again is dependent on the service provider and the level of QoS.
The second issue to address is whether the clustered groups of smart meters resides
within the same cell or divided across multiple cells. Even within the same cell, signals
may be picked up by other base stations due to signal strength or if the BTS is busy
with GSM calls. The easiest way to accommodate this issue is either to have specific
information regarding a specific GPRS network, or work under the assumption that the
clusters share the same cell without the smart meters changing BTS.
Depending on the service provider, the queuing process may utilize different queuing
schemes such as various versions of FIFO (First In First Out), FCFS (First Come First
Served), EDF (Earliest Deadline First) and SJF (Shortest Job First) amongst others. In
the case of buffer overflow, incoming packets will be dropped and considered lost.
2.4 Thesis Goals and Objectives
The two media alternatives used in this thesis are PLC (Power Line Communication)
and GPRS (General Packet Radio Service). The main reasons for using these are the
availability and price. In the case of PLC, the infrastructure already exists and for
GPRS it uses the widespread GSM network. Both of these provide low data rates,
which for AMI purposes is adequate.
The purpose of this thesis is to analyze smart-meter to data concentrator
communication in LV/MV electrical grids. The communication should follow the
PRIME protocol, which operates under the CENELEC A standard.
Specifically, the goal is to optimally assign the smart meters in a network to either a
Data Concentrator located at the secondary substation, a Gateway which also is
located at the secondary substation, or directly to a Virtual Data Concentrator located
at the head-end. The first and the second option both use the LV/MV PLC network for
communication, while the third is using wireless GPRS communication.
9
Table 1. List of scenarios to be compared.
The goal is to develop a method to have any PLC network topology inserted into a
MatLab program, and get an output of smart-meter assignments. The assignments are
based on both an economic standpoint and on communication availability. The
economic factors are based on CAPEX and OPEX, while the communication
availability is derived from PLC and GPRS simulations.
10
Chapter 3 Method
3.1 The Model
To assign smart meters to a data concentrator or a virtual data concentrator, several
tasks needs to be done. At the highest level, one inputs topology data into a MatLab
program and gets an output of assigned smart meters. At a lower level, one could
recognize simulations written in NS3/C++. To get an overview, one could look at the
model in figure x.
fig 3. Overview model of the method of assigning smart meters.
In total there are three simulation tools used to reach the results, MatLab, Riverbed
Network Management (formerly OPNET) and NS-3. In addition to this, a program such
as Excel should be used to create an input data file.
3.2 Input Data
The main purpose for the MatLab program is to compile various input data and use it to
get an output, the smart meter assignments. There are two types of inputs going into the
MatLab program, user data input and program data input.
The user data input consists of LV grid network topology data and various user
options. This data is presented as an Excel-spreadsheet with a specific layout, so that
the MatLab program can identify and read the correct fields of data. There are two
different files for the topology, one for the households and one for the cable links. The
households, which can be seen as smart meter hubs, are identified by position,
identifying code, network area name, and finally the allocated power. The Cable links
are similarly identified, but here there are two positions, one for both ends of a cable
link. In addition to these, each network area should have the position their Secondary
Substation, so that the feeders can later be identified. Other user inputs, such as various
output options and CAPEX/OPEX costs, are inserted into the MatLab program itself.
11
The program data input is the collective name of the data inserted into the MatLab
program from other programs. This consists primarily of the simulation data matrix
created from NS-3 and Riverbed Network Management. While this simulation data can
be edited by a user, it is not intended and should only be done if better simulation
results are achieved.
The MatLab program is intended to be able to read any network topology, as long as
it is within the simulation parameters (distance to secondary substation, number of
smart meters). If a household is located further away from the secondary substation
than it has been simulated for, it will be considered „too far away‟ for communication.
3.3 Topology Creation
To create a network topology, MatLab will read the presented input data. Once it is
read, it will create household nodes and link nodes.
The household nodes (HH-nodes) are positioned according to their X and Y position,
and are attributed with their allocated power. The allocated power varies greatly from
one HH-node to another, depending on the household‟s occupancy and power
consumption. Here we approximate that each customer has similar power consumption,
which indirectly tells how many customers there are per household.
Approximate number of smart meters of network j Nj.
Total allocated power of network j Pss,j.
Power per Meter of network j Psm = Pss / Nj.
Number of smart meters of household i Nhh,i ≈ Nj / Psm.
Having the exact number of smart meters and their position would be preferable and
yield a more exact model of the actual network. As it is now, the number of smart
meters deployed is proportional to the allocated power, which brings a density weight
to each household node. This way, the exact number of smart meters is not important,
as long as it is sufficiently high to provide accuracy to the weighs. For the size of the
specific network samples provided in this thesis, around 150 smart meters are deployed
per sub-network. The number of smart meter nodes deployed at each household node is
rounded upwards, which most often makes the number above 150.
12
Fig 4a. Model of Network 28CZ15with active household identification codes and the number of
allocated smart meters.
13
Fig 4b. Model of Network 28CGD8 with active household identification codes and the number
of allocated smart meters.
14
fig 4c. Model of Network 28CC6 with active household identification codes and the number of
allocated smart meters.
15
fig 4d. Model of Network 28CC15 with active household identification codes and the number of
allocated smart meters.
16
To be able to calculate the cable distance from each household node and their
assigned smart meters, they need to be assigned to a nearby link node. A straight cable
line is represented by two end points; each of those is categorized as a link node. From
the user input data the MatLab program link the end points by adding them into a
matrix. Also, the point-to-point distance is calculated between the nodes and added to
the matrix. Furthermore, to connect the cable lines, link nodes which are on top of each
other are linked with a distance of zero. Each household node is assigned to the link
node which is closest, with a margin of 1 meter.
Having assigned each household to a link node, and having the distance calculated
between all connecting links, the shortest path to the secondary substation is calculated
using Dijkstra‟s algorithm[1]
. The worst case computational complexity of Dijkstra‟s
algorithm is O(|E|+|V|log|V|), where V is the number of vertices/nodes and E the
number of edges/links.
Also part of the topology is the feeder nodes. The feeder nodes are link nodes which
begins at the secondary substation and branches outwards. The difference between the
feeder nodes and the link nodes is that the feeder nodes are not linked if they are on top
of each other. This makes it that the power lines can be separated by feeder, which is
important during the PLC clustering.
3.4 Smart Meter Clustering
When analyzing the communication capabilities of a smart meter, one needs to consider
its surrounding environment. Depending on the medium, the environment is different.
3.4.1 PLC Clustering
In PLC the smart meters are communicating through the electrical grid towards the
secondary substation, where there is either a data concentrator or a gateway. A smart
meter shares its medium with other smart meters in the same feeder, with various
distances from each other. Apart from the channel characteristics, such as background
noise levels and frequency band, the main concerns regarding the communication
quality are channel congestion and signal attenuation. The channel congestion is caused
by additional traffic from other communication devices. The signal attenuation is
mainly caused by channel noise and cable distance. To accommodate these two issues
the distance from the smart meter to the secondary substation, as well as the density of
proximate smart meters, are investigated. To find the distance and density of the smart
meters in a network, one can cluster the data points representing the smart meter
positions.
There are several clustering methods which can be used for grouping the smart
meters. Going by the positions alone can cause two smart meters to be considered
sharing the same media even though they are not, so the first step would be to separate
the set of smart meters into feeder clusters. The number of feeder clusters corresponds
with the number of feeder nodes which are populated with smart meters. The smart
meters gets assigned to a feeder cluster by looking for a path from a smart meter to all
feeder nodes, if a path exists it is the feeder it belongs to. There are cases where there
17
are paths from one smart meter to several feeder nodes; most often the cause is when
the input data puts a link node of one feeder on top of a link node of another feeder. In
this case the shortest path is used, but the consequence is that both feeders can be
considered to be of the same feeder cluster, which may or may not be intended. To
prevent this, the link cable edges should be placed more than one meter apart.
After the network is divided into feeder clusters, the smart meters are still scattered.
To divide the smart meters into groups by proximity, K-means clustering is used. There
are several possible clustering methods to use, such as hierarchical-clustering,
distribution-based clustering and density-based clustering. K-means, which is a centroid
based clustering method, is chosen for its simplicity and general usage.
One of the prerequisites of using K-means is determining the variable K. This can be
done by several methods, such as the elbow method or the silhouette method, where it
can be done case-by-case. In this thesis, the determination of K has been done by
experimenting with different K values. There are a few things to think about when
choosing K:
K must be equal or less than the number of smart meters.
Too small K results in smart meters too far away from the centroid.
Too large K can result in empty clusters or clusters splitting into more parts,
being too close to each other
The MatLab program automatically reduces the chosen K until the first requirement is
fulfilled, so the challenge lies in the two other points. One suggestion is to use a “rule of
thumb”[2]
which would be
K ≈ √ (f 3.4.1-1)
with n as the number of objects (smart meters or households).
When analyzing the congestion, one needs to consider the other clusters sharing the
same feeder. The number of populated clusters in a feeder is hereby called Q, and is
used during the simulations. Q can range from zero, when there are no smart meters in a
feeder, to a maximum of K.
0 ≤ Q ≤ K (f 3.4.1-2)
As the simulations are done independently from the MatLab program, K must be set
to a maximum of Q. If K is above Q, then the program will try to reach out of bounds
and exit the program. This is normally a problem if feeders get merged, and an
unusually large n gets into (f1). This can be accommodated for by either capping K or
increase the simulated Q. As the merging should be an anomaly, the program chooses
to cap K.
(f 3.4.1-3)
where Qsim is the largest Q that has been simulated.
18
For the network samples provided for the thesis, Qsim is set to 3, which also set the K
in K-means to 3. This K will be producing a maximum of three clusters within each
feeder cluster, depending on the data set. Also, when determining Q, the size of the
populated clusters must be of similar size to be considered. This is to very small
clusters to be considered larger than they are in the simulation, and vice versa. If a PLC
cluster is about the same size as the other two clusters combined, Q will be reduced
from 3 to 2.
When all smart meters and households has been put into PLC clusters, each meter
will have their density, distance and Q attributes set to the average of its cluster. These
attributes will be used for comparing with the repository PLC simulation results to see
if the communication quality is up to par.
fig 5. PLC clusters of Network 28CZ15 with K = 3.
3.4.2 GPRS Clustering
When communicating via a wireless medium such as GPRS, there is no need to
consider the electrical grids cable lines. Here, the smart meters communicates point-to-
point with a nearby BTS (Base Transceiver Station), which in turn forwards the data
towards the head-end where a virtual data concentrator is operating.
Apart from the GSM channel characteristics such as noise, the smart meter
communication is susceptible to congestion from other communication devices sharing
19
the same channel, especially when close. Since GPRS is using a cellular GSM network,
some of the smart meters might be considered to be in other cells than other smart
meters in other parts of the grid.
Because of this, K-means clustering can be used to group each smart meter data
points. With K-means a centroid is created from the set of smart meter data points. A
centroid is the mean position of all the points (smart meters or households). When
speaking of smart meters the centroid represents the mean position, when speaking of
households the centroid could be looked at as a center of mass with the allocated power
as weights. The centroid, created for each K-means cluster, is used to calculate the
mean point-to-point distance from the cluster to the BTS. The distance affects signal
attenuation, and is a variable used for the repository GPRS simulation results together
with the density of the clusters smart meters.
When determining K for GPRS K-means clustering, one could use the “rule of
thumb” discussed in the PLC clustering section. However, since each smart meter in the
same household shares its location, the number of households will be used in the (f1)
formula.
K ≈ √ (f. 3.4.2-1)
where n is the number of active households.
Below are examples from the four networks provided for the thesis, the calculated K
for each one is rounded upwards.
28CCH6 19 households K ≈ √ ≈ 3.08 ≈ 4
28CCI5 34 households K ≈ √ ≈ 4.12 ≈ 5
28CGD8 15 households K ≈ √ ≈ 2.73 ≈ 3
28CZ15 61 households K ≈ √ ≈ 5.52 ≈ 6
After determining K, the number of smart meters of the household nodes will be
considered so that the centroid gets positioned based on density of smart meters.
Figure 6 shows the plot of the 28CCI5 network, where K was calculated to be 5. As it
can be seen, there are only 4 clusters created because of the positions of the data points.
21
3.5 Simulation Data Comparisons
One of the reasons for clustering the smart meters is to simplify an otherwise complex
topology into entities consisting of Density (D), Distance (L) and number of adjacent
clusters (Q). These three values can then be compared to a repository matrix of
simulation results, to acquire the corresponding packet drop rate of the cluster.
Q = i Distance L
Density
D
P11 P12 P13
P21 P22 P23
P31 P32 P33
P41 P42 P43
Table 2. an example of a small repository of packet losses PLD.
If the packet drop rate of the cluster is above the threshold set by the user data input,
the specific option is considered to be too poor for communication.
SLDQ =
(f 3.5-1)
This factor is added to the smart meters cost function, so that the option will never be
chosen as the least cost option. If all alternatives are infinites, then the smart meter is
set to inactive, and assigned to neither a data concentrator nor a virtual data
concentrator.
3.6 Cost Function
3.6.1 CAPEX and OPEX
The cost of assigning a smart meter is different depending on the scenario and the
CAPEX and OPEX user input values.
CAPEX (Capital expenditure) are the costs of buying fixed assets or adding costs to
existing fixed assets. This category includes costs such as deployment of data
concentrators, smart meters, infrastructure upgrades and hardware upgrades. One could
also use the term transitional CAPEX, which is the cost difference between the old
system and the new. For a company in possession of power-lines, the transitional
CAPEX would be zero in regards of PLC cabling purchases.
OPEX (Operating expense) are the ongoing costs to uphold the network and its
communication. Examples of this would be maintenance, tariff and operation costs.
In this thesis focus on three different scenarios (Table 1), or communication topology
setups. Each of these scenarios has different sets of CAPEX and OPEX cost-values to
22
be set as input variables for the cost function (chapter 3.6.2) or TCO (Total Cost of
Ownership).
The first scenario, where the smart meters communicate to a data concentrator via
PLC, may have the following costs to assess.
Scenario 1
CAPEX OPEX
Deployment of a Data Concentrator at the
secondary substation.
PLC infrastructure maintenance.
Deployment of a smart meter for each
customer.
Data concentrator and smart meter
hardware maintenance.
PLC infrastructure upgrades.
Data Gathering & Com. analysis tools Maintenance/development of the tools
Virtual Data Concentrator Maintenance/development of VDC
The second scenario would be similar to the first, but with a smaller CAPEX cost
because of the cheaper gateway instead of a data concentrator at the secondary
substation.
Scenario 2
CAPEX OPEX
Deployment of a Gateway at the
secondary substation.
PLC infrastructure maintenance.
Deployment of a smart meter for each
customer.
Gateway and smart meter hardware
maintenance.
PLC infrastructure upgrades.
Data Gathering & Com. analysis tools Maintenance/development of the tools
Virtual Data Concentrator Maintenance/development of VDC
In the third scenario the communication devices bypass the PLC infrastructure by
communicating directly to a Virtual Data Concentrator via GPRS provided by a third
party supplier. The general difference here from the other scenarios is a lower CAPEX
but a higher OPEX. The party offering the GSM/GPRS network access may have
different payment plans, but here we consider either a fixed fee per data message or
fixed annual fee for sufficient communication capabilities.
Scenario 3
CAPEX OPEX
Deployment of a smart meter for each
customer.
Smart meter hardware maintenance.
Virtual Data Concentrator setup. GSM/GPRS service provider tariff.
23
For the thesis, the CAPEX and OPEX used has been provided by a DSO. Without
disclosing the actual numbers, the costs included in the CAPEX/OPEX calculations are
as follows.
DC (PLC) GW (PLC) GPRS
CAPEX OPEX CAPEX OPEX CAPEX OPEX SWITCH X - X - X - MV Supervisor X - X - X - SCADA + DMS X X X X X X GCT X X X X - - MDMS X - X - X - Smart meter X X X X X X Virtual Data Concentrator - - X X X X Power Analysis Tool X X X X - - Network Information System
- - - - - -
Table 3. The CAPEX/OPEX data parameters used in the case study.
Furthermore, each CAPEX and OPEX cost are divided by per smart meter and per
substation. For this thesis, the „CAPEX per substation‟ cost of scenario 2 (gateway) will
be set to 30% of that of scenario 1. This being said, in the case study the largest cost is
not he „CAPEX per substation‟ but the „CAPEX per smart meter‟, so the difference in
cost is not great even with the additional CAPEX and OPEX from „Virtual Data
Concentrator‟.
3.6.2 Combinatory Cost Function
The cost function is the combinatory costs of CAPEX, OPEX and the Loss-factor.
Cij = CAPEXij + OPEXij + Sij (f 3.6-1)
CAPEXij ≥ 0 (f 3.6-2)
OPEXij ≥ 0 (f 3.6-3)
where i is the smart meter and j the scenario (Table 1). CAPEXij and OPEXij are the
sums of costs according to (Table 3) and variations thereof depending on the user.
As it can be seen, the loss factor S decides if the cost C will be infinite or not
according to (f 3.5-1).
3.7 Optimization Problem
The task to optimally assign the smart meters in the smart grid AMI network can be
described as an Assignment Problem. An assignment problem is, in general terms, the
description of a problem where the task is to find a maximum/minimum weight
matching in a bipartite graph. A weighted bipartite graph is a graph where each edge
has an assigned value (weight), which in this case is the combinatorial cost from (f 3.6-
1).
24
fig 7. Assignment Problem, agents to the left assigned to tasks to the right.
In an Assignment Problem, it is optimal to assign agents (in this case the clusters) to
tasks (in this case DC, GW or GPRS) while minimizing the cost cij. A semi-assignment
problem normally deals with optimally assigning m tasks to n agents so that each task is
assigned to one agent. By relaxing the semi-assignment problem, the later constraint is
lifted, allowing for more than one agent being assigned to a task. Also, this will allow
for a task not being assigned an agent. Mathematically, this relaxed linear semi-
assignment problem[7]
can be defined as followed.
∑∑
(f 3.7-1)
Subject to
∑
(f 3.7-2)
where
(f 3.7-3)
(f 3.6-1)
(f 3.7-4)
Smart meter i is within the set of smart meters I, which is assigned to task j within the
set of tasks/scenarios J. Formula 3.7-2 together with 3.7-3 says that a cluster can only
be assigned to one task. By removing the original constraint (from the semi-assignment
problem) of
∑
it is possible for a task (scenario) to have more than one agent (smart meter) assigned to
itself.
The relaxed linear semi-assignment problem is a special case of the transportation
problem, which is a special case of the minimum cost flow problem, which is a special
case of Linear Programming (LP). A Linear Programming problem produces a feasible
25
region set by a series of linear constraints. The optimal assignment of smart meters is
finding the minimal possible cost within this region.
Because of the relaxations, one cannot use the Hungarian method which is often used
for combinatorial optimizations. However, since more than one agent per task is
possible, the minimum cost becomes as simple as choosing the cheapest option for each
smart meter, as the sum (f 3.7-1) would be minimized by doing this. That would not be
enough though, since there is an additional constraint implemented that says that only
one of the tasks can be used per network. This makes the practical method of finding
the minimized cost of assigning all smart meters to a scenario quite simple; sum the
costs of having all smart meters assigned to the tasks and select the cheapest option.
26
Chapter 4 Simulations
4.1 PLC Simulations
The PLC simulations are done using NS-3
[10] (Network Simulator 3). NS-3 is a discrete
event network simulator, built using C++ and Python. The default NS-3 library does not
include PLC support, so an external module named “PLC Software”[4]
is used for PLC
specific properties.
NS-3 is fundamentally a C++ object system, with node-objects sending packets via a
channel to each other. In the simulations, the smart meters and the data concentrator is
created as similar objects. The difference between the two is that the data concentrator
is the application source while one of the smart meters is the application sink.
Figure 8. A general model of an NS-3 simulation program.
4.1.1 Channel
The PLC channel is the connection between the nodes, such as PLC cables and various
environmental settings. Here, one can specify the frequency range, cable characteristics,
noise levels and various physical layer settings.
The channel is specified so that it emulates a CENELEC A PLC channel, and more
specifically within the PRIME protocol. The PRIME protocol uses the upper
CENELEC A frequency band, which ranges from 42 kHz to 95 kHz.
The noise floor, the sum of all noise sources such as background noise. Noise is
generally generated from electrical loads, which varies with location, time of day,
frequency and the distance to the noise sources. The PLC Software module has a pre-
defined “worst case” background noise based of the power spectrum, for this
configuration it is 1e-12 dBm which is used in the simulations. The noise floor affects
the signal attenuation, so the choice of noise floor will determine the distance of which
the PLC signals can traverse between repeaters.
Repeaters may either be placed as standalone devices along the PLC network, or as in
this case, as a smart meter functionality. Because of this, one can assume that each
meter of a PLC cluster has its signal repeated by the cluster‟s smart meters towards the
secondary substation. Therefore, when reaching the cluster‟s smart meter with the
27
shortest cable distance to the secondary substation, the signal is presumably repeated its
last time.
The simulation program models this by linking a point-to-point NAYY150SE PLC
cable from a data concentrator node to a link node of the distance of the smart meter
closest to the secondary substation. Propagating signals has an increasing attenuation
with length and frequency[6]
. The distances simulated ranges from 1 meter to 250
meters, as the furthest located smart meter used is less than that.
The channel uses OFDM, which primary advantage over single-carrier alternatives is
its ability to deal with harsh channel conditions such as narrowband interference and
frequency-selective fading due to multipath. OFDM splits the data into sub-carriers of
different frequencies (here it is within the upper CENELEC A band). PRIME uses
OFDM with adaptive equalization to overcome the intersymbol interference.
Intersymbol interference is when one symbol in a signal interferes with subsequent
symbols. This kind of interference is comparable to noise, and is usually caused by
multipath propagation in wireless media (a signal arrives from different paths), but in
this case most likely because of signal reflection (some signal power gets reflected back
to its origin).
To prevent signal reflection, an outlet is installed at the end of each cable line with an
impedance matched to the cables characteristic impedance. A common value for this
impedance is 50 Ω (as with radio-frequency systems) because of the long length of the
cable compared to the signal wavelength. A smaller impedance, such as 5 Ω, will
increase the packet loss ratio.
With OFDM, the data signal is modulated in parallel with BPSK (Binary phase-shift
keying). PSK (phase-shift keying) is one of three methods of conveying an analog
carrier signal as a digital data signal, the other two are either amplitude-shift keying or
frequency-shift keying. BPSK is the simplest (and most robust) form of PSK and uses
two different phases to indicate ones and zeros.
Figure 9. A BPSK signal over time.
4.1.2 Nodes and Net-devices
A node in ns-3 can be viewed as an empty computer chassis, in which a host can install
various hardware and software. To make a node being able to communicate with
another node, each node needs to have an interface to connect to the PLC channel.
A net-device can be viewed as a peripheral card acting as a medium between a
motherboard and the outgoing interface to the PLC channel. This PLC net-device
contains various link layer settings and protocols such as MAC.
To simulate an arbitrary power line network, the smart meter clusters are categorized
into groups of Density, Distance to DC and the number of these clusters sharing the
same feeder (denoted here as Q). The simulation program allows one to enter these
28
variables, and will then create a simple network representing an approximation of a
feeder with these clusters.
Since the cluster data from the MatLab program, or rather the user input data, is not
directly input into the NS-3 simulation code, every possible topology combination
cannot be foretold and simulated. Therefore each additional cluster (Q-1) is
approximated to be of similar size of the chosen smart meter‟s cluster (explained in
3.4.1).
The NS-3 program will place the smart meters (SM), Link nodes (LN) and the Data
Concentrator (DC) according to the figures below:
fig 10. Simulation placement of a [D, L, Q] = [3, 25, 1] topology.
fig 11. Simulation placement of a [D, L, Q] = [3, 25, 2] topology.
As can be seen from figure 10 and figure 11, the distance L is from the closest smart
meter in the cluster (SM-0) to the DC. The data transfer between smart meter and DC is
between these two nodes, while the others are there fill up the densities of the clusters
in the feeder.
There are one meter long links between each node to prevent errors during the
execution of the code. Because of this, large topologies will have their furthest SM-
nodes far away from the DC-node, but their signals will be repeated by nearby link-
nodes (which are placed so that they act like a shared medium) until the one closest to
the DC. The distance of the closest smart meter of the additional cluster to the shared
link node is approximately D∙(Q-1) meters, which in a [D, L, Q] = [3, 25, 2] topology
will be 6 meters. This will only add a small attenuation, but in a [D, L, Q] = [100, 25,
3] the distance from the furthest cluster to the shared link node will be 200m. While
this distance is large, and will introduce a more attenuated signal to the shared link
node, it corresponds with clusters being further away from the relevant data transfer.
Looking at figure 11, if one were to be interested in analyzing the data transfer between
SM-3 and the DC, it would be as simulating a
29
[D, L, Q] = [3, 25 + Q*D, 2] = [3, 31, 2]
topology as the relevant data transfer is by default between SM-0 and the DC. Also,
worth mentioning is that the number of assigned smart meters in the MatLab program
ranges from approximately 150-190 depending on power allocation and the user input
(one can use less number of smart meters). If the clusters are of similar size, it would
mean that the maximum distance from the closest smart meter of the third cluster has an
additional distance of:
→ Ladditional ≈ D∙(Q-1) ≈ 63∙(3 - 1) = 126m.
However, since each PLC network usually has more than one feeder, the size of each
cluster is usually small enough to not make a difference.
4.1.3 Protocols and Applications
CSMA/CA and ARQ is implemented in the module. With the nodes and the PLC
channel configured, there is a need for protocols and applications both to enable data
transmissions and to analyze the packets sent.
The PLC Software Module‟s PLC channel has already implemented ARQ and
CSMA/CA by default, which happens to be in line with the PRIME protocol. ARQ is
an error-control method used during the transmission of packets, which uses
acknowledgements and time-outs to attain more reliable data transmissions over the
unreliable power line environment. If the sender fails to receive an acknowledgement
from the receiver before the timeout, it will resend the packet. It continues to do so until
the number of maximum retransmissions has exceeded.
CSMA/CA (Carrier Sense Multiple Access with Collision Avoidance) is a multiple
access method which operates in the data link layer. CSMA/CA is mostly used for
wireless networks to prevent the hidden node problem, but is also used in PRIME due
to its ability to control shared media, which the power line network has in common with
a wireless network. With CSMA/CA activated, the nodes only transmit data when it
senses the PLC channel idle. If the channel is not idle, it will wait for a period of time
(backoff-time) before trying again.
To be able to send data, it needs to be in the form of packets. The packets are created
and sent by a traffic-generator application called “OnOffApplication”. The application
generates UDP packets with a size of 341 bytes and data rate of 21.4 kbps[5]
, for 200
seconds. The longer the application runs, the more packets can be analyzed with a
higher accuracy. Below is an example where one DC sends packets to a SM (D = L = Q
= 1).
Density Length Simulation
time Delivery rate Packet Loss
ratio Execution
time
1 1 30s 100 0 2s
1 1 60s 62 37 5s
1 1 100s 35 64 10s
1 1 200s 17 82 11s
1 1 400s 15 84 55s
1 1 800s 15 84 109s
30
Table 4. simulation results showing the difference between results based on simulation time.
Because of the vast number of simulations required to compose the repository to be
used in the MatLab program, one must weigh the packet loss accuracy to the execution
time. One can see that having a simulation time of 200s will have almost as accurate
results as having 4 times as long simulation time. The figure shows the execution times
of the smallest possible PLC network, when having a density closer to one hundred
smart meters the execution time will exponentially rise to approximately 3800 seconds
depending on computer power. This time is dependent on the computer used for the
simulations, which also dictates the maximum density/distance before a SIGKILL
occurs and terminates the simulations.
To use the application, IPv4 is installed on all smart meter nodes. PRIME supports
both IPv4 and IPv6, but NS-3 has a better support for IPv4 than IPv6, and the main
advantage of having a larger amount of addresses is not needed during the simulations.
With the packets now being sent from the DC to a smart meter, the FlowMonitor
module is implemented into the code. The FlowMonitor module‟s function is to
monitor and register packet-related events occurring at each node it is installed on.
Specifically, in this case, the FlowMonitor registers transmitted packets, received
packets, packet delay and packet loss ratio.
4.1.4 PLC Simulation Results and Discussion
The most accurate way to translate the results from the PLC simulations into a packet
loss repository would be to simulate each variation of D, L and Q. That option would,
however, not be feasible to complete within reasonable time. Depending on the D/L/Q
setup, each simulation may take between 10 seconds to an hour. The total number of
simulations, limiting it to a density of 100 per Q (beyond that is mostly 100 loss or
SIGKILL), is approximately 75,000.
Because of this, there is a need to apply a method to reduce the number of simulations
needed, a design of experiment (DoE). The deterministic nature of the simulation-
program makes alternatives which rely on variance dismissed.
Because of the small changes in distance related packet loss, usually one or two
percentage units, one can apply the following algorithm to fill the empty spaces in the
repository:
For each Q and for every fifth density variable starting from 1
Simulate the first (L = 1) and the last (L = 250)
If the results differ, simulate between the two results (L/2 = 125)
If the result differs from the result closest behind and the result closest ahead,
simulate between those two.
Repeat until there are no differences.
When there are nothing left to fill in according to the criteria above, linear interpolation
is done between the input results
( ) | |
( ) (f 2.3.4-1)
31
Then
( )
(f 2.3.4-2)
After every fifth row is filled, the other rows are assigned the row closest to itself.
This method decreases the number of simulations from approximately 75,000 to
roughly 250 depending on the results from the simulations. The method, while being
more time efficient, is still very time-consuming.
The packet loss retrieved from the simulations usually increases with length, but
slightly. The density usually dictates how long the distance can be set before reaching a
100 % packet loss. The number of clusters Q usually decreases this distance a bit
further, while also decreasing the density which can be simulated before a SIGKILL
termination. Hundred percent packet loss is first reached at [D = 75, L = 230m, Q = 1],
and remains a hundred percent from [D = 100, L = 1m, Q = 1]. These ranges vary
depending on Q, mostly because of the increased number of nodes. When unable to
simulate a topology because of SIGKILL, it will also be considered to be hundred
percent packet loss.
While both the packet delay and packet loss are available for tracking during the
simulations, the most interesting part is when we are unable to communicate. If there is
no larger interest in the throughput, unless the delay is too great to ignore, the focus lies
when the packet loss becomes 100 %. The packet loss, for this PLC channel
configuration, is consistently high and ranges between 82-88 % (unless 100%). The
packet loss is mostly dependent on the noise floor set; higher noise will increase the
packet loss because it will drown the PLC signals over the cable distance.
4.2 GPRS Simulations
Even though the majority of NS-3 users normally focus on wireless simulations, there
is no support for GPRS/GSM. Instead, in lack of simulation tools of our own, we will
use externally acquired simulation data.
For our case study, where availability is of more concern than the specific packet loss
rate of the specific topologies, we can take some liberties when selecting the data to use
for our packet loss repository. A simulation is done by [23] with variations of different
queuing schemes, arrival rates and ratios between GPRS and GSM calls. Not knowing
the GPRS service provided, it will be assumed that the smart meter communication will
share the same channels as both other GPRS traffic and prioritized GSM voice calls.
This dynamic schema is in contrast to a static schema, in which there is a static division
between GSM and GPRS. In the latter, the GPRS traffic (smart meter communication
included) would still be buffered and prioritized according to the level of QoS provided.
There are various scenarios depicted in [23], which are variations between different
queuing schemes and ratio between GSM (session) and GPRS (packet) traffic. We‟ll
32
use the data from their results using a FIFO queue and 20% packet traffic, which gives
an approximate of
( ( ))
( ( ))
(f4.2-1)
The data is then put into the GPRS packet loss repository, which will be used along
the PLC packet loss repository. When comparing the overall packet loss of GPRS with
that of PLC, one can see that it is much lower.
34
Chapter 5 Theory
5.1 K-means Clustering
K-means clustering is a popular prototype based method of partitioning a group of data
points into smaller groups. The definition of the problem is to partition n observations
into k mutually exclusive clusters, where k is set by the user. Each observation (data
point) is assigned to the cluster with the closest mean distance to the cluster‟s centroid.
The end result is a Voronoi cell diagram, where each observation is assigned to a cell.
fig 13. Voronoi diagram of 10 cells, each with a centroid.
The problem is defined as finding the global optimum of the objective function
∑∑
(f 5.1-1)
for a given set of observations x = X1, X2,…, Xn, where S = S1, S2,…, Sk are the
clusters. As µi is the mean distance point of the data points assigned to cluster i, one can
see that the objective function is minimized when all data points are as close to the
mean µi as possible. Note that the number of clusters k cannot be less than the number
of observations n or (k ≤ n).
Normally the mean distance is the Euclidian Distance, defined as:
√∑( )
(f 5.1-2)
The Euclidian Distance is also known as the line segment connecting point a and b,
or ab . An alternative distance type is the Manhattan distance, or city-block distance, is
the sum of distances along each dimension. One could see the city-block distance as
walking along a city road, while the Euclidian distance is as the crow flies.
Also non-Euclidian metrics such as Correlation can be used between the points (the
mean and the observation data point).
Error! Bookmark not defined.
( )
( ) ( )
(f 5.1-3)
35
Where the covariance is
( )
∑( ) ( )
(f 5.1-4)
And the standard deviation
( ) √
∑( )
(f 5.1-5)
where
∑
(f 5.1-6)
As the centroids positions are not defined, and needs to be tested before the
minimized solution of the objective function is found, it is clear that the problem is NP-
hard. With the dimension d and number of clusters k as constants, the computational
complexity of the problem can be defined as O(ndk+1
log n).
Because of the problem being NP-hard, there is a need for heuristic solutions
(algorithms) to solve the problem. Using a heuristic algorithm to solve the problem will
not make it optimal per definition, but close enough depending on utility. The most
commonly used algorithm to solve the k-means problem is called the Lloyd’s
algorithm, also conveniently known as just the „K-means algorithm‟ or the „Voronoi
iteration‟.
The algorithm starts by initializing k centroids, one for every cluster. This is done
uniformly random from the data points. Each data point will them be assigned to its
closest centroid, and then the centroids will be recomputed based on the assigned data
points. The last two steps will be repeated until it converges. Depending on where the
initial centroids are located, there might be slight differences in the results. Lloyd‟s K-
means algorithm has the complexity[12]
of O(nkdi), where i is the number of iterations.
As the number of iterations before convergence is often small, the algorithm is
considered to be linear.
There are two factors which potentially makes the algorithm a bad choice though. In a
worst case scenario, the algorithm can become very slow to converge. The second
reason that can lead to bad results is how the centroids are initially located (usually
because they are located too close to each other). By adjusting the “assignment” step, or
the expectation step, an improved[13]
k-means method can be used. The name of the
method is k-means++, and is used by programs such as MatLab.
1a. Choose one center c1 uniformly at random from the set of data points (x ϵ X).
1b. Choose another center ci, now from x ϵ X with a probability of ( )
∑ ( ( ) ) . This
step is called “D2 weighting”
1c. Repeat 1b until all k centroids has been located.
2. Now that all centroids have been initially located, proceed with the standard Lloyd‟s
algorithm from the “update” step (maximization step).
36
By changing the expectation step, the computational complexity becomes O(log k).
According to [12], the k-means++ consistently outperforms standard k-means
algorithm.
Before one begins to use K-means, one must determine how many clusters K is
needed, a task considered to be one of the algorithms biggest disadvantages. A good
choice of K will yield a result of clusters being away from each other, with their
assigned data points having a small mean distance and being few in numbers. Too few
clusters will result in the mean distance getting larger, too many will result in the
cluster centroids being too close.
One way to choose is to use the rule of thumb[2]
(f. 3.4.2-1), which is a simple way to
provide the algorithm with k, without external constraints. Another method is the Elbow
Method, which analyses the variance of “sum-of-squares” within the clusters when
increasing K. Here it is possible to see where an increase of K stops yielding good
enough results (see fig 14).
fig 14. An illustration of the „Elbow Method‟.
37
Chapter 6 Discussion
6.1 Results
To test the program, the following user data input were used along with four network
topologies provided by a DSO:
The number of smart meters per network is set to be approximately 150, which
will be rounded upwards depending on allocated power.
Packet loss rate threshold is set to 90% and later 20%
The smart meter nodes are set as switches.
Three different sets of CAPEX/OPEX, one for each scenario.
Different variations of GPRS CAPEX and OPEX for scenario 3.
The Gateway threshold is set to 100 and later 200.
Variable number of years of operation
As the four network topologies are defined, the difference in results will be seen when
adjusting the different parameters mentioned above. The costs of each scenario will be
shown for each of the four provided network topologies, as well as to which scenario
they will be assigned. Each set of parameter inputs will have their costs plotted against
time, which will show both the differences in costs between the different scenarios and
the development over time. The time interval used is 0 to 5 years, with markings every
quarter of a year.
The first set of parameters is chosen so that the number of smart meters for each
network is above the gateway threshold, this will disable the second scenario and only
the first and the third scenario will be plotted. The GPRS tariff will be chosen as having
no CAPEX and an OPEX of 0.002€ per data polling per smart meter.
The second run of the program will use the same parameters as the first, but now the
gateway threshold will be set to 200, enabling it for all the networks used. To see that
there‟s a difference, the „CAPEX per smart meter‟-cost of scenario 2 will be reduced by
50%.
The third run will return the gateway threshold to 100, but instead the GPRS service
provider now offers unlimited network usage for 50€ per month per network, with an
initial starting fee of 20€.
The fourth run will set the parameters identical to the first run, but with a packet loss
threshold set as 5%.
The fifth and final run will set the approximate number of smart meters assigned per
network to 100 instead of 150. The gateway threshold high enough to include all the
smart meters. Otherwise the same parameters as run 1.
38
Program
run #
Packet loss
threshold Gateway
threshold GPRS CAPEX and
OPEX (per SM per
day)
GPRS CAPEX
and OPEX (per
SS per year)
SM/network
1 90% 100 [0, 0.002] [0, 0] 150
2 90% 200 [0, 0.002] [0, 0] 150
3 90% 100 [0, 0] [20, 600] 150
4 5% 100 [0, 0.002] [0, 0] 150
5 90% 200 [0, 0.002] [0, 0] 100
Table 5. Parameters for each run of the program.
Fig 15. First run of the program. The red curve is scenario 1 and the blue is scenario 3.
Fig 16. Second run of the program. The blue curve is scenario 1, the red is scenario 2 and the
black is scenario 3.
-- (1) DC PLC
-- (3) GPRS
-- (1) DC PLC
-- (3) GPRS
-- (1) DC PLC
-- (3) GPRS
-- (1) DC PLC
-- (2) GW PLC
-- (3) GPRS
-- (1) DC PLC
-- (2) GW PLC
-- (3) GPRS
-- (1) DC PLC
-- (2) GW PLC
-- (3) GPRS
-- (3) GPRS
-- (3) GPRS
39
Fig 17. Third run of the program. The red curve is scenario 1 and the blue is scenario 3.
Fig 18. Fourth run of the program. The black curve is scenario 3.
-- (1) DC PLC
-- (3) GPRS
-- (1) DC PLC
-- (3) GPRS
-- (3) GPRS -- (1) DC PLC
-- (3) GPRS
-- (3) GPRS -- (3) GPRS
-- (3) GPRS -- (3) GPRS
40
Fig 19. Fifth run of the program. The blue curve is scenario 1, the red is scenario 2 and the black
is scenario 3.
Fig 20. Fifth run of the program, a zoomed in version of Fig 19 between year 0.8 and year 1.
From the first run, one can see that GPRS starts off as the cheaper alternative up
until approximately 1.5 years of operation. This seems to be true for all of the
networks except for 28CGD8, which has GPRS as its only alternative due to the packet
loss of its PLC clusters (some or all) being too high.
The second run shows that the gateway is acceptable, and cheaper than the other
options after approximately 1 year. As the packet loss threshold is the same for this
run as the previous one, the 28CGD8 network still has GPRS as only option.
The third run shows no noticeable difference in GPRS cost from the first run, as the
change is relatively too small.
The fourth run has a low packet loss threshold, which disables the PLC options but
allows GPRS for all the networks.
-- (1) DC PLC
-- (2) GW PLC
-- (3) GPRS
-- (1) DC PLC
-- (2) GW PLC
-- (3) GPRS
-- (1) DC PLC
-- (2) GW PLC
-- (3) GPRS
-- (1) DC PLC
-- (2) GW PLC
-- (3) GPRS
-- (1) DC PLC
-- (2) GW PLC
-- (3) GPRS
-- (1) DC PLC
-- (2) GW PLC
-- (3) GPRS
-- (1) DC PLC
-- (2) GW PLC
-- (3) GPRS
-- (1) DC PLC
-- (2) GW PLC
-- (3) GPRS
41
In the fifth run we set the number of smart meters to 100 instead of 150, this will
make the clusters smaller and the packet loss rate also as a result. Previously network
28CGD8 had scenario 1 and 2 disabled due to high packet loss, in this run both become
available. Because the difference between the scenario 2 and scenario 1 is relatively
small, figure 20 will show a zoomed version between year 0.8 and year 1.
The overall results from the MatLab program follows what is to be expected from
the simulation results. Different parameters such as various thresholds have been
tested for the four network topologies provided for the thesis.
6.2 Future Work
The results could be improved by having alternative simulation tools for the Power Line
Communication. By having access to only one NS-3 module, one cannot compare the
results and conclude that the results are accurate. One could also improve the work by
simulating GPRS, as this thesis had to rely on externally acquired packet loss rate data.
There are more clustering methods which can be explored, including not clustering at
all. Comparisons between different clustering methods could possibly approximate a
larger variety of distribution grids into more equivalent units. The more accurate one
can approximate a distribution grid, the more accurate one can compare it with pre-
measured results.
One of the main purposes of this thesis was to develop a tool for future use, to be able
to generate many random (possibly with the help of Monte Carlo algorithm) LV/MV
network topologies for analysis. Also, the method is not limited to LV/MV distribution
grids, with bits of changes in the code I would imagine this could be used for MV/HV
networks as well but for different parameters and constraints.
42
References
[1] E.W. Dijkstra,”A Note on Two Problems in Connexion With Graphs”,
Numerische Mathematik 1, 269-271 (1959).
[2] Kanti Mardia et al. (1979). “Multivariate Analysis”. Academic Press.
[3] Richard H. Frenkiel, “Cellular radiotelephone system structured for
flexible use of different cell sizes”, patent US4144411 A. (1976)
[4] F. Aalamifar, A. Schloegl, D. Harris, L. Lampe,“Modelling Power Line
Communication Using Network Simulator-3”, IEEE Global
Communications Conference (GLOBECOM), Atlanta, GA, USA,
December 2013.
[5] Don Shaver , “Low Frequency, Narrowband PLC Standards for Smart
Grid – The PLC Standards Gap!”,
http://cms.comsoc.org/SiteGen/Uploads/Public/Docs_Globecom_2009/6
_-_12-03-09_shaver_smart_grid_panel_final.pdf, Texas Instruments
Incorporated, December 2009.
[6] Manfred Zimmermann, “A Multi-Path Signal Propagation Model for the
Power Line Channel in the High Frequency Range”, Institute of
Industrial Information Systems University of Karlsruhe.
[7] A. Volgenant, “Linear and Semi-Assignment Problems: A Core Oriented
Approach”, University of Amsterdam, 1996.
[8] “Learn GPRS”, http://www.tutorialspoint.com/gprs/.
[9] “GPRS Family”, http://www.protocols.com/pbook/gprsfamily.htm.
[10] “Network Simulator 3”, http://www.nsnam.org.
[11] A.Fernandez Olivera, A.Sendin Escalona, Urrutia Galdos, J. Mateo
Arenas, Angueira Buceta, JJ. Ferro Vázquez, “Analysis of PRIME PLC
Smart Metering Networks Performance”, Iberdrola Engineering and
Construction S:A.U, Iberdrola Networks, University of the Basque
Country (UPV/EHU), 2013.
[12] “Clustering Algorithms: K-
means”,http://www.cs.princeton.edu/courses/archive/spr08/cos435/Class
_notes/clustering2_toPost.pdf, Princeton University.
43
[13] David Arthur & Sergei Vassilvitskii, “k-means++ : The Advantage of
Careful Seeding”.
[14] “The DISCERN Project”, http://www.discern.eu/project/vision-and-
mission.html.
[15] Juan Andrés Negreira, Javier Pereira, Santiago Pérez,“End-to-end
measurements over GPRS-EDGE networks”, Universidad de la
República Montevideo Uruguay.
[16] Sami Tabbane, “Quality of Service (QoS) definition and standards”, ITU
Academy November 2013.
[17] CENELEC EN 50065-1:2011: "Signaling on low-voltage electrical
installations in the frequency range 3 kHz to 148,5 kHz - Part 1: General
requirements, frequency bands and electromagnetic disturbances".
[18] Bogdan Baraboi,“Narrowband Powerline Communication Applications
and Challenges”, Ariane Controls inc.
[19] PRIME Project, “PRIME Technology Whitepaper; PHY, MAC and
Convergence layers”
[20] “G3-PLC Alliance”, http://www.g3-plc.com/
[21] “3GPP, A Global Initiaive”,
http://www.3gpp.org/technologies/keywords-acronyms/102-gprs-edge
[22] “Introducing the power of PLC, White Paper”, Landis+Gyr,
http://www.landisgyr.com/webfoo/wp-
content/uploads/2012/11/LG_White_Paper_PLC.pdf
[23] Maurizio D‟Arienzo, Antonio Pescapè, Rajiv Chakravorty, Giorgio
Ventre, “A Comparative Simulation Study for Multiple Traffic
Scheduling Algorithms over GPRS”,Computer Science department
Univeristy of Napoli, University of Cambridge Computer Laboratory.