contents report 3.doc · web viewthe more failures when training means we are more likely not to...
TRANSCRIPT
![Page 1: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/1.jpg)
[This page intentionally left blank]
An Investigation into Real-time Fraud Detection in the
Telecommunications Industry
Project TutorDr Abir Hussain
99039598Paul Kinlan
Software Engineering2003
![Page 2: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/2.jpg)
Contents Page
1. ABSTRACT 6
2. ACKNOWLEDGEMENTS 7
4. GLOSSARY OF TERMS 10
5. INTRODUCTION 11
6. RISK 12
7. RESEARCH 13
7.1 Chapter Summary 13
7.2 Investigation into the Telecommunications Industry 147.2.1 Mobile Phone Telephony: 147.2.2 Fixed Line Telephony: 157.2.3 Data Transfer: 167.2.4 Next Generation: 16
7.3. Investigation into Fraud 217.3.1 Who suffers from fraud? 217.3.2 Who commits fraud? 22
7.4 Investigation of Fraud in the Telecommunication Industry 237.4.1 What is Telecommunication Fraud? 237.4.2 What does this mean to the Telecomm companies? 247.4.3 How is Fraud Perpetrated? 25
7.4.3.1 Subscription Fraud 257.4.3.2 Call Surfing 257.4.3.3 Accounting Fraud 267.4.3.4 Ghosting 277.4.3.5 Information Abuse 27
7.4.4 How do Telecomm Companies Respond to Fraud? 297.4.5 Some Key Attributes which may Identify Fraud. 30
7.5 Methods to Detect Fraud 317.5.1 Why Call Pattern Analysis is not always enough 41
7.6 Consideration of Real Time Methods 42
8 IDENTIFICATION OF PROBLEM AND SPECIFICATION 43
8.1 Specification 44
8.2 System Tools Research and Requirements 45
![Page 3: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/3.jpg)
8.2.1 Further Requirements for the CDR Tool and Development Tool Research 468.2.2 Further Requirements for the Fraud Detection Prototype and Development Tool Research 49
9 DESIGN 52
9.1 Chapter Summary 52
9.2 Combined System 53
9.3 CDR Tool Design 549.3.1 Flow of Data When Creating a Model 629.3.2 Consideration of the UI 649.3.4 Data Representation and Considerations 64
9.3.4.1 Internal Data Representation 649.3.4.2 Customer Information 649.3.4.3 Entity Relationship 659.3.4.4 Data Access 669.3.4.5 Index Considerations 679.3.4.6 Aggregating the Data 679.3.4.7 Storing the Models 699.3.4.8 Testing the Model Generator. 69
9.5 The Neural Network 729.5.1 What is a neural network? 729.5.2 Types of Neural Networks 769.5.3 What Neural network to use? 789.5.4 Training a Neural Network. 799.5.5 Training Method for the Feed forward Network 839.5.6 Problems Which can be Encounter when Training 849.5.7 Inputs defined in the NN. 859.5.8 Proposed inputs. 869.5.9 Consideration of the Data Being Presented to the Network 899.5.10 Consideration of the Output of the Network. 90
9.6 Neural Network Creation Tools Design 929.6.1 Training Tool 929.6.2 Data Extraction 959.6.3 Performance Analysis and Testing 959.6.4 Establishing the Most Appropriate Threshold for the Final Network. 1019.6.4 Testing the Network Creation Tool. 102
9.7 Models Generation 1039.7.1 Methods to generate the best models. 1039.7.2 Brief discussion about the models used. 104
10. ANALYSIS OF RESULTS 106
10.1 Overview of how to study the graphs 107
10.2 5 Hidden Nodes 112
10.3 6 Hidden Nodes 113
10.4 7 Hidden Nodes 114
10.5 8 Hidden Nodes 115
10.6 9 Hidden Nodes 116
![Page 4: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/4.jpg)
10.7 10 Hidden Nodes 118
10.8 Final Node 11910.8.1 The weights from the input layer to the hidden node 12210.8.2 The weights to the Output Layer 12210.8.3 The Threshold 12210.8.4 Proposed Training Regime 122
11. EVALUATION 124
11.1 Does it work? 124
11.2 Is it Real-time? 124
11.3 Which Training Method was Most Appropriate 125
11.4 Other Points About the Neural Network 125
12. PROJECT MANAGEMENT 126
13. CONCLUSION 127
13.1 How I handled the project 127
13.2 What have I learnt? 129
14. FURTHER WORK 131
15. REFERENCES 134
16. APPENDICES 138
16.1 Bibliography 139
16.2 Program Listing 14016.2.1 MATLAB Overview 141
16.2.1.1 How to read the performance information off the CD 14116.2.1.2 Function Descriptions 142
16.2.1 MS Access Overview 143
16.3 CDR Generation Tool Screen Shots 14616.3.1 Screen 1 14616.3.2 Screen 2 14716.3.3 Screen 3 148
16.4 Testing Plan 14916.4.1 CDR Tool 14916.4.2 Neural Network Tools 152
16.5 Model Descriptions 155
16.6 Project Management 159
16.6.2 Time Plan (Interim) 160
![Page 5: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/5.jpg)
16.6.3 Time Plan (Final) 161
16.7 Interim Report & Specification 162
![Page 6: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/6.jpg)
1. Abstract
An investigation into fraud detection in the telecom industry with a focus on
development of a tool to help aid the detection process.
Neural networks were employed to find anomalous call patterns for customers over
two week periods which matched call patterns of previously known fraud.
Customer information was generated using a bespoke tool and a final neural network
was produced after rigorous testing which can successfully classify fraudulent and
non fraudulent activity of customers.
Keywords: Fraud Detection, Software Engineering, Customer Detail Record,
Database, Neural Network
![Page 7: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/7.jpg)
2. Acknowledgements
I have enjoyed working on this project and I would like to thank my parents and
family for the help and support that they have given me throughout this year.
I would also like to take this opportunity to thank Dr Abir Hussain for the help and
support that she has given me as a project tutor this year.
I hope this report shows the amount of work and effort that went into this project
during my final year studies.
7
![Page 8: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/8.jpg)
3. List of Figures
Figure 1 Process of a customer of a telecomm company.............................................18Figure 2 The Fraud Management Cycle.......................................................................29Figure 3 Roles where an FMS Tool maybe used.........................................................32Figure 4 Subscription Fraud.........................................................................................33Figure 5 Suring Fraud..................................................................................................33Figure 6 A) Non-linear problem separation B) Added Dimensions............................39Figure 7 Normal Linear Sequential Model (Waterfall)........................................................43Figure 8 Amended Linear Sequential Model (Waterfall).....................................................43Figure 9 Standard model for database communication................................................47Figure 10 An Ideal situation for CDR Tool and Fraud Detection Tool.......................49Figure 11 Processing the data through a neural network.............................................49Figure 12 Abstract overview of data flow in the system.............................................53Figure 13 A Gaussian distribution based on male heights in the UK..........................55Figure 14 The Gaussian distribution function..............................................................56Figure 15 Gaussian Distribution A..............................................................................57Figure 16 Gaussian Distribution B...............................................................................57Figure 17 Biased Weighting.........................................................................................58Figure 18 Customer Generate tool flow diagram.........................................................63Figure 19 Basic Entity Relationship for customer information...................................65Figure 20 Overview of tables, fields and relevant joins used in the final output query
..............................................................................................................................67Figure 21 Sliding Window Effect................................................................................68Figure 22 An animal neuron........................................................................................73Figure 23 An artificial neuron based on Binary Threshold Logic Unit.......................73Figure 24 Logistic Sigmoid function & Tan Sigmoid function...................................74Figure 25 An artificial neuron based on a continuous sigmoid output function..........74Figure 26 Combining logsig(5x-2) + logsig(x+2) – logsig(2½x -12 )...................................75Figure 27 The Feed forward Neural Network..............................................................76Figure 28 A Recurrent Network...................................................................................77Figure 29 Single Threshold system..............................................................................90Figure 30 Dual Threshold System................................................................................91Figure 31 Training Tool Data Flow.............................................................................94Figure 32 Data extraction tool data flow......................................................................95Figure 33 Y-Axis for ROC Chart (Sensitivity)............................................................97Figure 34 X-Axis for ROC Chart (1 - Specifity).........................................................97Figure 35 An incorrectly trained neural network ROC depiction................................98Figure 36 Actual output of an incorrectly trained network..........................................99Figure 37 ROC Chart for a working neural network...................................................99Figure 38 Data flow for establish the performance of the neural networks...............101Figure 39 Performance of the training algorithm.......................................................107Figure 40 Output of the neural network after the test stage.......................................108Figure 41 Output of the neural network after the validation stage............................109Figure 42 ROC Chart.................................................................................................110Figure 43 MATLAB depiction of a 2 Layer network with 5 nodes in the hidden layer
............................................................................................................................112Figure 44 MATLAB depiction of a 2 Layer network with 6 nodes in the hidden layer
............................................................................................................................113Figure 45 MATLAB depiction of a 2 Layer network with 7 nodes in the hidden layer
............................................................................................................................114
8
![Page 9: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/9.jpg)
Figure 46 MATLAB depiction of a 2 Layer network with 8 nodes in the hidden layer............................................................................................................................115
Figure 47 MATLAB depiction of a 2 Layer network with 9 nodes in the hidden layer............................................................................................................................116
Figure 48 MATLAB depiction of a 2 Layer network with 10 nodes in the hidden layer............................................................................................................................118
Figure 49 ROC Chart for the best performing network.............................................119Figure 50 Output from the training data....................................................................120Figure 51 Performance of the final network while training.......................................120Figure 52 Output from the validation data.................................................................121
9
![Page 10: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/10.jpg)
4. Glossary of Terms
4m's The four ms by FMS
Bad DebtUnpaid Credit. Up until a while ago fraud was written of as bad debt, however they are fundamentally different
BP Back propagation, used in the training of a feed forward neural networkCell A receiver or transmitter which a GSM phone communicates withFalse Negative
Incorrect classification of an event considered to be TRUE; the event is given as FALSE
False Positive
Incorrect classification of an event considered to be FALSE; the event is given as TRUE
FML A Fraud Management CompanyFMS Fraud Management System (A system used to detect and manage fraud)
GSMGroupe Speciale Mobile, also known as Global Systems for Mobile Communication
Internal Fraud Someone in the company is using inside knowledge to defraud the companyIP Internet ProtocolMeans The nature of the fraud used to satisfy the motiveMethod The detailed method used in 4m's classificationMLP Multi-layer PerceptronMode The generic fraud method usedMotive The objective of the fraudNN Neural Network
NRFNon-Revenue fraud. Intent to avoid the cost of a call, but no intention to make a profit from it
PABX Private Branch ExchangePRS Premium rate serviceTrue Negative
Correct classification of an event considered to be FALSE; the event is given as FALSE
True Positive
Correct classification of an event considered to be TRUE; the event is given as TRUE
UMTS Universal Mobile Telecommunications Service
10
![Page 11: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/11.jpg)
5. Introduction
The project aims to detect fraud in the telecommunication industry from the
perspective of the customer and the telephone calls that they make. Several different
method of detection can be used, but I intend to present one method that I feel is the
most suitable for reasons given later in this project. At the end a prototype system
will be presented to prove that the chosen method of fraud detection is feasible.
This project differs from the normal software engineering process, where the
stakeholders would be identified. Requirements gathered from the stakeholders, with
research into the system then taking place and the design processes following from
this.
Rather it is an investigation into the how fraud occurs in the telecommunication
industry and how it can be combated, with the added slant of a prototype system being
implemented to show that a particular method can be used successfully to detect
fraud.
Essentially I have identified a problem in the telecommunication industry, and after
researching the problem area, I will propose a system that could be developed and
produce a prototype of a system to show if it will work or not. It is not a case of
building one prototype however, due to the nature of the prototype many will have to
be created and empirically tested to find which prototype is the best performing .
This type of software engineering process, might be used for instance with a start up
company or new business venture. They have found a market niche and they think
they can exploit it by solving the problem. What is then required is a system of
research and prototype development, if the prototype is not successful then it maybe
that their current theory is not valid and a new direction of attack is needed.
The next chapter will include a brief introduction to the risk involved in this project
and a summary of the research that I have done to make this project possible.
11
![Page 12: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/12.jpg)
6. Risk
Often software engineers talk about inherent risk in each of the projects they
undertake. This project is no different, even though the slant of this project is slightly
different to what would be considered a "normal" software engineering project.
Pressman highlights eleven key components in overall risk for a project; however
only a few can be uniquely attributed to this project1:
6. Is the project scope stable?
8. Are the requirements stable?
10. Are there enough people on the team to complete the task?
As can be seen each of these key risks are associated with man-power and time taken
to complete the tasks. If the project requirements are not stable, then the likelihood of
a successfully completed project is minimised, since it is obvious that the
requirements gathering process will be failing, thus indicating that the customer will
not get the product they wanted. In relation to
Additionally, if the project has varying scope it means that the project will not satisfy
the requirements which it was originally intended to.
The two highlighted risks can also have an adverse effect on the number of people
needed to complete the task. The longer it takes to tack down a suitable product with
stable requirements and a well defined project scope, then the more people the
software development team are going to need to be able to successfully complete the
project.
12
![Page 13: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/13.jpg)
7. Research
7.1 Chapter Summary
In this chapter, various methods to detect telecommunication fraud will be
investigated. This meant that I had to understand the telecoms industry. From this, I
discovered that the telecoms industry is massive, with many different sectors;
therefore a tool to detect general fraud is impractical for a project. This led me to
focus on a subset of the industry. Further researching into this sub-sector, I found
again that there are many different fraud types and methods to detect such fraud.
Therefore I decided to further refine the category of fraud I was looking for.
Once I had decided on the type of fraud I would detect, it was important to understand
the methods used to detect the fraud. It became known that the most suitable solution
for me is to create a Neural Network based solution for reasons established in the
following section.
13
![Page 14: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/14.jpg)
7.2 Investigation into the Telecommunications Industry
The telecommunication sector is a huge arena. Each area of the sector covers a vast
domain of communication. Identified below are several areas in which the
telecommunication companies operate. These are mobile phone telephony, traditional
land based communication, data transfer and the Next Generation mobile services.
7.2.1 Mobile Phone Telephony:
The phone system that is in use throughout Europe and the
majority of the world is a standard called GSM (Global Systems
for Mobile Communications). Each mobile phone registers itself
to a "cell" (hence cellular phone) with which it can communicate
by broadcasting over the airwaves to it's cellular base station,
which will then essentially form a traditional circuit switched
network with the destination2.
Traditionally cellular services offered have been more expensive
than fixed line services, but are of similar nature and hence when
setting up customer accounts services similar processes are
adhered to; and accounts have to be paid for in a similar way. i.e.
via a contract in which payment is required at the end of each
billing period. Normally the contract would include a free phone
as part of the deal.
However, more recently prepaid credit schemes are being used
where the customers pay "up front" for the services they require
and this includes having to buy the mobile phone. Prepaid credit
was introduced into Europe in early 19963 as a method for the
telephone operators to reduce the risk of having "bad credit" users
on their system (people how have failed credit checks due to
issues such as late bill payment). The system follows the same
principle as the prepaid card schemes that have been used on
public telephone systems for many years. The user buys a certain
14
![Page 15: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/15.jpg)
amount of "talk time" minutes from a retailer and inputs this into
their mobile phone. The telecomm company is then aware of the
credit available to that customer. Once the customer has used up
all there credit, the phone will become unable to make out going
calls (expect for emergencies and credit top up). This has been
extremely popular with the teenage market, where contracts for
mobile are not possible.
7.2.2 Fixed Line Telephony:
The traditional bread bearer of the telecommunication industry,
with nearly every house (95% for 1999-2000) in the UK4 having
one or more telephone lines. Over these, normal voice traffic
occurs, but in the last 10 years substantial increases in Internet
Traffic, as many households get wired on to the Internet and
drastic increases in daily use of the Internet (October 2002
reported that 45% of households have access to the Internet)5,
have forced the telecommunication operators to reconsider the
pricing structures they offer for their customers.
The services of a fixed line system are normally contract based,
with the bill being settled by the customer at the end of each
billing period.
Traditional operation of fixed line telephony is based on circuit
switched networks. Which when a call is started, the local switch
at the telecommunication substation makes a circuit (possibly via
other switching stations) with the remote switch, which in turn
rings the dialled telephone number. This circuit is then
maintained for the duration of the call, and all information
follows one fixed path to the destination6.
15
![Page 16: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/16.jpg)
7.2.3 Data Transfer:
Initially data transfer services consisted solely of a carrier such as
BT, providing a permanent connection to the Internet or between a
companies' network. Essentially a dedicated communication line is
being placed between both of the ends. Heavy contracts between
the provider and the customer are drawn up and depending on the
contract, in which payment terms can consider the quantity of data
transferred as well as the speed of the line and what it is being used
for7. It must be noted that BT normally provide the communication
infrastructure, with other companies acting as partners reselling the
service. This was initially put in place to stop BT becoming a
monopoly8. These services where expensive, and designed mainly
for the corporate sector. Because of leased line pricing structures
and the work which needs to be carried out to connect customers to
BT networks, leased lines were never meant to be available to the
general public.
Other data transfer technologies exist and are coming to the
forefront; ADSL (Asymmetric Digital Subscriber Line) and DSL
(Digital Subscriber Line) are designed to operate over normal
twister pair copper cable and thus are potentially available to every
home in the UK. With the recent introduction of broadband
Internet access services such as ADSL and DSL, providers have
had to put in place extra facilities to handle the increased traffic, as
they are responsible for the routing of the data on to the Internet.
7.2.4 Next Generation:
This is where the distinction of services differs from tradition
mobile and fixed line services. Next Generation services more
commonly known 3G are systems offering services such as video
conferencing, video on demand, broadband Internet access across
16
![Page 17: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/17.jpg)
the air waves and are just some of the facilities that
telecommunication companies are gearing up to accommodate.
The technology that 3G communication operates on is similar in
nature to the method of current GSM, in the sense that each
handset communicates with the base station in its cell; however, it
uses a new communication protocol called UMTS (Universal
Mobile Telecommunication Service). UMTS communicates on
different frequencies and in a slightly different method to GSM,
which allows vastly supplier data transfer rates9 with the added
advantage of allowing the mobile telecom companies a smooth
transition between technologies.
Unfortunately for the telecom companies they invested a lot
(£billions) of money in to acquiring the licenses for the use of the
frequencies required by UMTS, so take-up by consumers may be
slow as the telecomm companies may want to recoup some of their
cost by heavily charging early adopters for use of the services.10
Each of the above areas, have very similar sub-sections that when combined provide
the final service to the customer (with the exception of data transfer services).
Subscription: This is the initial contact that the telecommunication
operator has with the customer. They will establish and verify the
details of the customer. Once completed, the company will move on
to the next stage of the process. This process will only happen once
per client.
Activation: Once the customer’s credentials have been verified and
the subscription process has been completed, the customer will be
set up on the network. This process may involve an engineer being
used to create a connection at the user’s premises, or in the case of a
mobile phone, the SIM card being activated. Like the previous stage
(subscription), this should only occur once for the customer.
17
![Page 18: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/18.jpg)
Customer Use: The customer has been set up on the company’s
network, and will be allowed to use the service with in the limits of
the agreed parameters, such as credit limits and usage agreements.
This will be established at the start of the contract, but will run
throughout the lifetime of the agreement and depend on the any
renegotiations of the contract.
Billing & Payment: Coinciding with the “Customer Use” is the
Billing of the service provided along with the payment, this could be
seen as two separate sections, as they require both parties
cooperation. The company will invoice the customer for the use of
the network at set intervals (monthly, quarterly etc) outlined in their
agreement. The customer is then expected (required) to pay for the
services that they used in a timely manner set out in their contract.
Termination of service: Once the contract has either been revoked
by the operator or ceased at the request of the customer. The
telecommunication company must issue a final invoice and then
terminate the user's privileges for the system.
The previous processes (figure 1) will only occur once per account item, such that if
the user request a new line or additional services, then the above steps will be
repeated and will generally pursue the same structure.
18
Figure 1 Process of a customer of a telecomm company
Subscription
Activation
Customer Use
Billing and Payment
Termination
![Page 19: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/19.jpg)
Two important very important areas are in the previous process (figure 1); customer
use, and billing. Whenever the customer uses their phone, information about the call
parameters is logged; using this information the customers bills are calculated.
Information is normally logged in what is called a CDR (Customer Detail Record or
Call Detail Record, both of which can be used interchangeably) is as follows:
Customer Number or ID
Destination Number
o Call Type (PRS, International, Local etc)
Call Start time
o Call time type (off-peak, on-peak)
Call End time
o Duration and final cost of the call
The secondary bullet points are by-products of the parent point and are also
sometimes a culmination of other points. For instance the final cost of the call, is a
combination of call type, time of call and duration of the call. These by-products
maybe generated at the time of the call so to speed up generation phone bill when it
comes to the end of the customer billing period, or it might be generated when the bill
is being worked out. The later requiring less storage space in the companies calls
logs.
Other telecomm companies are another major source of revenue for a telecom
company, they use a process called "Interconnection Charging". The telecom
company will charge each of the operators for every call originating on the
competitors network that is being routed to their network. For instance, BT will
charge NTL a set fee for each call originating on NTL with a destination on BT. This
practice is very common between mobile phone operators, as well as fixed lines
operators11.
Now that the levels of service that the companies offer has been established, it is
important to decide which areas that this project will concentrate on; this is because of
the differences in the core services. An example, ADSL broadband accounts will not
19
![Page 20: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/20.jpg)
operate in the same way as mobile telephone accounts operate, thus the business
processes and the implementation will be very different.
This project will focus on detecting fraud that can occur with circuit switched based
communication methods, particularly call based systems not derived using IP
solutions. In the next sections, topics will be covered with the emphasis on Fraud
occurring in the following sectors:
Mobile;
Fixed Line,
Bearing the above market sectors in mind it must be noted that when detecting fraud
for both sectors, only attributes that are present in both sectors can be used as
indicators of fraud.
Common attributes of both Mobile and Fixed Line telephony in particular are the
types of calls that take place. A mobile user will make calls to other mobiles, fixed
lines (local and national), international numbers, free rate numbers, Premium Rate
Service number (PRS). The same can be applied to users of fixed line services.
However when considering items that are dissimilar in both the technologies, issues
such as when a mobile user makes a call, the current cell that it is in is also recorded.
Obviously this is of no use when analysing call data for fixed lines.
20
![Page 21: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/21.jpg)
7.3. Investigation into Fraud
Fraud on its own can be defined as "an intentional deception resulting in injury to
another person" or "a person who makes deceitful pretences". Some useful synonyms
can also be used do describe fraud [con, swindle, racket, hoax, scam, deceit,
deception] and what a fraudster is [impostor, pretender, fake, faker, role player].
Fraud in general is a very broad subject, but can normally be boiled down to one easy
description; "The need to make money". Fraud can be committed in many ways, for
many reasons other than just "The need to make money" making many different
people from all lifestyles, susceptible to fraud. Other reasons include crackers
wanting kudos from their peers (breaking in to a system and taking information or
money); people wanting to save money rather than make money, the list continues.
7.3.1 Who suffers from fraud?
In the end we all do, for instance: Fraud in the insurance industry due to false claims,
can increase every customers premiums; Fraud in the financial industry, can mean
higher rates of interest on things such as mortgages, loans and credit cards while also
reducing the interest rates for savers; Fraud in the telecommunication industry can
result in higher call bills. All because the companies that are being defrauded, still
need to make money, so any loses due to fraud are normally passed on to the
consumer.
Fraud against the individual is also another topic that needs a brief discussion. Fraud
against the individual can take many different guises: A street seller may "persuade"
a person into donating money to a dying child; A phone scammer may persuade
people to part with their credit card details for a fictional product; or an email may
dupe people into depositing money into a Nigerian bank account with a promise of
returns far greater than those given.
The physiological effect of fraud as well is unmeasured but considerable. It is easy to
see that if an individual (rather than a corporation) has been defrauded, a once
normally confident person can easily be transformed into a person who no longer
21
![Page 22: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/22.jpg)
trusts his or her own judgement. Other than the financial difficulties induced, the fear
of criticism from peers is also high, since perhaps the subject had to request to borrow
money from a family member or business colleague. The increased fear of the parties
finding out may result in anxiety, guilt, and fear of being held responsible. Possibly
culminating in depression.12
For both types of fraud (Fraud against a corporate and fraud against and individual)
the number of different styles of fraud is uncountable. When the companies or law
enforcement agencies think they have the hatch battened down on fraud, another
scheme for the fraudsters presents itself and the cycle continues.
7.3.2 Who commits fraud?
Now that we have established a reason why fraud is committed, we must also ask
what type of person commits fraud.
The type of people that commit fraud can be broken down into at least two categories:
The Opportunists; The opportunist commits fraud as a one off. Word of mouth may
spread that a particular company is susceptible to fraud using a certain process . For
instance, obtaining a loan by using false details. Or faking an injury to obtain more
financial aid from an insurance company. The frauds in this case are normally
committed by normal people who essentially want to gain a quick buck.
The Crime syndicate; The crime syndicate will normally commit fraud, to provide
money for other crimes such as drug trafficking. They will hit a service for all the
money that they can get. The people who operate theses systems, unlike the
opportunist are very professional and will always be looking for new methods to
defraud people and companies, since it is in their interest to keep providing extra
money to the syndicate.
Fraud is unstoppable, even when an avenue to fraud has been closed, another will
present itself; and as the fraud detection systems get more complex, the methods used
to defraud people will also become more complex.
22
![Page 23: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/23.jpg)
7.4 Investigation of Fraud in the Telecommunication Industry
When establishing what "Fraud" is in the Telecommunication Industry it is important
to understand several questions.
1. What is Telecommunication Fraud?
2. What does Fraud mean to the Telecomm companies?
3. How is the Fraud perpetrated?
4. How do Telecomm companies respond to fraud?
5. Some key attributes which may identify fraud.
7.4.1 What is Telecommunication Fraud?
The Telecommunications (Fraud) Act 1997 highlights effectively what Fraud is in
the Telecommunication industry. In broad terms the act states, "To use or obtain a
service dishonestly" and including "To use or to allow the supply of a dishonest
service" is considered to be fraud. 13
Fraud in the telecommunication industry can be broken up into two major sections.
The first being revenue based fraud, and the second being non-revenue based fraud.
Revenue fraud consists of any type of fraud with the purpose to make the individual
who is perpetrating the fraud, money. This can be achieved in such ways as:
Selling high cost International calls to people by severely
undercutting the cost that the telephone company charges;
Calling high rate PRS lines, with no intention to pay for the
cost of the call.
Non-revenue fraud is normally fraudulent use of the telecommunication network for
reasons other than making money. Motivations for non-revenue fraud include:
23
![Page 24: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/24.jpg)
Removing any chance of criminals being surveyed or having
phones tapped, by criminal investigation agencies by making
illicit use of the network;
To provide free or heavily reduced call costs to friends and
family;
To show to their peers (other crackers) that they do have the skill
to breach the telecomm companies' security.
7.4.2 What does this mean to the Telecomm companies?
It has been reported that worldwide that fraud accounts for approximately 3%14 of the
Telecomm companies' annual revenue. In 1999 the UK alone suffered losses of at
least £720 Millions split over the following categories.15
Calling
Card
Cellular International Other Total
$150
Millions
$100
Millions
$500
Millions
$250
Millions
$1100
Millions
Table 1 Losses Due to Fraud in the UK (in dollars)
However, this only accounts for fraud that has been detected, since fraud can often go
undetected and unreported. Fraud may go unreported or at least unpublished due to
the nature of business contracts and customer confidence if the perceived levels of
fraud are high in relation to the revenue generated. The knock on effects of fraud and
lost income include higher bills as the losses are passed on to the customer, and higher
churn rates for the company when more people are unsatisfied with the service and
the perceived security that the company offer. Add this all together and it can
negatively effect share holders confidence as annual revenue is decreased and
expansion is drawn back.
7.4.3 How is Fraud Perpetrated?
24
![Page 25: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/25.jpg)
Telecomm Fraud can be broken into several generic classes. These classes describe
the mode in which the operator was defrauded and include subscription fraud, call
surfing, ghosting, accounting fraud and information abuse 16. Each mode can be used
to defraud the network for revenue based purposes or non-revenue based purposes.
7.4.3.1 Subscription Fraud
Subscription fraud occurs when an unsuspecting party have their
identity stolen or a customer tries to evade payment. Essentially,
personal details provided to the company are erroneous and designed
to deceive the company into setting up an account. Reasons for this
may include a customer knowing that they are a credit liability due
to CCJ's or other credit problems; or a fraudster needs to obtain
"legitimate" access to the telecomm network to perpetrate further
modes of fraud.
Subscription fraud causes serious financial loses to the
telecommunication operators, but in many instances may not be
attributed to fraud. If someone does not pay their bill, then the
telecomm company has to establish if the person was fraudulent or is
merely unable to pay. This may result in a lot of subscription fraud
being classified as bad debt. The BT Group in 2001-2002 estimated
that bad debt cost the company ~£79 million. 16
7.4.3.2 Call Surfing
Call Surfing is when an outside party will gain unauthorised access
to the operators network through several methods such as call
forwarding, cloning, shoulder surfing.
Call Surfing can include gaining access to a company's PABX
(Private Branch Exchange) either via social engineering, or by lack
of security. Social Engineering could be considered as: A person
rings the company's telephone administrators claiming to be a BT
25
![Page 26: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/26.jpg)
engineer performing a line test, they ask for the password so that
they can negotiate access to the call-back of the PABX; or a
employee in a large company receives a call from a person claiming
to have got the wrong extension, and requests if they could put them
through to extension 901, with 9 being the external dialling code of
the PBX and 01 being the international prefix.18
These may be unrealistic scenarios, but it is all too easy for someone
to gain access to a system this way. Once the cracker has access to
the PABX, they can use it to forward calls internationally or to
premium rate service lines. All they pay for is the cost of the call to
the company, while the company picks up the cost call to the proper
destination. The cracker may even escape paying for the original
call if they covered their tracks, for instance via subscription fraud.
Cloning of mobile phones is another issue that will arise, especially
since the early mobile phones operated on analogue with the signal
emanating from the phone being easy to detect and read, and thus the
technology used to identify each phone uniquely was susceptible to
being read. With this information, the fraudster would be able to
reprogram one of their own phones to match these unique details.
Once done, the con artist would be able to use the phone to make all
the calls that they needed without the original phone owner knowing
(until they get the telephone bill that is).19
7.4.3.3 Accounting Fraud
Accounting Fraud can occur through manipulation of accounting
systems and maybe used to help someone avoid having to pay for
the service. Normally this is an internal problem. Accounting Fraud
would normally occur, when someone would want to try and get
cash back at the end of their billing period, or have their bill reduced. 20
26
![Page 27: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/27.jpg)
7.4.3.4 Ghosting
Ghosting requires knowledge of the internal systems, maybe an
employee would set up an account that would not need to be billed or
remove billing details from the system. On the other hand, schemes
may involve creating a piece of tone generating hardware that will
fool the switch centre into thinking that a call might be a free call, or
is operating from a public telephone. Essentially, they are "Ghosts"
on the system as there is little or no trace of them ever being present
on the network. 21
7.4.3.5 Information Abuse
Information Abuse occurs when an employee can use the
telecommunications companies software to access privileged
information about clients or systems. This information maybe passed
on to third parties and used in further fraud. However, it is not solely
limited to this, for instance company A might place spies into
company B to find out information about any alliances that company
B might have. Again, this is an internal fraud. 22
27
![Page 28: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/28.jpg)
FML (A Fraud management company) developed a system called the 4m's to help
fraud analysts decide if a particular case they are studying is more than likely
fraudulent. It can be used to understand where each of the previously (section s
7.4.3.1 – 7.4.3.5) mentioned methods to perpetrate fraud and the reasons for doing so
fit in with each case of fraud. 23
The 4m's can be defined as Motive, Mean, Mode and Method:
Motive: This is the reason why they will commit the fraud. This
could range from generating money, saving money, kudos or just
malicious intent.
Mean: Used to satisfy the motive. If it is revenue based fraud, how
are they getting their money: by selling International calls at a
reduced rate; calling PRS services; using access codes supplied by
an informant.
Mode: This is the generic method used to commit the fraud. Such
as subscription fraud or call surfing.
Method: This is the way in which the fraud was committed. For
instance, how the call surfing was achieved.
An example of where this system of classification could be used: A person orders a
new telephone line with incorrect identification details, once the telephone line has
been installed; the person offers International and PRS calls at heavily reduced rates.
Then after the billing period the person vanishes and never uses the phone again.
Fitting the above example into the 4m's classification we can see that the persons
Motive was to make money. Their Means was via Call Selling. The Mode was using
vulnerability in the telecom companies subscription process (Subscription Fraud) and
the Method was using False details with no intent to pay for the services used.
A second example of where this classification could be used is: An employee who
works in the calling card printing division sells valid pin numbers for pre-paid calling
cards to third parties.
28
![Page 29: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/29.jpg)
Applying the 4m's classification, we can see that their Motive would be to make
money. Their Mean is Facilitation to supply fraudulent access to the network. The
Mode is via Information Abuse and the Method was disclosure of pre-paid card
number.
7.4.4 How do Telecomm Companies Respond to Fraud?
Telecommunication companies will respond to cases of frauds in a manner that is
similar to those used in the financial industry.
The telecommunication operator should have a company wide fraud management
scheme, which can be broken down into four discrete steps (figure 2)24.
1. Prevention
2. Detection
3. Analysis & Investigation
4. Resolution & Reaction
Prevention is the most important, if the
fraud is stopped before it happens, the
less money a company will lose.
However, if it cannot be prevented the
next best thing that the can happen is to
detect it either when it happens or in
the early stages of it occurring. This
will mean that losses will be reduced
from what they would have been if the
fraud had gone undetected.
Once a case has been detected, analysis must take place to ensure that a customer
account is being abused, since if service is withdrawn for insufficient reasons,
customers maybe entitled to pursue legal action against the company.
29
Figure 2 The Fraud Management Cycle
1.Prevention
2.Detection3.Analysis & Investigation
4.Resolution & Reaction
![Page 30: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/30.jpg)
Once sufficient motive has been established, it is then up to the company how they
choose to react. For instance disabling the account and placing measures to prevent
(stage 1) the type of fraud from reappearing, is the ideal solution.
Unfortunately, the measures taken are normally reactionary, since the fraud has
already occurred. The company will receive an indication that a customer account is
potentially fraudulent. It is up to the company to investigate the claim. Only then
once enough evidence has been established that fraud was taking place with the
customer can the telecommunication company can take appropriate action to remove
the fraud from the network
7.4.5 Some Key Attributes which may Identify Fraud.
A telecommunication company will look for several key attributes when trying to
ascertain if a fraudster is trying to use their network25:
1. The customer is new to the network, and has requested many
features of the phone system straight away.
2. The customer has high average call duration and high average
calls cost, can indicate PRS or International fraud.
3. A customer has a unnaturally low spread of call types (i.e. they
are mostly PRS calls or International calls).
4. The average duration of the time between calls is very small and
differs very little, can indicate auto diallers.
It must be noted again that any of these attributes may not correctly indicate fraud (it
could be a legitimate user), hence therefore a human investigator (part of the fraud
team) would have to establish if the fraud alert from a fraud management system
(FMS) is a valid alert.
30
![Page 31: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/31.jpg)
7.5 Methods to Detect Fraud
Clearly, telecomm companies will not tell us or the public the methods fraudsters use
to defraud their systems. However, it is possible to find some of the methods that the
fraudsters use, using a variety of sources such as:
The Internet is a good source to find information from fraud
groups. Unfortunately many of these groups are not about to tell
strangers how they can defraud the networks; if one of those
strangers happens to be the telecomm company then the methods
used by the group will become outdated.
Fraud Forums are organisations, which are set up to
accommodate the combined interest of all the companies in a
particular market. An example of this is TUFF (The
Telecommunications United Kingdom Fraud Forum). They
operate by charging subscription fees (normally so high that only
telecomm operates can join; so to allay any hope of a member of
the public joining to find out the fraud detection methods are
used), and then between their members they will tell each other
about experiences with fraud and how to effectively deal with it.
There are several known and established methods of fraud detection in the
telecommunication industry. What follows is a discussion in to the methods that I
found the industry are currently using.
Telecommunication companies, like financial institutions, employ people to detect
fraud occurring within their business domain. The role of the fraud analyst is to find
fraudulent use of the services that the company offers. With this in mind we must
take note that each investigation costs the company money (for instance, one fraud
analyst may be able to investigate ten customers per day). Therefore, if a high
number of customers who are considered fraudulent turn out to be non-fraudulent, the
company loses money and resources that could have been used to investigate real
fraudulent cases has been wasted. It is in the interest of the company to find as many
31
![Page 32: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/32.jpg)
fraudulent users of the service, while limiting the time spent dealing with false
positives.
The fraud analyst may apply the 4m's principle to ascertain what fraud is taking place,
how it is taking place, and why it is taking place on their network. Once the case is
understood, the fraud analyst will be then able to recommend changes to the
companies operating procedures, to help stop this type of fraud from happening again.
Fraud Management
Systems (FMS's) are the
tools used by the fraud
analysts, and their role is
pivotal in ensuring that
the company detects and
highlights as many
fraudulent accounts as
possible, by limiting the
number of customers the
fraud analysts have to
deal with.
This is especially important in the telecommunication sector due to the shear wealth
of data that is generated every time a phone call takes place, it would be impossible
for a fraud analyst to monitor every customer account on the system, meaning the task
of detecting fraud is almost impossible.
The FMS must provide a substiantially low False Positive Rate (FP) combined with a
low False Negitive (FN) Rate. These factors can be understood to mean, a low
proportion of cases which are considered to be fraudulent turn out to be clear,
likewise a low FN implies a low number of people who are actually fraudulent get
32
Figure 3 Roles where an FMS Tool maybe used
Subscription Department
Fraud Investigation Department
Billing Departmen
t
FMS TOOL
![Page 33: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/33.jpg)
past the FMS checks. Obviously you want the system to catch all fraudsters, while
minimising the number of people it might wrongly accuse.
It must be noted that a successful FMS is not to be solely used by the fraud analyst; it
must also be used elsewhere in the business
process (figure 3) and be able to fit into the
whole fraud management scheme. Suggestions
to which department has control of the FMS
include finance departments, security
departments and customer care. It is an obvious
implication that all groups should have a role in
the use of the FMS, especially if there is a
company wide policy dictating response to
fraud.
At this juncture it is important to specify the type
of fraud that the project will focus on detecting. Due to the shear number of different
types of fraud available to study, it is important to concentrate specifically a single
type of fraud for this project.
Fraud that occurs from the customer perspective, such that a developed system will
detect when a customer is making fraudulent use of the operators network with
means and method like Call Selling,
PRS abuse and other Non-Revenue
Fraud. These are normally related to the
modes surfing (figure 5) and
subscription fraud (figure 4), since either
way uses methods to evade payment
structure of the network operators.
33
Figure 4 Subscription Fraud
PBX Dial Thro'
Cloning
Calling Card Scam
Social Engineering
Surfing
No intention to pay
False Detail
Subscription
![Page 34: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/34.jpg)
Reports suggest that at least 50% of operators' losses due to fraud are caused by "Call
Selling", "PRS Abuse", "Internal Abuse" and "Non-Revenue Fraud" (All three will be
collectively referenced as Call Selling from now on)26. It is important to note that
even though it may be a customer who is caught defrauding the network, it may in
fact, be an internal problem, with employees supplying external "agencies" with
commercially confidential material.
34
Figure 5 Suring Fraud
![Page 35: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/35.jpg)
It is import to find where call selling fits into the 4m's classification and also where it
fits into the four stage fraud management scheme. Call selling is normally detected as
a by-product of monitoring customer use of the network; and since the fraudulent
customer is already on the network, we can say straight away that our fraud
management stage 1 (prevention, figure 2) has failed. Therefore, anyone who has
been caught fraudulently using the network can be said to have bypassed the
subscription fraud detection process, since they would have either applied to use the
network with false details, or with correct details but no intention to pay for the
services used.
Indicators to an active subscription fraud can be identified by checking that the
customer is who they say they are. Checks are normally carried out to identify the
background information that the customer supplies are valid; these can consist of
voting registrar checks, credit application checks and previous address checks.
Systems also exist that can cross-reference a customers applications with customer
applications of other companies to find consistencies and inconsistencies between the
supplied details.
Therefore, we can go through the four stages of the fraud management lifecycle, and
amend the subscription process. Unfortunately detecting that someone intends to
defraud the network by checking subscription details is never 100% successful (as
they might have used legitimate details but had no intention to pay for the services),
so the next process is to detect when fraudulent use of the network occurs. Once
fraudulent methods have been identified, the company can amend the system to help
detect the use earlier. Since this is always going to a be reactionary process, the
earlier you find the fraud, the earlier you can put a stop to it and the more money will
be saved.
This is where establishing when call selling is taking place requires a FMS (Fraud
Management System), due to the volume of call data generated whenever customer
use their phone system. There are several accepted ways to detect fraudulent use of a
telecommunications network, these include27:
35
![Page 36: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/36.jpg)
Matching a user call usage pattern to a know pattern that fraudsters
use.
Establishing that there is sufficient change in a customer's usage
pattern to warrant investigation.
Ascertaining if a customer's usage profile has exceeded set limits
defined by the fraud analyst.
Firstly if the telecommunication company is well established, then they are more than
likely going to know the call patterns associated with fraudulent use. Therefore, one
can assume that if a call pattern is the same as an established fraudulent pattern then
the customer account the call pattern belongs to warrants further investigation.
Unfortunately, things are never actually this easy. Fraudsters understand that to be
able to defraud the telecommunications companies in the future they must evolve their
cunning methods, as they also know that telecommunications companies are not
stupid and will spot when particular frauds are occurring. Likewise, the
telecommunication companies know that to keep the fraudsters at bay they must
constantly evolve their methods of detection and prevention. It seems like an
appropriate analogy would be that of a two horse race, with the fraudster always one
step ahead, so when the phone operators catch up, the fraudster will step up an extra
gear and move ahead again.
Some of the tools that the fraud analysts can use when detecting fraud can be
summarised as follows:
Rules based systems: Based on knowledge obtained from experts in
telecommunication fraud, the fraud analysts will create a set of rules
that will try to match certain aspects of a customers profile with a set
threshold.
Bayesian Knowledge Networks: A graph of related events is created
and between each is an arc based on the dependencies of one event
on the another. We could then build up a solution from evidence
presented to the network based on conditional probabilities.
Unfortunately, this needs a professional in both telecommunication
36
![Page 37: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/37.jpg)
fraud and Bayesian Belief networks. Without going too in-depth this
solution has been proven to be less reliable28, than other methods.
Neural Networks: Based on past data, a Neural Network should be
able to classify and ascertain if an input pattern matches or has
enough similarities to that of a pattern which the network has already
learnt29.
Rules based systems 30 31 32: Rules Based system require knowledge of the exact
parameters of fraud. In addition, since there are seemingly unlimited methods to
defraud via call selling, it would imply that the rule set required to capture the
fraudsters would also need to be sufficiently large. This is not feasible considering
each check may take a finite period of time, and the larger the rule set the longer the
checks will take and for possibly little gain in fraud detection rates.
Imagine that there was a system developed to check customer accounts against 500
rules, now imagine that a group of fraudsters establish a new method to defraud the
phone company. After a couple of weeks/months the company becomes aware of
these methods and adds new rule to handle to this new fraud, however they have a
500 rule limit and need to drop of some other rules. How do they decide which rules
to remove without making themselves susceptible to the older methods of fraud? Do
they assume that no one will use the older tricks? That would be stupid, since they
would be neglecting the opportunist fraudster who might only know the older
methods.
Additionally, rules systems are not dynamic in their nature, they normally consist of
checking the parameters of customer accounts against threshold values set up by the
Fraud Analyst. Therefore, the rules may miss the fraudsters who have not managed to
get themselves to the levels where their call patterns are deemed fraudulent.
Rules based systems are also open to internal abuse, since a person looking at the rule
set could easily discern its internal workings. For instance if someone knew that if
they kept the average cost of each call below £5.00 then all the fraudster has to do is
make sure that their average call cost around £4.50. This is an overly simplistic
example, but effectively highlights some of the problems with rules based systems.
37
![Page 38: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/38.jpg)
Bayesian Knowledge network systems 33 34 35: The parameters of fraud are know to the
telecommunication company based on certain features ascertained from the customer
base. The fraud analyst would then set up relationships between each piece of
knowledge and associate a probability that given a piece knowledge, how much that
particular piece of knowledge influences the event B, the event being in this case is
the probability of the customer being fraud. For example given that the average call
duration is x and most calls occur in the evening, is the customer fraud?
Systems have been researched that use two belief networks. The first network is
modelled by the fraud analyst with the relationship between knowledge being
established based on previous fraud that has been detected. The second is a network
that is automatically generated from all the clear (non fraudulent) data in the network
and a network is normally created for each customer class36. The data for each
customer is then passed through both networks and results from both networks are
considered on containing a belief of how fraudulent a customer is and the second a
belief of how clear a customer is.
However, what if the fraud analyst missed some important relationships out when
inferring knowledge in the system, how would the system respond? What if the
customer was perpetrating a new type of fraud that had never been modelled before?
My assumption is that the networks would not be able to respond sufficiently. For
instance, if the customer simply never intended to pay for a bill, but the calling pattern
was similar to one of average Joe customer.
Bayesian belief networks can be used to generate a better understanding of the
customer base, by helping the fraud analyst discover relationships in the data that they
might have otherwise missed. Michiaki, Taniguchi states that other methods of Fraud
Detection exist which provide higher degrees of true classification rates, with lower
false positive rates.
The Neural Network 37 38 39; Michiaki, Taniguchi have shown that Neural networks are
the better at classifying fraud than the previous two methods (rules based and
38
![Page 39: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/39.jpg)
Bayesian knowledge). Depending on the construction of the Neural Network, rates of
85% classification with out a single mistake have been recorded.
What is a Neural Network? Kevin Gurney states:
A Neural Network is an interconnected assembly of simple
processing elements, units or nodes, whose functionality is loosely
based on the animal neuron. The processing ability of the network is
stored in the interunit connection strengths, or weights, obtained by
a process of adaptation to, or learning from, a set of training
pattern.
Simply, given an input pattern, the neural network will discern from past training
what class it assumes the pattern belongs. Essentially, during training each of the
nodes in the neural network build up weightings to specific features presented in the
training data.
It can be seen that a neural network tries to imitate the reasoning process of a human
expert; where a human would build up an image of the solution by combing evidence
and weighting each piece against knowledge against the experiences of similar
problems. There may be many factors that a human will use to decide the best
solution to a problem.
Unlike rules' based fraud detection methods and Bayesian belief networks, neural
network will not need a fraud analyst to establish the reasoning the relationships
between customers being fraudulent and the attributes, rather the fraud analyst will
need to classify the customer accounts based on whether they think they are
fraudulent or not. For the network to be able to classify data with accuracy, the data
that it needs to use has too be of a good quality, if no relationships between features
of data can be established then it may be unlikely that the network will be able to
describe the weightings of the features in its internal system.
Neural networks are often use in data modelling or statistical analysis in problems
where there are many nonlinear relationships. For instance, weather forecasting,
39
![Page 40: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/40.jpg)
financial forecasting and fraud detection. This is because neural networks have been
shown to have an innate ability to classify non-linear problems. It may be good to
show how this can be inferred in fraud with an example40:
If we look at two variables Number of Calls and Average Call
Duration, with each point being a customer (see figure 6A), we have
no way to draw a straight line between the two classes (fraud –red,
and clear –black). Things start to get harder when we add more
variables in and the number of dimensions increase (figure 6B) when
drawing a hyper plane between the classes becomes nearly
impossible.
Figure 6 A) Non-linear problem separation B) Added Dimensions
Neural network technologies are commonly used in pattern recognition precisely
because they are good at solving non-linear problems, where there may be a pattern
that can be discerned but it is very hard for us humans to see them. The more
dimensions we have, the harder it becomes to separate each class of data with a line,
plane or hyper plane.
Neural networks offer several other advantages over the two other systems of fraud
detection (rules based and Bayesian knowledge). They also have the ability to
generalise a solution; that is classify how it thinks a particular customer account is in
40
Avg Duration
Num
ber o
f Cal
ls
Avg Duration
Num
ber o
f Cal
lsA B
![Page 41: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/41.jpg)
relation to fraud. The customer account information does not have to exactly match
the data that the neural network has been trained on. This can be good for detecting
fraud that is not being perpetrated in the same manner as other fraud, but has similar
characteristics.41
Another case for neural networks is their ability to adapt to changing circumstances.
Not only do they have the ability to generalise, they can also be retrained (once a
sufficient training regime has been put in place) with new data, so if the fraudsters
evolve their methods, then the neural network can be easily adapted to accommodate
these changes, with little extra effort from the fraud analysts.
Neural networks suffer less from the problems of internal fraud attacks against
themselves than other methods of fraud detection do. Neural networks have been
considered to be black boxes, you supply data to the network and you get a response
with out knowing specifically how the network came to its decision. Rules systems
and to some extent Bayesian networks, are susceptible to internal fraud, in that a user
of the system can infer the criteria used to establish if a customer account would be
flagged as fraudulent quite easily. Because simply looking at the nodes of a neural
network will not give any evidence as to how the neural network classifies its data, it
would require a professional with masses of experience with neural networks to be
able to assume any information describing the reasoning process. Therefore, in this
sense the neural network is more secure than other methods of fraud detection.42 43
Once the neural network has been trained, then the process of reasoning if a customer
account is fraudulent, is very efficient. The reasoning process (internally) normally
consists of matrix multiplications which can be carried out very efficiently. The
most time consuming issue with the solution would be summarising the customer
account from a database of all customer call information, which is a standard
operation across each of the methods described in this chapter. Once the data has
been summarised, it can be presented to the neural network and a response will be
given almost immediately. Compare this to having to first summarise the data, and
then trawl a rule set and compare each rule against the data. A rules system is
effectively systematically analysing every customer variable for an account. This
process can be more intensive and thus slower than the neural network method.
41
![Page 42: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/42.jpg)
7.5.1 Why Call Pattern Analysis is not always enough
Call pattern analysis is not the only method of fraud detection that should be
employed in the telecommunication industry. If we are having to capture the
fraudsters when they are using the network, then they already have evaded out first
check (ascertain who they say they are). Also the fact is that the customer may
mimic a normal person, and then neglect to pay the bill after the second month. For
instance, if we tie the system into the billing departments systems, we may notice
that a person might say they are a company, run lots of international calls through it
like a large company might, then close down when payment is due. What is to say
that the company did not fold on purpose?
42
![Page 43: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/43.jpg)
7.6 Consideration of Real Time Methods
Part of the emphasis of this project is to investigate Real Time methods used in Fraud
Detection within the Telecommunication Industry.
There are two types of real-time behaviour in computer systems, HARD real time and
SOFT real time, these are said to be "Traditional Real-Time Systems".
Hard real-time systems are normally associated with Hardware based systems, where
the timing of the responses from the software controlling the hardware needs to
follow strict guidelines with respect to response times. Hard real-time systems have
to be predictable to ensure that timing of event and response actions are always
known and adhered to,
Soft real-time systems on the other hand deal with timing requirements in more of a
lackadaisical manner, where events timings are non-deterministic. Thus the
programming for such system is said to be more complex than its Hard real-time
partner.
A more commonly used meaning of "Real-time Systems" is "the successful
achievement of results with acceptable optimality and predictability of timeliness".
This is the definition that I intend to use through out this project.
I intend to develop a prototype system that once it has been presented with the relative
information regarding customer call details a response will be return near
instantaneously, hence the "Real Time" part of the project title.44
43
![Page 44: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/44.jpg)
8 Identification of Problem and Specification
A customer would come to the software company with a set of requirements. The
development house would then analyse the problem domain, propose a solution using
certain formal methods and then if the customer is satisfied, they would agree to the
design and the implementation would follow.
This project has taken a different direction; I initially identified a problem in an
industry and proposed to find a solution to the problem. Therefore, I have also had to
take on the task of the customer. This required in-depth research into the industry. In
figure 7 the normal model of systems development has been shown. This method has
been amended (figure 8) to accommodate this project.45
Figure 7 Normal Linear Sequential Model (Waterfall)
From now on, this project joins with what could be considered a normal software
engineering project (removing the analysis stage, as it has already been done).
Figure 8 Amended Linear Sequential Model (Waterfall)
44
Customer identifies problem area
Asks Software Company to develop solution.
Software Company analyses problem
Software Company proposes solution
Software Company designs solution
Software Company produces and delivers solution
I identify problem area in an industry
I perform research to understand the industry
I propose a solutionI research methods to bring solution to life
I implement the methods and test the viability of solution Present final solution
![Page 45: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/45.jpg)
8.1 Specification
From the research provided it can be shown that a system to detect "Call Surfing" by
methods such as "Call Selling" and "PRS fraud", will help save the
telecommunication industry potentially millions of pounds per annum. The proposed
solution can be summarised as:
Develop a prototype system using neural networks that will
analyse the call patterns of individual customers, returning a
status of whether it thinks the call pattern is fraudulent or not.
The results of which will ascertain if such a solution is valid.
Using the above criteria as a starting base, we can see from previous research (chapter
7) it is more complex than what is simply stated above. The development aspects of
the system can be broken down into the following stages.
1. Develop a customer call generation tool. The tool will model
how classes of customers behave given user defined parameters.
2. Model neural networks using the generated data mentioned
above, with a training regime, testing methods and validation of
results.
A customer call generation tool will need to be created as I am unable to obtain any
proper call information from telecom companies. The customer call generator will be
able to generate all the customers and their calls needed for this project.
45
![Page 46: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/46.jpg)
8.2 System Tools Research and Requirements
The aim of this section is to understand the reasoning behind the selection of the tools
used to develop a system that can detect fraud, as well as a further discussion into the
requirements of the project. This is essential since we now know the minimum
requirements for the solution and before we can design how the package as a whole
will work, we must build up a more concrete set of requirements and we must also
understand how the development environments will help and hinder development.
The requirements of the system can be broken down in to two separate stages, one for
the CDR Tool and the other for the neural network. The requirements were
established by myself to give limits to the project, these limits are then imposed to
stop feature bloat and to minimise the risk that the project would not get completed in
time. The project would have to meet these requirements to be judged successful.
The requirements were gathered after the research stage (sections 7.2 -7.5) into fraud,
the telecommunications industry and fraud in the telecommunications industry.
Following on from this, several theories and methods of using a neural network
presented themselves as possible solutions to the problems; for various reasons where
decided not to be implemented. What therefore follows is the final set of
requirements deduced from a subset of all the initial theories. These theories were
based of and developed in tandem with the system tools research. (Notes available on
request)
Since this project is an investigation, it requires the development of a prototype tool.
Prototypes as the name suggests, do not have to be a fully functioning product that it
is aimed at the people who are to use it. Instead it is a proof of concept, saying that
"Yes this solution is viable and will work using following the principles".
It would be ideal to have development tools that are perfect for the task in hand,
however unfortunately this can never be the case, for many reasons not only including
a limited range of software the university posses, but the cost of the software that I
can afford.
46
![Page 47: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/47.jpg)
8.2.1 Further Requirements for the CDR Tool and Development Tool Research
When considering the features for a CDR (Customer/Call Detail Record) generation
tool it is important to understand all the data that is pertinent to a call. This is needed
since the analysis of the data will result in the creation of the detection methods which
directly affects the success of the fraud detection tool.
The CDR (Customer Detail Record) Tool must be able to create groups of customers
that follow a given model. This implies that the models must be able to be specified
in a form where the data can be represented in such away that it makes the model
information easy to use from a human perspective, but the format of the data is
flexible enough so that algorithms can be easily developed to create the customer
information.
Customer attributes will be considered in the design section of this project, as further
research is needed to judge which attributes are the main drivers of a customer's
account information, while other attributes may be inferentially obtained from the
main attributes.
Due to the huge amount of data that will be needed when creating a suitable system to
model CDR's, it is safe to assume that a RDBMS (Relational Database Management
System) will be needed. The main question is: What type of RDBMS should the
project use?
Points such as interoperability with programming tools, data extraction facilities, and
performance must all be understood.
It is widely considered that SQL is the de-facto standard for information extraction
from an RDBMS, so there is little argument that a tool must be able to communicate
directly with the database using this Declarative language. 46
47
![Page 48: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/48.jpg)
Tools for each stage (Generation of data and Fraud Detection System) must be able to
communicate with the RDBMS (figure 9). It is here where a remote communication
protocol called ODBC (Open Database Connectivity) developed by Microsoft47
should be highlighted. ODBC allows any program to access RDBMS's created by
many different vendors, with little or no need to alter the client application if
databases were to be changed during the project. ODBC also removes the distinct of
where a RDBMS is physically located, as it does not require the client application to
implement any network communication protocols.
Because of the decision that ODBC will be used, we are essentially free to choose
whatever RDBMS is available. The options for RDBMS are as follows, but not an
exhaustive list of all the database systems available to use:
o MySql, a highly used, efficient multi-user open source RDBMS,
used on many websites throughout the internet. However,
several failings remove this choice of RDMBS, from the
running. These include (at the time of assessing the
requirements) no support for sub queries (link), limited join
facilities (link) and no support for SQL views. 48
o PostgreSQL is a heavy weight multi-user open source RDBMS
alternative to Oracle. With excellent performance and uptime,
inclusion of its own SQL style procedural language to enable
48
Figure 9 Standard model for database communication
CDR Generation Tool
DB
Fraud Detection Tool
![Page 49: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/49.jpg)
easier data manipulation, and competent ODBC drivers. Has the
ability to run in a windows environment, but still requires ODBC
to connect with it. 49
o Oracle, a heavy weight business class RDBMS with excellent
performance and scalability. The likely feature set required for
this project will not cover even ½ of the available features that
Oracle offers. Oracle has had for many years its own data input
language called Oracle Forms as well as it's own procedural
language. While I have worked with Oracle in a professional
environment, it is judged that for this project its functionality is
an overkill. Combine this with the fact that the Oracle DB will
always reside in the university servers and access to such
services may for some uncontrollable circumstances become
unavailable.50
o MS Access 2000 is a business orientated RDBMS, although it
does not support many of the higher end features of some of the
other commercial databases such as efficient multi user support.
MS Access 2000 has its own implementation of VBA (Visual
Basic for Applications), which supplies a far superior interface
and development language than the other RDBMS mentioned
through the use of Windows forms allowing for easy prototyping
and application development; partly due to the ability to model,
control and access the data types and the underlying data store
with no extra work.51
The ability for MS Access 2000 to have the CDR generation tool sitting directly on
top of the RDMS is a tremendous advantage. As keeping everything in one location
will enable me to develop the software in more than once place, rather than
establishing connections to remote databases which could prove to be cumbersome,
slow and prone to failure (depending on the internet connection). (Figure 10)
49
![Page 50: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/50.jpg)
8.2.2 Further Requirements for the Fraud Detection Prototype and Development Tool
Research
On the Fraud Detection Tool side of the requirements, we have to choose a tool that
has the ability to access the data store, the ability to perform extra processing of the
data and show the results of the tool's performance. Like previously mentioned, the
tool will simply be a prototype, proof of concept as such and therefore will not require
a user interface that would normally be the case if we where to develop a system that
has been put out to tender.
The Fraud Detection Tool can be visualised as two separate, stages. Gathering the
data from the RDBMS; and processing it with the Neural Network.
Since the Fraud Detection Tool will require the use of a neural network, there are two
options:
1. Create a neural network from scratch with a programming language.
50
Figure 10 An Ideal situation for CDR Tool and Fraud Detection Tool
Figure 11 Processing the data through a neural network
CDR Generation Tool
DB
Fraud Detection Tool
Pass to NNTransform to Suit NN
Retrieve From DB
Display Results
![Page 51: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/51.jpg)
2. Create a neural network using a tool inside of a package especially
designed for prototyping and mathematical work.
It is obvious that the correct choice would be to choose a software package that can
simulate a neural network. Since the development of a neural network from scratch
would be a separate project in itself due to the many different types of neural
networks available, I would have to understand the precise workings of each to ensure
they are correct, and doing this would require time that I do not have if I am to create
a fraud detection tool.
The requirements for the neural network cannot be as solidly set as those for the CDR
Tool, since it is this section which is the research part of the project. To create the
final neural network it is a process of making use of many different architectures,
different training methods and then combining the results to get a final optimal
network.
The design of the neural network and the training methods, along with an overview
about neural networks is covered later in the design section of this project.
Luckily, the final neural network must meet several defined requirements:
It must detect fraud to a reasonable level;
A final network must be produced, that 'would' be used if the
model created apply to what happens in the real world.
Threshold level must be established to indicate which
classification the data is in, i.e. any value above and including
0.75 is clear, whilst anything beneath this value is fraudulent.
After cutting a large swath through the number of potential systems I can create by
removing the need to hand develop every neural network system, I can concentrate on
developing the prototype by swiftly creating and testing the most suitable networks
for the project and establishing which prototype system is more adept at classifying
fraudulent customers.
51
![Page 52: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/52.jpg)
With the ability to swiftly be able to create neural networks, it would be wise to
require the system to automate the training the neural networks. Doing this will free
myself from having to be involved in the process of creating each network. Once the
networks have been trained the system should be able to prune the Neural Networks
that cannot classify the results correctly. The nature of Neural networks means we
can never guarantee 100% correct classification of the data so we will need some
method of visualising the results on completion.
These requirements all point to systems that have either neural net packages included
or the ability to install them as an add-on. It comes as no surprise that I am limited to
the software that the university has available, these include:
o Matlab
o Visual Basic
o Visual C
There are several tools that aide the production of neural networks, however non that I
have found, have the inherent ability to provide statistical functions, data processing,
custom function generation and ODBC database connectivity that MATLAB provide.
Although it is true that both of the Microsoft Visual programming languages are very
flexible and enable rapid prototyping. They unfortunately are not pertinent to the
rapid prototyping needed for this project, since many of the statistical functions and
matrix operations required for neural network analysis are not provided as standard
(Also the quality of neural network packages varies wildly between implementations).
MATLAB on the other hand provides all the data processing functionality required of
this project with tried and test neural network packages and ODBC connectivity.
52
![Page 53: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/53.jpg)
9 Design
9.1 Chapter Summary
This chapter deals with the design of both parts of the system. The CDR (Call Detail
Records) Tool and the NN Fraud detection tool. The design is based on the
requirements determined during the research and presented in the specification.
This chapter will not deal directly with each algorithm used in the program, it will
also not show every data processing stage in detail, rather it will describe the
important algorithms used to generate the data; the data it will generate based on
model attributes; data that is generated as a consequence to data supplied and
generated using input parameters; and an overall flow showing how the system will
generate the data for each of the customer in the models.
In addition, because the Neural Network is not a full tool, but rather an experiment in
to the likelihood that such a process is practical to detect fraud. Only issues
concerning the creation of the NN will be documented. This may include brief
discussions in to helper tools used to create the NN and to test the efficiency of the
model used. But will mainly focus on the methods used to train and test the network.
The design section is a verbose description of the design process sprinkled with flow
charts and diagrams showing the major points discussed. This method of
documentation was chosen to give the reader a fuller understanding of the effort
involved and the algorithms used. Rough design work can be obtained on request.
53
![Page 54: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/54.jpg)
9.2 Combined System
The system as a whole must be considered before we can look into its principle
components. As described in the specification (8.1), the target of the system is to
show that neural networks can be used to detect fraud. This project differs from
normal 'stake holder' projects as there are no stake holders other than myself. This
What can be seen below is a very abstract view of how the whole system will work.
The first two stages are strictly based in Microsoft Access; the third stage is a shared
task between MATLAB and Microsoft Access. Primarily MATLAB uses the ODBC
functionality of Microsoft Access to gather information about the customers, however
the query generation functionality of MS Access allow better aggregation of the data
through the use of Views. Implying that most of the work is then done on the
database server and not in the neural network suite, which is the way that all good
database driven applications should be created.
The final stages (4,5 and 6 in figure 12) are based inside MATLAB and are there to
train the network and decide which network is most suited to our problem.
Figure 12 Abstract overview of data flow in the system
54
1) Define Parameters for models
2) Random creation of call information based on models
3) Aggregate the data based on bi-weekly stats
4) Train on ¼ of the data
5) Test on another ¼ of the data
6) Validate on the final ½ of the data and complete performance analysis
![Page 55: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/55.jpg)
9.3 CDR Tool Design
The CDR is the first major design hurdle that must be overcome when developing the
Neural Network solution, which as highlighted in the Research and Specification, real
world call data is impossible for me to obtain from Telecom companies.
The role of the CDR Tool is to generate all the call data needed to train, test and
validate the neural network models. The theory is that every customer belonging to a
telecommunication company will fit in to a certain demographic or customer profile,
be they fraudulent or not.
If a telecommunication company had already provided all the information about their
entire customer base, we should be able to take a sample of the customer population
and aggregate the data in such away that each customer would fit into one of many
profiles that we have defined.
Each customer profile group will have distinct attributes that every customer in that
group bears some relationship. An example is probably best:
After studying the customer information, we know that there the
telecommunications companies customer base represents 20 distinct
classes of customer, some of them follow:
People who use the telephone rarely and only in the evening;
People who use the Internet regularly at any time in the day;
People who have friends and family who are based abroad;
Companies who are shops and stores;
Companies who are small, but have a national customer base;
Companies who are large and have customers worldwide.
Each of the customers in each of the classes would not be the same as
each other, but their attributes would be similar to each other. As
above, customer type one is unlikely to make any PRS call or
International calls, but may make local calls to their friends and
55
![Page 56: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/56.jpg)
families. People in this group will not have the exact same calling
pattern, but they will have the same attributes.
As you can see if there are enough classes, every customer could be uniquely placed
inside the classes.
Following from this it is therefore acceptable to work in the reverse from what a
telecommunication company would do when looking at their customers. If we can
think of the potential classes of customers, we can then build models that will mimic
individual customers based on the classes in which they should fit.
The problem is how to define a model in a way that the customers do not have the
same calling patterns, but their calling attributes fit that of the model.
A statistical property called a "Normal Distribution" or "Gaussian Distribution" shows
how a population is distributed in relation to a property. It is broadly related to the
histogram function, but instead of showing the number of people in each class, it
shows the probability of that property occurring, as it is normalised against the
sample.
Figure 13 A Gaussian distribution based on male heights in the UK
The height of the curve represents the probability of the measurement at that given
distance away from the mean. The graph above represents the heights of men in the
UK, who might average about 195 cm. Therefore, the probability of picking a man
whose height is 195cm is the highest, while the probability of a man with height of
220cm is far smaller.
56
![Page 57: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/57.jpg)
As shown, the normal distribution is built by centring the graph on the mean (the
highest probability of an occurrence), and can be generalised using the following
function.52
Figure 14 The Gaussian distribution function
Why is this property good for generating Call Detail Records based on models?
Eric W. Weisstein states the Central Limit Theory as having the mean of any set of
variates with any distribution having a finite mean and variance tends to the
Gaussian distribution[sic]. In essence what it is implying is: real data never actually
will follow the mathematical ideal of a Gaussian distribution, but many types of data
distribution can be said to broadly follow a Gaussian distribution.53
This is good, as it means that we can use this principle by assuming that when a class
of customers is to be generated, everyone in that group will differ from the mean by a
random amount within a limit defined by the distribution. Nevertheless, the majority
of people will tend towards having the properties similar to that of the mean.
What therefore must follow if we are to generate customers that randomly differ from
each other, but follow a Gaussian distribution; is a function which can generate
numbers that are normally distributed.
Several properties exist that a when generating a distribution following a Gaussian
distribution, these are the mean and the standard deviation. The mean has the effect
of centring where population will be based around, and the standard deviation will
affect the spread of the population. A higher standard deviation will state that the
further outlying values have a higher probabilities of occurring than normal, thus
squashing the graph.
What can be seen in the following two charts, is a Gaussian distribution each with the
same mean, but the standard deviation in B is larger than that of A, thus having the
57
![Page 58: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/58.jpg)
effect of the P(x) at the mean is smaller in figure 15 than in figure 16. The two graphs
have been scaled so that they look similar, in height.
B)
Such an algorithm exists that will generate random numbers that will follow a
Gaussian distribution. It is called the Polar Box-Muller54 method and has the ability
to generate two random numbers based on the Gaussian distribution of mean 0 and a
standard deviation 1 from two uniformly distributed random numbers between 0 and 1
(many programming languages provide functions for generating uniformly distributed
random number). It is a simple case of manipulating these two generated random
numbers to get them to fit other distributions, such as a mean of 34 and a standard
deviation of 5, which would involve multiply the numbers by 5 and then adding 34 on
to the result.
Furthermore, it is assumed that each customer in each model, would have a similar
style of days (Monday, Tuesday, etc) in which they place telephone calls, for instance
it may be that a casual user of the phone, would make calls in the evening or on
Saturdays. These patterns would generally not differ from customer to customer in
the model, but would follow more of an uneven distribution and thus nothing like a
Gaussian distribution. Gaussian distribution are said to only be of use when the
58
Figure 15 Gaussian Distribution A
Figure 16 Gaussian Distribution B
![Page 59: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/59.jpg)
variable in question is "continuous" (like peoples heights) and not discrete (like days
of the week - Monday, Tuesday etc - or call types such as PRS, Free, Local etc).
Therefore, a method that allows weights to be assigned to each discrete variable needs
to be developed. A solution exists based on the process of biased weightings. For
instance when tossing an unfair coin, it could be said that the heads side is unfairly
biased (i.e. it is more likely to lands heads up).
The method will pick a random number between 0 and 1, then using the weighted
probabilities it will asses which property the random number belongs to. An example:
A set of three discrete properties exist (A, B and C), A is twice as likely to occur as B
and likewise with C. Such that A = 50%, B=25% and C=25% where A+B+C = 100%
as seen in figure 17. Our random number generator produces an even spread of
random numbers (every number will occur the same amount of times given enough
attempts). 55
Therefore, if the number random number is >=0 and <0.5 it is a class A, >=0.5 and <
0.75 class be etc. This indicates that together B and C
should occur the same number of times that A occurs, with
B and C occurring the same amount of times as each other.
Now that we have the functions to create the Gaussian distributed random numbers as
well as biased numbers, it needs to be decided what properties of a customer account
are the affecters of the call data. An example will provide a better understanding of
what is meant:
The cost of a customers monthly bill, could be said to be random but
each customer in the group will have similar bills. However, the
cost of the calls is not the effecter; it is the average duration of the
calls, the types of calls being made and the period (off peak, on
peak). The cost of a call is a function of these properties – y=f(call
period, call duration and call type).
59
Figure 17 Biased Weighting
![Page 60: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/60.jpg)
When considering which inputs should be based on a Gaussian distribution, the
following call parameters are good examples, as normally what is held in a Call Detail
Record is the type of call, the start of the call, the end of the call and the cost of the
call.
Other attributes which do not need to follow a Gaussian distribution, for instance each
model, will contain the probability that a particular call will happen on a set day
(Monday, Tuesday etc).
Each attribute that is said to follow a Gaussian distribution will need to specify two
variables, one being the mean (where we want our population for the variable to
centre about) and the other being the standard deviation, which will say how much on
average each of the elements differs from the mean or the spread of the data. The
other attributes that need to be randomly generated but following a set weighting,
need only have there weight established and scaled to 1 in relation to the other
variables in the set.
Attribute affecters are the attributes that will have a direct affect on other attributes;
these attributes are therefore the ones that will be specified when considering how
each of the models is generated.
Attributes such as call cost and average call cost do not need to be generated when
each customer is being created; rather they are produced as a consequence of other
variables. The call cost would be established by assessing the time period that the call
was made, considering how long the call was and also dependant on the type of call
made (PRS, International). Table 2 documents all the call parameters that the CDR
Tool will have to model.
Table 2 Call Parameters for the CDR Tool
Call Parameter Distribution
Type
Description
Call Type Weighted,
Discrete
There are discrete number of different call types available
(Free, Local, National, International, PRS and Mobile),
within each customer class it is assumed that the
attributes of the call type are similar for each customer.
60
![Page 61: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/61.jpg)
The probability of each type of call occurring is then
established.
Average call
duration
compared to
the population
Random,
Gaussian
Each customers average call duration will be different
from other customers, but again they are expected to be in
a particular group class, because the average call duration
for each call type is similar. Each customer in the group
will have a similar spread to the duration of each of the
calls.
Average Call
Duration
Random,
Gaussian
Each customer's calls will not be the same duration as
every other call they make. However, their calls will be
of a similar nature, for instance local calls made might
average 5 minutes, but vary from 1 minute to 7 minutes.
Averages might vary depending on the time of the call,
since people are more likely to make off-peak calls than
on-peak calls.
Average Call duration is established by the random
number generated from the "Average call duration
compared to the population" variable
Time of call Weighted,
Discrete
People are expected to make calls any time of the day,
however the likelihood that they will make a call at 3am
is far less than the probability of them making a call at
6:30pm. The distribution is not continuous, but can be
separated into discrete time segments.
Call Day Weighted,
Discrete
Like the time of day when a call can be placed, each
customer class can be expected to make more calls on
certain days than others. For instance, a small shop's
busiest day might be Saturday, while on Monday the shop
is closed to allow the staff to have a weekend. Therefore,
the probability of a call occurring on Monday for that
customer is nil.
Number of
Calls made
Random,
Gaussian
Each customer in the specified customer class will have
the number of calls they make for each period similar to
that of the other customer in their class. Each of these
61
![Page 62: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/62.jpg)
based around a mean number of calls made each varying
from the mean by some random amount.
Number of
Calls Received
Random,
Gaussian
Like the Number Of Calls Made, the Number Of Calls
Received for each customer in the group will be similar to
each other, varying by a random amount.
Number of the
Calls per call
type.
Weighted,
Discrete
Every call that a customer makes will not be a random
call type, rather the calls will be spread out over the
different call types (Local, National, International, Free,
PRS and mobile). A normal customer is going to make
hardly any PRS calls
Number of
telephone lines
Random
Gaussian
Every customer will have one or more telephone lines, if
the customer is a company they are normally going to
have many phone lines, if only one line was modelled
(and since calls cannot overlap) then the times of the calls
would not be accurate and therefore the costs of calls
would also not be accurate for that model. The number of
telephone lines will follow a Gaussian distribution, with
each customer having the number of telephones randomly
dictated by the distribution of the population. Households
will normally only have one phone, but they could have
two lines.
To make the design of the CDR Generation tool easier, some assumptions have to be
made:
Once a call starts, the billing period the call starts in is the billing
period the call is charged for. If it started at 7:59am and 8:00 am
is when the on peak billing started, the whole call will be classed
as an off peak call.
Calls in this model will always start on a 5-minute boundary.
The granularity of the received calls will be a day.
9.3.1 Flow of Data When Creating a Model
62
![Page 63: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/63.jpg)
Rather that give in-depth commentaries on each stage of the model generation and
detract from the intent of the project, what follows is a high level description and a
flow diagram (figure 18) showing a high-level abstraction of the generation of all the
customers inside a model.
Create all customer Phone numbers and number of calls they can make
For each customer
o Establish the types of numbers they can call
o Establish phone numbers they can telephone for each of the call types
o For each call they make
Generate start time and duration
Check the start time and duration do not overlap any other calls
Once the calls have been created for each customer. Store the call information in
the database.
63
![Page 64: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/64.jpg)
Figure 18 Customer Generate tool flow diagram
64
Start
Establish number of calls each customer will make and receive
for that model
Establish every customer's phone number
Generate Call type based on spread of call types
Generate Call start time and duration based on distribution
Does call overlap
Yes – need to make a new call.
Move to next call
No – continue to create calls
Dump to data store
End
Yes
Finished all calls for
customer?
Establish all the phone numbers the customer can ring based on distribution
YesNo – continue to create customers
Finished all customers
![Page 65: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/65.jpg)
9.3.2 Consideration of the UI
In this project the UI is a means to an end; simply put the UI is a shell to allow simple
access to the model parameters. This is not an exercise into requirements gathering
with respect to shareholders, nor is this project aiming at User Interface design. The
user interface for the CDR tool is simply a utility that will help to input model
information about each of the customer models.
Therefore, no usability studies where carried out for this project and no task analysis
took place.
9.3.4 Data Representation and Considerations
The format in which the data is stored needs to be considered as to allow easy access
to the data when it comes aggregating the data for analysis by the neural network.
9.3.4.1 Internal Data Representation
The internal representation of the data in the CDR Tool is an important aspect to
consider. The quality of the data representation will have effects on the overall
success of the project. Incorrectly defined data requirements and specification will
have a knock on effect of not allowing the neural network to use the data that will
enable it generalise efficiently. The input requirements to the neural network will be
discussed later in the report when considering the neural network design, but the data
access queries that will generate the data will be shown in this section.
9.3.4.2 Customer Information
Each customer needs to have an account. Within this account, data needs to be stored
concerning the model that was used to generate this information and the status that is
assigned to them (fraud/non-fraud).
65
![Page 66: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/66.jpg)
Following from this it would be useless not to have information about each call.
Including such items as call start time and end time (including the date of the call); the
type of call (Local, Free, National etc.) and call cost; and also needed would be the
number that was customer was calling.
Additionally needed is information about the number of calls received during the
period. Each call that is received only needs to store the date of the call. This is due
to the decision not to use a lot of information pertaining incoming calls when
considering what the neural network will use.
9.3.4.3 Entity Relationship
The data is partially normalised to first normal form. This is partly due to speed
aspects of the system as well as some repeated data being kept in a table; this includes
items such as start data and end date, which could be represented in another table, but
would then require extra joins, which would slow the system down dramatically.
Figure 19 Basic Entity Relationship for customer information
The customer information is first created with information such as their phone
number; this phone number is then used as the unique identifier in the remaining
tables. For analysis sake, the modelID which was used to create the customer is also
stored.
66
![Page 67: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/67.jpg)
The remaining tables hold information pertaining the calls that the customer makes
such as the start time, the end time, the cost of the call, the phone number that was
called and the type of call that is being made.
Incoming call information is also stored, but only requires information about the
phone number that called the customer and the date on which the call takes place.
This is because very little information pertaining incoming calls will be used in the
final network and no further analysis of incoming call patterns will be needed for this
project
9.3.4.4 Data Access
Data retrieval is an important part of the system; if the data cannot be aggregated
easily then it is practically useless. The system needs to be able to generate the
information needed by the neural network as to allow the neural network to be able to
understand the features in the data that result in correct classification.
What follows is a brief look at the final query used to pull all the data out of the
database. This is a visual representation of the query, rather than an SQL code
representation. It must be noted that this query pulls data from other queries. These
external queries can be considered as "views" in SQL relational database speak. They
are present for brevity as not to make the final query overly complex with the masses
of joins that it has to perform.
All the queries used in this project can be viewed in the Microsoft Access 2000
Database supplied on CD in the appendix of this project.
The query shown in figure 20 is the main query used by the neural network software
and gathers aggregated data about low risk calls and high risk calls over a two week
period.
67
![Page 68: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/68.jpg)
Figure 20 Overview of tables, fields and relevant joins used in the final output query
Descriptions of the queries used can be seen in the appendix (16.2.1)
9.3.4.5 Index Considerations
When considering which indexes should be present on a table, there are some well
established guide lines that can be adhered to, to allow for efficient access to the data:
1. Any field that is used in a join operation
2. Any field that is used in a where clause
3. Any information that is suitable unique throughout the data
4. Any information that may be used with aggregate function such as
Avg, sum etc.
The following indexes have been applied to help speed up access to the database.
9.3.4.6 Aggregating the Data
When considering telephone records, it is important to understand with respect to
pattern analysis, the time span in which the data should be aggregated over. Too
coarse a granularity of the time span may mean that suitable patterns for fraud
68
![Page 69: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/69.jpg)
detection may not be able to be noticed, while too broad a time range has the potential
to catch the fraud but will be too late to take any preventative action against the
fraudsters. In either the case the Fraud detection tool would be considered to be
useless.
The time period which has been chosen for this project is two weeks, however this
can be changed with very little effort. The reason for this decision is based on the fact
that billing normally occurs once every month and a system which waits this long to
capture the fraudster will be unable to catch the fraudster who defraud the operator for
the first month and then leave. A feature called sliding windows can be used to help
capture fraudulent new customer to the company.
Sliding windows (figure 21) can be seen as the effect of having a view on to a certain
portion of the data, in this case we might consider it to be customer phone usage. The
window is a set width and does not cover the whole of the data set. Each of the data
in the window is analysed and used. Next we move the window (or view) slightly
along in the data set, doing this we can then build up a day by day summary of two
weeks of customers usage.
Figure 21 Sliding Window Effect
69
![Page 70: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/70.jpg)
Now if we have a window the size of two weeks, we can analyse this information
nearly as soon as the customer has arrived on the network (by the end of the second
week we have a good start for monitoring their call patterns). Once the first two
weeks have been analysed we could then set the window's starting position to that of
the 2nd day of the first week and analyse the customer account again up to the 1 st day
of the third week (15th day). This would then occur each day for the customer.
This methods shows that by the end of the first month we have been able to analyse
14 different combinations of two weeks worth of call data. By the second month we
will be able to analyse at least 30 full combinations of two weeks worth of call data.
This is better than simply analysing the customers accounts once every two weeks or
once every time the customer is to be billed.
9.3.4.7 Storing the Models
The models must also be stored in the database to allow the creation of the customer
detail records. Again, this is normalised to first normal form to reduce the number of
columns in each table to ensure that they are easy to operate. All of the parameters
used on the forms are stored in one of these tables. Information relating to off peak
calls is stored in the "off-peak" table; data relating on-peak call in the "on-peak" table
and other information in the model table.
9.3.4.8 Testing the Model Generator.
Testing a project is an important step in the lifecycle of a project, if the project does
not meet its requirements or it does not function correctly (i.e. breaks) then the people
who need it will not want the program. If the code in an investigation such as this
does not work, then it is likely that the results will be incorrect and the final outcome
of the project will be wrong.
Testing a project normally takes the form of two different stages:
1. To ensure that project meets the requirements
70
![Page 71: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/71.jpg)
2. Functionality testing to ensure that no bugs have been
introduced into the code.
Requirements testing normally takes the form of black box testing and functionality
testing takes the guise of white box testing.
Black Box Testing can be considered as testing without information of the internal
workings of the program being tested. For example, the tester would only know the
allowed input parameters and what likely outputs should be returned, but not actually
how the program arrives at the result.
Black box testing can be considered to be testing with regard to the specifications; no
other information about the program is required such as code listings. Therefore, the
tester and the developer can be independent of one another and thus avoiding a
programmer being biased toward his own work and adversely effecting the testing.
White box testing is testing with full knowledge of the internal workings of the
program being tested. The tester in this instance would normally be the programmer
and would know the code paths that a particular piece of code should take. White
box testing is used to check for robustness of the code
Because the output will not always be the same for every run of the CDR Tool, testing
is a bit harder than normal. Therefore, unit testing took place throughout the
development of the CDR Tool. This meant hand testing each major functional
requirement, which is essentially black box testing.
Of particular interest is the random number generators used. Since they are not
simple random number generators, code had to be developed to make the random
numbers either follow a Gaussian distribution or follow a weighted distribution.
Black box testing has been chosen for this project. Normally with the black box
testing the programmer/designer and the tester are normally independent and have no
contact with each other; however, this cannot be the case for this project as I am the
only person in this project.
71
![Page 72: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/72.jpg)
The test have been designed around testing if the CDR tool can correctly generate the
call models; this implies that the random number generators are known to function
correctly and within given parameters. Therefore, the test plans will not only include
whether models have been generated correctly but also test whether the random
number generators can produce numbers with the desired parameters.
To keep this project sufficiently short the test plan has not been included in this
section; however, black box tests have been included in the appendix Test plans and
results are supplied in the appendix (16.4.1)
.
9.5 The Neural Network
The Neural Network is the second stage of the project and is the section that is
probably the most important, as the aim of the project is to detect fraud and the NN
will be the method used to detect the fraud.
Following on from the research it is important to establish how a neural network
functions and what type of Neural Network will be used in the final solution. It is
important to model a couple of types of neural networks and then empirically decide
which would be the best to use if a full tool were to be implemented.
What follows is a discussion on the training methods used, the Neural Network
Structures available for modelling, and an overview of the Performance Analysis used
in the project to determine which Neural Network is the best at classifying fraud.
Following on from a discussion about neural networks, will be a brief discussion
about the tools created to aid the development of the neural networks.
This is not a tutorial into neural networks, but rather the design of the network
presented by showing the major design decisions and why they were taken. This is
due to the fact that MATLAB already provides all the tools needed to create and train
a neural network. All that is needed are decisions in to the reasoning behind the
choices made.
72
![Page 73: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/73.jpg)
No mathematical proof as to how neural networks work has been provided in this
chapter so to keep the project succinct and to the point.
9.5.1 What is a neural network?
A neural net is a system that is set up to mimic the function of an animal neuron. The
basic neural network consists of a single node (or one neuron). This node can have
many inputs, and normally one output (for a single node). An animal neuron can be
seen to operate by summing the electrical signals entering the neuron through the
dendrites collected from the synapses, and firing off a signal down the axon if the
input signals attain some limit, sometimes called the activation. (Figure 22)
Figure 22 An animal neuron
The artificial node has the weighted inputs (similar to the synapse) from other nodes
(or cells); this is achieved by multiplying the input value by a weight value (the
weight value essentially says how important the input is). The node will perform a
summation of these inputs and fire (or activate) when a certain level is reached. 56
The first artificial neural nodes (figure 23) would simply output a 0 if the activation
was not reached or 1 if the activation level was attained, these were known as
Perceptrons and used threshold logic units (TLU) which took the form of a step
function. This system was okay if only binary information was being supplied and
they could tolerate noisy inputs (the inputs not being exactly 0 or 1). However,
systems using this were shown not to be able to classify problems that were not
linearly separable, such as the XOR problem. The activation level was normally hand
set to a value that would give correct classification to the problem. 57
73
![Page 74: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/74.jpg)
Figure 23 An artificial neuron based on Binary Threshold Logic Unit
Real neurons are not believed to operate in this manner, but rather receive inputs as a
continuous pattern of information, which can be summarised as a continuous range
mimicking that of an analogue system. This therefore meant that the TLU could no
longer simply operate using a step function but needed a function that would output a
value signifying which side of the problem domain the solution is in, based on the
strength of the inputs.58
Because the data being input is continuous, if we were to use a system such as a feed
forward network, which connects nodes to other nodes then the output would also
have to be continuous. There are several functions available, which normally take the
forms of a bipolar output and unipolar output. A bipolar output (figure 24 - tan
sigmoid) is one that takes the range of -1 to 1, whereas a unipolar output (figure 24 -
Logistic Sigmoid) is one that takes the range of 0 to 1. These functions can take an
infinitely large range of inputs and convert it to a scale of 0 to 1 or –1 to 159
74
I1
I2
I3
act=I1W1+I2W2+I3W3if act < θ then out = 0else out = 1W1
W2
W30 or 1
Figure 24 Logistic Sigmoid function & Tan Sigmoid function
Logistic sigmoid Tan sigmoid
![Page 75: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/75.jpg)
Figure 25 An artificial neuron based on a continuous sigmoid output function
If we have data that is non-linearly separable data, how can we find a curve to fit this
data? Sigmoid functions such as the logistic sigmoid, can be altered and summated in
such a way that we can get them to fit our problem domain (see figure 26) which is
could approximate our problem.
There is no way to know how many sigmoid operations we need to get the solution to
fit the curve, and this is where the training of the neural network comes in which will
be described in detail later. Suffice to say that each of the hidden nodes will output a
value based on its sigmoid-based activation function, the output node's role is to
combine these results into something similar to the process of us combing these
sigmoid functions as seen in figure 26.
75
I1
I2
I3
act=I1W1+I2W2+I3W3out=sigmoid(act)
W1
W2
W30 to 1
Figure 26 Combining logsig(5x-2) + logsig(x+2) – logsig(2½x -12 )
-1
-0.8-0.6
-0.4-0.2
00.2
0.40.6
0.81
1.2
-15 -10 -5 0 5 10 15
X
Out
put
![Page 76: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/76.jpg)
Figure 26 could represent "anything above the line is in class 1 while beneath the line
class 2". This is precisely what the neural network should achieve.
This is why some classes of neural networks are good at function generalisation, you
train the network with the inputs of a function and after a training session it should be
able to mimic the output of the function with out actually knowing the exact inner-
workings of the function.60 Rather it establishes how the function operates through a
process of looking at the output of the function in comparison to its own output and
adjusting its internal so that it becomes more like that of the function it is trying to
mimic.
9.5.2 Types of Neural Networks
So far, I have discussed briefly what a neural network is while mentioning some types
of networks, now what follows is a brief discussion of the types of neural networks
available.
There are many main types of neural networks: feed forward networks; feedback
networks or recurrent network; Self organising networks, the list continues.
A feed forward network operates by passing the output from one layer of nodes to be
the input of another lower layer, hence the data feeds forward throughout the network.
The inputs are supplied to the network; the node then calculates the output value
based on its activation function and passes it on to the next layer as inputs to the
76
Figure 27 The Feed forward Neural Network
Direction of flow in a feed forward network
Hidden Layer
Output Layer
Weights
Weights
Inputs
![Page 77: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/77.jpg)
neurons in the layer below. Each node in the layer above can be attached to every
node in the layer beneath. Each connection is then assigned a weight and this weight
then acts as a "importance level" to the information being output from the node above
to the node below. Networks which have every node in one layer connected to every
node in the layer below are said to be fully connect. Fully connected networks are the
only networks going to be considered in this project for conciseness.
Figure 27 presents a feed forward network known as the MLP (Multi layered
Perceptron). The MLP is a good choice and is generally described by many textbooks
to be sufficiently easy to implement and understand, but also good at finding solutions
to problem domains.61
MLP's normally consist of an Input layer, followed by one or two hidden layers;
hidden layers are so called as we have no direct access to their inputs or their outputs;
finally an output layer is added, so that we can gather the results from the network.
The MLP can be considered a fully connected feed forward network of Perceptrons
with one or more hidden layers using a continuous output function for each of the
nodes.
The recurrent network, also know as a feedback network, will have some of the
hidden nodes supply their activation to a node (also known as a context unit) in a
77
Figure 28 A Recurrent Network
Direction of flow in a recurrent network
Input LayerHidden Layer
Output Layer
Weights
Weights
Inputs
context unit
![Page 78: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/78.jpg)
lower layer, the output from which is then used again in the nodes that supplied the
input to the context unit and essentially what happens is a feedback loop occurs
(figure 28). This type of neural network is normally used when temporal information
is being used, that is to detect patterns that vary over time. Further more discussion of
recurrent networks is out of the realms of this project.62
Many other types of neural network exist, but for brevity and simplicity, only the
feed-forward networks based on the Multi-layered Perceptron and techniques
associated with this will be considered. The number of hidden layers will be set to
one, as this is suitable for solving many non-linear problems whist having two hidden
layers will increase the number of test networks required and the training time
dramatically.
9.5.3 What Neural network to use?
The neural network that this project will use is called the Multi-layer Perceptron
(MLP). This type of network is a feed forward network, and will be trained using
Back-Propagation.
The network will consist of three layers: an input layer, a hidden layer and an output
layer (similar to figure 28), however it is known as a two layer network since two
layers work on the data. MLP's can have more hidden layers, but the effort that is
needed for training is greatly increased with every additional hidden layer,
additionally this design for a neural network, with one hidden layer is sufficient to
solve many non-linear problems, including function approximation.
Multi-layer Perceptrons have one or more layer of hidden nodes, there are no set rules
to the number of hidden nodes there should be so the best way to determine this is
through a process of trial and error. However, knowing roughly where to start (with
the number of hidden nodes) has come under consideration.
Carl G. Looney shows that 63:
Masahiko recommends K-1 hidden nodes, where K is the
number of distinct training patterns. This is not feasible since
78
![Page 79: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/79.jpg)
the number of unique training patters could be as high as 1400
since the data is in continuous form.
Hayshi shows that , where M is the number of
nodes, N is the number of inputs features and J is the number of
output nodes and c is some constant. This is more reasonable
method and by adjusting c, we can get a range of hidden nodes
from 5-10.
A network with too few hidden nodes will not be able to generalise sufficiently
enough, while a network with too many hidden nodes may require phenomenally
more training cycles to be able to converge on a solution.
The number of hidden neurons in the final neural network will be determined
empirically through a process of trial and error. There will be a set minimum (5)
number of neurons and a set maximum (10); it is then a simple task of establishing
which network structure performs the best and will be used as the final network. The
process of deciding which network outperforms the rest will be discussed later.
9.5.4 Training a Neural Network.
For a Multi-layered Perceptron training normally takes place using an algorithm
called back-propagation whilst taking advantage of methods such as gradient descent
and gradient descent with momentum.
Training is required so that the hidden nodes can learn to mimic the relationships
between the input patterns supplied and the output that the network should achieve.
A good training algorithm will know how to minimise the difference between the
desired output and the actual output of the network.
Back propagation operates by signalling the errors backwards through the feed-
forward network, each node in a layer will then adjust its input weights by how much
it determines it is responsible for the error in the output. Each node in the next layer
79
![Page 80: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/80.jpg)
of the network then repeats this process by determining the level of responsibility it
has in the error of the layer beneath it.64
Fundamental to the function of neural networks is the algorithm used to minimise the
error that the weights are responsible for. These algorithms are said to be training
functions, in essence they are provide the ability for the network to learn from its
mistakes.
There are many training functions available for a Multi-layer Perceptron each with
there own distinct advantages and disadvantages; however, the two following ones are
used in this project are Gradient Descent and Gradient Descent with Momentum, for
the following reasons:
Gradient Descent attempts to minimise the error of the output for the
neural network, by considering the weights to be a function of the error.
By adjusting the weights in a manner that that does not allow the error to
increase. Then following this downward trend (by following the
negative gradient) of the function, find a point where the error no longer
decreases, but also does not increase.65 The amount at which the weights
are changed is governed by:
1. The size of the error
2. A property called the learning rate which governs how large a
weigh change can be and thus how fast training can take place.
Gradient Descent with Momentum: By using the same process as
normal gradient descent but also taking into account the weight change
from a previous iteration of training (or epoch), the magnitude of the last
weight alteration (also known as the momentum coefficient) and the
suggested weight change for this epoch will effect the change in the
weights. This can allow a faster convergence a global minimum
solution as it can avoid small deviations in error functions an potentially
miss out the local minima.66
80
![Page 81: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/81.jpg)
Both of the training functions mentioned above are considered to be the de-facto
standard training algorithms used to train a MLP and consider the error of the network
as function of its internal weights between the nodes. Essentially meaning that
incorrect weights are responsible for incorrect outputs from the network.
Table 3
Error Function Local Minima using Gradient
Descent
Global Minima using Gradient
Descent with Momentum
An error function for a node.
As the weights get adjusted the
error is decreased. Point A is
the global minima, Point B is
the local minima and Point C
is the starting position.
Using Gradient Descent, the
back propagation algorithm
will follow the steepest
gradient from the starting
position as to change the
weights and reduce the error.
Unfortunately it get stuck in a
local minima (B) and can't get
out since either way left or
right result in an increase in the
error.
Adding a moment term into the
function, increases the change
in the weights after each
iteration depending on the
results of the previous iteration
and the suggested change for
this epoch. This may result in
the local minima (B) being
skipped over, and the global
minima(A) being found.
The principle factor with both of these training algorithms is the method in which they
employ to minimise the difference between the obtained output of the network and
that of the desired output of the network. In combination with the learning rate, the
size of the error has a direct effect to how much each of the weights should be
changed.
81
![Page 82: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/82.jpg)
Because it is extremely unlikely that the neural network will be trained after
presenting all the training patters once, it is necessary to keep presenting all the
training patterns until either we have reached a limit where we decide the network
will not be able to train any more (it may have reached local minima for instance) or
the network has been trained enough so we can be confident that it can classify all of
our inputs. Each time we one complete training cycle has been ended we say that an
training 'epoch' has finished.
Weight adjustment can take two different forms. The first is batch mode and the
second is incremental mode. Batch mode will find the error across the whole training
set after each epoch of training and adjust the weights accordingly using this error.
Incremental mode alters the weights after each pattern has been presented to the
network.
Batch mode will be used in this project for simplicity as it is the default that
MATLAB implements.
9.5.5 Training Method for the Feed forward Network
The training method used for the feed forward can be considered to consist of the
following steps.
Initialise Weights
Do
For each training pattern
Train the network on current pattern
End For
Until output error is sufficiently small 68
Before a network can be trained the weights must be set to random initial values, this
is done so that the activity of measuring the error between the desired output and the
actual output can begin. If they were all set to zero the weights would have the effect
of removing any of the input data into the network. The initial random weighting of
the networks offers much discussion in the field of neural networks. Suffice to say
82
![Page 83: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/83.jpg)
that certain combinations of random weightings will mean the neural network can
never be trained whilst other combinations can produce mediocre results or amazing
results. MATLAB provides default functionality when choosing random initial
weights, this functionality will be used when creating the neural network.
There is no need to develop the training function as this is already in place as part of
the neural network toolbox provided in MATLAB. However, the "Train the network
on current pattern", varies wildly from neural network to neural network. The neural
networks that will be developed in response to this project will be trained using the
gradient descent and gradient descent with momentum. Both of these are used with
back propagation. Other training methods while they may be better are out of the
scope of this project.
Training of the network will take place using selection ¼ of all the available data.
The network will be tested using another ¼ of the data to see how well it performs.
Finally, the network will be validated on the remaining ½ of the data to ensure that it
can generalise for data that it should never have seen69. Data is not taken in sequential
order, but from equally distributed points through out the data, this is standard
practice as it can help to eliminate measuring the a customers aggregated account
details twice and give rise to a better spread of sampling data.
Training using fixed parameters for the learning rate and the number of training
sessions (epochs) alone is not sufficient to be able to successfully determine if this
project has succeeded. Therefore, a variety of epochs ranging from 500 to 3500 will
be used along with a sufficient number of variations of the learning rate. By adjusting
the learning rate, we may achieve faster convergence on a solution, it may also enable
us to miss local minima with respect to the error function. For the same reason,
gradient descent with momentum is used (to try and avoid getting caught in local
minima).
9.5.6 Problems Which can be Encounter when Training
When training the neural network we must be aware of two problems. The solution to
these problems have already been defined, but the reasoning has not been justified.
83
![Page 84: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/84.jpg)
When the network is training we can reach three conclusions:70 71
1. The network was not trained enough. It cannot classify enough
of the training data for us to say it will be of any use to us when
attempting to classify data it has never seen before.
2. The network can be over trained. An over trained network may
correctly classify all the data it has be trained on, but it will be
too inflexible when it comes to classifying data that it has never
encountered before.
3. The network can classify most of the input correctly, but it has
not been trained to such an extent that it cannot correctly classify
data that it has never seen.
Obviously the third situation is the most suitable and obtained by varying the number
of epochs that the network uses to train itself on the training data. We can then get a
selection of neural networks from which we choose one which is the best.
9.5.7 Inputs defined in the NN.
When considering how to design the neural network it is important to understand the
relation of the inputs to the neural network with respect to the output produced by the
neural network. The adage "garbage in, garbage out" is pertinent, if the neural
network through its training regime cannot discern any semblance of a pattern, the
likelihood of the correct classification for the data is drastically reduced. This is the
reason why research into the telecommunication industry was carried out, if none had
taken place then more likely than not a neural network that classifies a customer's
account correctly will not be created.
Once input data has been generated it must be processed. By reducing the number of
inputs to the network, pre-processing helps the network to learn more effectively, as
the likelihood of data correlation being identified is greatly increased. Two methods
commonly used are transformation and normalisation.
84
![Page 85: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/85.jpg)
Transformation can combine the generated data inputs into a single input; that is
altering the input in such away that input still represents something close to what the
pre-combined inputs meant. This implies that we can represent the same information
with fewer input nodes.72
Normalisation alters an input feature such as call duration so to distribute the data
evenly across the data set and to scale the data to match the data range of the input
neurons (so that the input data can be correctly scaled between -1 and 1 or 0 and 1
depending on activation function used and no matter what the data, it will always be
in the range specified). 73
In most neural network applications, transformation involves statistical manipulation
of the input data. For example, to understand the quantitive value of a customer call
account, the types of information that is pertinent must be established, for instance a
mean value of a customer's account attribute is not sufficient on its own, the standard
deviation or variance must also be included, so that the nature of the average can be
ascertained. For instance if the mean call duration is 20 minutes, we would want to
know how spread of the call durations, as a smaller the spread could indicate that
some automatic call device is being used.
Therefore the processed inputs may include items differences and ratios, rather than
separate inputs. Each of the neurons in the network's input layer will represent one of
the processed inputs.
Many transformation and normalisation methods exist, however some are more
appropriate than others, for specific applications. Now that the basic network
architecture has been decided upon (a MLP using back-propagation), it is important to
establish what inputs must be used in the network. These inputs will be the final ones
used, and thus it is important to establish why they have been chosen. If in the
telecommunications industry, rather than using a model generation tool which will no
matter how well developed never truly represent the client base. Analysis of the
inputs to the network would be far more in-depth.
85
![Page 86: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/86.jpg)
However, one must also consider the speed at which the data can be extracted from
the database. It is no good extracting information about a customer from the database
if it takes you 3 hours to get that piece of information.
9.5.8 Proposed inputs.
On initial inspection, it may seem logical to have every call type (PRS, free, local,
national, mobile and international) available as an input to the neural network, with
associated information, such as average call cost and average call duration, along with
the spread of the data for each.
However, this method will mean that for our test situation approximately 30 inputs
will be required (six call types each with at least four inputs). This is too many; too
many inputs will mean that it is harder for the network to learn how to generalise, also
some of the inputs maybe redundant having no effect on the output.
There are several ways to deal with this; one is by a process of elimination and
combination (remove ones we know are not having an effect, or combine several
inputs), the other method is via automated principle component analysis, which
attempts to find components in the data that have high correlations.
I propose rather to have a simple segregation of the call types: high risk and low risk.
This can greatly reduce the number of inputs that the network requires to use.
High Risk calls are international calls, mobile calls and PRS calls, whilst low risk
calls can be considered to be Local, National and free phone numbers. All inputs to
the neural network are shown in table 4.
Table 4 Input Parameters to the neural network
Input Description
Number of Low
Risk Calls
The number of low risk calls made during the period of analysis.
Number of High The number of high risk calls made during the period of analysis.
86
![Page 87: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/87.jpg)
Risk Calls
Incoming calls in
the period
The number of calls made during the period can be used as an indicator
of fraud. If the customer makes a plethora of calls whilst receiving a
relatively small proportion of call (in comparison to the number of
calls made) then the customer might be running some sort of call
selling scheme.
High risk ratio A ratio of high risk calls as a proportion of all the calls made, can be
indicative of a fraud occurring.
Low risk ratio A high ratio of low risk calls in proportion to all the calls made could
indicate that the customer might not be acting fraudulent, with regard
to scams such as call selling.
Low risk average
cost per call
Used to establish the average cost of a customers calls that have been
graded as low risk. This is present to counterbalance the high risk
average call cost feature from consuming the neural networks decision
to distinguish between fraud and non-fraud customers.
High risk average
cost per call
If the average cost for all the calls is high then, it could indicate that
some sort of fraud is occurring such as call selling or PRS fraud,
combining this with a high ratio of high risk calls could be a further
indication of fraud taking place as a lot of money is being spent on
calls.
Low risk cost per
call standard
deviation
This like many of the other features of the network, is to balance out
the effect of the high risk features.
High risk cost per
call standard
deviation
If the standard deviation of the high risk call costs is small, it may be
indicative of a fraud scheme such as those committed by PRS
fraudsters. They might be using a some sort of automated dialler.
Low risk average
call duration
Used to establish the average duration of calls that have been classified
as low risk. This is present to counteract the high risk average call
duration feature from overpowering the neural networks decision to
distinguish between fraud and non-fraud customers.
High risk average
call duration
A high average call duration on high risk calls might be indicative of a
customer making long duration calls to PRS or International numbers.
This might therefore be indicative of fraud.
87
![Page 88: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/88.jpg)
Low risk call
duration standard
deviation
This like many of the other features of the network, is to balance out
the effect of the high risk features.
High risk call
duration standard
deviation
A low standard deviation of call durations, might indicate an
automated call dialler scheme is in operation, since these may operate
using fixed redialling machines that will stay connected for a set period
of time.
You may ask, why are both high risk and low risk properties of a customers account
being considered as inputs to the neural network?
Take for example a company who may operate both nationally and internationally. A
proportion of the calls will be high risk and likewise some calls will be low risk. If
we only considered the high risk calls, then the customer may appear to be fraudulent
and the neural network may not be able to pickup on this relationship (there is no
relationship since high call durations to international numbers could either be
fraudulent or not).
Therefore, if we have the ratios and properties of low risk calls as inputs, the network
should be able to establish that many high-risk calls in the presence of relatively few
low risk calls could be deemed more fraudulent than lots of high risk calls I the
presence of plenty of low risk calls.
Each of the inputs will be bi-polar, thus will be normalised between -1 and 1. This
has the effect of resolving the issues of having one of the network inputs as a zero,
which can effectively turn off a node. If an input is zero, the node receives the input
multiplied by the weight, which will still be zero; it may be that this feature is an
important and the network will now not take any notice of it. Therefore, if the input is
-1, the network can operate on this feature and still understand that it is distinctive of
one class of data.
88
![Page 89: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/89.jpg)
9.5.9 Consideration of the Data Being Presented to the Network
The data that is being presented to the neural network is aggregated over two week
periods (as mentioned in section 9.3.4.6).
To ensure that enough data is being trained on, two months worth of data is being
generated by the call generation tool. Therefore, for each customer four windows
onto the data are being presented to the network, each starting and ending at the two
week boundaries.
More information from each customer could be provided if the sliding windows
algorithm were to be used (increment the start of the window by one day). However,
the nature of the call generator will mean that though the data generated is random,
and even if the windows on to the data were moved, the customers calling patterns
should still be the similar, as their day to day activity over a two week period will not
vary to a large degree. To keep things simple and efficient the windows onto the data
are simply spaced on the two week boundaries. This also will help with the training
as only one pattern for each customer is trained on, implying that the network will not
be trained on any one customer more than any other, this should help with the
generalisation ability of the network as it has been given the broadest representation
of customers available.
9.5.10 Consideration of the Output of the Network.
The output of the network is probably one of the most important parts of the neural
network. If the output is wrong or is ambiguous then the results obtained will mean
that any FMS based on this idea of a neural network will not work.
When training the network the output of the network will be compared against a 1 for
a non-fraudulent account and a -1 for a fraudulent account. This decision has been
taken for several reasons; the first being that the activation function on the other nodes
in the network are bi-polar, it stands to reason that keeping things even will be better;
the second is that MS Access stores a 'FALSE' value in a Boolean field as -1, so to
89
![Page 90: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/90.jpg)
save having to transform every network output keeping the output at -1 is
recommended.
When running data through a completed network, the output will therefore be in the
bounds of -1 and 1, but it will never be actually able to give the output of -1 and 1 due
to the nature of the tan sigmoid function (the results will get infinitesimally close to
the two limits but never reach them). The closer the data is to any of the bounds then
the more confident the neural network is that the input which was supplied is of a
certain class.
The network will also require a threshold function which will say above this value the
customer account for the specified period has been deemed to be non fraudulent,
otherwise it is fraudulent.
There are two schools of thought when using a
threshold function for the output; you can use two
thresholds or one. The two threshold method will
state for threshold 1, anything above the value is clear;
the second will say anything beneath my value is
fraud; finally anything lying in the middle ground will
be considered fuzzy (i.e. not too sure).
The single threshold system has been chosen because if a customer account appears in
the "not too sure" region in the two
threshold scheme, it would be wise to
still investigate the customers account
further. Therefore, the distinction
between the two methods has been
diminished as all fraudulent accounts
always have to be investigated before
a fraud analyst can take further action
to prevent the customers access to the
telecomm network. Therefore, all accounts which they are not too sure about will
90
Figure 29 Dual Threshold System
Figure 30 Single Threshold system
![Page 91: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/91.jpg)
also have to be investigated. The fraud analysts might as well have them classified as
fraud in the first place.
The output of the network will only use one node for simplicities sake, however more
can be used, depending on the number of output classes needed. This project only
requires one output because the data can be split into two separate classes (clear and
fraudulent, clear being signified as 1 and fraud as a -1). However, we could have had
two output nodes, one for each class.
9.6 Neural Network Creation Tools Design
Several tools have to be designed to facilitate network training, data extraction and
performance analysis. It is important to create software that will perform the
functions, because even though they can be easily created by hand with MATLAB,
analysis of the performance and the relevance of the networks cannot take place
9.6.1 Training Tool
The focus of the training tool is to generate a large set of networks that can later be
analysed. The reason why a large set of neural networks need to be created is,
because depending on the initial weights when training the network many never
converge on an optimal solution; it may never converge on any sort of solution at all.
Before any form of neural network can be recommended as an appropriate solution, it
must be shown that the one being presented is the best one available, as if other
networks could perform better why are they not being shown as the optimal solution.
Training takes place by varying the number of nodes in the hidden layer. This ranges
from 5-10 hidden nodes. Each network of the different number of hidden nodes must
each be trained in similar fashion. The manner in which each of these different types
of networks are trained can be described as follows:
The number of epochs must be altered, as to ascertain if extra
training does affect the overall performance.
91
![Page 92: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/92.jpg)
The learning rate must be altered. To small a learning rate
might cause the network to get caught in locally optimal
solutions or may mean that the network will take too long to
converge on a solution, while learning rates which are too
large may never be able to converge on a solution at all as the
changes in the weights might be too large instead the weights
would oscillate around the minimum solution but never
actually achieve the desired result.
Different data combinations must be trained on, since if we
did not randomly choose the data the network was to train on
the network might not have a diverse enough range of inputs
to learn from, so that its future ability to generalise answers
to unseen inputs would be severely diminished.
The algorithm used is depicted in the figure 31 and is the same algorithm for both the
normal gradient descent method and gradient descent with momentum all that is
changed is the training function which MATLAB uses to train the network.
92
![Page 93: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/93.jpg)
Figure 31 Training Tool Data Flow
93
Start
End
Increment training epoch
Separate the data into training, test and validation sets choosing any 1 of 4 combinations of data
Normalise each of the data sets
Increment hidden node count
Increment learning rate (lr)
Train the network with x nodes, y learning rate and z epochs
Store network parameters
Store performance information
Simulate network, with test and validation data
All 4 combinations
complete
All lr combination
complete
All hidden nodes
combination complete
Yes
Yes
Yes
Yes
No
No
No
No
All epochs have been completed
![Page 94: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/94.jpg)
9.6.2 Data Extraction
The data extraction process is a simple piece of the development. Without the data
extraction method, there would little or no way to get the data
out of the Microsoft Access 2000 database. Essentially, all that
happens is MATLAB connects to the database via an ODBC
connection specified as a 'User DSN'. Once connected the
SQL statement is passed through the connection. Following
this, the results are then 'fetched' from the database and stored
in a MATLAB variable.
There are two main types of data extraction queries. The first
query simply returns the information about each customer in
bi-weekly summaries. The second methods is then issued to
ascertain if each of these summaries is either fraudulent or
clear.
9.6.3 Performance Analysis and Testing
The performance analysis takes the form of throwing away the networks that could
never classify any of their data and then establishing which of the remaining neural
networks performed the best.
The performance of a neural network can be measured using the mean squared error
(MSE) on the output of the network once it has been trained. The MSE represents the
average error (difference between the expected output and the actual output) on the
output for all of the input patterns presented to the network. Although this might
seem to be a good indicator of performance; alone it is not. The network might have
been trained too much on one subset of the data; therefore, its ability to classify
94
Figure 32 Data extraction tool data flow
Start
End
ODBC Connect to database
Process SQL statement
Store Results
![Page 95: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/95.jpg)
unknown and unseen data would be severely reduced. It is also hard to visualise what
the output of the MSE signifies in relation to how many completely erroneous results
occurred (for instance 10 customers might be incorrectly classified, but the MSE
could still be low, if the rest of the customers are classified with little error). This is
where the Receiver Operating Characteristic (ROC) curve comes in. The ROC curve
can give us a visual representation of how well a neural network will work by
showing misclassification rates.
The first step of the performance analysis is to generate a ROC chart for the data that
the neural network has been trained on. The ROC chart can be used to analyse the
ratios of false negatives, false positives, true positives and true negatives. The area
under the ROC curve is considered a good indicator of how well classification has
been performed.
For this project, the following guidelines for case classification have been setup. A
non-fraudulent account is considered as a positive (because this is how it comes out of
the database), many systems consider a fraudulent account as positive, but both
methods are interchangeable.
If
we measure the performance with consideration of the non-fraudulent customers in
mind, we can establish the levels at which we can get 100% classification of non-
fraudsters and roughly what percentage of fraudster will be classified as non-fraud (if
we want every customer who is not fraudulent to be classified so). This is important
since totally missing the fraudsters will mean lost revenue, saying a customer is
fraudulent when they are not (although important) may not lose the company as much
money.
95
True Positive A correctly classified non-fraudulent customer account.
False Positive An incorrectly classified non-fraudulent account
True Negative A correctly classified fraudulent customer account.
True Positive An incorrectly classified fraudulent account
![Page 96: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/96.jpg)
A ROC chart is generated by plotting all sensitivity values (true positive fraction) on
the y-axis against their corresponding (1 - specificity) values (false positive fraction)
for all available thresholds on the x-axis. 74 75
Y-Axis (Sensitivity):
Figure 33 Y-Axis for ROC Chart (Sensitivity)
X-Axis (1-Specifity)
Figure 34 X-Axis for ROC Chart (1 - Specifity)
For each threshold value in the network (ranging for -1 to 1), the sensitivity and the 1-
specificity are calculated and plotted. In the final network we therefore locate the
point where the network has the highest number of correctly classified results is
obtained.
The area under the ROC curve is used as an indicator as it can provide an overall
score to how well the neural network performed no matter where the threshold value
is located. The larger the area of the ROC chart means that there is less overlap
between the two classes (one class being judged as another class/non-fraudulent
accounts being classified as fraudulent), an area of 0.5 means that the tool being used
96
Number of Positive (Non-Fraud) Accounts
(Number of Positive (Non-Fraud) Accounts
+
Number of accounts which should have been classified Non-fraud
but were classified as fraud)
Number of Negative (Fraud) Accounts
(Number of Negative (Fraud) Accounts
+
Number of accounts which should have been classified fraud but
were classified as non-fraud)
1-
![Page 97: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/97.jpg)
to judge which class the input is in, cannot distinguish between either of the two
classes of data.
The area under the graph can be determined using the trapezoidal rule.76
By only considering the networks which have an area under the ROC chart of more
than 0.6 we can remove the networks that will never be able to classify any results.
The results of one such network, which could not classify the result correctly, can be
seen in figure 35. (An area of less 0.6 will mean that there is a high overlap of results
and that only 60% of the time a non-fraudulent customer will have a score from the
neural network above the threshold for deciding if the customer is clear or not).
Figure 35 An incorrectly trained neural network ROC depiction
As can be seen in figure 35, the area is less than 0.6 and when looking at the results,
and essentially the graph is indicating there was no classification of accounts where
all the fraudulent accounts were not included in the non fraudulent set of accounts,
however there were no non-fraudulent customers classified as fraudulent. If this was
the case the curve of the graph would be bowed and look something like an
exponential curve.
97
![Page 98: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/98.jpg)
Figure 36 Actual output of an incorrectly trained network
Figure 36 is further proof that not a single fraudster was classified correctly (a value
of 1 being clear and -1 being
fraudster).
The next ROC chart (figure 37)
shows a network whose
performance was extremely
good. The networks can
correctly classify
approximately 90% of the non-
fraudulent customer accounts,
without classifying a single one
of the fraudulent accounts as
non-fraudulent.
However, according to the figure 37, if we want to correctly classify 100% of the non-
fraudulent accounts we must accept that about 8% of the fraudsters will be classed as
clear. That is to say 8% of all the fraudsters would manage to slip through the net if
we wanted to ensure that not a single person was falsely accused of fraud.
98
Figure 37 ROC Chart for a working neural network
![Page 99: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/99.jpg)
The second stage of the performance analysis is to establish how well the network
performs on a totally new data set. This method is used to establish if the neural
network can work with data that it has not been trained on. If it can, then we will see
very little difference between the two areas underneath each of the ROC charts
generated on the separate sets of data. If not then there should be a decrease in the
area under the second chart, implying that the networks performance also decreased.
Initially we remove any of the networks where the area under the ROC chart for the
first set of data is less than 0.6, which indicates that the network may have classified
one of the two output classes (Fraud or Non-fraud) 100% incorrectly. We then
compare the results of each of the neural networks against both data sets. The
complete process can be seen in figure 38.
99
![Page 100: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/100.jpg)
Figure 38 Data flow for establish the performance of the neural networks
9.6.4 Establishing the Most Appropriate Threshold for the Final Network.
Now that the final network has been established, a value for the threshold must be
given; the threshold is the value where all the call patterns whose neural network
output is greater than the threshold will be considered not to be fraudulent, while the
output of the network which lie beneath the threshold will be classified as fraudulent.
This value will be ascertained by hand because it is assumed that if the network works
well then there will be a visible distinction between the boundaries of both classes.
100
Check ROC area > 0.6
Test network on new data
Calculate new area under ROC
Establish difference between areas
Start
Finished testing all networks?
Find the network with the smallest difference between both ROC charts, and the highest area for the chart.
Show network Information
End
No
Yes
![Page 101: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/101.jpg)
9.6.4 Testing the Network Creation Tool.
The principles that are being used to test the CDR generation tool (section 9.3.4.8)
will be applied to the tools used to create the neural networks. That is black box
testing will take place on the functions. Because I am not making a full program in
the prototype, I only see the need to create functions that will help me speed up my
analysis of the results. Essentially, what is being created is a set of Macros in
MATLAB and the testing will provide proof that the functions work as specified.
Test plans and results are supplied in the appendix (16.4.2)
9.7 Models Generation
The models used to generate the call data are important aspects of the system. If they
are incorrect then we cannot prove they will help the fraud analysts to detect fraud.
Having too few call models may mean that they neural network was able to model the
problem too easily, but having too many may mean that the neural network is unable
to establish the features associated with each customer class. This is an important fact
to consider as there are more call classes and customer belonging to a telecoms
company than I can generate or research myself.
9.7.1 Methods to generate the best models.
There are several issues that need to be dealt with when designing the models. These
issues can be summarised as follows:
1. I am unable to obtain real call data that is generated by telecom companies;
2. The classes of customer are unknown, the best that can be done is to think of
the types of people there are using the system;
3. The call patterns for the classes is also unknown, so the best that can be hoped
for is to mimic call patterns after some research.
The best method I could find to generate the customer classes is to look at the phone
bills of friends and family and also of companies that I am associated with. Even
though this will only account for a small cross-section of the customer accounts on a
101
![Page 102: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/102.jpg)
telecomm system, I can use the data I have gathered first to model how the group the
participant would appear in acts. Additionally, extrapolate from this and say
something along the lines of "okay this is how company X operates who are a small
company with 5 people in, therefore, a medium sized company with 30 employees
might have 9 times as many call, with more being international". Obviously, this
method is not the most accurate way to model customer classes and is very flawed,
but it is the best bet for this project.
There were two stages of model creation; the first ran into problems because there
was little overlap between model groups and the neural network classified them too
well. This effectively meant the models generated were not like what might be seen
in real world telecom fraud analysis. This problem will be discussed further in the
evaluation of the results.
The quantity of data being presented to the networks is also important. If we trained
the network only on fraudulent accounts it would not be able to classify clear accounts
and vice versa. If the number of fraudulent accounts presented to the network is more
that the proportions of fraudsters we seen the wild, then this prototype tool would be
useless because it would be too sensitive to fraudulent information and would more
than likely classify non-fraudulent people as fraudulent.
Determining the proportions of fraud is an important step, as mentioned in section
7.4.2 approximately 3% of telecoms revenue are lost due to fraud. We could
therefore say that 3% of the customers are fraudulent and therefore if we had 1000
customer accounts about 30 of them would be fraudulent. This is a rough estimate
since if a person is trying to defraud a telecomm company are they only going to
make a few calls? more likely they are going hit the telecom company for all they can.
Therefore the number of fraudsters is probably less in proportion to the revenue lost
by all fraud. However telecommunication companies do not publish this information,
so sticking with roughly 3% of all customers are fraudulent is a good idea.
The second set of models (which are the ones used in the final tests) allow for overlap
between fraudulent and non-fraudulent customers. The models and reasons for
selection will be discussed next.
102
![Page 103: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/103.jpg)
9.7.2 Brief discussion about the models used.
A more in-depth discussion of the models can be found in the appendix (section 16.5),
but what follows in table 5 is an outline of all the models used.
Table 5 Models used in the creation of the customer data
Model Name Model Description No of
Accounts
Normal Average A normal customer making normal use of the
network
200
Normal No
International
A normal customer who makes no international
calls whatsoever.
200
FRAUD - CALL
SELLING International
An International Call selling operation 6
FRAUD - PRS A PRS scam which makes many of short
duration calls all of a similar period
12
FRAUD - PRS 2 A PRS scam which makes lots of long duration 10
Business - Shop A small shop – has one possibly two phone
lines
100
Business - Small A small business – has about 5 phone lines 50
Business - Medium A medium business – has about 10 phone lines 40
Business - Large A large business – has many phone lines 20
Home - Internet Access A home user who makes long duration calls on
the internet, and automatically redials when cut
off.
200
Home - Plenty Of
International
A home user who makes lots of international
calls for a relatively long period of time
200
Fraud Home Call Sell
PRS Hidden
PRS scam, which is hidden inside what appears
to be a normal customer account
10
The ratio of fraudulent customers to clear customers can be shown to be:
1010 clear customers; 38 fraudulent = 3.8% of customers are fraudulent.
103
![Page 104: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/104.jpg)
The above models represent as best that can be with the model generator classes that
are not easily separable from each other, this can be seen by such accounts as the
Fraud Home Call Sell PRS Hidden where PRS fraud is hidden with he guise of a
normal customer. Another example is a normal customer (Home - Plenty Of
International) who might have the properties of a call seller , this could be a customer
who wants to call relatives abroad.
The neural network must be trained to a level where it can understand the ambiguity
in the customer information.
104
![Page 105: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/105.jpg)
10. Analysis of Results
What follows is the analysis of the results from the neural networks. Each individual
network that was created will not be discussed, but rather the final networks which
have been to perform the best.
Each network class (number of nodes in the hidden layer) will be combined into a
super group of neural network. A new set of data will then be presented to each
network in this super group; this data has never been seen before and will differ from
the data that each of the networks has been trained on.
This data will be generated using the same model parameters, but because of the
method in which the random data is generated, no two customer call patterns will be
the same and therefore reinforce the notion that neural networks can generalise a
solution from data that they have never been aware of before.
The final network that will be chosen will be the network that has performed the best
across both sets of data. Along with a small conclusion as to why I think this network
outperformed the rest of the networks.
Neural networks with an area of less than 0.6 will be discarded and not considered,
this is because they have not been able to ascertain when a fraudulent account is
fraudulent.
105
![Page 106: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/106.jpg)
10.1 Overview of how to study the graphs
When each network has been produced, various performance functions are used to
measure how good the network was at classifying the results. The performance
functions of this project with respect to the networks are fourfold. It is important to
understand these graphs as they are presented on a CD in the appendix as there are too
many to reproduce in this project.
1. The performance function of the training set after each epoch.
Figure 39 Performance of the training algorithm
The graph is based on a semi-logarithmic scale and represents how the
error has been reduced through the training of the network. After each
epoch the 'mean squared error' is established and plotted. Here we can
see that after the 1500 training sessions the mean squared error (or the
performance) was 0.0112287. This graph was produced using a 7
hidden node network, trained with a learning rate of 0.1 and had 1500
training cycles using normal gradient descent as the error reduction
function. As can be seen, initially the rate of change in the error was
106
![Page 107: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/107.jpg)
high; it then petered off until and remained pretty constant for the
remainder of the training set.
2. The output of the network in relation to the expected output of the
network after the test stage.
Figure 40 Output of the neural network after the test stage
This graph shows that the network was trained using approximately
1200 different customer profile patterns.
The blue dots represent the desired output of the network and can be
clearly seen at the extremes of the output. The red dots represent the
output that the network produced (which can be seen in black and
white as the points not at the extremes) , while the green line represents
the difference between the desired output and the actual output.
Although some of the predicted output is far away from the expected
output; accurate location of the threshold value will still mean that they
107
![Page 108: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/108.jpg)
are correctly classified, however as the magnitude of the error nears to
two it signifies that this particular pattern will almost certainly never
be classified correctly.
3. The output of the network in relation to the expected output of the
network after the validation stage.
Figure 41 Output of the neural network after the validation stage
The above graph is based on the same principle as the training test.
However, this time it is ran on the remaining set of the data, which the
network has never seen. It is simply used to visually classify the
performance of the neural network. The more green lines that are
present which near the magnitude of two then the likelihood that the
network will not be able to classify the customer correctly, as this is
the data that has never been seen by the network before.
This graph is also a good indicator of future performance of the
network as the data presented has not been seen before, so if it can
108
![Page 109: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/109.jpg)
classify these well then it is likely that it will be able to classify other
unseen data just good.
The ROC is a better classifier of performance.
4. The ROC graph showing the classification level of the network, the
number of true positives and true negatives (sensitivity) against the
number of false positives and false negatives. Essentially this is the
misclassification rate.
Figure 42 ROC Chart
The above graph represent the a network with 7 hidden nodes trained
on the second available data set with a learning rate of 0.1 and had
1500 training cycles using normal gradient descent as the error
reduction function.
What we can see is that given any threshold value (above and
including which accounts are considered clear and beneath which they
109
![Page 110: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/110.jpg)
are deemed to be fraudulent) we can expect roughly 80% correct
classification of non-fraudulent cases without having a fully
misclassified fraudulent customer account. However, if we want to get
100% correct classification of True positive events (non-fraudulent
customers), then we have to have to accept that 10% of the cases
presented as non-fraudulent will be misclassified (False Positives)
fraudsters (i.e. they have slipped though the net).
All 17,000 performance graphs are available on the CD supplied in the appendix.
For a network to be considered as the final network the difference between the area
under the two ROC graphs should be minimal. The performance is quantified in the
following manner:
Establish the maximum difference of the size of the areas
between nodes that have not been discarded. This is then 100%
difference.
Every network is then classified as a percentage of the maximum
distance.
The networks with the lowest percentage area difference are
considered.
The network with the largest area is put forward to be final
network.
To find the final neural network, I could simply find the network which has the
highest are under the ROC curve and the smallest difference between the two sets of
data, however to be more through a more in-depth analysis of each of the groupings of
nodes will take place. The final network will be represented by the best performing
network out of all of the networks generated, but also with a description to why it has
been chosen.
Number of training failures taken into account for each different size network (the
number of hidden nodes). The more failures when training means we are more likely
not to get a satisfactorily trained network using that particular network architecture.
110
![Page 111: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/111.jpg)
10.2 5 Hidden Nodes
Figure 43 MATLAB depiction of a 2 Layer network with 5 nodes in the hidden layer
A hidden layer with 5 nodes, on average after training, the network had the following
performance classification based on the area under the ROC chart.
Table 6
Difference
Percentage
Networks in this group Percentage total of fully
trained networks
0-10 26695%
10-20 104%
20-30 31%
30-40 00%
40-50 00%
50+ 10%
432 Networks in total with 299 fully trained networks
280/432 * 100 = 64.81% successfully trained networks
Network proposed as final network from this group:
Internal Name: NN5_T4v2_LR0.4_EPOCHS3500
This network was trained using a learning rate of 0.4 with 3500 training epochs.
The area under each of the ROC graphs
Area under ROC1: 0.9994
Area under ROC2: 0.9994
The performance of these networks are good, nearly 65% managed to be trained. The
highest performing network could classify 100% of the data patterns presented to it,
111
![Page 112: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/112.jpg)
however better performing networks were present. As can be seen in table 6 a lot of
the networks had very similar performance and the areas under the ROC curves did
not differ much for the majority of the networks.
10.3 6 Hidden Nodes
Figure 44 MATLAB depiction of a 2 Layer network with 6 nodes in the hidden layer
The networks with a hidden layer with of 6 nodes, on average after training had the
following performance classification based on the area under the ROC chart.
Table 7
Difference
Percentage
Networks in this group Percentage total of fully
trained networks
0-10 18373%
10-20 73%
20-30 3715%
30-40 83%
40-50 52%
50+ 114%
432 Networks in total with 251 fully trained networks
251/432 * 100 = 58.11% successfully trained networks
Network proposed as final network from this group:
Internal Name: NN6_T5v2_LR0.5_EPOCHS3000
This network was trained using a learning rate of 0.5 with 3000 training epochs.
Area under ROC1: 0.9993
Area under ROC2: 0.9993
112
![Page 113: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/113.jpg)
The performance of these networks are good, nearly 58% of the networks were
successfully trained. The highest performing network could classify 100% of the data
patterns presented to it, however better performing networks were present. As can be
seen in table 7 a lot of the networks had very similar performance and the areas under
the ROC curves did not differ much for the majority of the networks, however more
networks in this node group differed from each other than in any of the other network
groups, suggesting that the training performance for these networks was erratic or that
some networks had been over trained.
10.4 7 Hidden Nodes
Figure 45 MATLAB depiction of a 2 Layer network with 7 nodes in the hidden layer
The networks with a hidden layer with of 7 nodes, on average after training had the
following performance classification based on the area under the ROC chart.
Table 8
Difference
Percentage
Networks in this group Percentage total of fully
trained networks
0-10 18781%
10-20 3314%
20-30 42%
30-40 42%
40-50 10%
50+ 31%
432 Networks in total with 232 fully trained networks
232/432 * 100 = 53.07% successfully trained networks
Network proposed as final network.
Internal Name: NN7_T6v4_LR0.6_EPOCHS3500
113
![Page 114: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/114.jpg)
Area under ROC1= 1
Area under ROC2= 1
The highest performing network could classify 100% of the data patterns presented to
it.
The performance of these networks are good, 53% of the networks were successfully
trained with 81% of the networks performance deviating little between both datasets.
As can be seen in table 8 a lot of the networks had very similar performance and the
areas under the ROC curves did not differ much for the majority of the networks, a
fair few of the networks performance differed quite a bit again suggesting that
possibly some training anomalies arose and perhaps some networks were over trained.
10.5 8 Hidden Nodes
Figure 46 MATLAB depiction of a 2 Layer network with 8 nodes in the hidden layer
The networks with a hidden layer with of 8 nodes, on average after training had the
following performance classification based on the area under the ROC chart.
Table 9
Difference
Percentage
Networks in this group Percentage total of fully
trained networks
0-10 23895.58%
10-20 72.81%
20-30 10.40%
30-40 00.00%
40-50 00.00%
50+ 31.20%
432 Networks in total with 299 fully trained networks
249/432 * 100 = 57.63% successfully trained networks
114
![Page 115: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/115.jpg)
Network proposed as final network:
Internal Name NN8MOMENTUM_T5v1_LR0.5_EPOCHS1500
Area under ROC1 = 0.9994
Area under ROC2 = 0.9994
The performance of these networks are good, nearly 57% of the networks were
successfully trained. The highest performing network could classify 100% of the data
patterns presented to it, however better performing networks were present. As can be
seen in table 9 a lot of the networks had very similar performance and the areas under
the ROC curves did not differ much for the majority of the networks. The network
trained on gradient descent with momentum performed the best this time and as can
be seen needed only 1500 epochs to be trained, compare this to the networks trained
using normal gradient descent which to get good performance needed in the range of
2500 – 3500 training epochs.
10.6 9 Hidden Nodes
Figure 47 MATLAB depiction of a 2 Layer network with 9 nodes in the hidden layer
The networks with a hidden layer with of 9 nodes, on average after training had the
following performance classification based on the area under the ROC chart.
Table 10
Difference
Percentage
Networks in this group Percentage total of fully
trained networks
0-10 21394%
10-20 63%
20-30 31%
30-40 00%
40-50 00%
50+ 42%
115
![Page 116: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/116.jpg)
432 Networks in total with 299 fully trained networks
226/432 * 100 = 52.31% successfully trained networks
Network proposed as final network:
NN9_T6v2_LR0.6_EPOCHS2500
Area under ROC1 = 0.9995
Area under ROC2 = 0.9995
The performance of these networks are good, nearly 52% of the networks were
successfully trained. The highest performing network could classify 100% of the data
patterns presented to it when given a threshold, however better performing networks
were present. As can be seen in table 10 a lot of the networks had very similar
performance and the areas under the ROC curves did not differ much for the majority
of the networks. The network trained on gradient descent with momentum performed
the best this time and as can be seen needed only 1500 epochs to be trained, compare
this to the networks trained using normal gradient descent which to get good
performance needed in the range of 2500 – 3500 training epochs.
10.7 10 Hidden Nodes
Figure 48 MATLAB depiction of a 2 Layer network with 10 nodes in the hidden layer
The networks with a hidden layer with of 10 nodes, on average after training had the
following performance classification based on the area under the ROC chart.
Table 11 Results for the 10 node network
Difference
Percentage
Networks in this group Percentage total of fully
trained networks
0-10 20595.35%
10-20 62.79%
116
![Page 117: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/117.jpg)
20-30 00.00%
30-40 10.47%
40-50 00.00%
50+ 31.40%
432 Networks in total with 299 fully trained networks
215/432 * 100 = 56.94% successfully trained networks
Network proposed as final network.
NN10MOMENTUM_T1v1_LR0.1_EPOCHS3500
Area under ROC1 = 0.9994
Area under ROC2 = 0.9994
93% of the 251 networks that were successfully trained and overall varied little in
comparison to the rest of the other networks groups. The best performing network
was trained using gradient descent, but still needed a lot of training; this is contrary to
what I initially expected where I thought that networks with gradient descent and
momentum would have required significantly less training to reach the same
performance levels as the networks trained just on gradient descent alone.
10.8 Final Node
The final network; the network that I propose to use for the system has the following
properties:
Name: NN7_T6v4_LR0.6_EPOCHS3500
Inputs: 14
Hidden Neurons: 7
Output Nodes: 1
Learning rate: 0.6
Training data set: 4
Epochs: 3500
Training Algorithm: Gradient Descent
117
![Page 118: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/118.jpg)
The reasons for this choice of this network are as follows:
Area under ROC1= 1
Area under ROC2= 1
Figure 49 ROC Chart for the best performing network
Figure 49 infers that no misclassification took place for the networks, this was the
best performing network with consideration of the area under both ROC curves.
Figure 50 Output from the training data.
118
![Page 119: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/119.jpg)
It is clear from figure 50 as to why the ROC charts' areas where 1, it can be seen that
there is a clear region in which none of the customer accounts can be confused
between either of the two classes, with very little error between the desired output and
the actual output of the network.
Figure 51 Performance of the final network while training
The overall MSE (Mean squared error) for this network after training is not the
smallest of any of the neural networks. This implies that even though less error could
have been achieved while training, the result is that the neural networks with a smaller
MSE might have been over trained and therefore have become too specific and cannot
thoroughly cope with new unseen data. However the training session for this network
was very good, after each iteration the network got closer and closer to converging on
a solution. The rate of change of the error was extremely steep in the first 100
training sessions, after which the rate at which it trained slowed down but appear
constant, indicating that the error was getting reduced steadily after each training
session
119
![Page 120: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/120.jpg)
Figure 52 Output from the validation data
Again it is clear from figure 52 as to why the ROC charts' areas where 1, it can be
seen that when presented with the validation data there is a clear region in which none
of the customer accounts can be confused between either of the two classes. The
actual classification of all the clear customers' patterns is very close to the desired
output; even the classification of all the fraudulent accounts is very close to the
expected output of the network. Both of these cases reinforce the fact that this neural
network is the best one available.
120
![Page 121: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/121.jpg)
10.8.1 The weights from the input layer to the hidden node
The weights that are attached to are shown below, rounded to two decimal places.
These would be the weights used if I were to propose this to a company who had real
data that was similar to my generated data.
Inputs weights
Link
To
Hid
den
Nod
e
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 0.72 -0.27 0.06 0.65 0.75 0.01 -0.18 0.36 -0.43 0.51 -0.08 -0.46 0.19 -0.46
2 -0.28 -0.04 0.30 -0.13 -0.22 0.22 0.32 0.15 -0.59 -0.82 0.61 0.37 0.60 0.28
3 -0.52 -0.64 0.36 -0.23 0.79 -1.47 0.61 -0.78 -0.41 0.04 0.96 -1.25 -0.26 -0.75
4 0.56 0.30 0.36 -0.34 0.95 -0.05 -0.17 -0.88 -0.33 -1.37 0.81 0.33 0.16 0.87
5 -0.25 0.63 0.52 -1.09 -0.24 -0.28 0.21 2.14 -0.11 1.91 -0.19 -2.65 0.31 -0.25
6 0.64 0.27 -0.31 -0.58 0.45 -0.26 0.61 -0.14 0.38 -0.27 0.34 -0.59 -0.60 -0.37
7 0.39 -0.43 0.58 0.39 -0.61 0.28 -0.73 -0.32 -0.47 0.69 -0.39 -0.14 0.11 0.00
10.8.2 The weights to the Output Layer
Weights from hidden node to output node
1 2 3 4 5 6 7
Output
Node
0.87 0.02 2.49 1.94 -4.00 0.28 -0.66
10.8.3 The Threshold
The reason why this network was chosen over the rest is simply because it was trained
well enough to be able to correctly classify 100% of the supplied patterns for both the
training and validation data and the totally unseen data.
A threshold value of 0.2, allows for 100% classification of fraudulent and non-
fraudulent data, while giving leeway to allow for a bit of uncertainty when
considering if an account is fraud. Because this threshold is near the middle of the
output range, all future customer accounts should be correctly classified even if they
vary somewhat more than anticipated from the expected customer models.
121
![Page 122: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/122.jpg)
10.8.4 Proposed Training Regime
Part of the reason why neural networks are good is because they can easily be
retrained, every month (or whatever period the telecomm company decides) the phone
company could retrain the network with a selection of all the customer data again this
would then include the frauds that had been found in the previous month. The
retraining would allow new trends to be picked up by the neural network with
minimal fuss. As the process of gathering the training data would be the same each
time the network needs to be retrained.
If the network needed to be retrained again, extracting from the results the best
method to use is normal gradient descent and a learning rate of 0.6; this seemed to
give the best trade off between fast convergence on a solution by minimising the time
it takes to reach a minima, against the likelihood of the network not training because
the learning rate was too high and the gradient could find the minima in the error
function.
122
![Page 123: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/123.jpg)
11. Evaluation
This is an evaluation into the success of the project; the overall aim of the project was
to generate a solution that would viable in the telecommunication industry therefore
the evaluation will only cover the success of the neural network.
11.1 Does it work?
Broadly speaking the project was in my view a complete success. The top performing
networks could classify close to 100% of the input patterns supplied to them, compare
this to the performance of Michiaki Taniguchi's 37 neural network which achieved
rates of 85% correct detection. However, I am the first to admit that this success
might be sullied by the fact that the models created could be conceived to be too
simplistic and not truly mimic that of how real customers behave. This was my first
ever attempt at a research based software engineering project, which brings several
other firsts: I have had to learn about creating models of real-world situations;
understand statistics that I have never used before and learn about neural networks.
I can say that the current solution may not work with a telecommunication company
because the neural network was trained on data that had to generated using a specially
developed tool, but the techniques developed here with a little bit of tweaking and
further evaluation can be used to great effect in the telecommunication industry.
The main problem when judging which network performed the best was that too
many of the networks that I created performed exceptionally well on both tests. This
lead to confusion when considering which network should be the final network.
11.2 Is it Real-time?
The short answer is yes it can operate in real-time. The longer answer is, the network
itself can operate in real time, and provide a real-time response once presented with
an input. It is the data acquisition, which is the slow part and would eventually bias
the classification of whether this project can operate in real-time towards the 'No it is
123
![Page 124: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/124.jpg)
not real time'. Luckily this misclassification can be reduced with the ambiguity
relating to what real-time actually means. As defined near the beginning of this
project, soft real-time is the solution that is most appealing to this type of project.
The overall speed of the neural network is extremely quick; the computer used to test
the networks can classify some 5000 two-week customer profiles in roughly 3
seconds.
11.3 Which Training Method was Most Appropriate
During the analysis of the results, I found that simple gradient descent was more
effective than gradient descent with moment. The reason being was that the results
indicated that a very large proportion of the networks that did not train correctly used
gradient descent with momentum as their training function. This is contrary to my
initial understanding and will need more testing by adjusting the momentum
coefficient.
11.4 Other Points About the Neural Network
One additional item I noticed when developing the neural networks was that as the
number of hidden nodes increased, the number of unsuccessfully trained networks
increased. Considering a network that had a ROC area of less than 0.6 meant it could
not classify any of the fraudulent customer call patterns and these networks were then
disregarded. I would have expected that more training would be needed every time
extra hidden nodes were added to the network and performance might therefore be
degraded if they were not trained anymore (which they were not), however the
aftermath of this results suggests the reason why they did not train was because the
random numbers assigned to each weight before training starts did not initialise
properly, or were somehow insufficient. I am not too sure why this happened,
perhaps and further study is required to find the root cause to this problem.
124
![Page 125: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/125.jpg)
11.5 The CDR Generation Tool
The CDR tool was a success for this project, it allowed me to generate millions of
Call details for thousands of customer which without I would not have been able to
test if a neural network was a worthwhile solution.
The CDR Tool could generate over 1000 customer details in less than 20 minutes with
each customer having hundreds of calls made.
The models were not the most accurate, but could be developed further if more
research into customer calling patterns takes place. More accurate models will
produce a neural network that can more accurately model customer calling patterns.
125
![Page 126: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/126.jpg)
12. Project Management
The project required a great deal of time management; although I am fairly happy
with the way in which I worked and the manner in which I followed the time plan I
gave initially and the amended time plan that was handed in at Christmas. Several
problems arose including the timing of other coursework's for other courses and the
unexpected duration of the analysis of the results. What follows are the three time
plans
1. The predicted time plan at the start of the project
2. The amended time plan at Christmas
3. The final time plan of all the work that has been carried.
As can be seen, I have managed to stick fairly well to the time plan, however I ended
up increasing the time of development and reducing the time for testing. The time
spent generating the CDR and analysing the call data was also extended and shifted
since at the start of the project I did not consider that the development stage and the
call generation stage where dependant on each other and really should have been
specified as the same task.
To keep the continuity of the project, the time plans are provided in the appendix
126
![Page 127: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/127.jpg)
13. Conclusion
This project has proven to an extent that fraud detection using call pattern analysis
with neural networks can work. It has however been limited by the lack of real world
data and even though a competent call generation tool can be created, I feel that it will
never be a substitute for using real data for a real telecommunications company which
will provide real results from the neural network that will prove how effective a
neural network can be at detecting fraud. Especially as the methods which fraudsters
uses is constantly evolving, a method of detection which can evolve with their
changing trends (both via generalisation and retraining) will no doubt be a useful tool
in any fraud analysts belt.
This project further shows that a neural network is not the only solution to detecting
fraud, firstly a company wide fraud strategy needs to be in place with a company wide
sharing of resources. For instance, neural networks may not be the optimal method of
detecting when people are using false information; in addition, what is the point of
having a fraudulent call detection system in place when someone is acting
fraudulently on your network if your subscription process keeps allowing the same
fraudster back on to the network.
13.1 How I handled the project
This section is a critical appraisal into where I feel the some of the flaws in this
project where located. I am doing this because it will allow me to identify in the
future where potential problems may arise. The successes of the project will be
briefly discussed in the next section (13.2).
From starting with the training in fraud detection in the financial market which I
already possessed, I have had to transfer the knowledge onto a totally different
domain of fraud detection. This meant learning more about general fraud, the
telecommunication industry and specifically more about fraud in the
telecommunication industry.
127
![Page 128: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/128.jpg)
I feel that the research stage of this project went well; although I found it difficult to
present only information that was pertinent to the project with out going off on too
much of a tangent.
The distinctions between the calls either being High Risk or Low Risk was a piece of
the system that I liked as it allowed me to vastly reduce the number of inputs to the
neural network; this would have had the effect of allowing the neural network to train
faster and generalise better. I would have liked to ran some principle component
analysis on the data through MATLAB as this might have been able to highlight more
areas where I could have combined data inputs and still got the same meaning from
the data, unfortunately time was not on my side for this so I was never able to include
it in the project as would have had to analyse both the performance methods of
combing the data and compare and contrast their benefits.
I feel that the need to develop a call generation tool essentially detracted the focus of
the neural network away from this project, as I feel this made the project the size of
two projects and therefore some of the information that I would have liked to put in
the project (such as the mathematical functioning of the neural networks) had to be
left out so to keep the length of this project within reasonable limits. Saying that I
feel even though I have had to shorten this project, the report still seems to be a bit too
long. Neither of the project design stages could be documented in this report
completely as there would simply be too much information for the reader to process
so I had to simply opt for a verbose description of the important algorithms pertinent
to the project. Low-resolution flow charts were provided (charts that document the
whole system operation not individual functions) to give insight as to how all the
separate parts of the project should fit together.
So many neural networks were created, many of them with very similar performance,
I had to think up a performance metric that would allow me to classify the results in
such a manner that I could give justification to my final choice of network. I am
pleased with the results of this performance metric, although I am not sure if it is an
original piece of work (as I have found no mention of it anywhere) or if it stands up
to rigorous statistical proof in to the worthiness of the method. I am sure that it won't
and if I had more time, I would have liked to find better ways to measure relative
128
![Page 129: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/129.jpg)
performance of the neural networks. Therefore I feel that this is another area where I
feel that my learning's and implementation lacked clarity and drive.
The neural networks were generated twice; I have already mentioned that I thought
the models were possibly slightly weak (section 11.1); this was the second time that I
created a set of models for the data. The first time there were only seven models and
the neural networks had no trouble at all at classifying them (They are they are the
first seven models shown in the appendix ). The second time I created several more
models and changed the CDR Generator; the neural networks had a harder time
classifying the results correctly, this had the effect of bringing the neural networks
performance more inline with that of a neural network which might be deployed in a
real world situation.
A revision of the code used to create the customer call records occurred during the
lifetime of the project, this was too add attributes into the call records to allow them to
mimic customers more effectively. Even though this was in the middle of the
development cycle, due to the framework I already had in place it did not affect the
overall timing of the project but did mean that the development time was extend.
The revision in the code took the form of:
Adding in the probability of calls happening on a given day
(This was left out until I realised calls in real life are not in
spread evenly over the week rather certain days are more likely
to have more calls than other days)
Including the available phone numbers that a customer can ring
(although not used by the neural networks) added extra realism
to the data generated.
The time plans were altered and can be seen in the appendix 16.6.3. I don’t think that
this slight mishap could have been avoided, and like many software engineering
projects using the linear sequential model you have to go back a few stages to fix the
problem (in this case it was a simple case to reanalyse the inputs to the neural network
and an easy fix for the design for the CDR tool).
129
![Page 130: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/130.jpg)
13.2 What have I learnt?
I have learnt the basic fundamentals of neural networks, obviously I have only
scratched the surface of neural network theory and design, but I feel that what
knowledge I have gained is the first step on a long road to understanding neural
networks and their applications not only in the fraud detection industry but elsewhere
as well.
I have also learnt that creating neural networks is not simply a process of training the
network and then putting it to use. Rather a proper plan of how you intend to test a
possible range of neural networks needs to be put in place before testing even begins.
This includes sufficient analysis of the results which the neural networks produce so
that the network we choose as the one to be used in a proposed system is the most
efficient one which we can develop.
I have learnt about prototyping systems such as MATLAB and how they can be used
to model potentially commercially viable systems. From starting with absolutely zero
experience of MATLAB, I feel that I can quite confidently use this to model any
future systems that I come to develop in my career.
This was a good exercise and first step into model generation. Models are useful for
predicting if a certain theory can be proven to be correct (like this project), you can
test the theory against the models before moving on to test against real world data.
This is also a draw back because a model of a population can never have the same
significance as real data and therefore it holds that if it works for a model it may not
work for the real thing.
I have shown the Microsoft Access can be used to great effect when considering
customer model generation, granted some of the model may have been overly
simplistic but if it were to be used in a live environment then a higher degree of
refinement would take place. To some extent I found the MS Access would start to
play up when more than 300MB of data was being stored in the database, therefore if
I could I would like to use MS Access as the front end tool to a more robust RDBMS.
130
![Page 131: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/131.jpg)
A model generator such as the one developed for this project would be useful for start
up communications companies. New companies are more prone to fraud, as
fraudsters may (sometimes correctly) assume that new companies do not have the
resources of a "more established" company to detect fraud and also do not know what
fraudulent activity of customers may look like.
Overall, this project has taught me more about time management and the software
engineering process than any of the other projects or coursework's though my life in
university since the workload required by this project has been astoundingly large.
The next stage is to take the techniques used in the creation of the network, and apply
it to real data from the telecommunication industry, only then can I state fully whether
a neural network solution to fraud detection is viable.
131
![Page 132: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/132.jpg)
14. Further Work
I have identified several key areas, in which this project can be extended if required.
Some of these features are extensions that I would have liked to put in if I had the
time, while others are extensions to the project that may enable this project to be used
as the basis of further work.
Firstly using data that can be gathered from real customer data from real telecom
companies would be the first expansion on this work. This would enable myself, or
anyone following on from this project to finally prove that neural networks are
suitable for fraud detection.
A system that can monitor change in behaviour as well as what the current
implementation achieves would be an extremely useful tool. This is because
fraudsters may try to hide their fraudulent acts by impersonating a real customer for
their first few billing periods, then once their accounts have reached a certain maturity
level, they would activate their major fraud operation. This system should then be
able to alter the fraud management team to suspicious call activity which is out of the
norm for the customer.
To account for behaviour change occurring like the above; an enhancement would
require the alteration of the CDR generator time. I have had several thoughts in ways
this can be implemented based on the following two ideas.
1. Use the current neural network, but have the data access tools
aggregate the data over times ranging from two months, to two
weeks. This is essentially reducing the granularity of the
analysis, if there is substantial change towards fraud in the
smaller time capsules in relation to the others and then it may be
that someone has started a fraud operation.
2. Use the same principle as mentioned previously, but this time if
we are looking at two time periods in sequential order, make the
input to network, which is analysing the second time period
132
![Page 133: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/133.jpg)
include the fraud score that the previous network gave, thus if
the weighting to fraud is higher on a previous data segment, have
it effect the next run through the neural network take this into
account.
These are just theories that would need to be fleshed further out before embarking on
a research project.
The system could have also been extended by including some method of establishing
if the customer is constantly calling hotspots (known high risk phone numbers). For
instance, calling particular foreign PRS lines would be considered suspicious and is
not currently handled by either the CDR generation tool or the neural network model.
Establishing the time between telephone calls would also be a good enhancement, as
this could then be used to establish to a higher degree of certainty if a customer is
using an automated dialling tool.
Also knowing how long a customer has been with the company and using that as an
input to the neural network would also be a good indicator of fraud. It is likely that a
customer of five years is not going to be fraudulent, so when analysing their call
patterns you would allow for some leeway if the customers recent call pattern has
changed or exhibited fraudulent patterns recently. However a customer could be
about to move house, use as much of the phone as they can, and then never pay the
bill (which is fraud). Obviously further research into this is required.
Furthermore, this project can be considered the starting point for a full blown FMS,
the techniques used in this project could be implemented in a system that would
automatically generate neural networks based on call information supplied, train on
this information and then be used in a live environment. The FMS could then be
linked to other system inside the business such as Billing and Subscription. All these
departments could then benefit from the use of such a tool.
Although detection of fraud in fixed line communication has been the focus of this
project, other important areas of the telecommunication industry also need fraud
133
![Page 134: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/134.jpg)
detection, in particular IP fraud where hackers know how to obtain free Internet based
services from the telephone company. Using similar pattern recognition methods as
used in this project I think the project could be extended to find when a customers
Internet service usage is suspicious.
134
![Page 135: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/135.jpg)
15. References
1. Roger S Pressman: Software Engineering A Practitioners Approach, Chapter 6 Pages 148 – 151
2. Computer Networks, Third Edition: Andrew S. Tanenbaum, GSM pages 266-275
3. Telestial – A SIM History: http://www.telestial.com/prepaid_more.htm4. National Statistics : "Households with home telephones 1971-2000: Social
Trends 31", www.nationalstatistics.gov.uk5. National Statistics : "Home Net Access Up: 11.4 million UK homes now
online" www.nationalstatistics.gov.uk6. Computer Networks, Third Edition: Andrew S. Tanenbaum, Circuit
Switching pages 130-1347. National leased lines in the UK Summary of Oftel’s investigation: Annex
A: BT’s prices for leased lines http://www.oftel.gov.uk/publications/1999/competition/lls0199.htm#Chapter%206
8. National leased lines in the UK Summary of Oftel’s investigation: Chapter 6 Competition in the provision of leased lines in the UK http://www.oftel.gov.uk/publications/1999/competition/lls0199.htm#Chapter%206
9. UMTS Forum: What Is UMTS http://www.umts-forum.org/servlet/dycon/ztumts/umts/Live/en/umts/What+is+UMTS_index
10. 3G Breakeven Doubtful : http://www.3g.co.uk/PR/October2002/4186.htm11. INTERCONNECTION AND INTEROPERABILITY: A framework for
competing networks http://www.oftel.gov.uk/publications/1995_98/interopa.htm#CHAPTER%202
12. Victims of Consumer and Investment Fraud: http://www.crimes-of-persuasion.com/Victims/victims.htm
13. Telecommunications (Fraud) Act 1997: http://www.hmso.gov.uk/acts/acts1997/1997004.htm
14. Combating Against Telecom Fraud: Introduction http://www.mmtelcom.com/webdex/fraud_prev.html
15. Combating Against Telecom Fraud: Losses due to fraud at the world telecom market http://www.mmtelcom.com/webdex/fraud_prev.html
16. The Communications Revenue Assurance and Fraud Management Handbook Yearbook 1999-2000:
17. BT Plc: PRELIMINARY RESULTS - YEAR TO 31 MARCH 2002 http://www.btplc.com/mediacentre/Archivenewsreleases/2002/Xq402release.htm
18. A Management Guide to the Prevention of Telephone Fraud in the UK 1998: Siemens Communication Unlimited
19. Experiences in Mobile Phone Fraud: Jukka Hynninen; Chapter GSM cloninghttp://www.niksula.cs.hut.fi/~jthynnin/mobfra.html
20. The Communications Revenue Assurance and Fraud Management Handbook Yearbook 2000-2001: Billing Integrity
135
![Page 136: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/136.jpg)
21. The Communications Revenue Assurance and Fraud Management Handbook Yearbook 1997-1998: Fraud Types
22. The Communications Revenue Assurance and Fraud Management Handbook Yearbook 2000-20001: The Enemy within
23. Classifying Fraud - The 4 M's: http://mujweb.atlas.cz/Obchod/ordema/fraud.htm
24. The Fraud Risk Management Cycle (FRMC) : Fraud Risk Solutions http://www.fraudrisk.com.au/frmc/
25. The Communications Revenue Assurance and Fraud Management Handbook Yearbook 1999-2000
26. Fighting Telecom Fraud: Cerebrus Solutions Limited; Slide Loss by value27. Definition of Fraud Detection Concepts: ADVANCED SECURITY FOR
PERSONAL COMMUNICATIONS TECHNOLOGIES http://www.esat.kuleuven.ac.be/cosic/aspect/
28. Fraud Detection In Communications Networks Using Neural and Probabilistic Methods: Michiaki Taniguchi et al; Section 2.3 Bayesian networks
29. Applications of Neural Networks to Telecommunication Systems: Fraudulent Use of Cellular Phone Detection: RJ Frank et al
30. Discovery of Fraud Rules for Telecommunications - Challenges and Solutions: Saharon Rosset et al
31. FRAUD CONTROL: http://www.ctl.com/News/CTLinNEWS/article4.htmd32. Detecting Fraud in the Real World: Michael H. Cahill Chapter 2 Fraud
Detection Based on Thresholding33. Bayesian Belief Nets: http://www.cs.ualberta.ca/~greiner/bn.html34. Bayesian Belief and Decision Networks: http://www.norsys.com/belief.html35. Bayesian Belief Network: http://www.murrayc.com/learning/AI/bbn.shtml36. Fraud Detection In Communications Networks Using Neural and
Probabilistic Methods: Michiaki Taniguchi et al; Section 2.3 Bayesian networks
37. Fraud Detection In Communications Networks Using Neural and Probabilistic Methods: Michiaki Taniguchi et al; Section 2.1 Neural networks with supervised learning
38. An Introduction to Neural Networks: Kevin Gurney; Chapter 1 pages 1-639. Pattern Recognition Using Neural Networks; Theory and Algorithms for
Engineers and Scientists: Carl G. Loony; Chapter 3 MLP as Pattern Recognisers
40. An Introduction to Neural Networks: Kevin Gurney; Chapter 4 & 6 pages 46, 74
41. An Introduction to Neural Networks: Kevin Gurney; Chapter 1 & 6 pages 4, 80
42. Neural Network: http://hugroup.cems.umn.edu/Research/plant/neural.htm43. What is a Neural Net?: http://www.cormactech.com/neunet/whatis.html44. Real-time Systems Lecture Notes: Qi Shi45. Software Engineering A Practitioners Approach: Roger S. Pressman;
Chapter 2 pages 26-2946. SQL FAQ, SQL Standard :
http://epoch.cs.berkeley.edu:8000/sequoia/dba/montage/FAQ/SQL.html47. Understanding ODBC and OLE: http://msdn.microsoft.com
136
![Page 137: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/137.jpg)
48. MySQL General Information: http://www.mysql.com/documentation/mysql/bychapter/manual_Introduction.html
49. PostgreSQL: http://advocacy.postgresql.org/50. Oracle 9i RDMBS: http://www.oracle.com/ip/deploy/database/oracle9i/51. Microsoft Access RDBMS:
http://www.microsoft.com/office/access/evaluation/guide.asp52. The Gaussian Distribution:
http://www.graphpad.com/instatman/TheGaussiandistributionandtestingfornormality.htm
53. Eric W. Weisstein's Math World; The Gaussian Distribution: http://mathworld.wolfram.com/GaussianDistribution.html
54. Perl Cookbook: Tom Christiansen & Nathan Torkington; Chapter 2.10 Generating Biased Random Numbers page 54-55
55. Perl Cookbook: Tom Christiansen & Nathan Torkington; Chapter 2.10 Generating Biased Random Numbers page 55-56
56. An Introduction to Neural Networks: Kevin Gurney; Chapter 1 pages 1-657. An Introduction to Neural Networks: Kevin Gurney; Chapter 2.2 Artificial
neurons: The TLU pages 13-1758. An Introduction to Neural Networks: Kevin Gurney; Chapter 2.4 Non-
binary signal communication pages 17-2059. Fundamentals of Neural Networks; Architecture, Algorithms and
Applications: Laurene Fausett; Chapter 1.4.2 pages 17-1960. What if it's not linear? (nets): Alun Jones, Institute of Biological Sciences,
University of Wales http://users.aber.ac.uk/auj/talk/depttalk97/nets.html61. An Introduction to Neural Networks: Kevin Gurney; Chapter 6 The
Multilayer Perceptron and Back propagation pages 65-9162. recurrent link63. Pattern Recognition Using Neural Networks; Theory and Algorithms for
Engineers and Scientists: Carl G. Loony; Chapter 9.5 General Principles for Neural Engineering
64. Fundamentals of Neural Networks; Architecture, Algorithms and Applications: Laurene Fausett; Chapter 6 Back propagation Neural Net pages 289-290
65. An Introduction to Neural Networks: Kevin Gurney; Chapter 6.1 Training rules for multilayer nets pages 65-67
66. An Introduction to Neural Networks: Kevin Gurney; Chapter 6.5 Speeding up the learning: the momentum term page 71
67. An Introduction to Neural Networks: Kevin Gurney; Chapter 6.3 Local versus global minimums page 69-70
68. An Introduction to Neural Networks: Kevin Gurney; Chapter 6.2 The back propagation algorithm
69. Pattern Recognition Using Neural Networks; Theory and Algorithms for Engineers and Scientists: Carl G. Loony; Chapter 9.8 The Processes of Training and Validation
70. An Introduction to Neural Networks: Kevin Gurney; Chapter 6.7 The action of a well trained net; pages 73-76
71. An Introduction to Neural Networks: Kevin Gurney; Chapter 6.9 Generalisation and overtraining; pages 80-83
137
![Page 138: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/138.jpg)
72. Pattern Recognition Using Neural Networks; Theory and Algorithms for Engineers and Scientists: Carl G. Loony; Chapter 10.6 Data Engineering ; pages 352-358
73. MATLAB Neural Network Toolbox: Pre-processing and Post-processing; Chapter 5-61 page 189.
74. Receiver Operating Characteristic Curves: SAS Institute; http://jeff-lab.queensu.ca/stat/sas/sasman/sashtml/stat/chap39/sect33.htm
75. Receiver Operating Characteristic (ROC) curves: http://www.cs.washington.edu/homes/djp3/Compbio/quals/paper/node15.html
76. Threshold-independent measures: http://obelia.jde.aca.mmu.ac.uk/multivar/roc.htm
138
![Page 139: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/139.jpg)
16. Appendices
139
![Page 140: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/140.jpg)
16.1 Bibliography
Experiences in Mobile Phone Fraud: Jukka Hynninen http://www.niksula.cs.hut.fi/~jthynnin/mobfra.html
This paper provides information concerning in particular fraud with mobile phones.
Principles of Data Mining: Hand, Manilla & Smyth; Discusses some of the considerations when trying to extract data used for statistical processing from a database
A Basic Course in Statistics: Clarke & CookeThis book supplied with information and algorithm concerning statistical properties such probability, standard deviations and normal distributions
Introducing Statistics: Upton & CookThis book supplied with information and algorithm concerning statistical properties such probability, standard deviations and normal distributions
An Introduction to Neural Network: Kevin GurneyThis book was my main reference to neural networks, I recommend it if you are unfamiliar with neural networks and would like extra information about neural networks.
Pattern Recognition Using Neural Network; Theory Algorithms for Engineers and Scientists: Carl G. Looney
This book was one of my secondary references to neural networks which I used when I was unsure about a particular item concerning neural networks or my main reference was not clear enough.
Fundamentals of Neural Networks; Architecture, algorithms and applications: Laurene Fausett
This book was the second of my secondary references to neural networks which I used when I was unsure about a particular item concerning neural networks or my main reference was not clear enough.
Applications of Neural Network to Telecommunication Systems: RJ Frank et al.
Contains ideas about methods to detect fraud using neural networks as well as some other uses of neural networks in the telecom industry
Activity monitoring: Noticing Interesting Changes in Behaviour: Tom Fawcett & Foster Provost
Contains information I would use if I were to extend this project in to monitoring for changes in behaviour.
Detecting Fraud in the Real World: Michael H. Cahill et all.Contains a lot of information about different fraud detection methods in relation to credit cards and telecoms and fraud in general
140
![Page 141: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/141.jpg)
16.2 Program Listing
The program listings have not been supplied in this report so that the size of the report
is kept to a minimum, rather they have been included on a CD stored in the
appendices.
There are two sections to the program listing. The MATLAB section and the
Microsoft Access 2000 section:
The MATLAB section contains the workspaces that were used
throughout the development and also the code that was created
for the automated neural network creation and testing tools.
Each of the MATLAB files is fully commented and are exact
digital copies of what would normally be present in this section.
A brief overview about what each of the functions achieves is
included in this section.
The Microsoft Access 2000 section contains the MDB that is the
actually application that was created to make all of the customer
calls. The code listing on its own would be useless without the
necessary information about the windows forms used in this
project. However, the documentation tool that Microsoft Access
provides, outputs more information than is needed and also even
with the basic information which would provide a overview of
the system the output would run into approximately 30 pages.
Other than just the code listings are the queries that were used
through out the project to aggregate the data. Many of these
queries are extremely long and are also nested to such an extent
that a simple output of the code will give no indication as to what
is happening in the system.
141
![Page 142: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/142.jpg)
Instead Screen shots have been provided with an a explanation to
the parameters used in model creation.
Overall the project would be extended by more than 60 pages none of which
would provide any significant advantage over providing the code listings on
a CD.
16.2.1 MATLAB Overview
This section details the code and the performance information related to the
MATLAB section of this project.
16.2.1.1 How to read the performance information off the CD
Each of the performance graphs are stored on the CD under the directory
Matlab. To make it easier to read the graphs each network that has a
different number of hidden nodes stores the graphs in the directory nnx,
where x signifies the number of hidden nodes.
Each neural network that was created has a name. This name is used in each
of the file names; the type of graph created is appended to the end of the file
name.
The name contains information about the network parameters.
NN5_T1v1_LR0.1_EPOCHS1000
The network has 5 hidden neurons (NN5), was trained on gradient descent
(No training qualifier), some versioning information (T1) training data
combination 1 (v1), the learning rate 0.1 was used (LR0.1) and the number
of training sessions was 1000 (EPOCHS1000)
NN5MOMENTUM_T3v1_LR0.3_EPOCHS1000
Networks trained using gradient descent with momentum can be seen
because they have the classifier MOMENTUM in.
Each network that has been created has at least 4 graphs:
142
![Page 143: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/143.jpg)
Normal Performance Graph (Takes the form of the network
name.png)
ROC chart for the first performance test (network
name)_ROC.png)
Output of network after test stage (network name-
Testoutput.png)
Output of network after validation (network name-
Validationoutput.png)
16.2.1.2 Function Descriptions
A brief description of the functions that are used to help me generate and
test all the neural networks is provided below.
Function Description
AreaROC Returns the area, sensitivity and the 1- specificity
of the neural network. The output is NOT stored
to disk.
countPercentageGroup Counts the number of neural networks whose
performance is 100 - x% smaller than the largest
error (difference between neural networks). For
instance, if we want to find all the networks who
have only 10% difference in the area in relation to
the largest error. There error performance must
be 90% smaller than the largest error.
getData Retrieves all the data needed for the inputs of the
neural networks. All data aggregation is
performed in the MS Access queries and not in
this query.
getDistance Gets the difference between the two areas of the
ROC charts for each neural network
getOutputData Get the desired outputs from the database. These
are the outputs that the network compares itself
against when training the neural network.
143
![Page 144: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/144.jpg)
makeNNMom Crates, trains and outputs all the performance
information concerning the neural networks to be
trained using gradient descent with moment.
makeNNs.m Crates, trains and outputs all the performance
information concerning the neural networks to be
trained using gradient descent.
normaliseData Get the parameters so that the data can be
normalised so that the inputs fall in the range of -1
and 1.
plotErrors Creates the graphs used to visualise how the
neural network output has compared to the
expected output. Stores the information on disk.
retest Tests all the neural networks on the new data
which the networks have never seen before.
Establishes the ROC chart for the new simulation
and stores information about the area under the
ROC chart and the difference between ROC chart
areas for the different network simulations.
ROC Used to store and generate information about the
ROC chart for each neural network (such as all
the data points and the area) and also outputs the
graph to disk.
seperateData Splits the data into 4 separate sets of data so that
the neural network is trained on a "random"
sample of data.
setupConnection.m Creates the ODBC connection that is used in
every function that pulls data from the database.
16.2.2 MS Access Overview
The code itself if documented inside MS Access, however the queries that are used
are not, and they require brief explanation in order to understand what they are meant
for. The SQL is not provided as this can be seen when entering the design view of
any of the queries.
144
![Page 145: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/145.jpg)
Query Name Description
Free Append list of valid free phone std codes in to the
database
Intl Append list of valid Intl std codes in to the database
Local Append list of valid Local std codes in to the database
Mobile Append list of valid Mobile std codes in to the database
National Append list of valid National std codes in to the
database
PRS Append list of valid PRS std codes in to the database
qryAllOutputs4Weeks Gathers all the outputs that is required for the neural
network to train from.
qryCallInfo4Weeks Gathers information about the number of calls made per
customer over a bi-monthly period.
qryCallSummary4Weeksv2 This is the amalgamation of all the other queries. It is
used by the neural network software to provide
aggregated data to the network for training.
qryCodeCounts Used when creating a customers valid dialling code
proportions.
qryGetLocalCodes Get all the local std codes, so that valid phone numbers
can be generated
qryHighRisk-
CallSummary4Weeks2
Gather all data about high risk calls for each customers
bi monthly period
qryLowRisk-
CallSummary4Weeks2
Gather all data about low risk calls for each customers
bi monthly period
qryModels Used on the main form to store information about all
models that need to be created
qryPeriodIncomming-
4Weeks
Get all information about all incoming calls for a
customer
qryTimeData Get the data required for the form about when calls can
be made.
qryTimeData Get the data required for the chart about when calls can
be made
qryTimePlans Get a list of all the available time plans for a model.
145
![Page 146: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/146.jpg)
146
![Page 147: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/147.jpg)
16.3 CDR Generation Tool Screen Shots
Screen shots are provided to show the work that has been done on the CDR Tool,
while also showing what some of the non-obvious parameters are for.
The Source code for any of the forms can be viewed by clicking on the form icon in
MSAccess and then going to the "View" menu and then selecting "Code". Alternately
the code has been included in a plain text file on the CD.
16.3.1 Screen 1
Information concerning the number of calls everyone in the model will make.
147
![Page 148: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/148.jpg)
16.3.2 Screen 2
Information concerning durations for each type of call the population and each customer will have in the model.
148
![Page 149: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/149.jpg)
16.3.3 Screen 3
Information pertaining to the times when calls can occur for a given model
149
![Page 150: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/150.jpg)
16.4 Testing Plan
16.4.1 CDR Tool
The following set of checks is to establish if the random number generators function
correctly. Because of the way in which random numbers are generated, we can never
get an accurate prediction as to how they will be distributed in a population.
Therefore some tolerance has been allowed to judge whether they pass. It must also
be noted that the more tests we do the closer to the actual predicted values the result
will be.
All of the test to generate the random numbers were ran through the VBA Output window in Microsoft Access 2000.
Test Expected Result Obtained ResultGenerate a 500 random Gaussian distributed numbers with a mean 0 and standard deviation 1 (Assess whether values generated by the polar box-Muller method follow a random distribution)
STDEV: ~1MEAN: ~0 +/- 0.5
STDEV: 0.985MEAN: 0.047PASS
Generate 100 random numbers based on three bias values of A) 50%, B) 25% C) 25%
50 random number classified as A, 25 as B and 25 as C (10% error)
A: 52B: 25C: 23PASS
Generate 500 random Gaussian distributed number with mean 20 and standard deviation 5 (Assess whether we can transform the values generated by the polar box-Muller method)
STDEV: ~5MEAN: ~20+/- 0.5
STDEV: 0.4.973MEAN: 20.01PASS
Now that the correct functionality of the random number generators has been
established, it is time to move on to the testing of the model generators. The
following test are performed firstly to see if the models can be generated from the
random numbers supplied, but also that the models which have been created follow
the pattern that the model specifies.
It must again be noted that all the models are based on random number with
properties given as weightings, means and standard deviations. Therefore the results
obtained will not match exactly what is expected but will follow some semblance of
the attributes defined by the model. Obviously the more models we generate in a
group the closer that the distributions will match with the predicted distributions.
150
![Page 151: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/151.jpg)
All of these test were performed using SQL that was generated specifically to test if
the model generator worked correctly.
Test Expected Result Obtained ResultCreate 200 customers whose number of calls follow the distribution with an average of 300 calls over the time period with a standard deviation of 50
STDEV: ~50MEAN: ~300
STDEV: 46.01MEAN: 297.1PASS (The more model created the more in line the results will be)
Create 200 customers whose number of incoming calls follow the distribution an average of 600 calls over the period with a standard deviation of 80
STDEV: ~80MEAN: ~600
STDEV: 86.975MEAN: 598PASS (The more models created the more in line the results will be)
Create a population of 200 customers whose number call distribution of call types is:Free Rate: 1.5%Local Rate: 61%National Rate: 12%International Rate: 6%PRS Rate: 0.5%Mobile Rate: 19%
Free Rate: ~1.5%Local Rate: ~61%National Rate: ~12%International Rate: ~6%PRS Rate: ~0.5%Mobile Rate: ~19%
Free Rate: 1.3%Local Rate: 59.1%National Rate: ~13.1%International Rate: ~5.3%PRS Rate: ~0.3%Mobile Rate:
~20.9%PASS
Create 200 customers whose telephone calls they make have the probability of falling on a given day:Monday: 5%Tuesday: 10%Wednesday: 10%Thursday: 10%Friday: 10%Saturday: 35%Sunday: 20%
Monday: ~5%Tuesday: ~10%Wednesday: ~10%Thursday: ~10%Friday: ~10%Saturday: ~35%Sunday: ~20%
Monday: 4%Tuesday: 11%Wednesday: 11%Thursday: 12%Friday: 8%Saturday: 33%Sunday: 21%PASS
Create 200 customers and check the populations average call duration for off peak calls, follows the Gaussian distribution based the following:
Free:MEAN: 3 (minutes)STDDEV: 3
Local:MEAN: 25 (minutes)STDDEV: 10
National:MEAN: 5 (minutes)
Free:MEAN: ~3 (minutes)STDDEV: 3
Local:MEAN:~25 (minutes)STDDEV: 10
National:MEAN: ~5 (minutes)
Free:MEAN: 3.2 (minutes)STDDEV: 3.5
Local:MEAN:27.3 (minutes)STDDEV: 8.6
National:MEAN: ~4.01 (minutes)
151
![Page 152: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/152.jpg)
STDDEV: 2International:
MEAN: 12 (minutes)STDDEV: 5
PRS:MEAN: 2 (minutes)STDDEV: 3
Mobile:MEAN: 5 (minutes)STDDEV: 5
STDDEV: 2
International:MEAN: ~12 (minutes)STDDEV: 5
PRS:MEAN: ~2 (minutes)STDDEV: 3
Mobile:MEAN:~5 (minutes)STDDEV: 5
STDDEV: 2.03
International:MEAN: 12.59 (minutes)STDDEV: 5.793
PRS:MEAN: ~1.44 (minutes)STDDEV: 2.32
Mobile:MEAN:5.2 (minutes)STDDEV: 5.64PASS
From the 200 customers in the previous test, check the average call duration for the off peak calls types of each customer has a standard deviation of that specified for the Gaussian distribution:
Free Calls Average Standard Deviation:5Local Calls Average Standard Deviation:10National Calls Average Standard Deviation:5International Calls Average Standard Deviation:5PRS Calls Average Standard Deviation:2Mobile Calls Average Standard Deviation:8
Free Calls Average StDev:~5Local Calls Average StDev:~10National Calls Average StDev:~5International Calls Average StDev:~5PRS Calls Average StDev:~2Mobile Calls Average StDev:~8
Free Calls Average StDev:4.371Local Calls Average StDev:10.15National Calls Average StDev:3.58International Calls Average StDev:4.45PRS Calls Average StDev:2.19Mobile Calls Average StDev:8.31PASS
Create 200 customer accounts with an average of 20 phone lines (company) with a standard deviation of 3 across the population
STDEV: ~20MEAN: ~3
STDEV: 22.331MEAN: 3.586PASS
Check that for 200 customers that a call is 7 times more likely to happen at 7:30pm than at 7:30am.
~14% more likely to be an evening call about 7:30pm
Average number of Calls Made at about 7:30 pm: 2.6 per customerAverage number of call made at about 7:30am: 0.36 PASS
152
![Page 153: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/153.jpg)
The Final set of tests that were carried on the CDR tool were some simple functionality tests to ensure the logic is correct
16.4.2 Neural Network Tools
Each of the major functions used in the training tool creation has had to be tested sufficiently
Function Test Description Test
Result
AreaROC Test that for any given network it
returns the correct area under the
ROC chart
PASS
countPercentage
Group
Create a temporary set of that have
certain errors in relation to their
performance on training of both
data sets. 10 networks are given an
error of between 10% - 20% ensure
they are counted.
PASS
getData Check that 1500 bi-monthly
customer accounts are obtained
from the database.
1500 accounts
returned
getOutputData Check that 1500 bi-monthly
customer accounts are obtained
from the database. Ensure they
align with the inputs already
obtained.
1500 accounts
returned PASS
makeNNMom
Test 1
Create a set of 5 layer networks
with learning rates from 0.1 - 0.9 (lr
increment of 0.1) and 500 - 1000
epochs (500 epoch increments)
Are 18 networks created?
18 networks
created.
PASS
makeNNMom
Test 2
Create a set of 5 and 6 layer
networks with learning rates from
PASS
153
![Page 154: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/154.jpg)
0.1 - 0.9 (lr increment of 0.1) and
500 - 1000 epochs (500 epoch
increments)
Are 36 networks created?
makeNNs Test 1 Create a set of 6 layer networks
with learning rates from 0.1 - 0.9 (lr
increment of 0.1) and 500 - 1500
epochs (500 epoch increments)
Are 18 networks created?
PASS
makeNNs Test 2 Create a set of 5 and 6 layer
networks with learning rates from
0.1 - 0.9 (lr increment of 0.1) and
500 - 1000 epochs (500 epoch
increments)
Are 36 networks created?
PASS
plotErrors Test 1 Is the graph of errors correct in
relation to the output of the network
and the expected
PASS
plotErrors Test 2 Is the graph of errors correct in
relation to the output of the network
and the expected and is the graph
stored on disk
PASS
Retest Test 1 For each network presented, is the
network re-simulated on the new
Data created.
PASS
Retest Test 2 For each network presented, is the
network re-simulated on the new
Data created and is performance
information stored.
PASS
ROC Test 1 Create a data set with 0
misclassified results out of 100.
Check to see if the area 1.0
PASS
ROC Test 2 Create a data set with 20
misclassified results out of 100.
PASS
154
![Page 155: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/155.jpg)
Check to see if the area
approximately 0.85
seperateData
Test 1
Run test to establish if data is split
¼ training, ¼ test and ½ validation
Check if data is interleaved.
PASS
seperateData
Test 2
Run test again to see if the has
separated on a different view of the
data.
PASS
155
![Page 156: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/156.jpg)
16.5 Model Descriptions
The actual model parameters can be seen by starting up the Microsoft Access
Database located on the CD.
However, what is included is a brief description of each of the models used in this
project.
Model Name Model Description Number of
Accounts
Normal
Average
This customer is classed as an average user, one who
will make 25 telephone calls each week; receiving
approximately 32 calls a week, this was modelled off
my relatives. The majority of their calls being local
rate calls, in combination with a some national based
calls. The likelihood of International calls is slim, but
still occur. The probability that they make a PRS call
is even smaller than that of International calls, but still
may occur.
200
Normal No
International
This customer has the same parameters as the "Normal
Average" Customer, however they make no
international calls at all.
200
FRAUD -
CALL
SELLING
International
This type of call selling is based around the method
(4m's) of making money by selling on calls to
International numbers at heavily subsidised rates.
Therefore we can assume that each of the calls will
take a fairly long time to complete, while the times in
which the calls operate can be considered to occur at
pretty much anytime of the day and likewise for the
day in which they occur (as there are international
fraud hotspots all over the world). They are unlikely
to call other number groups such as local, national etc.
6
FRAUD - PRS The first type of PRS fraud is based on the fraud where
the perpetrators will use an automated dialling tool to
12
156
![Page 157: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/157.jpg)
call the PRS numbers. These can normally be
identified because certain characteristics occur, such as
the call length is nearly always the same duration, with
very little deviation.
FRAUD - PRS
2
The second type of PRS fraud occurs when a customer
will "hammer" PRS lines, that is they will either call
one number or many numbers (normally related to the
same owner of the PRS line) with the duration of each
of the calls taking an extremely long time.
10
Business -
Shop
Consider a small shop such as a sweet shop or
hairdressers. They may not make many calls during
the day and even fewer during the twilight hours.
However, they are more than likely to receive more
calls from suppliers or customers. Bearing this in
mind, another point to consider is the days in which
they will make calls. Many small shops are open on
Saturdays, this means that they may close on a
Monday. Obviously this affects the times of calls, and
the call cost, since if the majority of calls they place,
occur on a Saturday then these calls all incur off-peak
tariffs.
100
Business -
Small
A small business could be expected to roughly make at
least twice as many calls as they receive. Their calling
times also differ from those such as the general public,
in that the majority of calls will be placed in the 9am-
5pm range. You would also expect them to make
roughly equal local and national calls. Again this all
depends on the business type, and would need to be
established further. This is the first model to
approximate the use of multiple lines. This is due to
the fact that a company is likely to make and receive
multiple calls at the same time. Additionally no PRS
calls are made.
50
157
![Page 158: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/158.jpg)
Business -
Medium
Assumptions based on a medium sized company being
approximately four times larger than a company in the
"small" class. Additional features have been included,
such as the probability that a call is an International
call occurs is more prevalent than the same parameter
in the "Business – Small" class.
40
Business -
Large
A Large company could be considered about nine
times larger than a medium sized company with
respect to the number of calls it makes and receives.
Typical accounts might include industries such as
banks. However the call properties, such as average
call duration would be roughly comparable to those of
a smaller company. Again, this is business specific.
20
Home -
Internet
Access
Home users can also be said to fit into two categories,
those who use the Internet and those who do not. This
is a very broad simile. However, I have used it to
establish if the neural network could establish the
difference between these accounts and those that are
used in any type of fraud that involves long call
durations. For instance, an Internet user will more
than likely make a lot of call that last two hours and
may have a re-dialler on the system that would
automatically log them back on to the Internet. This
could be said to mimic the PRS auto dialler model and
also the PRS long call duration model
200
Home - Plenty
Of
International
There are situations when a home customer would
regularly call International numbers. Cases such as a
family have emigrated to the UK, while the rest of
their family are still living in the country of their
origin.
Situations that take into account where the families are
from is not included (i.e. the location of where the call
200
158
![Page 159: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/159.jpg)
is being placed to), for instance there is no distinction
of someone calling Australia or Zimbabwe.
Fraud Home
Call Sell PRS
Hidden
This model tries to mimic the "Normal Average"
model, but with one distinct difference, they are
actually hiding PRS fraud in which they make long
duration calls the PRS lines in the evening.
10
159
![Page 160: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/160.jpg)
16.6 Project Management
16.6.1 Time Plan (Initial)
160
![Page 161: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/161.jpg)
16.6.2 Time Plan (Interim)
161
![Page 162: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/162.jpg)
16.6.3 Time Plan (Final)
162
![Page 163: Contents report 3.doc · Web viewThe more failures when training means we are more likely not to get a satisfactorily trained network using that particular network architecture. 10.2](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff49c13aa0d0279ba44efb0/html5/thumbnails/163.jpg)
16.7 Interim Report & Specification
163