a neural network learns to play mortal kombat 3 - home - · pdf filematuraa rbeit oktober 201...

Maturaarbeit Oktober 2016

A neural network learns to play Mortal Kombat 3

Author, class: Carlo Hartmann, M4a

Supervising teacher: Andreas Umbach

Contents

1 Abstract 1

2 Foreword 22.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

3 Introduction 3

4 Neural Network 44.1 Neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44.2 The topology of a neural network . . . . . . . . . . . . . . . . . . . . . . . 64.3 Purpose of Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 6

5 Learning Algorithms 75.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

5.1.1 The Classical Perceptron . . . . . . . . . . . . . . . . . . . . . . . 75.1.2 The modern Perceptron . . . . . . . . . . . . . . . . . . . . . . . . 95.1.3 Linear separability . . . . . . . . . . . . . . . . . . . . . . . . . . . 115.1.4 Error surface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125.1.5 Perceptron learning . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5.2 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.2.1 Clusters and the clustering problem . . . . . . . . . . . . . . . . . 135.2.2 Competitive learning . . . . . . . . . . . . . . . . . . . . . . . . . . 15

6 Image Recognition 186.1 OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

6.1.1 Template matching . . . . . . . . . . . . . . . . . . . . . . . . . . . 186.1.2 Difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

7 Own project 217.1 Mortal Kombat 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217.2 Observation of the actions . . . . . . . . . . . . . . . . . . . . . . . . . . . 217.3 Losing control over the computer . . . . . . . . . . . . . . . . . . . . . . . 227.4 Two different approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

7.4.1 SOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227.4.2 Supervised neural network . . . . . . . . . . . . . . . . . . . . . . . 237.4.3 Structure of the neural network . . . . . . . . . . . . . . . . . . . . 237.4.4 The results of the network . . . . . . . . . . . . . . . . . . . . . . . 24

8 Discussion 25

Chapter 1

Abstract

The purpose of this project is to get a better understanding of neural networks and howto create a neural network that is able to play Mortal Kombat 3.

In the first part of my project I created several programs that were able to do basictasks. Those tasks were: taking screenshots, starting the Emulator, comparing twoimages and constructing a basic neural network with neurons and layers. After this Istarted to use my knowledge about the neural networks and created a neural networkthat was able to play the game to a certain extent.

1

Chapter 2

Foreword

2.1 Motivation

I have always been really fascinated about the human brain. What is the other personthinking and why? How will he react to this? How is he able to learn so well? Thesewere question I have asked myself for a long time. The neural networks were createdto address such questions. By simulating a learning process we are able to get a betterunderstanding of the human brain. This is exactly way I chose this subject.

The idea of creating a neural network that could learn to play a video game was notmy own. A famous Youtuber called Sethbling created in 2015 a neural network thatwas able to play Super Mario World. I wanted to create a neural network that was ableto do the same, but since doing the exact same thing by also using Mario as the gamewould not have felt like my own work, so I chose to do my paper about another classic:Mortal Kombat 3.

2.2 Acknowledgment

I really want to thank my supervisor Andreas Umbach for helping me with this project.Every time I was stuck Umbach helped me by giving me new ideas to approach thesubject and leading me into the right direction for several obstacles I had during thisproject.

2

Chapter 3

Introduction

The goal for this project is to be able to understand how neural networks are built andhow they are capable of learning something. This acquired knowledge was to be used tocreate a create a neural network that is able to utilize Mortal Kombat 3. The networkshould be able to learn by its own.

The program will be written in Java since this is a language I am most familiar withand used before. Java is also a really handy programming language that has access toa lot of different libraries. As a development environment I’ll use eclipse and for theemulator a Super NES emulator was used.

3

Chapter 4

Neural Network

4.1 Neuron

The most basic component of the human brain is the neuron. The brain consists ofbillions of neurons. The purpose of them is to process information. Even though theirtask is so important their structure of each neuron is rather simple. As shown in Fig.4.1 the biological neuron is composed of a nucleus, dendrites, a cell body and an axon.The axon serves as a conductor and transmits signals to dendrites of a different neuronat an intersection called synapse. [Soares, F. M. / Souza, A. M. F. (2016)]

The neuron that is used in artificial neural networks are modeled after the biologicalcounterpart. Just like the biological neuron the artificial ones have an input and outputcomponent. This is visualized in Fig. 4.2. Every neuron has a set number of inputs. Eachinput has a specific weight value. The weight values are the components that specifywhat kind of output will result at the end. This means that altering those values resultsin a different output. During the learning process the learning algorithms change thevalues. What the algorithms does exactly will be explained in the chapter 5. The actualbody of the neuron has two functions implemented into it: the integration (also referredto as summation) and the activation function. The integration function is needed sincewe try to use a primitive activation function with only one parameter. This meansthat this function reduces the n possible arguments to a single value that the activationfunction uses later on as its arguments. This results into an output. The output is

Figure 4.1: Biological neuron.

4

Chapter 4. Neural Network

Figure 4.2: Artificial neuron.

Figure 4.3: Basic structure of layers.

either just the result of the activation function or if the output goes into another layerhas sometimes also a specific weight value. What Layers are exactly will be explainedin the next section. [Rojas, R. (1996)]

5

Chapter 4. Neural Network

4.2 The topology of a neural network

Layers define the capabilities of each network. The more layers the network has the moreinformation will be processed in the network. A neural network is always separated intoseveral layers. Every network consists of at least an input-layer and an output-layer. Asthe first layer, the input layer receives and processes information. It is the first layerin every network. The output layer is the last layer of a network. It receives valuesfrom either the input layer or a hidden layer and processes it for a last time and hasa direct influence to the outside world. Hidden Layers are the body of every network.The number of hidden layers can vary from 0 to as many layers you want. Every addedlayer will enhance the network’s capacity to represent more complex knowledge. Figure4.3 shows what the structure of a basic neural network with one hidden-layer looks like.[Soares, F. M. / Souza, A. M. F. (2016)]

4.3 Purpose of Learning Algorithms

Learning algorithms are responsible for the actual learning process that networks wantto achieve. As stated before in section 4.1 learning algorithms optimizes the weightvalues to enhance the knowledge of the network. Every kind of input has a specifiedweight value that determines what kind of output will result at the end. Every singleweight in the whole network has an influence and can change the whole outcome.

6

Chapter 5

Learning Algorithms

People do not learn everything in the same way. An example would be how peoplelearn vocabulary: they see the word in their language and they see what it means in adifferent one. That means they know what kind of result is expected from it. Anotherexample is how they learned how to walk. They had no idea how they should movetheir own body to achieve it but they tried over and over again, failed many timesbut they gradually learned it. Learning algorithms are also separated into differenttypes. There are two big classifications: supervised learning and unsupervised learning.[Soares, F. M. / Souza, A. M. F. (2016)]

5.1 Supervised Learning

”A learning algorithm is an adaptive method by which a network of computing unitsself-organizes to implement the desired behavior.” (Rojas, 1996: p. 77). This is howRojas described the behavior of a learning algorithm. In supervised learning this self-organization is achieved by presenting some examples of desired input-outputs. Thenetwork is then able to adjust the weights between the neurons in order to ensure thata specific input results in a specific output. For such an algorithm to learn a largeamount of data is necessary. Insufficient data renders the approach inoperative. Howsuch a scenario can be avoided will be explained in section 5.2. How a learning algorithmprocesses input-output examples is shown in Fig. 5.1. An input is fed into the networkand that network processes the information and gives an output. The output is thencompared to the expected output. If the two outputs are identical, the network will notbe changed. If they differ, the network parameters will be adjusted. This process isrepeated thousands of times. After a certain amount of iterations, the network’s outputshould converge with the desired output, independently of the input.[Rojas, R. (1996)]

5.1.1 The Classical Perceptron

Perceptrons were a big step for neural networks. In 1958, Rosenblatt, an Americanpsychologist, proposed the concept of the perceptron. The innovative part of it was theintroduction of numerical weights and a special interconnection pattern. The classicalperceptron, as proposed by Rosenblatt, is shown in Fig. 5.2.

While the perceptron used nowadays works differently to what was originally pro-posed, the concept still remains the same. The classical perceptron has a projectionarea, sometimes labeled retina. This retina sends binary values to a layer of computingunits. The connections between the retina and the first layer of computing units are

7

Chapter 5. Learning Algorithms

Figure 5.1: Learning process.

Figure 5.2: The classical perceptron [after Rosenblatt 1958].

8


Figure 5.3: Predicates and weights of a perceptron.

deterministic and non-adaptive. This means that they are not weighted and will not bechanged in the process of learning. Connections are selected stochastically. This wasmade so that the model is biologically plausible, since the goal of a neural network is tosimulate the process in a human brain. The whole idea behind the system is to trainit so that it is able to recognize certain input patterns in the connection area, which inturn leads to the appropriate path through the connections to the output layer. In thismodel the learning algorithm must derive suitable weights. [Rojas, R. (1996)]

5.1.2 The modern Perceptron

Minsky and Papert saw a big potential in the system of Rosenblatt. They took theessential features of his system to study its computational capabilities. Their new per-ceptron is a simplification of Rosenblatt’s classical perceptron. For practical reasons Iwill from now on revere to Rosenblatt’s perceptron as the classical perceptron and toMinsky and Papert’s perceptron as perceptron. The perceptron also has a retina of pix-els with binary values on which patterns are projected. Some pixels are then connectedto so-called predicates which are logic elements that can compute a single bit accordingto the input. Those predicates then transmit their binary values to a weighted thresholdelement. That threshold element is responsible for the final decision in e.g. a recognitionproblem. (see Fig. 5.3)

”A simple perceptron is a computing unit with threshold θ which, when receivingthe n real inputs x1, x2, ..., xn through edges with the associated weights, w1, w2, ..., wn,outputs 1 if the inequality

∑ni=1wixi ≥ θ holds and otherwise 0 ” (Rojas, 1996: p.

60) This is the first definition that has been officially given to the perceptron. Be-fore this, a threshold element was associated with either a whole set of predicates ora network of computing elements. What constitutes a network of computing elementsexceeds the scope of my matura thesis. This definition by Rojas refers to a perceptron

9


as an isolated threshold element which computes its output without any delay. Theperceptron also separates its input into two half-spaces: either 0 or 1. [Rojas, R. (1996)][Berger, C. (2016)]

Bias

The bias is an additional input vector that is added to a perceptron with a set input anda set weight. Simply put, the bias is the output that the perceptron gives when there iszero input. It increases the capacity of the neural network to solve problems. The biasisn’t essential but it can be a very useful tool. The influence of the bias is better shownwith an example of an AND gate with a perceptron. The function of an AND gate isthat it gives 1 (true) back only if both inputs are true.

To get a better understanding of the bias an example of an AND gate with a biasis presented. A perceptron is used with two inputs x1 and x2. In addition to those twoinput vectors also have the bias which has a value of +1. The setup of the perceptronis shown in Fig. 5.4. The value of the weight vector of the bias is in this example -30.This number varies from situation to situation. In this scenario it needs to be -30 towork out. First the inputs need to go through the summation function:

s =∑

w · x

where w are the values of the weights and x are the values of the input. For our scenario:

s = −30 + 20x1 + 20x2

For the activation function we use the Sigmoid function. The Sigmoid function is usedas an activation function for its capability of simulating the processing in the humanbrain. A property of the sigmoid function is that around +4 and -4 the y value is alreadyeither +1 or 0. This is the equation of the function:

g(s) =1

1 + e−t

The output of the perceptron is:hθ(x) = g(s)

10


If the numbers are put into a table:

x1 x2 hθ(x)

0 0 g(-30)≈ 0

0 1 g(-10)≈ 0

1 0 g(-10)≈ 0

1 1 g(10)≈ 1

With the bias the AND gate works perfectly. Creating an AND gate without a biasis unnecessarily complicated. The most important question is how do we find the valuefor the bias? In simple examples like this it is easiest just to do it with simple intuition.Intuition can be used in simple examples, in more complicated networks with hiddenlayers the learning algorithm finds the bias for us. It does that by treating the bias asa normal vector. The weight vector is then adjusted the same as all the other units.[Berger, C. (2016)]

Figure 5.4: Example for an AND gate with a perceptron.

5.1.3 Linear separability

”Two sets A and B of points in an n-dimensional space are called absolutely linearlyseparable if n + 1 real numbers w1, ..., wn+1 exist such that every point (x1, x2, ..., xn)εA satisfies

∑ni=1wixi > wn+1 and every point (x1, x2, ..., xn) ε B satisfies

∑ni=1wixi <

wn+1” (Rojas, 1996: p. 80)The learning algorithm tries to separate the input data into two different sets. This

is where linear separability comes in. As the definition above states, two sets of pointsthat are put into an n-dimensional space can only be called linearly separable if theymeet certain requirements. The threshold is important in this case because wn+1 is −θ(the negative of the threshold). The explanation of why wn+1 needs to be −θ exceedsthe scope of this introduction. With the help of the summation function and wn+1 we

11


Figure 5.5: Error function for the AND gate with weights between −0.5 and 1.5.

can cleanly separate all points. If the summation∑ni=1wixi is bigger then −θ then the

points are a part of set A. If the summation is smaller then the points belong to set B.[Rojas, R. (1996)]

5.1.4 Error surface

The error is technically just an incorrect set of points. The objective of a learningalgorithm is to minimize the error. There are several valid approaches to achieve this.One of the simple ways to do this is to use a so-called greedy algorithm that computesthe local error of a perceptron with a given weight. After that it decides in whichdirection the weight vector needs to be updated and does that by selecting new weightsin the selected search direction. The error function is visualized in Fig. 5.5 to get abetter understanding of how it works. This structure has been created by trying to findthe right weights for an AND gate by first choosing a fixed threshold θ. After that wecreated this by looking for the right weights w1 and w2 between −0.5 and 1.5. The erroris calculated by comparing the value of the output and the expected value. It is clearlyvisible that there are places where the error function gives either 2 or 1 back. What wewant is the area where it is 0. The area of error = 0 in this function is a triangle. Howthis structure looks from above and how the right weight is found is illustrated in Fig.5.6. It is made clear that the solution is not always found right away. The weight is justslowly being adjusted: it first start with w0 then updates it 2 times and goes throughw1 and w2 until it finally reaches w∗. [Rojas, R. (1996)]

5.1.5 Perceptron learning

Figure 5.7 is a flowchart for this specific learning algorithm. For this algorithm there is atraining set that consist of two sets, P and N, in an n-dimensional extended input space.The task of this algorithm is to find a weight w that separates those two sets. Thisalgorithm only changes a weight vector if a vector in either P or N was not classified

12


Figure 5.6: Iteration steps to the region of minimal error.

correctly. The flowchart does not have an end. This has two reasons. The first oneis that it shouldn’t stop as long as there is anything to learn. Learning is an ongoingprocess that shouldn’t stop as long as you have data to work with. The second reasonis that it shouldn’t be able to exit the algorithm. The last node doesn’t need to makeany more decisions. The only possible answer to the if clause that it has been given isyes. This results into a endless loop of learning . [Rojas, R. (1996)]

5.2 Unsupervised Learning

The possibilities with supervised learning are immense but in some scenarios it fails.This is where unsupervised learning comes in. The big difference of supervised andunsupervised is that unsupervised does not need an expected output to compare to.Instead, it decides what output would be best for a given input and reorganizes the net-work accordingly. There are two main classes of unsupervised learning: reinforcementand competitive learning. In the first method the algorithm reinforces the weights of thenetwork in such a way as to enhance the reproducibility of the desired output. One ofthe best known example for this method is Hebbian learning. Competitive learning asthe name suggests works in a way that the elements of the network compete against eachother for the right to provide the desired output associated with an input vector. The unitthat wins is called the BMU (best matching unit). This matura paper will explain com-petitive learning in depth. [Rojas, R. (1996)] [Soares, F. M. / Souza, A. M. F. (2016)]

5.2.1 Clusters and the clustering problem

Clusters are the key to the concept of competitive learning. The way the learning algo-rithm is able to find the best output is by organizing the input data into so called clusters.What the clustering looks like is shown in the two figures 5.8.a and 5.8.b. In Fig. 5.8.athere are two sets of input vectors that have been put into a coordination system. What

13


start

weightvector w0 isgeneratedrandomly

set t = 0

vector x ε P∪ N selected

randomly

x ε P andwt · x > 0

x ε P andwt · x ≤ 0

set wt+1 =wt + x andt = t + 1

x ε N andwt · x < 0

x ε N andwt · x ≥ 0

wt+1 = wt−xand t = t+ 1

yes

no

yes

no

yes

no

Figure 5.7: Flowchart of the perceptron learning algorithm.

14


(a) (b)

Figure 5.8: Left : Two sets of vectors P and N Right : Weight vectors for the clusters.

the clustering now does is that it looks at the vectors and tries to approximate themwith a weight vector. The weight vectors that result out of this setup is then visible inFig. 5.8.b. Each weight vector is represented by one computing unit, that means theamount of clusters is predefined. This predefinition results into some problems. Unsu-pervised learning is mainly used because we do not actually know the whole structure ofthe data,estimating the amount of clusters needed can be hard. This applies especiallywhen dealing with multidimensional data sets with an unknown deep structure. So thequestion now is: ”If the number and distribution of clusters is unknown, how can wedecide how many computing units and thus how many representative weight vectors weshould use?” (Rojas, 1996: p. 103) [Rojas, R. (1996)]

5.2.2 Competitive learning

To get a better understanding of the subject a network is created for Fig. 5.8 as anexample. Since we have 3 clusters A, B and C in the Fig. 5.8 the network that processesthe problem also needs 3 units. The concept of competitive learning defines that onlyone of the units is allowed to actually trigger a 1. This results into the necessity of unitsto communicate with each other. Figure 5.9 is a possible network that would be able toprocess the problem in Fig. 5.8. In Fig. 5.9 each unit computes its weighted input, butin the end only the best matching unit is allowed to fire a 1. The other units are thenprevented to give any output. It is necessary for the units to communicate with eachother. Those connections between the single unit are also visualized in Fig. 5.9. Thissetup can also be looked at as multiple perceptrons with variable thresholds. In eachcomputation the thresholds are updated to ensure that only one unit is able to fire.

The following learning algorithm is a possible way to identify the clusters of inputvectors. To first classify some specifics: X = (x1, x2, ..., xl) is a set of normalized inputvectors in n-dimensional space which we want to classify in K different clusters. Thenetwork itself consists of k units, each with n inputs and a threshold of zero. Using athreshold of zero does not loose us any generality.

The algorithm will stop after a predetermined number of steps. What it does is thatthe weight vectors are being attracted in the direction of clusters in the input space.Normalizing after we substituted wm with wm + 1 is a really important step to preventone vector to become so big that it would win every competition. This would resultinto several so called dead units since during the algorithm only one would be updated.

15


Figure 5.9: A network of three competing units with connections between each unit.

[Rojas, R. (1996)]

Kohonen network

Kohonen, also called self-organizing maps, are a kind of network architecture that useunsupervised learning. It was first created by the Finnish professor Teuvo Kohonen inthe early 80s. It works in a similar way as traditional competitive neural networks. Inthe algorithm the BMU is also calculated and then updated accordingly but there is onedifference and that is the concept of neighborhood neurons. The Kohonen works in away that it also updates the neurons that are nearest to the BMU. The neighborhoodneurons are not changed as much as the BMU. This results in the neurons that are closeto each other to give a similar output. It creates a map of the data in which the singleclusters are visible. This is why it is also called self-organizing map or short SOM.

Figure 5.11 is a visualization of the mapping process. The blue cluster stands for theinput data and the grid for the neurons in the network. The yellow indication illustratesthe nearest neuron that the algorithm calculated that later got moved in. The adjacentneurons also got moved. After repeating this process multiple times the network lookslike it is shown in the last portion of the illustration. [Rojas, R. (1996)]

16


start

normalizedweightvectors

w1, ..., wk aregeneratedrandomly

select avector xjεX

randomly

computexj · wi fori = 1, ..., k

select wmsuch thatwm · xj ≥wi · xj fori = 1, ..., k

substitutewm with

wm + xj andnormalize

Figure 5.10: Flowchart of the competitive learning algorithm.

Figure 5.11: Illustration of the training of a self-organizing map.

17

Chapter 6

Image Recognition

Image Recognition has been a big part of my project. Getting the information out ofthe emulator has been shown to be a harder task then I first anticipated. I did severalapproaches in solving this problem. The cleanest way to do this would have been byusing a xml file. Since that in itself could have been a separate matura paper I decidedto search for something easier to use. My supervisor then suggested that I should useOpenCV a programming library that is widely used for image recognition and processing.What I used and how it works I will explain in 6.1 OpenCV.

6.1 OpenCV

OpenCV is an open source computer vision and machine learning software library. Itwas developed to provide a common infrastructure for computer vision applications andto accelerate the use of machine perception in the commercial products. There areover 2500 optimized algorithms implemented in the library that can be freely modified.Originally the library was only developed for the programming languages C++ and C.Later on it was implemented into Python. Lastly Java became one of the languages thatcould make use of the library. As Java was only added recently, information of how touse the library is scarce. After some research I found two possible ways to use OpenCVwith my program: Template matching and difference. [OpenCV (2012)]

Before going into the two methods I used, a basic understanding of how computerssee an image is needed. A computer is not able to see as humans do. Everything theyare able to process are just numbers. Every pixel in a picture has a specific numberassociated with it. A program on our computer visualizes those numbers for humans sothat we are able to see and understand them. An example of how the computer would”see” a car is shown in Fig. 6.1. In other words a pictures is just a grid or matrix ofnumbers.

6.1.1 Template matching

This method uses two different pictures: a template and a picture they want the templateto match with. The concept is pretty simple. It takes the template and slides it overthe picture. Then with one of the two main matching methods that are used with thismethod it is possible to find the best match. The first method is TM CCOEFF:

R(x, y) =∑x′ ,y′

(T′(x

′, y

′) · I(x+ x

′, y + y

′))

18

Chapter 6. Image Recognition

Figure 6.1: The grid of number that the computer is able to understand.

This method gives a number between −1 and 1 back. 1 means that there is a perfectmatch and −1 means that there is a prefect mismatch. 0 just means that there is nocorrelation at all.

The second method is TM SQDIFF which looks like this:

R(x, y) =∑x′ ,y′

(T′(x

′, y

′)− I(x+ x

′, y + y

′))2

There are only two differences between the first method and this one: instead of amultiplication there is subtraction and the whole method is squared. This results intothis method giving back a number between 0 and a really big number, 0 being a perfectmatch. Thanks to the square it’s easier to identify a really bad match since the worsethe match is the faster number grows. [OpenCV (2011)]

19

Chapter 6. Image Recognition

Figure 6.2: Absdiff() used on Mortal Kombat 3.

6.1.2 Difference

The second method that has been used worked in a similar but easier way. It also takestwo images and compares them but is not actually looking for a specific match but fora difference in the images. The method name is called absdiff() and it is described toreturn the absolute value of differences between two arrays. What this function returnsis an array of points that are different. If you then visualize it you get an image as shownin Fig. 6.2.

20

Chapter 7

Own project

7.1 Mortal Kombat 3

Mortal Kombat 3 is the third adaption of the popular series Mortal Kombat. It is afighting game that lets players choose between a cast of different characters to thenfight against an opponent. The game was first published in arcades in 1995 but wassoon ported to three home consoles: Genesis, Super NES and Sony Playstation. OnlyPlaystation was identical to the arcade version though, due to a deal that Sony didwith the developers of game Midway. Because of that the games of each platform, eventhough they are called the same, vary a little. The game itself is best known for thebrutality of it, especially when talking about the finishing moves. The complexity of thegame is also admired by a lot of people around the world.

7.2 Observation of the actions

The first thing that had to be figured out was how the informations could be fed outof the emulator. The emulator was a Super NES emulator. Template Matching, whichhas been explained in 6.1.1, has been the first method that had been tried to utilize.A template of one of the characters called Shang Tsung was created and then usedon another screenshot that had been taken of the game. It did not have any problemlocating the character while most of its body was visible. The problems came up assoon as situations were tried out where the character were doing for example a roll orthe body were obstructed by the enemy character. The program was not able to findthe character and it seemed like it chose a random location on the screen. One possibleway to solve this problem would have been to make a template of each possible movethat the character could do and work with them. The negative of this method was thatthe characters would have needed to be predefined. This would take the necessity of aneural network away. Because of this the second method was tried out.

21

Chapter 7. Own project

Figure 7.1: Template Matching used on Mortal Kombat 3.

The second method was a lot better fitting for a neural network since it only givesback an array of numbers that indicates something changed. Nothing has been predefinedso the neural network needs to find out by itself what everything means. It does haveone weakness and that is, it doesn’t just show whats new in the second image but itjust shows the absolute difference. This is also visible in the previously shown Fig. 6.2.There seem to be 4 characters in the picture. This happened because the area where thecharacter originally was has changed. That section is then seen as a difference betweenthe images. It is definitely a weakness of the method but it shouldn’t affect the neuralnetwork to a big extent.

7.3 Losing control over the computer

The second thing that was needed was how the neural network should be able to interactwith the emulator. That is done by using a so called Robot class in Java. A Java programis able to control the computer by using this class. Thanks to this class it is possible tostart everything without anyones interacting with the computer. This helped to createa controlled environment without any human interactions. Because of this any controlof the computer is lost for a while since every time someone would try to move themouse or press a button the program would overwrite the actions. Because of this anemergency stop was implemented so that when the esc button is pressed the programstops interacting with the computer.

7.4 Two different approaches

My initial plan with my project was to use an unsupervised learning algorithm. For thisI used the Kohonen algorithm that was explained in section 5.2.2.

7.4.1 SOM

The thought behind the use of the SOM-algorithm was that it may be able to clusterthe single moves of the opponent, so that it could react in a specific way. The problem

22


behind this concept is that a person would need to intervene way too much to actuallybe able to call it unsupervised learning. This method would have been able to clusterall the information that would have been given to the neural network by looking at thesimilarities between each screenshot that had been taken. In the end the program didhave several clusters designated to each move of the opponent. One thing was that thenumber of input neurons would be needed to be adjusted each time the opponent wouldchange so that there would not be any dead neurons. The difficult thing now for theSOM-algorithm was to find the right output. Everything that this algorithm is able todo in this scenario is to clusters the move of the opponent but knowing how to react wasanother thing. One thing I thought about was using the health-bar as a desired output.So what the neural network would be doing then is try moves out and find a way toeither hurt the opponent or look that he does not get hurt. It was probably a possibleway to solve the problem but since a desired output had been implemented the neuralnetwork turned into a supervised neural network. So it had been decided to goo all theway and implemented supervised learning.

7.4.2 Supervised neural network

The neural network got split into two parts. The input layer was unsupervised andit clustered the input data so that each enemy movement would be designated to oneneuron. The output layer was supervised.

First data had to be gathered for the network. To be able to do this a program hadbeen created that would supervise a game of Mortal Kombat 3. Two classes that areimplemented in Java had been used: the Robot class and the Keylistener. The Robotclass had been already used for the emulator to be able to control the computer, asexplained in 7.3. It was also used to take screenshots while playing the game. TheKeylistener, as the name suggests, records the keys the player presses. With this theneeded data had been able to be gathered for the network by playing the game for severalhours. The player does not need to be the best play in the world since for this just arough idea of what the networks needs to do should suffice.

With the data that had been gathered information could now be fed into the network.The input layer would cluster all the data into a specific amount of clusters. After this itwould feed into a hidden layer with the same amount of neurons. This hidden layer actsas a input layer for the supervised part of the neural network. The supervised learningalgorithm had then been used by using the clusters and the keys that were pressed in thesituation. The adjustment needed to be small to ensure that they don’t always overwritethemselves.

7.4.3 Structure of the neural network

As it already had been implied does the neural network have an input layer, a hiddenlayer and an output layer. Every layer has a specific amount of neurons that need to beadjusted before going against a specific opponent. The input layer and the hidden layerhave the exact same amount of neurons since the hidden layer just receives the clustersof the input layer and acts as a input layer for the supervised learning algorithm. Theoutput layer has always the same amount of neurons since there are just a specific amountof keys to press. Figure 7.2 shows the structure of the neural network that was used forthe project and the connections between each neuron.

23


Figure 7.2: Structure of the neural network was used.

7.4.4 The results of the network

The program was in some extent able to compete with the enemy opponent but it hadsome difficulties in certain scenarios. Especially when the whole environment changeddrastically by moving fast in one direction or when the players were overlapping eachother, the network was not able to react accordingly. Another thing that was not optimalwas that to certain moves of the opponent the program always reacted in the same way.This made the program really predictable. Against the computer this wasn’t any problembut a real human could exploit this weakness.

24

Chapter 8

Discussion

Even though the program did have some weaknesses it still achieved some victoriesagainst the opponents. By feeding it a lot of information of the game the network wasable to create a neural network that could compete with some mediocre players. Theprogram was also able to process changes on the screen and act accordingly.

There are two big weaknesses that could still be improved a lot. One of them wasthat the program just learns from the input data that has been fed to it since during thegame it was not able to confirm if what he was doing was right or wrong. To solve this adifferent approach would be needed e.g. with neuroevolution. The second weakness wasthe image processing. There is still a lot of potential to improve it so that the programdoes not have any problems in processing the game.

25

Bibliography

[Rojas, R. (1996)] Neural Networks: A Systematic Introduction. Berlin: Springer-Verlag

[Soares, F. M. / Souza, A. M. F. (2016)] Neural Network: Programming with Java.Birmingham: Packt Publishing

[Berger, C. (2016)] Perceptrons: the most basic form of a neural network. RetrievedOctober 10, 2016, https://appliedgo.net/perceptron/

[OpenCV (2012)] OpenCV: Open Source Computer Vision. Retrieved October 10, 2016,http://opencv.org/about.html

[OpenCV (2011)] OpenCV: Template Matching. Retrieved October 12, 2016,http://docs.opencv.org/2.4/doc/tutorials/imgproc/histograms/template matching/template matching.html

26

Bibliography

”I hereby declare that this matura thesis is my original work and I did not use anyunauthorized help to create it. All information from sources, aids and webpages hasbeen depicted truthfully and was cited.”

27

a neural network learns to play mortal kombat 3 - home - · pdf filematuraa rbeit oktober 201...

Documents