the university of...

Automated Timetabling: Applyingevolutionary algorithms to

timetabling problems

This dissertation is submitted for the degree ofB.Sc Software Engineering (Business Systems)

The University of Manchester

Faculty of Engineering and Physical SciencesSchool of Computer Science

Final year project

May 6, 2009

Author:Andrew Wise

Supervisor:Dr. Aprodite Galata

Abstract

This report details the creation of a generic timetabling application using geneticalgorithms. To do this required the creation of a lightweight and object-orientedgenetic algorithms framework in the Java language, named the Darwin GA Frame-work. This framework features several architectural advancements over previousgenetic algorithm libraries and allows rapid development and integration of geneticalgorithms into any program.

Previous attempts at building programs that solve timetabling problems have fo-cused on building programs to meet the specific needs of a single domain and its setof problems. Programs such as these don’t allow for changes in the problem specifica-tion that typically arise in real-world scenarios. Because of this a generic timetablingapplication has been developed that makes use of a simple specification method fordescribing timetabling problems, built on-top of the authors attempt at a standardmodel for describing timetabling problems. Previous attempts at standardizing amodel for timetable problems have provided a good base for development, but themodels lack the subtlety required to express the complex and detailed constraintsand heuristics employed when timetables are solved using a manual process. Thisreport details the constraints language, XML instance documents and Java programrequired to build a generic solver based upon this extension to previous timetablemodels. The program developed manages to effectively capture all the subtle rules inboth the high school scenario, used as a benchmark, and other scenarios consideredduring development.

The complexity of these rules places extreme pressure on the genetic algorithm, asmany of these rules make the problem NP-complete. For non NP-complete prob-lems, the revised standard model may now allow other approaches, like graph basedsolvers, to perform more efficiently; for NP-complete problems, an extension to thegenetic algorithm approach, utilizing new selection and search operators and par-allelized algorithm implementations may allow solving in less time than is requiredusing the current simple approach. Nevertheless, the current solution manages toachieve running times significantly less than existing manual methods, with theadvantage of finding multiple valid timetables that provide alternatives to the user.

i

Abstract Abstract

Acknowledgments

I would like to thank the following academics for their advice and guidance through-out this project:

• Dr. Aphrodite Galata

• Dr. Milan Mihajlovic

• Dr. Richard Neville

ii

Table of Contents

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

1 Introduction1

1.1 Report Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.1 Programming Languages . . . . . . . . . . . . . . . . . . . . . 2

1.2 Project Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.1 Project Brief . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.2 Project Management . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Preliminary Readings . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 BackgroundAn Introduction to the Theory 7

2.1 Timetabling Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 82.1.1 Types of Problem . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 What are Genetic Algorithms? . . . . . . . . . . . . . . . . . . . . . . 92.2.1 Genetic reproduction . . . . . . . . . . . . . . . . . . . . . . . 102.2.2 Natural Selection . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.3 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 112.2.4 Differences between Genetic Algorithms and Natural Systems 112.2.5 Taking evolutionary computing further . . . . . . . . . . . . . 12

2.3 Existing Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3.1 Genetic Algorithm Libraries . . . . . . . . . . . . . . . . . . . 122.3.2 Timetable Solvers . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 RequirementsA generic timetable solver 15

3.1 System Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.1.1 Non-functional system requirements . . . . . . . . . . . . . . . 16

3.2 Example Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.3 Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 18

iii

TABLE OF CONTENTS TABLE OF CONTENTS

4 A Formal DefinitionDimensioning the problem domain 19

4.1 A Model for Timetable Problems . . . . . . . . . . . . . . . . . . . . 204.1.1 What is a timetable problem? . . . . . . . . . . . . . . . . . . 204.1.2 The essence of a timetable . . . . . . . . . . . . . . . . . . . . 234.1.3 Defining a timetable . . . . . . . . . . . . . . . . . . . . . . . 234.1.4 Expressing Constraints . . . . . . . . . . . . . . . . . . . . . . 27

4.2 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.2.1 Anatomy of a Genetic Algorithm . . . . . . . . . . . . . . . . 304.2.2 Fitness Calculation . . . . . . . . . . . . . . . . . . . . . . . . 324.2.3 Selection for Survival . . . . . . . . . . . . . . . . . . . . . . . 324.2.4 Selection for Crossover . . . . . . . . . . . . . . . . . . . . . . 344.2.5 Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.2.6 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.2.7 Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5 Design and ImplementationConstructing the system 41

5.1 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.2 The Darwin GA Framework . . . . . . . . . . . . . . . . . . . . . . . 42

5.2.1 Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.2.2 Framework Architecture . . . . . . . . . . . . . . . . . . . . . 445.2.3 Code Branches . . . . . . . . . . . . . . . . . . . . . . . . . . 485.2.4 Using the Framework . . . . . . . . . . . . . . . . . . . . . . . 495.2.5 Extending the Framework . . . . . . . . . . . . . . . . . . . . 50

5.3 A Scheduling Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.3.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.3.2 Internal Model . . . . . . . . . . . . . . . . . . . . . . . . . . 545.3.3 Encoding, decoding and doping a Population . . . . . . . . . . 555.3.4 Fitness Calculation . . . . . . . . . . . . . . . . . . . . . . . . 575.3.5 XML Handling . . . . . . . . . . . . . . . . . . . . . . . . . . 58

6 Evaluation 616.1 Effectiveness of Timetabling Model . . . . . . . . . . . . . . . . . . . 616.2 Future Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.2.1 Darwin GA Framework . . . . . . . . . . . . . . . . . . . . . . 626.2.2 Scheduling Agent . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.3 Further Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Bibliography 67

A Darwin GA FrameworkClass Diagram 68

B Scheduling AgentInternal Timetable Model 72

iv

List of Figures

2.1 Timetabling Problem Visualization . . . . . . . . . . . . . . . . . . . 8

4.1 Timetable Data Input . . . . . . . . . . . . . . . . . . . . . . . . . . 254.2 Phases of a Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . 304.3 Anatomy of a Genetic Algorithm . . . . . . . . . . . . . . . . . . . . 314.4 Single-Point Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . 354.5 Single-point crossover: two children . . . . . . . . . . . . . . . . . . . 38

5.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.2 Execution of a simple GA over a single cycle . . . . . . . . . . . . . . 465.3 Separation of Operator Concerns . . . . . . . . . . . . . . . . . . . . 475.4 Core Classes and Interfaces of the Darwin Framework . . . . . . . . . 52

v

List of Tables

2.1 Timetabling Problem Types . . . . . . . . . . . . . . . . . . . . . . . 9

4.1 Selection for Crossover Schemes . . . . . . . . . . . . . . . . . . . . . 34

5.1 Differences between JGAP and Darwin . . . . . . . . . . . . . . . . . 43

vi

List of Algorithms

1 Construct a randomized population: binary genotype . . . . . . . . . 312 Monte Carlo algorithm for proportional selection . . . . . . . . . . . . 333 Random N point crossover, 2 parents → 1 child . . . . . . . . . . . . 364 Random Strong Mutation . . . . . . . . . . . . . . . . . . . . . . . . 39

vii

LIST OF ALGORITHMS LIST OF ALGORITHMS

viii

1

Introduction

I have primarily written this report to provide a clear summary of my actionsthroughout this project, but also to provide an introduction to genetic algorithmsand timetabling problems; For those not familiar with these areas of research Ihave provided introductory chapters and also a clear narrative of the system de-velopment. These later chapters are recommended for other software engineers orcomputer scientists, especially those who are interested in developing a system withsimilar functionality or in these research areas.

* * *

For those wishing to use this report as a basis for further development I have includedsome preliminary reading recommendations later in this chapter. These should givethe reader a firm grounding in the principals required to develop a timetablingsystem using genetic algorithms.

1

CHAPTER 1. INTRODUCTION 1.1 Report Structure

1.1 Report Structure

At the beginning of each chapter I have provided a brief summary of its purpose,for those skimming the report. Followed by a slightly more in depth appraisal of itscontents, for those giving it a more thorough read. Where possible I have tried togroup material by topic rather than by the stage of development, allowing individualaspects of the project to be studied more completely.

1.1.1 Programming Languages

The system developed in this report has been built using the Java programminglanguage Version 6, although the principals used and functionality developed couldbe equally applied in any other language. For those attempting to follow the devel-opment in this report closely, I recommend that you are familiar with (and chooseto develop in) modern Object Orientated (OO) programming languages and the un-derlying paradigm that they follow. I have also made extensive use of Generics (orparametrized1 types for those fans of Gamma et al.) to provide compile time typechecking and a type safe system. Subsequently it would be highly advantageous tothe developer if an equivalent language that implements generic types was chosen,unless they have a particular desire to refactor the large majority of code in thisreport.

The programming languages used in this report are:

• Java Standard Version 6

• XML

• XSLT

1.2 Project Structure

I have undertaken this project using a loosely phased, iterative approach. After aninitial design I began to develop prototypes for each aspect of the system. Followingthe creation of working prototypes for each sub-system I began to iteratively combineelements of my prototypes into a central working system, testing extensively duringintegration using JUnit and a central testing scheme.

1.2.1 Project Brief

Since undertaking this project the scope and goals have changed significantly. Thishas been for a variety of reasons but foremost was the need to differentiate thisproject from other similar projects being undertaken in the department. Initiallythe brief for my proposed project was as follows,

1Surely they should be called parameterized types?

2

CHAPTER 1. INTRODUCTION 1.2 Project Structure

Develop an evolutionary computing algorithm to allocate rooms tofunctions in a conference center

Evolutionary Computing / Genetic Algorithms Project:

The project is an application that many businesses may actually require.This is a simple case of an allocation of resources problem. The projectinvolves developing a prototype system that has a small number of com-panies with differing requirements of the conference centre.

The types of requirements are:

1. Number of people attending (i.e. which rooms are viable for thisevent)

2. Dates event runs over (i.e. has the centre any space for the newevent)

3. Support Equipment required (i.e. OHP, Video, White boards, flipcharts, coffee, food etc.)

The prototype should be followed by the development of a booking sys-tem that can automatically allocate resources.The resources are allocatedautomatically using a Genetic Algorithm. During the project the stu-dent will be expected to comprehend the underpinning theory of GeneticAlgorithms. They would also be expected to write some software for theprototype and final system.

The core principal of the project was to solve timetabling problems with evolutionarycomputing processes and more specifically using genetic algorithms. To differentiatethis project from others like it, I expanded the scope of the project to mirror thisprincipal. My revised project goal was to create a system, based on a genetic algo-rithm, which could generically solve any instance of a timetabling problem. Clearlythis introduces new elements into the system requirements so the new subgoal ofcreating a coherent and comprehensive model of this class of timetabling problemwas also added to my revised system brief.

1.2.2 Project Management

1.2.2.1 Development Tools

This project was developed using the Netbeans IDE[1], an integrated developmentenvironment primarily targeted at the Java platform. It provides a powerful de-bugger, version control integration, build support and various code completion andrefactoring tools. Using a tool such as this allowed for much faster developmentcycles and a more robust program due to reduction in bugs.

1.2.2.2 Version Control

All the code and documents for this project have been managed using the Subversionversion management system - http://subversion.tigris.org/

3

CHAPTER 1. INTRODUCTION 1.2 Project Structure

This has allowed me to effectively run prototypes of my system and develop newfeatures in isolation, testing them and checking for conflicts before merging themwith the main branch of my code. This greatly reduced the amount of time Ispent doing integration testing during my project and has allowed me to trial manydifferent variations of my code.

1.2.2.3 Licensing

Pending the completion of this project, both the framework for implementing timetablingproblems and Darwin Genetic Algorithm framework - including component libraries- will be released under the GNU General Public License version 3 (GPLv3) http://www.opensource.org/licenses/gpl-3.0.html or other equivalent license at thetime of publication.

1.2.2.4 Accessing System

The source code for all elements of this project - released under the above licensing -will be available with documentation at http://labs.novasis.co.uk/ download-able via the subversion version management system.

1.2.2.5 Report Typesetting

This report has been typeset using the LATEX typesetting system, utilizing TeXnic-Center and MiKTeX, and a custom style template.

LATEX is a document preparation system built on top of the TEX typesetting lan-guage. It was created by Donald E. Knuth and published in 1989.[2] The primarygoal for the system is to allow an average user to typeset documents with a rea-sonable amount of effort and many current implementations allow the generationof PDFs which creates a cross-platform document. It provides a powerful mechanicfor expressing mathematical notation and formatting large documents without ex-tensive typesetting by hand.[2] I have also made use of BibTEX, an extension to thesystem which typesets bibliographic data and includes it in the document.

The LATEX source will be available after completion of the report via the abovesubversion repository.

4

CHAPTER 1. INTRODUCTION 1.3 Preliminary Readings

1.3 Preliminary Readings

Readers may benefit from reading the following sources first, although the materialin them is in no way a prerequisite for this report.

• Evolutionary Computation - [3] - A solid introduction to genetic algorithmsand their related algorithms, anyone looking to develop the genetic algorithmsin this report further should start with this book.

• Concrete Mathematics - [4] - Provides a comprehensive introduction to discretemathematics

• Software Engineering by I. Sommerville - Gives a good introduction to thesoftware development process for those not familiar with the activity.

• Fundamentals of Database Systems - [5] - Provides a useful introduction toRelational Algebra, which informs the creation of the timetable model used inthis report.

• Genes and DNA - [10] Provides an excellent introduction into the naturalprocesses genetic algorithms are based upon.

5

CHAPTER 1. INTRODUCTION 1.3 Preliminary Readings

6

2

Background

An Introduction to the Theory

For those not familiar with timetabling problems, what follows is a brief introductionand background to the problems that will be addressed in the following chapters.There is also an overview of Evolutionary Computing techniques, including a morein depth introduction to Genetic Algorithms, for those readers not familiar withthese approaches to designing algorithms.

* * *

Much of the material on genetic algorithms that can be found in this chapter iscovered in much greater depth in the suggested preliminary reading. What I havewritten here is a general introduction to the topic that will provide the reader withenough knowledge to comprehend the following chapters without delving into theliterature. The reader should feel free to skim through sections they are familiarwith.

7

CHAPTER 2. BACKGROUND 2.1 Timetabling Problems

2.1 Timetabling Problems

Timetabling problems have been an important problem area for research since the1960’s. It is of particular interest in the areas of Operational research and artificialintelligence.[6] It is a prime example of a set of activities that prove particularlydifficult to compute a solution to. When a human solves such a problem they willbring their own knowledge about the situation and employ several difficult to defineheuristics in order to come up with a good (but often sub-optimal) solution. It iseasy to see why the problem is such a difficult one if we consider the definition givenby Burke et al.[7] which covers most cases of timetable problems

A timetabling problem is a problem with four parameters: T , a finite setof times; R, a finite set of resources; M, a finite set of meetings; and C,a finite set of constraints. The problem is to assign times and resourcesto the meetings so as to satisfy the constraints as far as possible.

The set of meetings is at its worst the Cartesian product of the set T and the setR. Put simply, the number of possible assignments that an algorithm may have toattempt is the number of times in T multiplied by the number of resources in R.This may be drawn in the entity relationship form as in Figure 2.1. The set ofmeetings is only limited by the members of set C which express constraints on theallowable meetings in M, however, in the worst case scenario these constraints wouldallow all possible meetings to be placed in M and so not reduce the complexity ofthe problem.

If an algorithm were to find the optimum solution using a brute force approach(trying every single possibility) then it would have to try every possible combinationof M. In a worst case scenario the algorithm would have to attempt 2M−1 possibilitiesand so we can begin to appreciate the potentially enormous amount of time requiredfor all but the smallest of problems.

Figure 2.1: Timetabling Problem Visualization

2.1.1 Types of Problem

Timetable problems take many different forms and exist in numerous domains. Re-search has primarily focused on the timetabling problems in educational establish-ments, such as high schools and universities. In these domains the task of assign-ing classes to teaching periods then teachers and rooms to classes constitutes atimetabling problem. Another example is exam timetabling where an exam must

8

CHAPTER 2. BACKGROUND 2.2 What are Genetic Algorithms?

be scheduled to a slot where all the pupils can attend it and in a room with enoughcapacity.

Timetabling problems exist in other domains and another area of research thatpresents a different class of timetabling problem is scheduling train timetables.Timetabling problems such as this require a different approach as timetable clashescannot easily be inferred from the timetable data itself. Problems of this class oftenrequire further models to simulate the end system in order to validate timetables.Table 2.1 summarizes the types of timetabling problem.

Problem Type Problem DescriptionResource Assignment A class of timetable problem where the en-

tire problem space is encoded by the map-pings of resources to time slots

Resource Modeling A class of timetable problem where theentire problem space is not encoded by themappings of resources to time slots andadditional modeling is required to ensureall constraints are satisfied

Table 2.1: Timetabling Problem Types

Every timetabling problem is unique to its domain and has its own set of constraintsthat define valid solutions. Each problem may also have a different set of mappings,which define what resources are to be assigned. This has led to a vast number oftailored solutions being created for individual timetabling problems. The problemwith this approach is that specialist knowledge is usually required to alter the prob-lem definition if the domain requirements change and this can cost significant time,money and resources.

2.2 What are Genetic Algorithms?

Genetic algorithms are members of a class of algorithms named evolutionary algo-rithms. Evolutionary algorithms have been developed from research into evolution-ary computation methods. Evolutionary computation methods simulate biologicalprocesses to provide a powerful search and optimization paradigm, and the manyclasses of evolutionary computation models that have been studied are usually re-ferred to, collectively, as evolutionary algorithms.[3]

Evolutionary Algorithms have their roots in work done by many of the pioneers inthe field of machine learning. In 1950 Alan Turing raised the possibility of usinggenetic and evolutionary metaphors in search algorithms.[8] This was followed byRichard M. Friedberg in his 1958 paper A learning machine, where he consideredthe evolutionary metaphor for teaching a computer to generate programs for solvingsimple problems.[9]

Evolutionary Algorithms make use of randomness and geneticaly inspired operationsto evolve from a poor solution, to a strong solution. [3] Genetic algorithms are aclass of algorithms that attempt to mimic biological processes very closely. The

9


problem is encoded into a set of genetic strings before using a series of operationsbased upon natural genetic reproduction, such as crossover and mutation, to evolvethe solution candidate towards a better solution. Just like in nature, a populationof individuals is usually needed to allow this breeding process to take place and eachmember of the population needs to be selected to breed in some manner, known asselection.

2.2.1 Genetic reproduction

All life on earth reproduces by some form of genetic reproduction. Each cell of anorganism contains strands of DNA called genes which together make up the geneticcode (or genome) for that organism. Genes are constructed from a sequence of 3alleles which take one of the values A,C,G,T. Each gene encodes the construction ofproteins in the body via means of mRNA - a chemical that interprets the code of agene (e.g. ‘CGT’) and constructs a specific block of protein from it.[10] The geneticcode of that organism and genes are passed on to new generations of the organism bymeans of genetic reproduction. The complete set of observable properties encodedin the genome of an organism is known as an organisms phenotype.[10]

Two types of genetic reproduction are commonly observed in organisms and thoseare meiosis and mitosis. In cellular mitosis a parent cell will pass on all of its geneticmaterial to its children and is the form of reproduction in asexual organisms. Duringcellular meiosis two parent cells combine their genetic material to form a new set ofchildren with different genetic material.[10]

During replication of genetic material, in either process, a mistake is occasionallymade which causes a change in the genetic material. This process is known asgenetic mutation and can be beneficial or detrimental to the ability of an organismto survive. Genetic mutation allows new traits to appear in an organism which maygive the organism an advantage over others. Similarly it may remove, or introduce,a trait that results in making the organism less able to survive that others.[11]

Cellular meiosis has the additional step of combining genetic material from a set(usually a pair) of parents to produce a child cell. This process is known as geneticcrossover and, although the exact mechanism depends on the organism, has theresult of taking some genetic material from each parent to create a child with acomplete genome.[11]

2.2.2 Natural Selection

Neither process on its own is remarkable, however when you introduce a third stepknown as selection they combine to form a very powerful optimization mechanism.

Natural Selection is a widely accepted and well known theory devised by CharlesDarwin during the 19th century and presented in his seminal work: On the Ori-gin of Species by Means of Natural Selection. Through extensive observation andexperiment Darwin was able to explain how organisms adapt to their environment.

The key fundaments of natural selection are that (excluding minor fluctuations) anorganism’s environment is stable and has finite resources available to the population.

10


That given an infinite amount of resource a population would continue to breedand grow exponentially (PopSizenew = PopSizecurrent × egrowthRate)1. Given thata population remains stable even though many more members are produced thanthe resources can support, there must be a strong competitive element within apopulation which results in only a small fraction of the possible new populationsurviving. This competition is not random but a result of hereditary traits passedto the member and this uneven survival constitutes Natural Selection. [12]

It is not difficult to see that when this concept is combined with genetic reproduction,the possibility for new beneficial traits in a population to allow those members, andany children that inherit it, to spread throughout the population. It is this conceptthat provides us with a basis for genetic algorithms.

2.2.3 Genetic Algorithms

Genetic Algorithms were developed by John Holland in the early 1960’s [3; 13] whilsttrying to understand the principles of designing robust adaptive systems. Holland’smethod, based on natural selection and survival of the fittest, involved a set of binarystrings that represented the genomes of individual members of the population. Hethen employed operations such as crossover, mutation and inversion to the stringsbefore assessing their performance with a fitness function. This fitness function tookthe place of the environment in natural selection, determining how well a populationmember could perform relative to other population members.

Genetic Algorithms are predominantly used as a search or optimization technique.They draw on the metaphor that living organisms are optimizing themselves fortheir environment through means of natural selection. The act of optimizationrequires both exploration, trying new solutions, and exploitation, the improvementof current solutions. Genetic algorithms are especially adept at balancing these twodimensions.[14]

2.2.4 Differences between Genetic Algorithms and NaturalSystems

Genetic Algorithms only have to store data in the genome of each candidate, asthey can utilize computer memory to store encoding information - stating whereeach gene begins and ends. Natural systems have to store encoding informationalongside the data in the genome, making use of start and stop codons - specificallele sequences that signify the start or end of a gene. In addition, natural systemshave additional translation processes beyond the genome which enable or disablegenes in the genome without removing the genetic material. Genetic algorithmsalso often use different allele values such as 1,0 (binary genome) instead of A,C,G,T(DNA).

Genetic algorithms can be written to model natural systems closely, but the benefitsof modeling such a system have not been shown in research, given the performance

1Standard Growth Formula, derived from exponential growth formulaprovided by Wolfram: http://mathworld.wolfram.com/ExponentialGrowth.html

11

CHAPTER 2. BACKGROUND 2.3 Existing Systems

benefits of the abstract approach currently used.

2.2.5 Taking evolutionary computing further

Evolutionary computing methods were not developed primarily for search and op-timization tasks, but for machine learning purposes. Perhaps the most obviousextension of genetic algorithms is genetic programming. In genetic algorithms theoutput represents the values of a set of variables that solve the problem optimally, ingenetic programming the output is mapped to symbols of a programming language.As a result it is possible for a genetic process to generate programs that solve specifictasks as optimally as possible.[9]

Another interesting approach is to use evolutionary methods to train finite statemachines to demonstrate intelligent behavior. In this case intelligent behavior isdetermined to be the ability of the machine to predict the next symbol it is goingto receive after being presented with an input symbol. The fitness of a machine ismeasured by its accuracy and each machine is described by a genetic string that canundergo mutation. [15]

2.3 Existing Systems

2.3.1 Genetic Algorithm Libraries

Existing genetic algorithm libraries cover a wide range of programming paradigmsand languages. Here I will give a brief overview of the most commonly used imple-mentations.

1. GAlib - C++

2. GAUL - C

3. JGAP - Java

2.3.1.1 GAlib

GAlib[16] was developed by Matthew Wall at MIT and although not available aspublic code, it is available at no cost. It is programmed in object-oriented C++and builds exist on most platforms (Linux, MacOSX, SGI, Sun, HP, DEC, IBM,MacOS and DOS/Windows systems). GAlib was designed to allow easy comparisonof various selection schemes, genome structures and genetic algorithms. As such itprovides a set of common genetic algorithms and a range of selection schemes anddata structures for constructing a genome. It lacks the extensibility required for theprogram developed in chapter 5 and also requires moderate user input to setup anew genetic algorithm, but the data structures it provides inform possible extensionsto the Darwin GA Framework.[Chapter 5]

12


2.3.1.2 GAUL

GAUL[17] was developed by Stewart A. Adcock and is now available under the GNUGeneral Public License. GAUL provides data structures, operations and extensiblestructures for defining custom additions to the library in the C language. The useof none object-oriented C makes this impractical for a modern programs which areinevitable class based. Although if the program was developed in C++ it would betrivial to call the C functions in the library.

2.3.1.3 JGAP

JGAP[18] is a Java based project led by Klaus Meffert at the community projectrepository SourceForge. JGAP is an object-oriented, extensible GA library designedto allow genetic algorithms to be incorporated easily into other projects. This projecthas heavily informed the design of the Darwin framework developed in chapter 5.Though was not used itself due to it being targeted as a library rather than aframework for genetic algorithm development. It requires significant end-user inputto define elements such as chromosome structures and operators. It is also a verylarge library and it was decided that a more slimline solution was needed for thepurposes of this project. A more in-depth discussion of the differences betweenDarwin and JGAP can be found in chapter 5, section 5.2.1

2.3.2 Timetable Solvers

Programs aimed at solving, resolving or aiding in the solution of timetable problemshave become important tools in operational management. The majority of softwareon the market (at the time of writing) appears to be targeted at solving academictimetabling problems in schools and universities. The lack of mature generic prob-lem solvers has led, in part, to the development of this project. Although it isdifficult to determine the exact methods that these programs use to solve timetableproblems, the fact that they handle predetermined constraints limits their useful-ness in complex scenarios and fails to capture the subjective detail of what makesone valid timetable better than another. Quite often in scenarios involving humans,as opposed to scheduling algorithms used in operating system design etc. , a validsolution is not necessarily a good solution. In this project we will be looking atways of capturing these subjective heuristics to more accurately mimic the humansolving methodology.

13


14

3

Requirements

A generic timetable solver

Here I shall set out the core requirements and scenario around which the systempresented in the report will be built. Section 1.2.1 set out the primary goals of theproject and these will be expanded upon to specify the system that needs to bebuilt.

* * *

Specifying such a system is an inherently difficult task given the lack of a trueend user. Such lack of end user involvement has caused great difficulty specifyingrequirements using standard software engineering techniques; instead I will brieflyattempt to account for the underlying needs of a generic timetabling system.

15

CHAPTER 3. REQUIREMENTS 3.1 System Goals

3.1 System Goals

The aim of this project is to develop a generic scheduling agent that can solveunknown timetabling problems without significant user input. Previous systemshave primarily been tailored to a specific timetabling domain and as such lack theflexibility required to perform in most real-world environments; where constraintsand timetable details change often.

This project will aim to perform the scheduling task with a genetic algorithm, how-ever, as there are many algorithmic approaches to solving these problems, the systemshould have a modular design such that the scheduling agent is capable of utilizingother algorithms in the future.

The system should be able to handle desirable timetable characteristics, previoussystems (even those tailors to domains) limit the dimensions in which a timetablecan be expressed. Often, interpreting a good timetable as being any timetable inwhich all resources have been scheduled. Human users of such timetabling softwareoften want to express subtle rules that rank one timetable over another, otherwisevalid, timetable.

The system should serve as a framework on which more complete systems, withuser-interfaces and domain-specific components, can be built. The system shouldprovide a basis for further development and research work in the field of timetabling.

3.1.1 Non-functional system requirements

The following list details the informal non-functional requirements (NFRs) of thesystem. As the system is not to include a user-interface, or have any direct end-user other than other development teams and researchers, it is difficult to providecomprehensive NFRs for the system. This system aims to fill a gap in current soft-ware provision and as such has no direct competition other than manual processes.Requirements for ease of use and system performance come from existing manualprocesses; the system should be able to fulfill all tasks currently undertaken man-ually and provide an equivalent solution in less time and with no significant extratraining for users.

1. The system should be extensible to allow future development and integrationinto external applications.

2. The system should make use of Open Standards for data exchange and systeminteraction, where available.

3. The system should be cross-platform compatible.

4. The system should be well documented to allow for easy extension and futuredevelopment.

5. The system should solve the problem quicker than the equivalent manualmethod.

6. The system should require little user training to define its inputs and interpretits outputs.

16

CHAPTER 3. REQUIREMENTS 3.2 Example Scenario

3.2 Example Scenario

Once a generic scheduling agent has been constructed, it should be able to com-pletely model real world scenarios. As a benchmark for what constitutes a realworld scenario in the context of this problem I will set out below an extension ofthe common High School Timetable scenario that is seen often in literature. Thisscenario has been developed through the authors observation of the manual processin a current British secondary school.

Our scenario will focus on a facet of the school timetable, assigning classes totimetable periods and assigning teachers to classes within a science department.The department head gets given a list of classes that will be attending the depart-ment for lessons in the coming year. These classes are grouped together into bands,where all classes in a band must have lessons taught in the same timetable periods.Within each band the classes are ordered by ability, with the most able class beingclass 1. Each class must have a fixed number of lessons in a week and some classesmust have a fixed number of lessons from each subject area; Physics, Biology andChemistry. Teachers in the department also have a maximum number of lessonsthey can teach. Some teachers may be part time, and so only be available for spe-cific time slots. A number of rules are then applied to optimize the timetable forthe department, these rules are listed below.

1. No class can be timetabled to the same time slot more than once

2. No teacher can teach more than one class in a given time slot

3. A class’s lessons should be spread as evenly across the week as possible, limitedto 2 a day where possible

4. If a class has 2 lessons on the same day then they should be placed in adjacenttime slots where possible

5. A teacher should teach a spread of abilities, preferably each teacher shouldhave a similar spread

6. In each subject area, a class should have as few different teachers as possible

7. Where possible, a class’s teacher should be a teacher of the same subject areaas the lesson

The department has 8 teachers, each of which can teach a maximum of 8 lessons.There are 8 bands of classes in the department, 6 of which have 4 classes and theremaining two have 8 classes each. Each of the first 6 bands has 3 lessons in theweek, the remaining 2 have 6 lessons a week. The timetable period is one 5 dayweek of 25 lessons, 5 lessons a day.

17

CHAPTER 3. REQUIREMENTS 3.3 Functional Requirements

3.3 Functional Requirements

A complete system should;

1. Make use of genetic algorithms to solve the timetabling problems

2. Allow for the expression of constraints on the timetable

3. Allow for the input of timetable elements, such as classes and bands

4. Allow for model elements to have parameters

5. Allow for expressing constraints on element parameters

6. Allow for loading timetable data, such as the number of classes, class namesetc into the program

7. Be capable of using existing timetables as a starting point

8. Be capable of changing the solving algorithm to ‘tune’ the program for betterperformance

9. Be capable of reading in constraints data from the user, without need forcompilation

18

4

A Formal DefinitionDimensioning the problem domain

In this chapter I will attempt to provide a more formal and complete definition forthis class of timetabling problems. This will inform the development of a genericproblem solving application in the following chapters. The aim of this formal modelwill be to constrain the many possible problem dimensions to the minimum set ofaspects that describe the problem space. This section will also provide design andimplementation details for the model and algorithm sections of the project, presentedin a modular fashion rather than as separate units of work. Each module will containboth a conceptual and language specific view of the design and implementation soas to be useful to all readers.

* * *

During Implementation this model will provide a structure with which to define newtimetabling problems, before they are submitted to the scheduling agent. It willalso form the basis for the design of the scheduling agent [see chapter 5], providinga standard input and output format in addition to providing implicit rules that theagent must follow.

19

CHAPTER 4. A FORMAL DEFINITION 4.1 A Model for Timetable Problems

4.1 A Model for Timetable Problems

What follows is an attempt at constructing a comprehensive model for defining andmanipulating Resource Assignment problems (Table 2.1). This model will provide astructure for defining input and output from the final system in addition to providinga standard internal data structure to allow cooperation between system components.

4.1.1 What is a timetable problem?

Timetabling problems were described by Burke et al.[7] as

a problem with four parameters: T , a finite set of times; R, a finite setof resources; M, a finite set of meetings; and C, a finite set of constraints.The problem is to assign times and resources to the meetings so as tosatisfy the constraints as far as possible.

This provides us with an excellent starting place in our exploration of timetablingproblems. In essence, a timetable using Burke’s model can be described by a set ofcouples M . The only limit on the members of set M being rules defined in set C.

M 7→ {C(< T : R >)}

4.1.1.1 The Timetable Model

So what do these sets actually represent? To delve a little deeper into timetablingan example is needed and for this we shall look at a simple high school scenario.

A Simple High School ScenarioOur task is as follows: we are running a high school department

and we have been given a list of what groups we will be teaching ineach timetable period over a week. We have to assign teachers in thedepartment to each of the periods on our list so that all groups will havea teacher when they have a lesson scheduled.

Our task appears simple, so lets map the ideas to Burke’s model of a timetable togain a little clarity. The members of T in Burke’s model are the periods in our weeklytimetable, of which there are 5 periods in a day - 5 days in our week - 25 periods. Theresources (R) that we are attempting to schedule in this situation are the teachersin our department, of which we have 8 which shall be known as A,B,C,D,E,F,G,H.Our meetings (M) are the classes that are coming to the department for lessons.

For now we will consider these facts and keep in mind that we have some constraintsto apply later. The data we are receiving from the schools central office provides uswith M and the mappings of T to M , so we currently know

M 7→ {C(< T :? >)}

Clearly to complete the timetable we must provide the mapping for R to M .

Now we must consider our constraints:

20


1. Every Class must have a teacher

2. Every teacher must only have one class per time period

The first constraint is enforced by the structure of our model, as every meeting isconstructed from a couple < T : R >. The second constraint is given by our functionC such that the as couples < T : R > must be unique, two identical couples wouldindicate that a single teacher has been assigned to the same time period more thanonce. So we can see that this simple model meets all our needs for this scenario, itcan accommodate all our resources, times, classes and constraints;

M 7→ {C(< {1, . . . , 25} : {A, . . . , H} >)}

This scenario introduces several common themes in timetable problems so it seemssensible to give them names. Constraint number 1 gives us theCompleteness Constraint: that is that every instance of M receives a value.(Burke defines this as every time slot receives a value, this redefinition gives us moreflexibility)[7]Constraint 2 gives us theNo-Clashes Constraint: requiring no resource participate in the same time slotmore than once.[7]

4.1.1.2 Constraints

In this scenario, both of these constraints are fundamental: if either of these con-straints is not met then the solution is not valid. Constraints with this property arenamed hard constraints[7] and definition all feasible solutions to a scenario.

S is the set of all solutions to a given timetabling problem. For each hard constrainttheir exists a binary-valued function h : S → {0, 1} defined for each solution w ∈ Sby

h(w) =

{1, if w does not satisfy the constraint0, otherwise

A feasible solution is any solution w ∈ S that satisfies all the hard constraints,h(w) = 0 for all h.[7]

To develop our model we will now extend the scenario. The 3 steps we will covernext are

1. Assigning rooms to classes in our department

2. Handling part-time teachers

3. Giving (where possible) a class the same teacher for all their lessons

Assigning rooms to the classes in our department under Burke’s model requires asimple extension: our couple with the class properties becomes a triple.

M 7→ {C(< T : RT : RR >}

Where RT represents the set of teachers and RR is the set of rooms.

21


We have 8 rooms in our department, labeled a, b, c, d, e, f, g, h. We will also makethe assumption that the rooms are available for all time slots in T . Making thisassumption allows us to reason about rooms in the same manner as teachers, withconstraints being applied in a similar manner. Not making this assumption andworking with part-time rooms allows us to treat them like part-time teachers, how-ever, introducing this property to resources changes the complexity of the problem.

Introducing part-time teachers (or other part-time resources) into such a scenario hasbeen the subject of previous research. The presence of these Time-ConstrainedResources in a problem has been shown to make a problem NP-complete.[19] Suchproblems can be expressed using our current model by using our set of constraintsand associated constraining function C. This function would limit valid triples tothose where RT , or RR for time-constrained rooms, was only assigned with values ofT within a valid range. Such constraints could be rewritten directly as valid triplessuch as this.

Teacher A may only work on a Monday, which is periods 1 to 5.

M 7→ {C(< {1, . . . , 5} : A >), C(< T : {B, . . . , H} >)}

Clearly for more complex problems, especially where additional resources are in-troduced, this notation becomes impractical but it provides a useful clarification ofwhat values are valid - especially handy for implementation.

Thus far we have considered all lessons in our department as unique, if a class hasmore than one lesson then it is represented by additional meetings in the set M .What our model does not allow for is a set of related meetings, as would be requiredif we were to express constraints across all lessons for a single class. It is clear thateach m ∈M has its own set of properties describing that meeting, so it makes senseto adapt our model such

A Timetable T is defined as

T = {C(< T : RT : RR : M >)}

Where M is a set of classes.

When stating our model like this, it becomes clear that all classes in M can bereasoned about similar to all resources in RT and RR. A resource can be consideredto be any assignable entity which has a set of properties and a label. A class, in thisrespect, is no different from a teacher; M is, in fact, a special case of a resource.

In this scenario, M and T are provided for us so this can be considered to be apartially solved timetable. Our task therefore is to complete the mapping for theremaining resources.

Before we leave the construction of our model we need to return to constraints. Ourfinal constraint is subtly different from those we have come across so far: we areattempting to give a class the same teacher for all there lessons ‘where possible’.This is less of a constraint, more a direction for optimization. This class of constraintare usually refered to asSoft Constraints: constraints that are desirable but not included in the minimumspecification for constructing a feasible timetable.[7] Often a soft constraint will bepartnered with a hard constraint to provide additional detail in the solution.

22


For example;Soft Constraint: Give (where possible) a class the same teacher for all their lessonsHard Constraint: No class should have more than 3 teachers

4.1.2 The essence of a timetable

We now have a more complete model for describing timetable problems:

A timetable problem is a problem with 5 parameters which constructs atimetable T : T , a finite set of indivisible time periods; C, a finite set ofconstraints that limit dimensions of the timetable (Hard Constraints); O,a finite set of optimization rules that score the performance of a solution(Soft Constraints; R, a finite set of resource types Ri . . . Rn that are setsof assignable timetable elements. The problem is to assign members ofsubtypes in R to members of T satisfying the set of constraints C andmaximizing O.

This gives the underlying principals for describing timetables, but in real life atimetable may have many perspectives. In our high school scenario - once we havesolved the timetable problem - we will print off a series of timetables for differentpurposes. These purposes are:

1. Staff timetables: A timetable for each member of staff stating which roomsand which classes they are teaching by time period.

2. Pupil timetables: A timetable for each pupil being taught in the departmentstating which rooms and teachers they will have by time period.

3. Room timetables: A timetable for each room in the department stating whichteachers and classes are in the room by time period.

Each of these timetables can be extracted from our model, each one constitutingdifferent views on our model.[20] Each view comprises a restriction and projection ofour model[5], that is - if we write each entry in our timetable as rows in a table, witheach type in a different column, we are only selecting the rows and columns we need,around a focus element - such as a specific teacher or room. The number of viewsavailable on our timetable is constrained by the number of unique elements that aremembers of it: each member may have its own view by restricting1 the model onthat element and then projecting on the member and any other components thatare to be included in the timetable. It is also possible to transform between differentview of the data.[20]

4.1.3 Defining a timetable

The purpose of constructing this generic model for timetabling problems is to alloweasy specification and modification of timetabling problems for use in a generic

1also known as selection

23


solver. It is intended that this should be implementable as part of a larger applicationand as such a standard input and output mechanism from the framework should beestablished.

Due to its prevalence and widespread use in web technologies, the extensible-markuplanguage (XML) has become very popular as a language for representing semi-structured data. This makes it a useful language to use when defining input andexpressing output from the system. Even if the user of the framework is not familiarwith XML it is easily manipulated and so 3rd party developers should find it easyto provide interfaces to manipulate the XML documents required to express thetimetable problem.[21]

Our model requires two XML files as input, firstly, a definition file which provides thedefinitions for subtypes of R in our model, secondly, an instance file which providesinstances of each of the resource types expressed in the first file. This two filedapproach in conjunction with XML allows for powerful data validation by utilizingthe built in properties of an XML parser. Processing the first file of definitions allowsus to construct an XML schema to which the second must conform. An XML Schemais a XML formatted document that can be used to limit the semantic structure of anXML document, allowing validation checks to be performed.[22] When processingthe second file, the XML schema can be provided to the parser which will validatethe instance file as it is read. This process is shown in figure 4.1 and negates theneed for building a validation process into our framework. Seperating the two filesalso allows different instance data files to use a common definition file. Grobneret al.[20] use a single XML file for describing both the timetable form and data intheir paper standard framework for timetabling problems [20]. This method requiresthe timetable model description to be written into every instance file, meaning notwo instance files can be guaranteed to be describing the same timetabling problemwithout additional checking.

4.1.3.1 XML Data Format

The model definition file needs to provide a description of all resources in a problem,and all the properties of those resources. To do this an xml document is written toconform to the following schema.

Listing 4.1: Timetable input: Model description XML Schema

1 <xs : e l ement name=” t imetab l e ”><xs:complexType>

<xs : e l ement name=” re sou r c e ” type=” resourceType ”maxOccurs=”unbounded”/>

</xs:complexType></ xs : e l ement>

6<xs:complexType name=” resourceType ”>

<x s : a l l><x s : a t t r i b u t e name=”name” type=” x s : s t r i n g ” /><xs : e l ement name=”property ” type=”propertyType”

maxOccurs=”unbounded” />11 </ x s : a l l>

</xs:complexType>

24


Figure 4.1: Timetable Data Input

25


<xs:complexType name=”propertyType”><x s : a l l>

16 <x s : a t t r i b u t e name=” type” type=”” /><x s : a t t r i b u t e name=”name” type=” x s : s t r i n g ” />

</ x s : a l l></xs:complexType>

Resources can be defined in this schema as in the following example.

Listing 4.2: Timetable input: Model description XML1 <t imetab l e>

<r e s ou r c e name=” c l a s s ”><property type=” x s : s t r i n g ”> l a b e l</ property><property type=” x s : i n t e g e r ”>s i z e</ property>

</ r e sou r c e>6 <r e s ou r c e name=”room”>

<property type=” x s : s t r i n g ”> l a b e l</ property><property type=” x s : i n t e g e r ”>s i z e</ property>

</ r e sou r c e>. . .

11 </ t imetab l e>

The resulting schema that can be constructed from this is shown below and definesthe required structure of all instance files.

Listing 4.3: Timetable input: Instance XML Schema<xs : e l ement name=” t imetab l e ”>

<xs:complexType><x s : a l l>

4 <xs : e l ement name=” c l a s s ” type=” classType ”minOccurs=”1” maxOccurs=”unbounded”/>

<xs : e l ement name=”room” type=”roomType”minOccurs=”1” maxOccurs=”unbounded”/>

</ x s : a l l></ xs : e l ement>

9 <xs:complexType name=” classType ”><x s : a l l>

<xs : e l ement name=” l a b e l ” type=” x s : s t r i n g ”/><xs : e l ement name=” s i z e ” type=” x s : i n t e g e r ”/>

</ x s : a l l>14 </xs:complexType>

<xs:complexType name=”roomType”><x s : a l l>

<xs : e l ement name=” l a b e l ” type=” x s : s t r i n g ”/>19 <xs : e l ement name=” s i z e ” type=” x s : i n t e g e r ”/>

</ x s : a l l></xs:complexType>

All XML files that meet this schema are valid instance files for this timetable prob-lem. This separation of concerns allows the instance data to be redefined freelywithout concern for the description of the timetable problem. Listing 4.4 gives aexample of a valid instance file for the above schema.

Listing 4.4: Timetable input: Instance XML example<t imetab l e>

26


<c l a s s>< l a b e l>9a1</ l a b e l>

4 <s i z e>29</ s i z e></ c l a s s><c l a s s>

< l a b e l>9a2</ l a b e l><s i z e>27</ s i z e>

9 </ c l a s s>. . .

<room>< l a b e l>a</ l a b e l><s i z e>32</ s i z e>

14 </room><room>

< l a b e l>b</ l a b e l><s i z e>30</ s i z e>

</room>19 . . .

<t imetab l e>

4.1.3.2 Defining Output

The output can also be described using XML. To output the complete timetablesolution the system can can almost write out the model verbatim thanks to thestructure of XML.

Listing 4.5: Full timetable output: XML example

<t imetab l e><entry t imeper iod=”4” teacher=”A” c l a s s=”5” room=” f ”/><entry t imeper iod=”2” teacher=”C” c l a s s=”2” room=”a”/><entry t imeper iod=”8” teacher=”C” c l a s s=”3” room=”e”/>

5 . . .</ t imetab l e>

To obtain a more meaningful data structure, and generate views on the data, thisXML file can then be further processed using XQuery[23]. XQuery - a standardmaintained by the W3C - allows the structured querying and manipulating of XMLdocuments[23] in a manner not dissimilar to the manipulation of the relational modelusing the Structured Query Language (SQL).[5] As such it allows the projection andrestriction operations required to construct a view of the timetable. An XQuerybased implementation of view creation allows much greater flexibility than wrappingthis functionality within the framework, as it also allows implementors to convertthe XML to a local format in the same step.

4.1.4 Expressing Constraints

Expressing constraints on the model is arguably the most complex part of defininga timetable problem. We need to be able to express both hard and soft constraintsin a rigorous manner that can be easily interpreted by both the scheduling agentand the user who defines the problem. In order to express constraints on our model,we require two things:

27


• A Standardized extension to our model that allows the expression of con-straints

• A Standardized language for expressing constraints on the extension

4.1.4.1 Constraints Extension

Constraints are often expressed on a view of our timetable model. But in order toeffectively express a constraint we need to introduce an orthogonal perspective ofour data. Considering our school model once more, with some sample data:

T = {C(< T : RT : RR : M >)}

T =

1 A 1 a3 C 1 b4 A 3 c1 G 2 d

Each row in the matrix represents a tuple in our timetable set. Implicit in the setimplementation is that each tuple must be unique; applying the no-clashes constraintalso states that no resource must appear in more than one tuple with the samevalue of T . However, when examining the data in this form it becomes clear thatan orthogonal perspective exists for the data: each column of the matrix gives usa un-ordered list of all assignments for a resource in our model, labeled here as theextent of a resource. For ease of notation each extent is referred to in the formTextentLabel for example the extent of all classes in our model would be TM .

The importance of extents in our model is clear from an example. Taking the highschool constraint,

Giving (where possible) a class the same teacher for all their lessons.

This natural language constraint can be formalized on our model as

for each m ∈M , minimize the number of unique T in the view (m, TT )

or∀m ∈M, where T ′ = {T |TM = m}

min |unique (T ′T )|Extents allow us to retrieve all assignments of a resource type, where the type isassigned to the same tuple as the element about which our view pivots.

Hard constraints can be expressed in a similar way, by imposing a limit rather thana direction in our constraint.

4.1.4.2 A Constraints Language

From our model of constraints we can devise a set of features that any languageused to describe constraints on the timetable model must possess.

Language Features:

28


1. Ability to define timetable views, implicitly the language must be able toperform selection and projection on our model

2. Ability to define resource extents

3. Ability to express limits over extents

4. Ability to express optimization directions over extents

5. Ability to access resource properties

Expressing hard constraints is a relatively straight forward process. As a hardconstraint is always expressed as a limit we can predict the functional elementsrequired to fully express them. The functional elements we require are:

x < y

x ≤ y

x = y

x ≥ y

x > y

Each of these comparisons returns a binary result, as is required by our model of hardconstraints. Feature 5 allows us to impose non-literal hard constraints by accessingthe properties of elements in our model. This allows us to make statements such as

∀m ∈M, where T ′ = {T |TM = m}

∀R ∈ T ′R, m.size ≤ R.size

As the extent of a resource is a list of all occurances, two more functions are alsorequired to allow all hard constraints to be expressed.

unique(Ti)

count(c, Ti)

The unique function returns the extent as a set, removing duplicate occurrences ofelements in the extent. The count function returns an integer Z≥0,≤|Ti| which is thenumber of occurrences of c in the extent Ti. This allows us to limit the number oftimes a resource is assigned in an extent. Expressing soft constraints is considerablymore complex. Whilst we can predict common functionality such as maximize x overy and minimize x over y, a soft constraint can be expressed in many different ways.To allow for this, normal programming constructs such as conditional statementsshould be provided to allow for complex constraint definitions.

In the program introduced in chapter 5, constraints are handled with a Java imple-mentation. Ideally a language and parser should be designed[20], either generatinga Java implementation or being interpreted by the Java framework, but it is beyondthe timescale of this project to do so.

29

CHAPTER 4. A FORMAL DEFINITION 4.2 Genetic Algorithms

4.2 Genetic Algorithms

A genetic algorithm has no fixed design; instead referring to a set of algorithms thatfollow a common metaphor2.2. To introduce the concepts of a genetic algorithm Ihave laid out a ’typical’ algorithm in this section. The ideas explored here will formthe basis of the genetic algorithm framework constructed in chapter 5.

4.2.1 Anatomy of a Genetic Algorithm

A genetic algorithm typically has two phases: 1) Survival of Current Generation2) Construction of New Generation . It is not unusual for the operations that makeup these phases to change depending on the application, however, these phasesalways exist in a genetic algorithm and in the stated order. A genetic algorithm

Figure 4.2: Phases of a Genetic Algorithm

may have no clear exit condition, and as such may continue to cycle indefinitely: inpractice, limits are placed on the algorithm to ensure that it exits when an acceptablesolution is found. Examples of such limits include

• Exiting after a preset number of cycles

• Exiting after the solution reaches a predetermined fitness threshold

• Exiting once the algorithm has converged on a solution and the diversity ofpopulation has dropped below a predetermined threshold value.

4.2.1.1 Constructing an Initial Population

A genetic algorithm requires a population of genomes on which to perform opera-tions, as such the first step in a genetic algorithm is always to construct an initialpopulation. In most scenarios it is appropriate to create a population with random-ized genomes. In scenarios where there is preexisting data, the initial populationcan be doped with known solutions: allowing the algorithm to perform a local searcharound known solutions for other optimum. In order to dope a population, a knownsolution is encoded into a genome, then inserted into the population such that, dur-ing selection, it is highly likely undergo crossover. In most selection schemes this

30


Require: Pn, L ∈ Z>0

Ensure: population.length = Pn

Ensure: for all c in population: c.length = Lpopulation⇐ LISTfor i = 1 to Pn do {Create Population of Pn candidates}

c = BLANKSTRINGfor j = 1 to L do {Create random genome}

RandomGene = Random Integer from set 0,1append RandomGene to c

end forpopulation← c

end forreturn population

Algorithm 1: Construct a randomized population: binary genotype

requires a significant proportion of the candidates in the population have the dopedgenome, although, the exact proportion is dictated by how closely around the exist-ing solution the user wants he search to be run. The higher the proportion of dopedgenomes to randomized genomes, the tighter the search will be.

4.2.1.2 Filling out the algorithm

Within each phase of the algorithm a number of distinct operations are employed toprogress the state of the algorithm. Figure 4.3 shows a typical implementation of agenetic algorithm; each phase is terminated when the algorithm returns to a stablepopulation. The three populations shown in the diagram are:

1. Population: The parent population of this generation

2. Breeding Pool: The population that survives to breed

3. New Generation: The population at the end of this generation, containingnew child individuals and any surviving parents. This population becomes theparent in the next generation.

Figure 4.3: Anatomy of a Genetic Algorithm

The operations that comprise each phase in the diagram are fairly typical of a generalpurpose genetic algorithm and are discussed in more detail in the following sections.

31


Aside: Those eagle eyed readers will spot the existence of a third phase in thediagram. The algorithm has a third transition between stable populations as thealgorithm moves to the next cycle. This phase can often be given operations whichallow additional selection to take place: algorithms where candidates can have alifespan of more than one cycle would introduce additional operators here.

4.2.2 Fitness Calculation

In nature, the environment plays an evaluatory role on a population: judging eachorganism against all other organisms to determine which ones will survive and whichwill breed. Fitness calculation aims to complete this task computationally on themembers of the algorithm’s population, each candidate is judged against a set ofcriteria and assigned a fitness value. The fitness value of each candidate representshow well the candidate meets the criteria it is tested against, although the exactcalculation is left open to the implementor. Typically the score against each crite-ria will be normalized and then a scaling factor applied to it so that each criteriahas a predictable effect on the resulting score. The entire score may then be trans-formed using some function to provide a greater differential between similarly scoredindividuals.

4.2.3 Selection for Survival

Selection operators allow us to determine which members of the population will enterthe pool of available candidates to breed. They take the fitness of the candidatesand convert it into a probability that the candidate will breed to form the newgeneration of candidates.

4.2.3.1 Proportional Selection

Proportional Selection places the candidates distribution based on their fitness value.For each candidate to be inserted into the breeding pooling, a random numberis generated and if that number falls within a candidates distribution then it isinserted into the breeding pool. The likelihood a candidate will be selected increasesthe higher its fitness is compared other members of the population. Algorithm 2and Listing 4.6 show possible implementations this process, although many stablealgorithms exist.[24]

4.2.3.2 Ranked Selection

Ranked Selection orders the candidates based on their fitness and then selects themto enter the breeding pool based on their position in the list, more highly fit indi-viduals being more likely to be selected. This differs from proportional selection asone highly fit individual is not given a large advantage over less fit individuals, asit would only be placed one higher in the list and so only linearly more likely to beselected to breed.

32


breedingPool⇐ LISTEnsure: population.length = breedingPool.length

for i = 1 to population.length do {Create a distribution for the candidates fitnessvalues}

compute qi =i∑

k=1

pk, i = 1, 2, . . . , population.length

end forfor i = 1 to population.length do {Select for the breeding Pool}

g = Random number in range [0,1], the random number belonging to a uniformprobability distributionif 0 ≤ g ≤ q1 then

append first member of population to breedingPoolelse

for j = 1 to population.length doif qj−1 < g ≤ qj then

append member j of population to breedingPoolend if

end forend if

end forreturn breedingPool

Algorithm 2: Monte Carlo algorithm for proportional selection

Listing 4.6: Monte Carlo Selectiondouble f i t n e s sD i s t r i b u t i o n [ ] = new

double [ operand . popu la t i onS i z e ( ) ] ;double t o t a l F i t n e s s = 0 ;for ( int i = 0 ; i < operand . popu la t i onS i z e ( ) ; i++){

4 t o t a l F i t n e s s +=operand . getCandidate ( i ) . g e tF i tn e s s ( ) ;

f i t n e s sD i s t r i b u t i o n [ i ] = t o t a l F i t n e s s ;}for ( int i = 0 ; i < operand . popu la t i onS i z e ( ) ; i++){

f i t n e s sD i s t r i b u t i o n [ i ] =(double ) f i t n e s sD i s t r i b u t i o n [ i ] /(double ) t o t a l F i t n e s s ;

9 }for ( int i = 0 ; i < operand . popu la t i onS i z e ( ) ; i++){

double g = randomGenerator . nextDouble ( ) ;i f ( g <= f i t n e s sD i s t r i b u t i o n [ 0 ] / t o t a l F i t n e s s ) {

matingPool . add ( operand . getCandidate (0 ) ) ;14 } else {

int j = 1 ;while ( j < operand . popu la t i onS i z e ( )−1 && g >

f i t n e s sD i s t r i b u t i o n [ j ] ) {j++;

}19 matingPool . add ( operand . getCandidate ( j ) ) ;

}}

33


4.2.3.3 Elitist Selection

Elitist Selection schemes allow the most fit individuals of the population to continueinto the new population. They may still be used for breeding but are not replacedin the new population by offspring.

4.2.3.4 Generational Selection

An extension of the elitist selection mechanism, a generational selection scheme onlyreplaces a proportion of the population with offspring during each cycle. This allowsfor good genetic material to breed multiple times before it is replaced, mimickingnature more accurately than other selection mechanisms.

4.2.4 Selection for Crossover

Once the candidates have been selected to enter the mating pool, candidates mustbe selected to undergo breeding (crossover) and there are several schemes for doingthis.[3]

Random Selection The most commonly used scheme is ran-dom selection. Pairs of candidates are re-peatedly selected from the breeding pool -without removing previously selected can-didates - and undergo crossover, the re-sults of which are then placed into the newpopulation.

Positive Assortive Similar individuals are mated with eachother. In a binary genome, the similar-ity of two individuals can be calculatedby means of the hamming distance of thetwo genomes. Can lead to premature con-vergence of the algorithm.

Negative Assortive Dissimilar individuals are mated witheach other. As with positive assortive se-lection, the hamming distance of the twogenomes is employed during selection.

Table 4.1: Selection for Crossover Schemes

4.2.5 Crossover

The crossover operation is one of the primary search operators in a genetic algorithm(in conjunction with the mutation operator) and allows the algorithm to iteratetowards a better solution. The crossover operation allows exploration of the problemspace by the algorithm. The metaphor for the crossover operation comes directlyfrom genetic crossover in cellular meiosis. The genetic string of a child is constructed

34


by selecting genetic material from a number of parents and this child is then includedin the new generation of the population.[25]

4.2.5.1 Single-point Crossover

Single-point crossover takes the genetic material from 2 parents and creates a childby appending the tail of one parent’s genome to the head the other; splitting thegenomes at a randomized crossover point.

We denote the crossover operator by C. x and y denote two sets of chromosomeshaving fixed length L. z is the resultant child set.

x = x1x2 . . . xL

y = y1y2 . . . yL

For this operator we haveC(x, y) = z

wherez = x1 . . . xkyk+1 . . . yL

and k is a random integer, having a uniform distribution, from the set

{1, 2, . . . , L− 1}

Figure 4.4: Single-Point Crossover

Single-point crossover is the simplest crossover mechanism available when construct-ing a genetic algorithm. However, it can be shown that single-point crossover cannotcreate certain combinations of genes; reducing the exploration of the algorithm inthe problem space.[3]

4.2.5.2 Two-point crossover

In nature, crossover may occur at more than one point in the genome; this gives riseto the two-point crossover. Child candidates have a genome structure in the form:

z = x1 . . . xkyk+1 . . . yk′xk′+1 . . . xL

35


where

k < k′

Two-point crossover may perform better than single-point crossover in some situ-ations; it can still be shown that some combinations of genes cannot be producedusing this operation. It can therefore be reasoned that we require the ability to havemultiple crossover points.[3]

4.2.5.3 N-point crossover

The logical extension to two-point crossover is the N-point crossover. An arbitrarynumber of crossover points N are selected where 1 ≤ N < L; N = 1 gives single-point crossover and N = 2 gives two-point crossover. N-point crossover can reduce

Ensure: child.length = parentLengthRequire: firstParent.length = secondParent.length

parentLength⇐ firstParent.lengthx⇐ 0y ⇐ ORDEREDSETfor i = 1 to N do {Create Crossover Points}

y ← Random number R in range 0 < R < parentLengthend forfor i = 1 to N do {Cross strings to create child}

if i mod 2 = 1 then {Alternate between parents for each crossover segment}append firstParent.StringBetween(x, yi) to child

elseappend secondParent.StringBetween(x, yi) to child

end ifx = yi

end forreturn child

Algorithm 3: Random N point crossover, 2 parents → 1 child

the number of some gene combinations that are not attainable in single- and two-point crossover; it also increases the amount of disruption it causes to the parentgenomes during breeding, which may lead to to loss of good genetic material betweengenerations. The relative advantages and disadvantages of N-point crossover are stilla point of discussion in research.[3]

4.2.5.4 Siblings

Thus-far, all the crossover operations discussed have produced a single child genomeafter crossover. It is possible to create siblings during the same crossover operation;using the remaining genetic material from the first crossover. Siblings ensure thatgenetic material is not lost; this is at the cost of reduced diversity in the breedingprocess, as half the number of candidates are bred to keep the population sizeconstant (for a 2:2 crossover). An example implementation is shown in listing 4.7.

36


Listing 4.7: A Sample Java Implementation of N-point Crossover: two childrenfor ( int i = 0 ; i < c r o s s ove rPo in t s . l ength ; i++){

// Create Crossover Pointsc r o s s ove rPo in t s [ i ] =

randomGenerator . next Int ( parentM . l ength ( ) ) ;}

4 Arrays . s o r t ( c r o s s ove rPo in t s ) ;S t r ing childM = new St r ing ( ) ;S t r ing ch i ldF = new St r ing ( ) ;int i = 0 ;for ( int j = 0 ; j <= cro s sove rPo in t s . l ength ; j++){

// Crossover S t r i ng s at Crossover Points9 i f ( j < c r o s s ove rPo in t s . l ength ) {

for ( ; i < parentM . l ength ( ) && i <c r o s s ove rPo in t s [ j ] ; i++){i f ( j%2 == 0) {

childM = childM + parentM . charAt ( i ) ;ch i ldF = chi ldF + parentF . charAt ( i ) ;

14 } else {childM = childM + parentF . charAt ( i ) ;ch i ldF = chi ldF + parentM . charAt ( i ) ;

}}

19 } else i f ( j == cro s s ove rPo in t s . l ength ) {for ( ; i < parentM . l ength ( ) ; i++){

i f ( j%2 == 0) {childM = childM + parentM . charAt ( i ) ;ch i ldF = chi ldF + parentF . charAt ( i ) ;

24 } else {childM = childM + parentF . charAt ( i ) ;ch i ldF = chi ldF + parentM . charAt ( i ) ;

}}

29 }}St r ing ch i l d r en [ ] = {childM , ch i ldF } ;return ch i l d r en ;

37


Figure 4.5: Single-point crossover: two children

4.2.6 Mutation

Alongside crossover, mutation makes up the primary search operators in a geneticalgorithm.[3] Mutation mimics the way that genetic information is occasionally tran-scribed incorrectly by RNA during reproduction. It allows exploitation of the prob-lem space by changing the values of isolated sections of the genome, allowing adjacentareas of the problem space to be tested for possible improvements to the solutions.[3]

Mutation can be implemented in many different ways but all involve changing thevalue of individual genes in a genome. The common variations in implementationwill be discussed here.

4.2.6.1 Random Mutation

Random mutation considers a universal probability that a gene may change itsvalue in isolation from all other factors. A mutation probability is selected forthe algorithm and is denoted by pm. All genes in the population then have a pm

probability of undergoing mutation.

Traditionally, small values of pm are recommended for the typical genetic algorithmwe are discussing here. A reasonable heuristic is

pm ∈ [0.001, 0.01]

and generally gives good results for simple scenarios.[3] Another interesting heuristicis that for a (1:1) genetic algorithm [a single parent creates a single offspring onlyundergoing mutation] the value

pm =1

L

is very nearly optimal[26; 27]: This may provide a lower bound for an algorithmsoptimum mutation probability.[26; 27; 28; 29] Clearly - with this method of mutation

38


- the number of genes that undergo mutation is dependent on both the populationsize n and the genome length L. This often requires new mutation rates to be chosenwhenever these variables change, and as such makes algorithms very specific to theirapplication. Empirical studies have found more general values for pm which takeaccount of these variables so that the mutation rate is in fixed proportion. Onecomprehensive study found (and is backed up by my own testing) that a value of

pm ≈1.75

L12 n

works well in most typical scenarios.[30] Algorithm 4 demonstrates random mutationin practise.

for i = 1 to population.length dofor j = 1 to population.member(i).genomeLength do

g = Random Number q in the interval [0,1]if q < mutationRate then

Invert the value at population.member(i).gene(j)end if

end forend for

Algorithm 4: Random Strong Mutation

4.2.6.2 Candidate Mutation

Rather than applying mutation to all genes in a population, this alternative methodapplies mutation to only those genes belonging to chosen candidates. A secondarymutation rate pmc is chosen and applied to the members of a population. Only thosemembers chosen for mutation will then have their genes submitted to mutation. Itshould be noted that this does not guarantee mutation will take place, only that itwill not take place in the members not chosen by the method. This method heavilyincreases the probability that members will pass into the new generation withoutundergoing mutation.

4.2.6.3 Adaptive Mutation

Mutation can aid the algorithm in finding a good solution, but as the algorithmconverges on a solution it may also prevent it from finding an optimum solution byintroducing variance where it is not needed. Adaptive mutation techniques aim toensure that mutation never becomes disruptive by changing the value of pm as thealgorithm runs.

The most common types of adaptive techniques are time-dependent mutation andfitness-dependent rates. pm is expressed in terms of either the number of algorithmcycles passed t or the fitness of the current candidate undergoing mutation f(c).The later technique pairs well with candidate based mutation.

39


4.2.6.4 Strong and Weak Mutation

Strong and weak mutation are two schemes that effect the action taken when a geneis selected to undergo mutation. In strong mutation a gene will always change itsvalue. In weak mutation, once a gene has been selected to undergo mutation, agene will take a new value selected randomly from the available genotype. This mayresult in the gene taking the same value as it had before mutation or taking a newvalue.[3]

4.2.7 Inversion

Inversion is a secondary search operator in genetic algorithms. It mimics a poorlyunderstood mechanism in genetic reproduction where two genes may occasionallyswap positions in the genome. Its application is not clearly understood but it isincluded here for completeness.[14]

40

5

Design and Implementation

Constructing the system

In this chapter I have documented the additional system design, beyond the el-ements discussed in chapter 4, in addition to the implementation of the system.Those looking for a guide to modifying or extending the system without needing agrounding in the theory should probably start here.

* * *

This chapter is intended to serve as a reference for other software engineers who areworking with this system. Providing them with the design rationale, implementationand previous testing details of the system features. It is also intended (for thepurposes of this project) to summarize the work undertaken and present an overviewof the technical aspects of the work.

Where possible I have attempted to provide a conceptual view of any architecturalor algorithmic design decisions, although the whole code base will not appear in thisreport - refer to the project source for a complete code listing, section 1.2.2.3.

41

CHAPTER 5. DESIGN & IMPLEMENTATION 5.1 System Design

5.1 System Design

The key design principle employed in development this system was modularity. Cre-ating a generic scheduling agent forces the separation of the scheduling agent fromthe description of the problem domain, but applying a modular design also allowsus to separate the underlying scheduling algorithm from the scheduling agent. For

Figure 5.1: System Architecture

this system the primary elements are as follows,

1. Domain Model Description: XML model of problem and constraints expressedin a constraints language. [section 4.1]

2. Scheduling Agent: An application that can read and interpret the DomainModel Description and submit it to a scheduling algorithm. [section 5.3]

3. Scheduling Algorithm: In this application, this is a genetic algorithm; withfunctionality provided by the Darwin GA Framework - a genetic algorithmframework developed specifically for this project. [section 5.2]

The domain model description has already been discussed in chapter 4 but the designand development of the remaining two components are discussed in the remainderof this chapter. The complete class diagram for the developed system can be foundin appendix A.

5.2 The Darwin GA Framework

The Darwin Genetic Algorithms Framework has been developed to provide the ge-netic algorithm implementation to the scheduling agent developed in this chapter.It was developed to be both easy to use and extensible. The underlying goals forthe framework were that it should allow the transparent use of genetic algorithms

42

CHAPTER 5. DESIGN & IMPLEMENTATION 5.2 The Darwin GA Framework

inside a larger system but allow for extending of the framework to allow furtherexploitation of genetic algorithms.

5.2.1 Design Goals

The Darwin Framework was developed to meet the genetic algorithm requirementsof a timetable scheduling agent. This application has governed the design of theframework, but consideration has been taken to allow the framework to be used inmany different applications.

The design of the framework is similar to the design of the existing library: JGAP(section 2.3.1.3). In many ways the JGAP library is far more extensive than Darwin,however, it focuses primarily on providing data-structures and operators for geneticalgorithms. As such, a large number of gene types are supported and a large quantityof operators are provided that are currently missing from Darwin.

Darwin has a significant number of features, however, that expand on the capabilitiesof JGAP to provide a much more extensible and adaptable system. In addition toproviding operators and basic gene representations, Darwin provides an abstractionover the structure of genetic algorithms called pipes (see section 5.2.2.2) which allowthe algorithm itself to be changed and extended, without changing the underlyingframework code. Darwin also splits the crossover and mutation operators into sub-components (see section 5.2.2.3) which allows reuse of code and reduces the numberof operators needed in an operator library. Finally, Darwin has been designed usingthe latest version of the Java programming language, and as such makes extensiveuse of Generic types and Generic methods to provide compile-time type checkingthroughout the framework. This dramatically reduces the risk of run-time errorswhen extending and implementing the system. Table 5.1 summarizes the differencesbetween the two systems.

System AdvantagesJGAP Large collection of algorithm operators in-

cluded, supports multiple genome types.Darwin Highly extensible design, dynamic algo-

rithm structure, lightweight, compile-timetype checking

Table 5.1: Differences between JGAP and Darwin

In the following sections I will summarize the design and implementation the frame-work sub-components.

5.2.1.1 Programming Language Choice

In order for this framework to be usable by the largest number of people it wasdecided that the programming language used should a mainstream, class basedobject-oriented language. Due to time constraints and the intensive nature of thealgorithm, I decided to use the Java 6 programming platform as it meets the above

43


requirements and its memory management features would allow for a lower riskdevelopment for such an intensive algorithm.

5.2.2 Framework Architecture

The complete class diagram for the Darwin Framework may be found in appendixA. The framework has been designed, according to the goals set out in section 5.2.1,to be easily expandable. As such the core design principle is different to that usedit similar libraries and frameworks1. The main framework elements are

1. The Algorithm Class

2. Pipe Objects

3. Operators

Each element signifies a granular separation of processing within the framework,making it easy to extend and understand.

The framework contains several ancillary elements useful for genetic algorithms andextending the library. These are

1. The Population Class

2. The Chromosome Class

3. The PipeNode interface

Each of these components provide key functionality in the framework and are dis-cussed in the following subsections.

5.2.2.1 Algorithm

The algorithm class is the root class of the framework. It holds all settings forrunning an algorithm and handles a series of operations key to the execution of analgorithm.

The algorithm class is designed to handle the following operations within the frame-work:

1. Creation of an Algorithm

2. Setting an initial population

3. Handling exit conditions

4. Handling algorithm execution

5. Accumulation of data from algorithm execution and making it available tolisteners

1section 2.3.1

44


Instantiating the algorithm class creates an algorithm with a series of default set-tings. The user can then override these settings using the provided accessor func-tions. The instance requires a pipe object, which represents the executable algo-rithm, and a Population with which to execute the algorithm on. The algorithminstance maintains the current Population during execution and allows caching ofpopulation statistics such as best candidate and fitness variance; these statistics aremade available to listeners to allow deeper integration of the framework into a largerprogram.

Population statistics can also be leveraged to provide exit conditions for the execut-ing algorithm. This is achieved by using methods such as setCutoffVariance() to setthresholds on a statistic that trigger exiting of the algorithm. The algorithm objectalso provides pause and resume methods which allow a multi-threaded applicationto control algorithm execution directly. The default exit condition for any algorithmis met when a predefined number of cycles have executed, although this value canbe altered or overridden by the implementor.

An algorithm object acts as the root for algorithm execution. Calling the algo-rithm objects execute method hands execution to main algorithm pipe which beginsto transform the initial population. In addition to being able to pause and re-sume algorithm execution, the algorithm class also provides a over-ridable methodextendLoop. This allows the implementor to easily add custom code into the algo-rithm, which runs after each cycle of the main pipe. This allows custom statistics,exit conditions, transformations and triggers to be added to the framework withoutsignificant effort.

5.2.2.2 Pipes

Pipes set the Darwin framework apart from other previous libraries. Rather thanhave pre-scripted genetic algorithms included in the framework, which make useof available operators, pipes allow the user to create new algorithm structures byadding operators to a sequence that will be executed in turn. These sequencesconstitute pipes, with functions to allow manipulation and execution of the datastructure.

Each pipe maintains a Linked List of operators, which allows the pipe to recursivelyexecute all operators in the pipe. This saves on memory intensive operations suchas passing populations back to the pipe object before they are submitted to the nextoperator. This structure also allows for branches to be built into the algorithm se-quence - useful for simulating elitist selection policies and other similar mechanisms.Although the list structure and inherent structure of a genetic algorithm do notdirectly permit execution branches, the interface structure of the framework statesa pipe may include branching pipes as long as they conform to the parallel pipestype. This requires that all nested pipes must be executable as atomic and discreteoperations. A parent pipe treats branches as normal operations, with execution be-ing handed to the nested parallel pipe object, which then executes each of its pipesas discrete pipes, before handing the resulting population object back to the parentpipe.

There is no limit to the depth of pipes which may be nested and performance should

45


Figure 5.2: Execution of a simple GA over a single cycle

scale linearly with the number of operations employed, with negligible overhead fromthe pipe objects themselves.

5.2.2.3 Operators

Objects extending the operator class are the core functional elements of each geneticalgorithm. Each operator takes and returns a population object, performing anarbitrary operation on the population in the process. This operation is usually atransformation related to one of the genetic algorithm operators discussed in chapter4. The core transformations for genetic algorithms are provided in the framework,but any operation (including none genetic algorithm operations) can be added tothe framework to extend its functionality.

The core operations provided by the framework (in its current release) cover allthe facets of basic genetic algorithms. A Survival operator is provided which al-lows a user to script a custom evaluation function for members of a population. Aproportional selection operator is provided that implements the Monte Carlo algo-rithm [Algorithm 4.6] for selecting candidates to enter a breeding pool. A basicimplementation of an inversion operation is provided for evaluation purposes.

The design of Mutation and Crossover operations differ from previous implemen-tations. Each operation has a root abstract class, leveraging Java’s type systemto separate the dual concerns of these operations. Both Mutation and Crossoveroperations have to select candidates from the population to take action upon; thenexecute the action upon the candidates. The abstract root class contains a refer-ence to a scheme class (MutationScheme and CrossoverScheme respectively) whichholds the execution code for an operation. The abstract class is itself extended toprovide an implementation of the selection method. This separation of concernsallows any selection methodology to use any execution process without re-scriptingor providing a separate implementation of the operator. The crossover schemesprovided in this release are;

1. SinglePoint Crossover

2. DoublePoint Crossover

3. N-Point Crossover

46


Figure 5.3: Separation of Operator Concerns

These schemes are provided with a Random Crossover implementation, which pro-vides the most common crossover selection mechanism. The mutation selectionmethods provided are;

1. Fitness Dependant

2. Time Dependant

3. Gene Mutation

4. Chromosome Mutation

with both strong and weak mutation schemes implemented to cover all commonmutation processes.

5.2.2.4 Populations and Chromosomes

The Population and Chromosome classes provide type definitions for their respectivegenetic algorithm components.

Each chromosome object represents a single solution candidate in our algorithm,holding both the candidate’s genome and its calculated fitness value. Chromosomesare contained by the Population data-structure, which forms the primary operand ofa genetic algorithm. The population type provides accessor methods for retrieving

47


chromosomes from the collection, setting the value of chromosomes in the collectionand appending chromosomes to the population.

The roles of the chromosome and population classes are limited in the current frame-work release, with further expansion planned in this area - see section 5.2.3.1.

5.2.2.5 The PipeNode interface

The PipeNode interface provides a type description for all executable elements inthe framework and subsequently defines the set of types that can be contained bya pipe. In the framework, only the operator and parallel pipe classes implementthis interface; it provides a key role in extending the framework as it allows animplementor to define new types that may be included in a pipe.

5.2.3 Code Branches

In addition to the core code trunk presented in section 5.2.2, the Darwin GA frame-work has three other active code branches; currently focusing on further extensionof the framework’s features. These three branches are

1. The complex genome branch

2. Parallel algorithm implementation

3. Parallel populations branch

Each branch has been created to explore the possibility of leveraging more power-ful algorithm structures within the framework in order to speed up the solving ofoptimization problems.

5.2.3.1 Complex Genomes

This code branch looks at extending the gene implementation of the framework.Currently the framework is designed to handle binary string genomes. Researchhas indicated that real valued genes significantly improve performance on someproblems[3] and it may also be desirable to simulate natural gene structures such asthe DNA structure seen in natural systems. This requires a different implementationof genomes within the framework and this branch contains a series of prototypeelements designed to test alternative implementations.

The branch currently supports alleles with an arbitrary number of integer values.Future work will incorporate real valued attributes and support for common alter-native allele representations - such as the {[A, C, G, T ]} notation used for DNA.

5.2.3.2 Parallel Implementation

There is a large scope for creating parallelizable implementations of genetic algo-rithms, indeed other libraries have done so. This branch contains a prototype of

48


the framework, designed to allow operations to be implemented in parallel. Theexisting design of the framework already has a number of features that lends itselfto a parallel implementation; this prototype is targeted at developing the conceptfurther and analyzing performance. It is linked closely to the implementation ofparallel populations as this gains significant benefits from parallelization.

5.2.3.3 Parallel Populations

The current framework can only handle discrete populations of candidate solutions.Research into genetic algorithms and modeling of natural processes has indicatedthat performance benefits may be gained by having a set of non-discrete populationsthat interbreed using a predetermined policy. This code branch aims to modify thecurrent pipe structure to allow for non-discrete populations.

5.2.4 Using the Framework

The framework’s intended end-users are other development teams, looking to in-tegrate genetic algorithms into a larger project. As such there is no user interfaceincluded in the framework, as such functionality would be provided by a third party.What follows is a programmers guide to setting up and running the framework aspart of a program.

Incorporating the Darwin Framework into an external program requires 3 actionson behalf of the developer.

1. Implement Darwin.Input

2. Create Algorithm Instance

3. Configure Algorithm

5.2.4.1 Darwin.Input

The static input class is a placeholder for developers to implement a series of rulesthat score candidates produced by the algorithm. The scoring of candidates is highlydependent on the application domain and as such the framework does not imple-ment these actions directly. The framework provides a series of scaling tools andencode/decode functions to return Java data-types from the genome of candidatesso that they can be manipulated efficiently. This class must be implemented, unlessthe framework is extended to provide an alternative survival operator - the providedsurvival operator pulls its data from the Input class. See section 5.3.4 for moredetails on providing an implementation, although many developers may choose toprovide their own survival operator by extending the framework.

5.2.4.2 Creating an Algorithm

Creating an algorithm requires the developer to extend the Algorithm class andimplement the createPipe() method. In order to do this, the developer also needs

49


to create a pipe containing the operations they wish the algorithm to perform andthe order in which they are to be performed. Additionally, they may choose tore-implement the createPopulation() method which allows direct control over pop-ulation size and construction. Listing 5.1 provides example of all these steps.

5.2.4.3 Configuring the Algorithm

Once the algorithm has been created, it needs to configured. To do this, an instanceof the algorithm is created and then its parameters are set using the methods ofthe algorithm class. The final step is to call the execute() method of the algorithmwhich will run the algorithm. Listing 5.2 shows an example of this step.

Listing 5.2: Configuring and Running an Algorithm

public stat ic void main ( St r ing [ ] a rgs ) {2 BasicEA newAlgorithm = new BasicEA ( ) ;

newAlgorithm . enableCache ( ) ;newAlgorithm . setMaxCycles (100) ;newAlgorithm . execute ( ) ;

}

Developers of a multi-threaded application such as a user-interface should also takenote that the methods pause() and resume() allow the pausing and resumption ofalgorithm execution through calling these methods.

5.2.5 Extending the Framework

The framework is designed to be highly extensible, to allow for third party addi-tions and integration of the framework. What follows is a guide to extending theframework.

The framework is constructed from a core of abstract classes and interfaces whichallow for easy local extension of its functionality. These components are;

1. Algorithm Class

2. Pipe Class

3. PipeNode Interface

4. Operator Class

5. Crossover, CrossoverScheme, Mutate and MutateScheme Classes

Figure 5.4 details their relationships. Extending the algorithm class allows an algo-rithm configuration to be predefined for easy recall and reuse. Likewise, extendingthe pipe class allows creation of reusable pipe configurations.

The PipeNode interface defines the type of objects that may be placed into a pipe.Implementing this interface allows additional executable elements to be placed in theframework, allowing genetic algorithm operations to be combined with other actionsand algorithms. The Operator class defines the structure of all genetic algorithm

50


Listing 5.1: Creating an Algorithmpublic class BasicEA extends Algorithm{

3 protected void c rea tePopu la t i on ( ) {i n i t i a l P opu l a t i o n = new Populat ion ( ) ;ArrayList<Chromosome> cand ida t e s = new

ArrayList<Chromosome>() ;for ( int i = 0 ; i < 1000 ; i++){

Chromosome cand idate = new Chromosome ( ) ;8 cand idate . setGenome ( RandomString . newString (30) ) ;

c and ida t e s . add ( cand idate ) ;}i n i t i a l P opu l a t i o n . setCandidates ( cand ida t e s ) ;

}13

protected void c r ea teP ipe ( ) {algor i thmPipe = new

CustomPipe ( i n i t i a l P opu l a t i o n ) ;}

18 public void extendLoop ( ) {System . out . p r i n t l n ( currentCyc le + ” ” +

cachedCandidate . g e tF i tn e s s ( ) + ” ” +cachedAverage + ” ” + cachedVariance ) ;

i f ( cachedCandidate . g e tF i tn e s s ( ) == 1) {this . pause ( ) ;System . out . p r i n t l n ( currentCyc le ) ;

23 }}

class CustomPipe extends Pipe{public CustomPipe ( Populat ion operand ) {

28 setupPipe ( operand ) ;(new Surv iva l ( operand ) ) . execute ( ) ;

}public void cons t ructP ipe ( ) {

ope ra to r s . add (newPropo r t i ona l S e l e c t i on ( operand ) ) ;

33 ope ra to r s . add (new RandomCrossover ( operand ,new N PointCrossover (6 ) ) ) ;

ope ra to r s . add (new Mutate Gene ( operand ) ) ;ope ra to r s . add (new Surv iva l ( operand ) ) ;

}}

38 }

51


Figure 5.4: Core Classes and Interfaces of the Darwin Framework

52

CHAPTER 5. DESIGN & IMPLEMENTATION 5.3 A Scheduling Agent

operations, so new operations being added to the framework should extend thisclass.

There are two special cases of operation included in the framework, Crossover andMutation. These operations are separated into selection actions, by extending theroot class, and actions of the selected candidates, by extending the related Schemeclass.

New operations that have dual actions like this should be added in a similar fashion,by defining a root abstract class and scheme class which can then be extended toprovide each instance of the operator.

5.3 A Scheduling Agent

To meet the project brief, A scheduling agent was developed on-top of the Darwingenetic algorithm framework. This scheduling agent is presented here with referenceto the timetable model developed in chapter 4.

Aside: The scheduling agent presented here lacks a full implementation of the modeldeveloped in chapter 4 due to time constraints on the development of this project.As such this section provides a proof of concept for the material presented in thisreport.

5.3.1 Design

Here I will cover the major design issues involved in building a scheduling agent.Due to time constraints, the creation of a constraints language and parser wasdeemed beyond the scope of this project. Nevertheless, I will also include here abrief introduction of a possible, model-driven, solution which would integrate wellwith the current scheduling agent built in this section.

5.3.1.1 Interpreting a Constraints Language

Whilst a dedicated language parser, built into the scheduling agent, would providethe cleanest method of expressing constraints on a timetabling model; an alternativeexists in using a model-driven approach to generate Java classes directly from aformal constraints language. Unlike a parser, which would populate an internaldata-structure and interface with Java operations to provide manipulation of thetimetable model, a model-driven approach requires re-compilation of applicationafter the constraints have been interpreted. This is a significant disadvantage asour program is intended to be run by non-technical users without the knowledgerequired to re-compile a program. However, such an approach provides us with adevelopment advantage. Utilizing existing model-driven tools significantly reducesthe development time required to create a program capable of handling generictimetable constraints.

In a model-driven approach, the constraints are expressed in a formal modeling lan-guage such as EMF, a facility provided by the eclipse modeling framework[31], which

53


conforms to a meta-model for the language built in a meta-modeling language suchas the international standard MOF2 (meta-object facility). Such a meta-modelinglanguage allows for the model expressed in EMF to be translated into another model,through means of a model transformation language such as QVT (query/view/trans-form, a language specification of the Object Management Group who oversee manylanguage specifications including UML).[32] In our case we could transform themodel to Java, and use a model-to-text (M2T) transformation to turn the modelinto valid Java code.[32] In Java code, the constraints will have been translatedinto classes representing different resource types, with parameters and methods toprovide constraints checking. Once compiled, these classes form an internal data-model of a timetable for the scheduling agent. In the following scheduling agent thisinternal data structure can be seen in the Input class.

5.3.1.2 Use of Generics

The internal model makes extensive use of generic data-structures in order to providecompile-time type checking. This negates the possibility of run-time errors occurringwhen the model is being instantiated. This is easily achievable when the model isgenerated using the model-driven process detailed in section 5.3.1.1 as the generictypes can be assigned during the M2T transformation. If a parser was used tointerpret the constraints language directly into a Java data-structure, devising adata-structure that uses generics would be far more challenging and quite probablyimpossible (although I haven’t a formulated a proof of this hypothesis).

5.3.2 Internal Model

For clarity the code for building the internal timetable model for the high schoolscenario has been included in appendix B.

This particular implementation solves the no-clashes constraint by utilizing thegenetic algorithm implementation. When reading teacher information from thegenome, if a teacher has been assigned to a time slot more than once the last slot tobe read is discarded. This method avoids having to perform an additional constraintcheck, as the completeness constraint for the model dictates that all classes shouldhave a teacher. Removing a teacher from a time slot breaks this second constraintand so both the completeness constraint and no-clashes constraint are both includedin a single constraint check.

To allow for efficient constraint checking, as each group is assigned a set of teachersfor each time slot and hence all resources must be involved in a group assignment, themodel is expressed from a group view. The later reason allows that when selectingthis view, no data is lost from our timetable model. The model discussed in chapter 4dictates that from any complete view (a view that has not had any projectionsperformed upon it) all associated data can be retrieved by transforming the view,so constraints expressed on any other views of the data can still be evaluated bytransforming the internal data model. Another dimension to this model is becauseall classes have a fixed number of occurrences in the timetable, when encoding the

2ISO/IEC 19502:2005 Information technology – Meta Object Facility (MOF)

54


genome of a candidate, all classes and occurrences of a class in the timetable arerepresented in the genome by a continuous range of integers. A teacher is thatassigned to an occurrence number, requiring a method getClassFromOccurance() todecode the genome value and populate the model.

For each type of resource an ArrayList object is declared that contains the set of allinstances for that resource type. The instances are created when the timetable is in-stantiated; reading parameter values from the input XML document.[Section 4.1.3.1].

When a genome is parsed into the model, the model is updated with the genomevalues which can then be checked for constraints. After each round of constraintchecking, the model is cleared ready for the next genome to be parsed. Each resourcetype in the model contains ArrayList objects which contains resources that areassigned to it. They also contain a method hasPeriod() that is used to check if aresource has been assigned to a specified time period, this is used extensively duringconstraints checking.

5.3.3 Encoding, decoding and doping a Population

To effectively use the Darwin framework to solve the scheduling problem, a popu-lation must be created for the GA to run over. The population must be given agenome which is long enough to encode all solutions to our timetable, or else the GAwill be unable to fully explore the problem space. Once the genome is encoded, wealso require the reciprocal decode function to enable us to extra a solution from thegenome of a candidate in the population; the decode function must extra data in aconsistent manner over all candidates or else the algorithm would only be performinga random search.

Unlike JGAP, which requires the user to specify the structure of the genome man-ually, the Darwin framework provides two methods - encode and decode - whichcalculate the length of genome required to encode an integer variable. Setting thegenome length of a population requires passing the number of possible assignmentsfor each model element in the timetable to the encode function and creating a genomewhose length is a sum of all values returned by encode. The decode function allowsthe inverse to be performed, taking a section of genome as input it can return thevariable value encoded by that segment (in this respect it can be considered themRNA of the genetic algorithm). This method allows the model to be populatedfrom a genome, by looping through each of the elements in the model, passing itssize to the decode method, and then setting the elements value to the value returnedby the decode method. Listing 5.3 demonstrates the usage of both functions.

5.3.3.1 Doping a Population

Of particular interest in scheduling applications is the ability to dope a population.This mechanism (introduced in section 4.2.1.1), used in combination with the encodeand decode functions - allows an existing solution [to the scheduling problem] to beencoded into a genome and inserted into a population of otherwise random candi-dates. This process allows the genetic algorithm to try and find similar solutions toan existing timetable when constraints may have changed and it needs adapting.

55


Listing 5.3: Encoding and Decoding a genomepublic stat ic int encode ( int va lue ) {return

I n t eg e r . toBinaryStr ing ( va lue − 1) . l ength ( ) ;}public stat ic int decode ( S t r ing

genomeSect ion ) {returnI n t eg e r . pa r s e In t ( genomeSection , 2 ) ;}

4 . . .

// I n s t a n t i a t i n g the model by decoding a genomepublic stat ic void par s eS t r i ng ( S t r ing bitStream ) {

int bitStreamPointer = 0 ;9

for ( int i = 0 ; i < t imeExtents . l ength ; i++){t imeExtents [ i ] = 0 ;

}

14 for ( int i = 0 ; i < ca l cu l a t eC la s sOccurance s ( ) ;i++){

int parsedValue =LocalMath . decode ( bitStream . sub s t r i ng ( bitStreamPointer ,b i tStreamPointer +LocalMath . encode ( t ea che r s . s i z e ( ) ) ) ) ;

i f ( parsedValue < t e a che r s . s i z e ( ) ) {getClassFromOccurance ( i ) . s t a f f . add (

t ea che r s . get ( parsedValue ) ) ;}

19 bi tStreamPointer +=LocalMath . encode ( t ea che r s . s i z e ( ) ) ;

}}

56


5.3.4 Fitness Calculation

The default location for fitness evaluation in the framework is the evaluate methodof the Input class. The method takes a candidate genome as input, decodes it topopulate the internal model, evaluates each constraint then returns a score for thecandidate based on the validity of the solution it encodes.

Each constraint has associated with it a method and an integer. These count thenumber of times the constraint is violated. The method interrogates the modelto apply the constraint, returning the number of violations to the integer. Hardconstraints are comparatively trivial to express in a method. To express a hardconstraint, the method has to loop through each member of the extents involvedin the constraints, and at each member apply a conditional statement to expressthe constraint. Soft constraints require more complex expression. If a languagesimilar to that described in section 4.1.4 is used to describe constraints, includingconditional statements and set operators, then expressing soft constraints is a matterof translating the logic from the constraints language to Java. Listing 5.4 shows aninteresting constraint from the scenario, which distributes the classes evenly betweenteachers - dependant on the ability of the class.[section 3.2, constraint 5]

Listing 5.4: Soft Constraint Expressionint t e a ch e r l e s s on c oun t = 0 ;for ( Teacher t : t e a che r s ) {

int l e s s on coun t = 0 ;4 for ( Class c : c l a s s e s ) {

for ( int i = 0 ; i < c . occurances ( ) && i <c . s t a f f . s i z e ( ) ; i++){i f ( c . s t a f f . get ( i ) == t ) {

l e s s on coun t++;}

9 }}i f ( l e s s on coun t < t . occurances ) {

t e a ch e r l e s s on c oun t += t . occurances −l e s s on coun t ;

} else {14 t e a ch e r l e s s on c oun t += l e s s on coun t −

t . occurances ;}

}

To calculate a final score, all the constraint scores are scaled and summed to providea normalized value. Scaling functions are provided by the LocalMath class.

Listing 5.5: Candidate Fitness Scalingdouble t o t a l S c o r e =

LocalMath . s c a l e ( ca l cu l a t eC la s sOccurance s ( ) −t eache r c l a s s Count , 0 ,ca l cu l a t eC la s sOccurance s ( ) ) +

LocalMath . s c a l e ( ( ca l cu l a t eC la s sOccurance s ( ) ∗t e a che r s . s i z e ( ) )−t e a ch e r t ime con s t r a i n t , 0 ,ca l cu l a t eC la s sOccurance s ( ) ∗ t e a che r s . s i z e ( ) ) +

4 LocalMath . s c a l e ( ca l cu l a t eC la s sOccurance s ( ) −c l a s s t e a c h e r c o n s i s t e n c y++ , 0 ,ca l cu l a t eC la s sOccurance s ( ) ) +

57


LocalMath . s c a l e ( ca l cu l a t eC la s sOccurance s ( ) −t e a che r l e s s on coun t , 0 ,ca l cu l a t eC la s sOccurance s ( ) ) ;

return LocalMath . s c a l e ( t o t a l S c o r e , 0 , 4) ;

5.3.5 XML Handling

Following the scheduling agent design set out in chapter 4, our scheduling agentneeds to be able to read in XML timetable instances. In addition, the schedulingagents should also be able to output solutions in XML. For this reason, classes thathandle reading and writing to XML files have been included in the agents code-base.

5.3.5.1 Reading XML

XML reading is managed by the XMLReader class, which provides a wrapper aroundthe Java Document Object Model(DOM) classes, used to store XML styled data,and the Java XML parser. Providing a separate wrapper for this purpose allowsthe underlying parser and data-structure to be decoupled from the scheduling agentand changed independently without affecting other system components.

The class constructor takes a valid file path as input, creating a new instance of theXMLReader. On creation of a new XMLReader, the file pointed to by the file pathis read and parsed by the XML parser into a DOM instance. The DOM is a treestructure that holds each XML entity as a leaf on the tree.[21] The XMLReaderobject provides a number of traversal and inspection operations to allow the DOMto be navigated and interrogated. The nextRoot, previousRoot and next, previousmethods allow forwards and backwards navigation of the current tree level and allchildren of the current node respectively. The examine and examineItem methodsallow the currently selected child node to be interrogated, or the current node valueto be read respectively.

The XMLReader class is used by the Input class to instantiate elements of the inter-nal model. In a more complete system, a second abstraction of the XML functionscould be provided to automatically populate the internal model without having totraverse the tree as a separate step. This could be done using a SAX (Simple API forXML) parser; rather than constructing a DOM of the XML document, a SAX parseruses an event-based model and calls specified methods when it finds certain elementswithin the XML document.[21] This style of parser could be used to perform themodel instantiation in a single step.

5.3.5.2 Writing XML

Writing out XML is handled by the XMLWriter class. Although writing XML is arelatively trivial task, requiring writing of strings containing model parameter valuesin the XML format, I have provided a number of helper functions for constructingtags in the proper form. The methods startTag and endTag insert the appropriatebracketing around element names to construct a valid XML document. The XML-

58


Writer class also contains file writing capabilities. A output file path is providedduring instantiation of the class and all tags added to the output are written out tothe file when the objects write or append method is called.

59


60

6Evaluation

6.1 Effectiveness of Timetabling Model

The proposed timetable model allowed for an internal data-structure to be con-structed and manipulated in a structured way. Having predictable actions and astandard interface for accessing all elements of the model allowed for the fast con-struction of multiple problems from different domains. The model, with associatedconstraints language and XML files, allows for the complete expression of all con-straints in the scenario set out in section 3.2. The split XML input files allows fordefinition of complex model elements, such as classes and their relation to bands; notpossible using past approaches. Whilst the instance XML file allows for the loadingof specific model data such as the class names and their sizes, allowing for constraintsto be expressed on any conceivable attribute type the users may require. Hard con-straints and the two special no-clashes and completeness constraints account for thescenario constraints 1, 2 and 3. Whilst the implementation of soft constraints, inconjunction with our standard model, allows for the predictable implementation ofthe remaining constraints despite their complex and possibly conflicting goals. Con-flicting constraints are handled by the genetic algorithm as it finds an acceptablebalance between the two constraints.

The Darwin GA Framework provides simple to use and efficient genetic algorithmsto the scheduling agent. The downfall of Darwin is its simplistic operators, whichlimit the strength of the solution process. While this simple implementation allowsfor the solving of non NP-complete problems in approximately 2000 cycles (about2 minutes on my Pentium D PC) of the algorithm, with small variations for actualsize of the problem; a complete problem such as the one in the scenario takes manythousands of cycles to find partial solutions. Reduced scenarios have shown that thetimetabling model is complete, in that it can model all complex constraints so thatthey are solvable by the genetic algorithm, but on non-trivial problems the geneticalgorithm cannot solve the problem in a feasible timescale. With the timetablingmodel in place, non NP-complete problems may be more efficiently solved usingother solving algorithms that have shown promise with other optimization tasks, inparticular a graph based approach.[7] NP-complete problems may be more effectivelysolved by employing more advanced genetic algorithm operations and incorporatingsome of the existing code branches for Darwin into the main code trunk. Real valued

61

CHAPTER 6. EVALUATION 6.2 Future Developments

genome strings have been shown to have significant benefits over the binary approachemployed in Darwin.[3] Additionally, building a parallelizable algorithm could allowfor much faster solving times on modern multi-core computers. Indeed, in its currentimplementation, the genetic algorithm employed only ever exploited one core on thedevelopment system, reducing its capacity by half. Simulating parallel populationswith inter-breeding may also so good results, gaining benefits from both accessingadditional processing cores and improvements in the algorithm’s search power.[3]

6.2 Future Developments

6.2.1 Darwin GA Framework

The Darwin framework, whilst featuring several good architectural features thatsets it apart from its competitors, lacks the advanced genetic algorithm componentsthat would give it the power to solve problems as complex as those set out in thesystem scenario.[section 3.2]

Future development should focus on integrating the 4 current code branches, pro-viding alternative gene types, parallel populations and parallelized implementationswould allow the algorithms to solve problems in a much more reasonable time thancurrently possible. Development should also seek to improve the number of opera-tors available to the default framework. Although the framework is easily extensible,optimized operators should be included to allow the framework to be distributed foruse in other application areas.

6.2.2 Scheduling Agent

The current scheduling agent, whilst being based on the standardized timetablemodel, does not exploit its full potential. Due to time constraints, the implementa-tion of the constraints language could not be completed. Currently Java coding isrequired to implement each constraint, although its structure closely resembles thatof the constraints language; this falls short of the intended use for the project, allow-ing non-technical users to specify elements of the system. A model driven approachcould mitigate the need for Java coding to some degree, however, the program wouldstill need to be recompiled which feels a little against the spirit of the project.

Future development should focus on completing the constraints language and build-ing an interpreter into the scheduling agent that can read constraints ‘on-the-fly’and express them on the internal model. This is not a trivial task, however, it wouldgreatly increase the usefulness of the system.

The original project plan included building a user-interface for the program. Withthe change in scope of the project, where originally it had focused on a domain spe-cific solution, and time-constraints due to the project complexity, it was necessaryto remove the requirement for a user interface as it would have been un-completableby the project deadline. In future development this could be completed to providea complete program for use by real world end-users. This would allow a more com-plete study into the completeness of the timetable model and constraints language,

62

CHAPTER 6. EVALUATION 6.3 Further Research

informing future developments in the field.

6.3 Further Research

Completion of a full program would allow future research to be undertaken intothe completeness of the timetable model and its constraint language. Currently themodel fits all scenarios imagined by its author, however, it is likely that furtherdevelopments can be made so that it can model scenarios not considered during itscreation. The timetable model and the scheduling agent are also designed for solvingresource assignment problems.[Table 2.1] Research into applying this approach toresource modeling tasks would help solve many currently challenging tasks such astransport timetabling.

63

CHAPTER 6. EVALUATION 6.3 Further Research

64

Bibliography

[1] “NetBeans,” 4th May 2009. http://www.netbeans.org/.

[2] F. Mittelbach, M. Goossens, J. Braams, D. Carlisle, and C. Rowley, The LaTeXCompanion. Addison-Wesley, 2004.

[3] D. Dumitrescu, B. Lazzerini, L. Jain, and A. Dumitrescu, Evolutionary Com-putation. CRC Press LLC, 2000.

[4] R. L. Graham, D. E. Knuth, and O. Patashnik, Concrete Mathematics: AFoundation for Computer Science. Addison-Wesley, second ed., 1994.

[5] R. Elmasri and S. B. Navathe, Fundamentals of Database Systems, 2nd Edition.Benjamin/Cummings, 1994.

[6] R. Qu, E. K. Burke, B. Mccollum, L. T. Merlot, and S. Y. Lee, “A surveyof search methodologies and automated system development for examinationtimetabling,” J. of Scheduling, vol. 12, no. 1, pp. 55–89, 2009.

[7] E. Burke, D. de Werra, and J. kingston, “Applications to timetabling,” inHandbook of Graph Theory (K. H. Rosen, J. L. Gross, and J. Yellen, eds.),Discrete Mathematics and its Applications, pp. 445–474, CRC Press LLC, 2004.

[8] A. Turing, “Computing machinery and intelligence,” Mind, vol. 59, 1950.

[9] R. Friedberg, “A learning machine: Part i,” IBM J. Research and Development,vol. 2, 1958.

[10] C. K. Omoto and P. F. Lurquin, Genes and DNA: A Beginner’s guide to ge-netics and its applications. Columbia University Press, 2004.

[11] P. J. Russell, iGenetics: A Molecular Approach. Pearson Education, 2006.

[12] E. Mayr, The Growth of Biological Thought: Diversity, Evolution, and Inheri-tance. Harvard University Press, 1983.

[13] J. H. Holland, “Outline for a logical theory of adaptive systems,” JACM, vol. 9,no. 3, 1962.

65

BIBLIOGRAPHY BIBLIOGRAPHY

[14] J. H. Holland, Adaptation in natural and artificial systems. Cambridge, MA,USA: MIT Press, 1992.

[15] L. Fogel, “Autonomous automata,” Industrial Research, vol. 4, 1962.

[16] “Galib,” 4th May 2009. http://lancet.mit.edu/ga/-GAlib.

[17] “Gaul,” 4th May 2009. http://gaul.sourceforge.net/.

[18] “Jgap,” 4th May 2009. http://jgap.sourceforge.net/index.html.

[19] S. Even, A. Itai, and A. Shamir, “On the complexity of timetable multi com-modity flow problems,” SIAM J. Comput., no. 5, pp. 691–703, 1976.

[20] M. Grobner, P. Wilke, and S. Buttcher, “A standard framework for timetablingproblems,” in PATAT (E. K. Burke and P. D. Causmaecker, eds.), vol. 2740 ofLecture Notes in Computer Science, pp. 24–38, Springer, 2002.

[21] E. T. Ray, Learning XML, Second Edition. O’Reilly, 2003.

[22] C. M. Sperberg-McQueen and H. Thompson, “Xml schema,” 4th May 2009.http://www.w3.org/XML/Schema.

[23] W. Recommendation, “Xquery 1.0: An xml query language,” 4th May 2009.http://www.w3.org/TR/xquery/.

[24] D. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learn-ing. Reading, MA: Addison-Wesley, 1989.

[25] L. Davis, Handbook of Genetic Algorithms. Van Nostrand Reinhold, 1991.

[26] H. Muhlenbein, “How genetic algorithms really work: Part 1. mutation and hill-climbing,” in Parallel Problem Solving from Nature, 2 (R. Manner and B. Man-derick, eds.), pp. 115–124, Elsevier, 1992.

[27] T. Bck, “Optimal mutation rates in genetic search,” in Proceedings of the fifthInternational Conference on Genetic Algorithms, pp. 2–8, Morgan Kaufmann,1993.

[28] T. Bck, Evolutionary Algorithms in Theory and Practice. Oxford UniversityPress, 1996.

[29] J. Smith and T. C. Fogarty, “Self adaptation of mutation rates in a steadystate genetic algorithm,” in Proceeding of the 3rd International Conference onEvolutionary Computation, pp. 318–323, IEEE Press, 1996.

[30] J. D. Schaffer, R. A. Caruana, L. J. Eshelman, and R. Das, “A study of con-trol parameters affecting online performance of genetic algorithms for functionoptimization,” in Proceedings of the third international conference on Geneticalgorithms, (San Francisco, CA, USA), pp. 51–60, Morgan Kaufmann Publish-ers Inc., 1989.

[31] “Eclipse modeling framework,” 4th May 2009. http://www.eclipse.org/

modeling/emf/.

66

BIBLIOGRAPHY BIBLIOGRAPHY

[32] J. D. Poole, “Model-driven architecture: Vision, standards and emerg-ing technologies,” 4th May 2009. http://www.omg.org/mda/mda_files/

Model-Driven_Architecture.pdf.

67

A

Darwin GA FrameworkClass Diagram

68

APPENDIX A. DARWIN GA FRAMEWORK CLASS DIAGRAM

69


70


71

B

Scheduling Agent

Internal Timetable Model

Listing B.1: Internal Timetable Model

stat ic ArrayList<Group> groups = new ArrayList<Group>() ;stat ic ArrayList<Class> c l a s s e s = new ArrayList<Class >() ;

3 stat ic ArrayList<Teacher> t e a che r s = new ArrayList<Teacher >() ;stat ic int t imetableLength ;stat ic int [ ] t imeExtents ;stat ic int tota lOccurances = 0 ;

8 void ca l cu la t eTota lOccurance s ( ) {tota lOccurances = 0 ;for (Group g : groups ) {

tota lOccurances += g . occurances ;}

13 }

stat ic int ca l cu l a t eC la s sOccurance s ( ) {int t o t a l = 0 ;for (Group g : groups ) {

18 t o t a l += g . occurances ∗ g . c l a s s e s . s i z e ( ) ;}return t o t a l ;

}

23 stat ic Class getClassFromOccurance ( int occuranceNumber ) {int po in t e r = 0 ;for (Group g : groups ) {

for ( Class c : g . c l a s s e s ) {for ( int i : c . p e r i od s ( ) ) {

28 i f ( po in t e r == occuranceNumber ) return c ;po in t e r++;

}}

}33 return null ;

}

class Group{

72

APPENDIX B. SCHEDULING AGENT: INTERNAL TIMETABLE MODEL

ArrayList<Class> c l a s s e s = new ArrayList<Class >() ;38 St r ing name ;

int occurances = 3 ;private int [ ] t imetab ledPer iod = new int [ occurances ] ;void s e tPer i od ( int index , int pe r i od ) {

t imetab ledPer iod [ index ] = pe r i od ; }int [ ] p e r i od s ( ) { return t imetab ledPer iod ; }

43 int s i z e ( ) { return c l a s s e s . s i z e ( ) ; }void newClass ( S t r ing name ) { c l a s s e s . add (new Class ( name ,

this ) ) ; }Group( St r ing name ) { name = name ; groups . add ( this ) ;

de fau l tArray ( ) ; }Group( St r ing name , int occurance s ) { this ( name ) ; occurances

= occurance s ; t imetab ledPer iod = new int [ occurances ] ; }void r e s e t ( ) { t imetab ledPer iod = new int [ occurances ] ;

de fau l tArray ( ) ; }48 boolean hasPer iod ( int pe r i od ) {

for ( int i = 0 ; i < t imetab ledPer iod . l ength ; i++){i f ( t imetab ledPer iod [ i ] == pe r i od ) return true ;

}return fa l se ;

53 }void de fau l tArray ( ) {

for ( int i = 0 ; i < t imetab ledPer iod . l ength ; i++){t imetab ledPer iod [ i ] = −1;

}58 }

}class Class {

Group parentGroup ;S t r ing name ;

63 ArrayList<Teacher> s t a f f = new ArrayList<Teacher >() ;int occurances ( ) { return parentGroup . occurances ; }int [ ] p e r i od s ( ) { return parentGroup . t imetab ledPer iod ; }boolean hasPer iod ( int pe r i od ) {

for ( int i = 0 ; i < parentGroup . t imetab ledPer iod . l ength ;i++){

68 i f ( parentGroup . t imetab ledPer iod [ i ] == pe r i od ) returntrue ;

}return fa l se ;

}Class ( S t r ing name , Group parentGroup ) { name = name ;

parentGroup = parentGroup ; c l a s s e s . add ( this ) ;}73 }

class Teacher{St r ing name ;int occurances ;

78 Teacher ( S t r ing name , int occurance s ) { name = name ;occurances = occurance s ; t e a che r s . add ( this ) ;}

}

73

the university of...

Documents