project documentation carol george
TRANSCRIPT
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 1/33
RESEARCH PROPOSAL
TITLE:
EFFECTS OF EMPLOYING
DATA MINING IN A HOTEL MANAGEMENT SYSTEM.
A CASE STUDY OF Norfolk Hotel Nairobi (Fairmont Hotel).
PRESENTED BY:
GEORGE NJUGUNA
REG.NO.
BMIT/OO54/05/09
A Research Proposal documentation to be submitted in partial fulfillment of the Degree of Bachelor of Management $ Information Technology
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 2/33
DECLARATION.
I would like to declare that this is my own work and that it has not been presented to any other
University or this university for marking before. It is the creation of my own ideas from
beginning to the end of it all.
Student GEORGE NJUGUNA
Signature ……………………………………
Date ……………………………………
Supervisor
Signature ……………………………………
Date ……………………………………
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 3/33
ABSTRACT
The overall goal of this research is to come up with a computer system that can automate the
operations of a Hotel system which has data mining capabilities.
The case study is the Norfolk hotel(fairnont) . It is a hotel which is dedicated to providing best
services to its customer. Currently it is using a system that doesn’t incorporate a data mining
feature.
With this in mind, a research was necessary to investigate data mining issues in management
computer systems.
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 4/33
DEDICATION.
I affectionately dedicate this work to all the generous and helpful people, men and women who
have this gift and sharing and helping others. And to two of them in particular: My parents Mr.
and Mrs. Anthony Kamau without them I couldn’t have made it this far.
To my siblings Willy, Trevor, Emma and Mitchell who have given me a reason to believe in
myself.
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 5/33
ACKNOWLEGDEMENT.
With the research proposal completion, I wish to acknowledge with thanks the help of those
without whom, I would not have accomplished half as much in this project.
First I thank the Almighty God for his strength, grace and mercy which has been unconditionally
given to me from the start up to now.
Sincere thanks to friends who in one-way or another helped me see this through. Their support
financial, moral or otherwise was well appreciated.
Exceptional thanks also to my various lecturers who have given me inspiration, support, and
criticisms.
Individual thanks to my supervisor Mr. Ngeno. for his guidance throughout the research and to
my friend Mr. N.Karie, Warui, Melvin Mwangi and Peris Wanjiru for their endless
encouragement.
God Bless you all.
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 6/33
TABLE OF CONTENTS.
CHAPTER 1.
1.0 INTRODUCTION.
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 7/33
1.1BACKGROUND INFORMATION
Before and during the colonial period there were few if any large hotels in Kenya. Early British
settlers in Kenya often lived in the cities for part of the year but they usually rented a house from
their British predecessors , if they did not own one, rather than staying in a hotel. Numbers of
business and foreign visitors were very small by modern standards. The accommodation
available to them included lodging houses and coaching inns. Lodging houses were more like
private homes with rooms to let than commercial hotels, and were often run by widows.
Coaching inns served passengers from the stage coaches which were the main means of long-
distance passenger transport before railways began to develop in the 1830s. The last surviving
galleried coaching inn in London is the George Inn which now belongs to the National Trust.
A few hotels of a more modern variety began to be built in the early 1900’s and so was the
norfork Hotel which opened its doors in the early seventies , the precursor of Claridge's, opened
its doors in 1812 but, up to the mid-19th century it closed down due to bankruptsy and went into
receivership.The Norfolk Hotel Fairmont is one of the most characterful hotels in East Africa.
Older than the London Ritz, the Norfolk started life on Christmas day, 1904 and
apart from being one of the oldest hotels in Kenya and indeed the whole African region,
the Norfolk has gone through various transitions over the decades,
from restyling and renovations, to change of ownership and is dedicated to offering services to
tourists from all over the world including the local tourists. It offers a wide range of services
ranging from providing conference rooms, Accommodations for tourists, hearty meals for all
kind of people, entertainments like live performance from local and international singers, Sports
such as Golf which is the major sport, boat riding, camel riding, to name just but a few. It is
managed by experienced personnel. It has about 500 employees. Because of the thirst to explore
Kenya by tourists and the beautiful and cool nature of the country especially the Rift valley, the
hotel business has picked up. It is among the top Hotels in the region, hence the need to give the
best services efficiently. The company is currently using a system that does not have data mining
features. The directors are also intending to expand the Hotels to other towns in the near future.
The aim of the company is:
To provide employment.
To promote both domestic tourism and foreign tourism.
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 8/33
Help tourists to explore on nature and different cultures in Kenya.
1.2 STATEMENT OF THE PROBLEM.
Norfolk Hotel Nairobi (Fairmont Hotel) deals with hundreds of customers and carries out
thousands of computerized transactions a year. How to use this data has become kind of a
complex phenomenon to managers. Questions that pop up are what next after this accumulation?
As the scope of the Hotel become wide there is need to control costs and understand services
mixes of different customer groups. How to price services and focus certain services to a group
of people is very important for the growth of the hotel.
Each customer has a different buying habit. They might differ according to seasons. Trackingthis is as complex as inferring information got from it.
1.2 PROBLEM SOLUTION.
The above problems can adequately be solved by developing a data mining system. The system
will be able to perform among others the following functions:
1. Use the data mining capacity of the software to analyze tourist request patterns.
2. The patterns, associates, or relationships among all this data can provide information. For
example, analysis of the hotel’s point of sale transaction data can yield information on
which services are selling and when.3. Information can be converted into knowledge about historical patterns and future trends.
For example, summary information on services sales can be analyzed in light of promotional efforts to provide knowledge of tourist buying behavior. Thus, helps todetermine which services are most susceptible to promotional efforts.
4. The system will also enable Green Park Golf and Country Hotel to determinerelationships among "internal" factors such as price, service delivery, or staff skills, and"external" factors such as economic indicators, competition, and customer demographics.
5. They will be able to determine the impact on sales, customer satisfaction, and hotel’s
profits.6. Finally, it enables them to "drill down" into summary information to view detail
transactional data.
1.3 RESEARCH OBJECTIVES.
1.3.1 General Objective
To develop a data mining system with capabilities of mining and inferring conclusions
from a large complex hotel’s dataset
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 9/33
1.3.2 Specific Objectives
i. Research on data mining as an aid to recognizing patterns from hotel’s data
ii. Investigate on the recent state of the tourist Market in Kenya and its growth capacity.
1.4 RESEARCH QUESTIONS
i. A general overview of what is data mining?
ii. What are the foundations of data mining?
iii. What can data mining do to an underlying business venture?
iv. What are the most commonly used techniques in data mining?
v. What is the essence of visualizing data mining models?
1.5 Justification of the study This study is intended to help management deal with the ever growing problem of acquiring
large volumes of data from complex databases in their organizations it is thus to simplify thework of top level management deal with the issue of this as well as to minimize the time wasted
in trying to maneuver through the said systems
The study will also help employees learn how the system will also enable Norfolk Hotel Nairobi
(Fairmont Hotel) to determine relationships among "internal" factors such as price, service
delivery, or staff skills, and "external" factors such as economic indicators, competition, and
customer demographics
On the other hand, present and future researchers will have a starting point on ways of handling
large amounts of information to their advantage .
1.6 Scope of the study
The research will be carried out in Norfolk Hotel Nairobi (Fairmont Hotel) in the Nairobi CBD
Area and adjusting hotels and restaurants.
1.7 Limitations and delimitations of the study
Budget – due to the distance from my learning institution to the said venue of conducting the
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 10/33
research finances will be a huge setback but with few funds from my parents I will manage
Insufficient time to obtain research data, therefore, findings will give a short run overview and
not the general trend of events.
Time – the short time required to have completed the research proposal will be agreat hindrance
Primary data.
I intend to use:
Questioners.
Interviews.
Observations.
Secondary Sources
Internet- It is a very reliable source for providing information in the research area
because it contains many systems in use.
Books- Having to search on information e.g. on how hotel automation would enhance
services provided to the users, books helps in giving adequate information.
1.7 REQUIREMENTS
Hardware requirement
Pentium 4 PC.
Mouse and keyboard
Printer.
2gb flash disk for backup.
128 MB of
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 11/33
CHAPTER 2
2.0 DATA MINING
2.1 WHAT IS DATA MINING? OVERVIEW
1. Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information -information that can be used to increase revenue, cuts costs, or both. Data miningsoftware is one of a number of analytical tools for analyzing data. It allows users toanalyze data from many different dimensions or angles, categorize it, and summarize therelationships identified. Technically, data mining is the process of finding correlations orpatterns among dozens of fields in large relational databases.
2. Data mining is the process of sorting through large amounts of data and picking out
relevant information. It is usually used by business intelligence organizations, andfinancial analysts, but is increasingly being used in the sciences to extract informationfrom the enormous data sets generated by modern experimental and observationalmethods. According to W. Frawley and G. Piatetsky (1992) he described Data mining as"the nontrivial extraction of implicit, previously unknown, and potentially usefulinformation from data" and D. Hand, H. Mannila and P. Smyth (2001) described it as"The science of extracting useful information from large data sets or databases.", AndKantardzic, Mehmed (2003) defined Data mining in relation to enterprise resourceplanning is the statistical and logical analysis of large sets of transaction data, looking forpatterns that can aid decision making.
3. Data mining, the extraction of hidden predictive information from large databases , is apowerful new technology with great potential to help companies focus on the mostimportant information in their data warehouses. Data mining tools predict future trendsand behaviors, allowing businesses to make proactive, knowledge-driven decisions. Theautomated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. Datamining tools can answer business questions that traditionally were too time consuming to
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 12/33
resolve. They scour databases for hidden patterns, finding predictive information thatexperts may miss because it lies outside their expectations.
Most companies already collect and refine massive quantities of data. Data miningtechniques can be implemented rapidly on existing software and hardware platforms to
enhance the value of existing information resources, and can be integrated with newproducts and systems as they are brought on-line. When implemented on highperformance client/server or parallel processing computers, data mining tools can analyzemassive databases to deliver answers to questions such as, "Which clients are most likelyto respond to my next promotional mailing, and why?"
2.2 DATA, INFORMATION, AND KNOWLEDGE
2.2.1 Data
Data are any facts, numbers, or text that can be processed by a computer. Today, organizations
are accumulating vast and growing amounts of data in different formats and different databases.This includes:
operational or transactional data such as, sales, cost, inventory, payroll, and accounting
non-operational data, such as industry sales, forecast data, and macro economic data
meta data - data about the data itself, such as logical database design or data dictionarydefinitions
2.2.2 Information
The patterns, associations, or relationships among all this data can provide information. Forexample, analysis of retail point of sale transaction data can yield information on which productsare selling and when.
2.2.3 Knowledge
Information can be converted into knowledge about historical patterns and future trends. Forexample, summary information on retail supermarket sales can be analyzed in light of promotional efforts to provide knowledge of consumer buying behavior. Thus, a manufacturer orretailer could determine which items are most susceptible to promotional efforts.
2.2.4 Data Warehouses
Dramatic advances in data capture, processing power, data transmission, and storage capabilitiesare enabling organizations to integrate their various databases into data warehouses. Datawarehousing is defined as a process of centralized data management and retrieval. Data
warehousing, like data mining, is a relatively new term although the concept itself has beenaround for years. Data warehousing represents an ideal vision of maintaining a central repositoryof all organizational data. Centralization of data is needed to maximize user access and analysis.Dramatic technological advances are making this vision a reality for many companies. And,equally dramatic advances in data analysis software are allowing users to access this data freely.The data analysis software is what supports data mining.
2.3 THE FOUNDATIONS OF DATA MINING
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 13/33
Data mining techniques are the result of a long process of research and product development.This evolution began when business data was first stored on computers, continued withimprovements in data access, and more recently, generated technologies that allow users tonavigate through their data in real time. Data mining takes this evolutionary process beyondretrospective data access and navigation to prospective and proactive information delivery. Data
mining is ready for application in the business community because it is supported by threetechnologies that are now sufficiently mature:
Massive data collection Powerful multiprocessor computers Data mining algorithms
Commercial databases are growing at unprecedented rates. A survey of data warehouse projectsfound that 19% of respondents are beyond the 200 gigabyte level, while 59% expect to be thereby the start of the new millennia. [1] In some industries, such as retail, these numbers can bemuch larger. The accompanying need for improved computational engines can now be met in a
cost-effective manner with parallel multiprocessor computer technology. Data mining algorithmsembody techniques that have existed for at least 10 years, but have only recently beenimplemented as mature, reliable, understandable tools that consistently outperform olderstatistical methods.
In the evolution from business data to business information, each new step has built upon theprevious one. For example, dynamic data access is critical for drill-through in data navigationapplications, and the ability to store large databases is critical to data mining. From the user’s
point of view, the four steps listed in Table 1 were revolutionary because they allowed newbusiness questions to be answered accurately and quickly.
Evolutionary
Step Business Question Enabling
Technologies Product
Providers Characteristics
Data Collection
(1960s)
"What was my totalrevenue in the last fiveyears?"
Computers, tapes,disks
IBM, CDC Retrospective,static datadelivery
Data Access
(1980s)
"What were unit salesin New England last
March?"
Relational databases(RDBMS),
Structured QueryLanguage (SQL),ODBC
Oracle,Sybase,
Informix,IBM,Microsoft
Retrospective,dynamic data
delivery atrecord level
DataWarehousing &
"What were unit salesin New England lastMarch? Drill down to
On-line analyticprocessing (OLAP),multidimensional
Pilot,Comshare,Arbor,
Retrospective,dynamic datadelivery at
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 14/33
DecisionSupport
(1990s)
Boston." databases, datawarehouses
Cognos,Microstrategy
multiple levels
Data Mining
(EmergingToday)
"What’s likely to
happen to Boston unitsales next month?Why?"
Advancedalgorithms,multiprocessorcomputers, massivedatabases
Pilot,Lockheed,IBM, SGI,numerousstartups(nascentindustry)
Prospective,proactiveinformationdelivery
Table 1. Steps in the Evolution of Data Mining.
The core components of data mining technology have been under development for decades, inresearch areas such as statistics, artificial intelligence, and machine learning. Today, the maturityof these techniques, coupled with high-performance relational database engines and broad dataintegration efforts, make these technologies practical for current data warehouse environments.
2.4 WHAT CAN DATA MINING DO?
Data mining is primarily used today by companies with a strong consumer focus - retail,financial, communication, and marketing organizations. It enables these companies to determinerelationships among "internal" factors such as price, product positioning, or staff skills, and
"external" factors such as economic indicators, competition, and customer demographics. And, itenables them to determine the impact on sales, customer satisfaction, and corporate profits.Finally, it enables them to "drill down" into summary information to view detail transactionaldata.
2.4.1 The Scope of Data Mining
Data mining derives its name from the similarities between searching for valuable businessinformation in a large database — for example, finding linked products in gigabytes of storescanner data — and mining a mountain for a vein of valuable ore. Both processes require eithersifting through an immense amount of material, or intelligently probing it to find exactly wherethe value resides. Given databases of sufficient size and quality, data mining technology can
generate new business opportunities by providing these capabilities:
Automated prediction of trends and behaviors. Data mining automates the process of finding predictive information in large databases. Questions that traditionally requiredextensive hands-on analysis can now be answered directly from the data — quickly. Atypical example of a predictive problem is targeted marketing. Data mining uses data onpast promotional mailings to identify the targets most likely to maximize return oninvestment in future mailings. Other predictive problems include forecasting bankruptcy
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 15/33
and other forms of default, and identifying segments of a population likely to respondsimilarly to given events.
Automated discovery of previously unknown patterns. Data mining tools sweepthrough databases and identify previously hidden patterns in one step. An example of
pattern discovery is the analysis of retail sales data to identify seemingly unrelatedproducts that are often purchased together. Other pattern discovery problems includedetecting fraudulent credit card transactions and identifying anomalous data that couldrepresent data entry keying errors.
Data mining techniques can yield the benefits of automation on existing software and hardwareplatforms, and can be implemented on new systems as existing platforms are upgraded and newproducts developed. When data mining tools are implemented on high performance parallelprocessing systems, they can analyze massive databases in minutes. Faster processing means thatusers can automatically experiment with more models to understand complex data. High speedmakes it practical for users to analyze huge quantities of data. Larger databases, in turn, yield
improved predictions.
Databases can be larger in both depth and breadth:
More columns. Analysts must often limit the number of variables they examine whendoing hands-on analysis due to time constraints. Yet variables that are discarded becausethey seem unimportant may carry information about unknown patterns. Highperformance data mining allows users to explore the full depth of a database, without pre-selecting a subset of variables.
More rows. Larger samples yield lower estimation errors and variance, and allow users
to make inferences about small but important segments of a population.
The most commonly used techniques in data mining are:
Artificial neural networks: Non-linear predictive models that learn through training andresemble biological neural networks in structure.
Decision trees: Tree-shaped structures that represent sets of decisions. These decisionsgenerate rules for the classification of a dataset. Specific decision tree methods includeClassification and Regression Trees (CART) and Chi Square Automatic InteractionDetection (CHAID) .
Genetic algorithms: Optimization techniques that use processes such as geneticcombination, mutation, and natural selection in a design based on the concepts of evolution.
Nearest neighbor method: A technique that classifies each record in a dataset based on acombination of the classes of the k record(s) most similar to it in a historical dataset(where k ³ 1). Sometimes called the k-nearest neighbor technique.
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 16/33
Rule induction: The extraction of useful if-then rules from data based on statisticalsignificance.
Many of these technologies have been in use for more than a decade in specialized analysis toolsthat work with relatively small volumes of data. These capabilities are now evolving to integrate
directly with industry-standard data warehouse and OLAP platforms. The appendix to this whitepaper provides a glossary of data mining terms.
2.5 HOW DOES DATA MINING WORK? While large-scale information technology has been evolving separate transaction and analyticalsystems, data mining provides the link between the two. Data mining software analyzesrelationships and patterns in stored transaction data based on open-ended user queries. Severaltypes of analytical software are available: statistical, machine learning, and neural networks.Generally, any of four types of relationships are sought:
Classes: Stored data is used to locate data in predetermined groups. For example, arestaurant chain could mine customer purchase data to determine when customers visitand what they typically order. This information could be used to increase traffic byhaving daily specials.
Clusters: Data items are grouped according to logical relationships or consumerpreferences. For example, data can be mined to identify market segments or consumeraffinities.
Associations: Data can be mined to identify associations. The beer-diaper example is anexample of associative mining.
Sequential patterns: Data is mined to anticipate behavior patterns and trends. Forexample, an outdoor equipment retailer could predict the likelihood of a backpack beingpurchased based on a consumer's purchase of sleeping bags and hiking shoes.
Data mining consists of five major elements:
Extract, transform, and load transaction data onto the data warehouse system.
Store and manage the data in a multidimensional database system.
Provide data access to business analysts and information technology professionals.
Analyze the data by application software.
Present the data in a useful format, such as a graph or table.
Different levels of analysis are available:
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 17/33
Artificial neural networks: Non-linear predictive models that learn through training andresemble biological neural networks in structure.
Genetic algorithms: Optimization techniques that use processes such as geneticcombination, mutation, and natural selection in a design based on the concepts of natural
evolution.
Decision trees: Tree-shaped structures that represent sets of decisions. These decisionsgenerate rules for the classification of a dataset. Specific decision tree methods includeClassification and Regression Trees (CART) and Chi Square Automatic InteractionDetection (CHAID) . CART and CHAID are decision tree techniques used forclassification of a dataset. They provide a set of rules that you can apply to a new(unclassified) dataset to predict which records will have a given outcome. CARTsegments a dataset by creating 2-way splits while CHAID segments using chi square teststo create multi-way splits. CART typically requires less data preparation than CHAID.
Nearest neighbor method: A technique that classifies each record in a dataset based on acombination of the classes of the k record(s) most similar to it in a historical dataset(where k 1). Sometimes called the k -nearest neighbor technique.
Rule induction: The extraction of useful if-then rules from data based on statisticalsignificance.
Data visualization: The visual interpretation of complex relationships inmultidimensional data. Graphics tools are used to illustrate data relationships.
How exactly is data mining able to tell you important things that you didn't know or what is
going to happen next? The technique that is used to perform these feats in data mining is calledmodeling. Modeling is simply the act of building a model in one situation where you know theanswer and then applying it to another situation that you don't. For instance, if you were lookingfor a sunken Spanish galleon on the high seas the first thing you might do is to research the timeswhen Spanish treasure had been found by others in the past. You might note that these shipsoften tend to be found off the coast of Bermuda and that there are certain characteristics to theocean currents, and certain routes that have likely been taken by the ship’s captains in that era.You note these similarities and build a model that includes the characteristics that are common tothe locations of these sunken treasures. With these models in hand you sail off looking fortreasure where your model indicates it most likely might be given a similar situation in the past.Hopefully, if you've got a good model, you find your treasure.
This act of model building is thus something that people have been doing for a long time,certainly before the advent of computers or data mining technology. What happens oncomputers, however, is not much different than the way people build models. Computers areloaded up with lots of information about a variety of situations where an answer is known andthen the data mining software on the computer must run through that data and distill thecharacteristics of the data that should go into the model. Once the model is built it can then beused in similar situations where you don't know the answer. For example, say that you are the
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 18/33
director of marketing for a telecommunications company and you'd like to acquire some newlong distance phone customers. You could just randomly go out and mail coupons to the generalpopulation - just as you could randomly sail the seas looking for sunken treasure. In neither casewould you achieve the results you desired and of course you have the opportunity to do muchbetter than random - you could use your business experience stored in your database to build a
model.
As the marketing director you have access to a lot of information about all of your customers:their age, sex, credit history and long distance calling usage. The good news is that you also havea lot of information about your prospective customers: their age, sex, credit history etc. Yourproblem is that you don't know the long distance calling usage of these prospects (since they aremost likely now customers of your competition). You'd like to concentrate on those prospectswho have large amounts of long distance usage. You can accomplish this by building a model.Table 2 illustrates the data used for building a model for new customer prospecting in a datawarehouse.
Customers Prospects
General information (e.g. demographicdata)
Known Known
Proprietary information (e.g. customertransactions)
Known Target
Table 2 - Data Mining for Prospecting
The goal in prospecting is to make some calculated guesses about the information in the lowerright hand quadrant based on the model that we build going from Customer General Informationto Customer Proprietary Information. For instance, a simple model for a telecommunicationscompany might be:
98% of my customers who make more than $60,000/year spend more than $80/month on longdistance
This model could then be applied to the prospect data to try to tell something about theproprietary information that this telecommunications company does not currently have access to.With this model in hand new customers can be selectively targeted.
Test marketing is an excellent source of data for this kind of modeling. Mining the results of atest market representing a broad but relatively small sample of prospects can provide a
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 19/33
foundation for identifying good prospects in the overall market. Table 3 shows another commonscenario for building models: predict what is going to happen in the future.
Yesterday Today Tomorrow
Static information and currentplans (e.g. demographic data,marketing plans)
Known Known Known
Dynamic information (e.g.customer transactions)
Known Known Target
Table 3 - Data Mining for Predictions
If someone told you that he had a model that could predict customer usage how would you knowif he really had a good model? The first thing you might try would be to ask him to apply hismodel to your customer base - where you already knew the answer. With data mining, the bestway to accomplish this is by setting aside some of your data in a vault to isolate it from themining process. Once the mining is complete, the results can be tested against the data held inthe vault to confirm the model’s validity. If the model works, its observations should hold for the
vaulted data.
Today, data mining applications are available on all size systems for mainframe, client/server,and PC platforms. System prices range from several thousand dollars for the smallestapplications up to $1 million a terabyte for the largest. Enterprise-wide applications generallyrange in size from 10 gigabytes to over 11 terabytes. NCR has the capacity to deliverapplications exceeding 100 terabytes. There are two critical technological drivers:
Size of the database: the more data being processed and maintained, the more powerfulthe system required.
Query complexity: the more complex the queries and the greater the number of queries
being processed, the more powerful the system required.
Relational database storage and management technology is adequate for many data miningapplications less than 50 gigabytes. However, this infrastructure needs to be significantlyenhanced to support larger applications. Some vendors have added extensive indexingcapabilities to improve query performance. Others use new hardware architectures such asMassively Parallel Processors (MPP) to achieve order-of-magnitude improvements in querytime. For example, MPP systems from NCR link hundreds of high-speed Pentium processors to
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 20/33
achieve performance levels exceeding those of the largest supercomputers.
2.7 AN ARCHITECTURE FOR DATA MINING
To best apply these advanced techniques, they must be fully integrated with a data warehouse aswell as flexible interactive business analysis tools. Many data mining tools currently operate
outside of the warehouse, requiring extra steps for extracting, importing, and analyzing the data.Furthermore, when new insights require operational implementation, integration with thewarehouse simplifies the application of results from data mining. The resulting analytic datawarehouse can be applied to improve business processes throughout the organization, in areassuch as promotional campaign management, fraud detection, new product rollout, and so on.Figure 1 illustrates an architecture for advanced analysis in a large data warehouse.
Figure 1 - Integrated Data Mining Architecture
The ideal starting point is a data warehouse containing a combination of internal data tracking allcustomer contact coupled with external market data about competitor activity. Backgroundinformation on potential customers also provides an excellent basis for prospecting. Thiswarehouse can be implemented in a variety of relational database systems: Sybase, Oracle,Redbrick, and so on, and should be optimized for flexible and fast data access.
An OLAP (On-Line Analytical Processing) server enables a more sophisticated end-userbusiness model to be applied when navigating the data warehouse. The multidimensionalstructures allow the user to analyze the data as they want to view their business – summarizingby product line, region, and other key perspectives of their business. The Data Mining Servermust be integrated with the data warehouse and the OLAP server to embed ROI-focused
business analysis directly into this infrastructure. An advanced, process-centric metadatatemplate defines the data mining objectives for specific business issues like campaignmanagement, prospecting, and promotion optimization. Integration with the data warehouseenables operational decisions to be directly implemented and tracked. As the warehouse growswith new decisions and results, the organization can continually mine the best practices andapply them to future decisions.
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 21/33
This design represents a fundamental shift from conventional decision support systems. Ratherthan simply delivering data to the end user through query and reporting software, the AdvancedAnalysis Server applies users’ business models directly to the warehouse and returns a proactive
analysis of the most relevant information. These results enhance the metadata in the OLAPServer by providing a dynamic metadata layer that represents a distilled view of the data.
Reporting, visualization, and other analysis tools can then be applied to plan future actions andconfirm the impact of those plans.
2.8 VISUALIZING DATA MINING MODELS The point of data visualization is to let the user understand what is going on. Since data miningusually involves extracting "hidden" information from a database, this understanding process canget somewhat complicated. In most standard database operations nearly everything the user seesis something that they knew existed in the database already. A report showing the breakdown of sales by product and region is straightforward for the user to understand because they intuitivelyknow that this kind of information already exists in the database. If the company sells differentproducts in different regions of the county, there is no problem translating a display of this
information into a relevant understanding of the business process.
Data mining, on the other hand, extracts information from a database that the user did not alreadyknow about. Useful relationships between variables that are non-intuitive are the jewels that datamining hopes to locate. Since the user does not know beforehand what the data mining processhas discovered, it is a much bigger leap to take the output of the system and translate it into anactionable solution to a business problem. Since there are usually many ways to graphicallyrepresent a model, the visualizations that are used should be chosen to maximize the value to theviewer. This requires that we understand the viewer's needs and design the visualization with thatend-user in mind. If we assume that the viewer is an expert in the subject area but not datamodeling, we must translate the model into a more natural representation for them. For thispurpose we suggest the use of orienteering principles as a template for our visualizations.
2.9 DATA MINING AND CUSTOMER RELATIONSHIP MANAGEMENT Customer relationship management (CRM) is a process that manages the interactions between acompany and its customers. The primary users of CRM software applications are databasemarketers who are looking to automate the process of interacting with customers.
To be successful, database marketers must first identify market segments containing customersor prospects with high-profit potential. They then build and execute campaigns that favorablyimpact the behavior of these individuals.
The first task, identifying market segments, requires significant data about prospective customersand their buying behaviors. In theory, the more data the better. In practice, however, massivedata stores often impede marketers, who struggle to sift through the minutiae to find the nuggetsof valuable information.
Recently, marketers have added a new class of software to their targeting arsenal. Data miningapplications automate the process of searching the mountains of data to find patterns that aregood predictors of purchasing behaviors.
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 22/33
After mining the data, marketers must feed the results into campaign management software that,as the name implies, manages the campaign directed at the defined market segments.
In the past, the link between data mining and campaign management software was mostly
manual. In the worst cases, it involved "sneaker net," creating a physical file on tape or disk,which someone then carried to another computer and loaded into the marketing database.
This separation of the data mining and campaign management software introduces considerableinefficiency and opens the door for human errors. Tightly integrating the two disciplines presentsan opportunity for companies to gain competitive advantage.
2.9.1 How Data Mining Helps Database Marketing Data mining helps marketing users to target marketing campaigns more accurately; and also toalign campaigns more closely with the needs, wants, and attitudes of customers and prospects.
If the necessary information exists in a database, the data mining process can model virtually anycustomer activity. The key is to find patterns relevant to current business problems.
Typical questions that data mining addresses include the following:
Which customers are most likely to drop their cell phone service?What is the probability that a customer will purchase at least Ksh1000 worth of merchandisefrom a particular mail-order catalog?Which prospects are most likely to respond to a particular offer?Answers to these questions can help retain customers and increase campaign response rates,which, in turn, increase buying, cross-selling, and return on investment (ROI).
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 23/33
CHAPTER 3
3.0 DATA COLLECTION AND ANALYSIS.
3.1 Data Collection Techniques.
This were fact-finding techniques that I used to collect data about the requirements of the system that I
proposed.
The methods I used included:
Questionnaire.
Observation
Interview
The main technique used was questionnaire as most of the Green Park Golf and Country Club
staff members are usually very busy and didn’t have time for an interview. Interview was used toensure accurate and comprehensive investigation. I mainly interviewed the Hotel manager.
3.1.1 Questionnaire.
Questionnaires were prepared to collect information about various aspects of the system from
various respondents. Questionnaires were designed and issued to selected population. The use of
standardized questionnaire helped to yield more reliable data than other fact-finding techniques and the
wide distribution ensured greater anonymity for respondents, which lead to more honest responses.
Questionnaires included both open and closed-ended questionnaires, which were filled by the
respondents. Open-ended questionnaires were used to learn about feelings, opinions, and general
experiences or to explore a process or a problem. Closed-ended questionnaires controlled the frame of
reference by presenting respondents with specific responses from which to choose.
The responses were not subjected to any influence, as the respondents were required to fill in the
questionnaires and nearly all the respondents responded in time.
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 24/33
3.1.2 Observation.
Observation enables the analyst to have to have an inside view of the system operations rather
than the outside view of the system.
In these case observation was used to get requirements that might be under looked by both the interview
and the questioner and not justify the achievable requirements and non-achievable requirement. The area
that the developer observed were the physical task performance and also familiarize with the commission
day today activities that go on there including customer’s registration task. The other reason for using
observation is that some workers may not be able to describe some work experience in exact terms and
may distort some facts about the system.
Advantages
Fast response time.
Enables the analyst to verify information obtained by other methods.
Disadvantages.
Time consuming.
Exaggeration may occur. Summary of the results gathered from the questionnaires and analysis using
pie charts and bar charts
3.1.3 Interview
An interview was conducted on one of the company’s manager and managed to gather the
following;
a) There was the need to analyze tourist request patterns.
b) There was need to find ways to enhance the expansion of the customer base.
After the interview I realized the proposed system would enhance the company’s business
activity in the following ways;
a) The proposed system would use data mining capacity of the software to analyze
tourist request patterns.b) The patterns, associations, or relationship among all this data can provide
information which can be converted into Knowledge about historical patterns and
future trends.
c) It would lead to improved profitability due to the enlarged customer base.
d) It would lead to enhanced stock variety leading directly to improved profitability.
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 25/33
Advantages of using interviews:
The analyst can be able frame questions differently to individuals depending on their levels of
understanding. Thus it allowed detailed facts gathering.
The analyst can observe non-verbal communication from the respondents or interviewees.
The response rate rends to be high
Provides immediate response
3.2 ANALYSIS OF THE COLLECTED DATA.
The research findings were analyzed to get the actual information about Green Park Hotel and
Country club. The table below represents the number of people who were interviewed and the
roles they play in the company.
Person interviewed Role in the company Number of interviewee(s)
Manager To manage the Hotel’s resources. 1
Chief Chef He is in charge of food department. 2Cashier Takes the money and issues receipts 3
Waiters They offer services in the hotel 7
Housekeepers In charge of room maintenance 5
Receptionist Front office personnel 2
Total 46
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 26/33
Summary
The majority of the employees that responded were the waiters. They complained about
the slowness of services due to the data accumulation.
The cashiers also helped to analyze the system since they were using the current system
during day to day operation.
For the manager the main concern was the lack of knowledge about customer’s future
trends because of lack of mining tools in the current system which can be of help in decision
making.
The Chief Chef was also concerned because the current system could not help in knowing
what food the customers preferred ordered and at which time. So as to know their trends.
3.3 FINDINGS AND RESULTS ANALYSIS
The respondents answered all the questions and yielded the following:
No of
respond
ents
Question Answer conclusion
20 Are you an employee of Green
Park Hotel and Country Club?
All the respondents were
employee of the hotel.
All were employees.
20 If yes, which position do you Manager -1
Manager
Chief Chef
Cashier
Waiters
Houskeeper
Receptionist
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 27/33
hold in Green Park Hotel and
Country Club?
Chief chef -2
cashier -8
others -9
20 How long have you been
working in Green Park Hotel
and Country Club?
Less than I year-1
1-3 years-10
More than 3 years-9
Majority of them have
been in the hotel for
less than 3 years.
20 Are you computer literate? Yes-17
No-3
Many of them were
computer literate.
20 For how long have Green Park
Hotel and Country Club
accumulated data?
Most of them felt that it was
for quiet a long time
20 Which season do you get many
customers?
Summer -12
Winter -2
Spain -3
Autumn -2
Most respondents
selected summer as the
season with most
customers.
18 On average how many
customers do you serve per
day?
Less than 100 -0
Between 100-500 -6
Between 500-1000-12
More than 1000-1
Majority selected
between 500-1000
9 How do you store your data? In files and cabinets -2
In a DBMS -7
Others -0
Most choose DBMS.
20 How many transactions does
the hotel make per day?
Less than 100 -0
Between 100-500 -6
Between 500-1000-12
More than 1000-1
20 From which region do most of
your customers come from?
Within Kenya -2
Within Eastern Africa
continent -3
Within Africa continent-9
Within Africa
continent.
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 28/33
Worldwide -6.
20 How fast do you think the
employees will take to adapt the
data mining system?
Very fast -14
Fast -3
Slow -1
Too slow -0
The respondents felt
that it will be adapted
very fast.
20 Which of the problem do you
think will be solved by the data
mining system?
Accumulation of data -0
Analyzing of customers
habits-0
Understanding the customers
mixes -3
Pricing and focusing of
services to a group of
customers-0
All the above -20.
None of the above-0
Employees thought it
will solve all problems
that were provided in
the question
19 Do you think implementing
data mining system will be
cheaper or will it provide
additional expenses?
It will be cheaper -12
There are cost implications-4
Not sure-3
Majority thought it
will be cheap.
20 Will the data mining system
make good use of resources
available in the hotel?
Yes-17
No-3
87 % felt that it will
make good use of the
resources.
15 Would you recommend a data
mining system for the Hotel?
Yes-12
No-3
Majority of them felt
that it would be better
to implement one.
15 If yes, which features of the
current system needs to be
improved?
Majority felt that the data
storage and also data mining
should be implemented.
3.4 FINDINGS.
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 29/33
From the respondents’ feedback, it can be concluded that most of the respondents recommended
development of a data mining system that will be able to analyze tourist requests, convert the
information into knowledge to help to know the future trend of the Hotel, enable the
management to determine the impacts on sales, customer satisfaction and hotel’s profits.
3.5 Secondary sources of information included:-
i) Internet – Material provided on the internet was used to develop a clear understanding on the use of
data mining technique to analyze tourist request patterns, determine the impact on sales, customer
satisfaction e.t.c. This provided a wide variety of information on development of a data mining system.
ii) Books – Books, magazines and journals on human-computer interaction also contributed to the
collection of the required information.
iii) Analyzing documents
The organization’s documents that were analyzed during data collection process were grouped into three
categories:
Documents that might describe the problem: Those analyzed included customer complaints,
interoffice memos, suggestion box notes, reports, work measurement reviews and accounting records.
Documents that might describe the business functions: Those analyzed included: Organizational
policies, departmental objectives and standard operating procedures.
3.6 FEASIBILITY STUDYFeasibility study was carried out to determine whether the proposed system was worthwhile.
3.6.1 Economic Feasibility
An economic feasibility study of the system was carried out to establish whether the benefits of
the proposed system outweigh the cost of implementation of the new system. It was found that the system
is viable since the organization has the resources required during the implementation of the system. The
system will be able to determine the impact on sales and hotel’s profits.
BENEFITS AMOUNT
Costs saved on file buying and
maintenance (yearly)
35,000
Costs saved on salaries (yearly) 55,000
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 30/33
COST ANALYSES TABLE.
BENEFITS ANALYSIS TABLE
3.6.2 Legal Feasibility
The management is keen on the system legality and registration if any. The use of valid
development tools and software is of utmost importance to ensure the system passes all the legal
requirements and tests. The work is also copyrighted as original work.
3.6.3 Social Feasibility.
The system in addition to analyzing tourist request patterns will enhance communication
and interaction between the employees and customers. This will improve communication.
3.6.4 Operational Feasibility.
An Operational Feasibility study was carried out to address the question whether the new
system operations are acceptable to the users. The study indicated that the intended system users
support implementation of the system, there was no resistance to the new system to be
implemented and therefore they will be able to use the system with little training. The system
will not affect the organizational structure the effect of the system on the existing organizational
structure.
The following were the main areas touched on:
The effect of the system on current organization structure Implication of the system on existing staff development programmers
Redundancy and retrenchment implication to the employees as a result of new
system.
3.6.5 Technical Feasibility.
Technical Feasibility study was conducted to determine whether the proposed system can be
Loss avoidance (yearly) 20,000
TOTAL 110,000
COST AMOUNT
Maintenance cost 9,000
Training cost 10, 000
Operational cost 15,400
Development cost 27, 000
Total 61,400
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 31/33
implemented using the available hardware, software and technical resources. The study indicated that the
institution has enough resources required in terms of equipment, personnel, technology and the likelihood
that it can be developed
Technical feasibility thus was aimed at evaluation of the following:
The hardware required for the new system
The software required for the new system
Determination of whether the current facilities are adequate or inadequate for the new system after
implementation.
Evaluation of the current technology and how application it is to the new system.
The inputs, outputs, files and procedures that the proposed system should have as compared to the
outputs, files and procedures for the current system.
3.7 SYSTEM REQUIREMENT SPECIFICATION
Functional Requirements
1. The system should provide an easy and user friendly interface for employees convenient
use
2. The system should be able to capture and store all the useful data.
3. It should provide full functionality in the sense that it should be able to analyze tourist
request patterns.
4. The system should also be able to convert the customer’s patterns, associations, or
relationships among all data into knowledge so as to help in knowing historical patterns
and future trends of the customers.
5. The system should be able to determine relationships among internal factors such as price
or service delivery and external factors such as economic indicators and competition.
6. The system should enable the management to drill down into summary information to
view detail transactional data.
7. The system should also enable the management to determine the impact on sales,
customer satisfaction and hotel’s profits.
Non-Functional Requirements.
1. Reliability
The system will be available to the user whenever required.
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 32/33
2. Maintainability
The system can handle changes in the future at minimal costs and effort.
3. Re-Usability
Modules in the system can be used more than once throughout the system and in future versions
of the same.
4. Resource Utilization.
The system should ensure that all resources are effectively used.
3.8 Legal Requirements
The system shall comply with all legal requirements as stated by the government law and
must make sure that all copyrights are adhered to which prevent it from competitor copying.
3.8.1 User requirement
To enhance usability which is an important aspect of web navigation, the website interface
should maintain consistency of colors and language and simplicity to the users for understanding
purposes. The system is easily adoptable to its usage through the help manual. In response to
repetitive strain injuries to the users appropriate work design should be implemented, with anti-
glares in place to protect the users’ eyes and regular breaks for the relaxation purposes.
3.8.2 Data Requirements
Input data
The system will require the user to input data through the keyboard and the mouse so that he/she
will be authenticated.
Output data
The system will enable querying from the database all the details the system stores and display
them for the user.
8/3/2019 Project Documentation Carol George
http://slidepdf.com/reader/full/project-documentation-carol-george 33/33