chapter 1 introduction to data analysis and decision · pdf file2 chapter 1 introduction to...

28
1 CHAPTER 1 USING QUANTITATIVE METHODS TO SOLVE REAL BUSINESS PROBLEMS A s you embark on your study of data analysis and decision making, you might question the usefulness of quantitative methods to the “real world.” A front-page article in the December 31, 1997, edition of USA Today entitled “Higher Math Delivers Formula for Success” provides some convinc- ing evidence of the applicability of the methods you will be learning.The sub- heading of the article,“Businesses turn to algorithms to solve complex problems,” says it all.Today’s business problems tend to be very complex.In the past, many managers and executives used a “by the seat of your pants” approach to solve problems—that is, they used their business experience, their intuition, and some thoughtful guesswork to obtain solutions. But com- mon sense and intuition go only so far in the solution of the complex prob- lems businesses now face.This is where data analysis and decision making— and the algorithms mentioned in the title of the article—are so useful.When the methods in this book are implemented in user-friendly computer soft- ware packages and are then applied to complex problems, the results can be amazing. Robert Cross, whose company, DFI Aeronomics, sells algorithm- based systems to airlines, states it succinctly:“It’s like taking raw information and spinning money out of it.” The power of the methods in this book is that they are applicable to so many problems and environments.The article mentions the following © Larry Downing/Reuters/Landov Introduction to Data Analysis and Decision Making

Upload: dangkhuong

Post on 06-Feb-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

1

C H A P T E R

1

USING QUANTITATIVE METHODS TO SOLVE REALBUSINESS PROBLEMS

As you embark on your study of data analysis and decision making, youmight question the usefulness of quantitative methods to the “real

world.” A front-page article in the December 31, 1997, edition of USA Todayentitled “Higher Math Delivers Formula for Success” provides some convinc-ing evidence of the applicability of the methods you will be learning.The sub-heading of the article,“Businesses turn to algorithms to solve complexproblems,” says it all.Today’s business problems tend to be very complex. Inthe past, many managers and executives used a “by the seat of your pants”approach to solve problems—that is, they used their business experience,their intuition, and some thoughtful guesswork to obtain solutions. But com-mon sense and intuition go only so far in the solution of the complex prob-lems businesses now face.This is where data analysis and decision making—and the algorithms mentioned in the title of the article—are so useful.Whenthe methods in this book are implemented in user-friendly computer soft-ware packages and are then applied to complex problems, the results can beamazing. Robert Cross, whose company, DFI Aeronomics, sells algorithm-based systems to airlines, states it succinctly:“It’s like taking raw informationand spinning money out of it.”

The power of the methods in this book is that they are applicable toso many problems and environments.The article mentions the following

© L

arry

Dow

ning

/Reu

ters

/Lan

dov

Introduction to Data Analysis and Decision Making

00837_01_ch1_p0001-0028.ps 6/17/05 12:53 PM Page 1

Page 2: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

2 Chapter 1 Introduction to Data Analysis and Decision Making

“success stories” where quantitative analysis has been applied; others will be discussedthroughout this book.

1. United Airlines installed one of DFI’s systems, which cost between $10 million and$20 million. United expects the system to add $50 million to $100 million annuallyto its revenues.

2. The Gap clothing chain uses quantitative analysis to determine exactly how manyemployees should staff each store during the holiday rush.

3. Quantitative analysis has helped medical researchers test potentially dangerousdrugs on fewer people with better results.

4. IBM obtained a $93-million contract to build a computer system for theDepartment of Energy that would do a once-impossible task: make exact real-timemodels of atomic blasts. It won the contract—and convinced the DOE that its sys-tem was cost effective—only by developing quantitative methods that would cut theprocessing time by half.

5. Hotels, airlines, and television broadcasters all use quantitative analysis to implementa new method called “yield management.” In this method, different prices arecharged to different customers, depending on their willingness to pay.The effect isthat more customers are attracted, and revenues increase.

The article concludes by stating that Microsoft’s Excel spreadsheet software contains amini-optimization program called Solver.This is a key statement. Many of the algorithms thatenable the successes discussed in the article are very complex mathematically.They are wellbeyond the grasp of the typical user, including most readers of this book. However, usersno longer need to understand all of the details behind the algorithms.They need only toknow how to model business problems so that appropriate algorithms can be applied andthen how to apply them with user-friendly software. For example, we see in Chapters 14and 15 how to apply Excel’s Solver to a variety of complex problems.You will not learn theintricacies of how Solver does its optimization, but you will learn how to use Solver veryproductively.The same statement applies to the other methods discussed in this book.Youmight not understand exactly what is happening in the computer’s “black box” as it per-forms its calculations, but you will learn how to become a very effective problem solver bytaking advantage of powerful software. ■

1.1 INTRODUCTIONWe are living in the age of technology. This has two important implications for everyoneentering the business world. First, technology has made it possible to collect huge amountsof data. Retailers collect point-of-sale data on products and customers every time a trans-action occurs; credit agencies have all sorts of data on people who have or would like toobtain credit; investment companies have a limitless supply of data on the historical pat-terns of stocks, bonds, and other securities; and government agencies have data on eco-nomic trends, the environment, social welfare, consumer product safety, and virtuallyeverything else we can imagine. It has become relatively easy to collect the data. As aresult, data are plentiful. However, as many organizations are now beginning to discover, itis quite a challenge to analyze and make sense of all the data they have collected.

A second important implication of technology is that it has given many more people thepower and responsibility to analyze data and make decisions on the basis of quantitativeanalysis. Those entering the business world can no longer pass all of the quantitative analy-sis to the “quant jocks,” the technical specialists who have traditionally done the numbercrunching. The vast majority of employees now have a desktop or laptop computer at their

00837_01_ch1_p0001-0028.ps 6/17/05 12:53 PM Page 2

Page 3: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

disposal, they have access to relevant data, and they have been trained in easy-to-use soft-ware, particularly spreadsheet and database software. For these employees, statistics andother quantitative methods are no longer forgotten topics they once learned in college.Quantitative analysis is now an integral part of their daily jobs.

A large amount of data already exists and will only increase in the future. Many com-panies already complain of swimming in a sea of data. However, enlightened companiesare seeing this expansion as a source of competitive advantage. By using quantitativemethods to uncover the information in the data and then acting on this information—againguided by quantitative analysis—they are able to gain advantages that their less enlight-ened competitors are not able to gain. Several pertinent examples of this follow.

■ Direct marketers analyze enormous customer databases to see which customers arelikely to respond to various products and types of promotions. Marketers can thentarget different classes of customers in different ways to maximize profits—and givetheir customers what the customers want.

■ Hotels and airlines also analyze enormous customer databases to see what their cus-tomers want and are willing to pay for. By doing this, they have been able to devisevery clever pricing strategies, where not everyone pays the same price for the sameaccommodations. For example, a business traveler typically makes a plane reserva-tion closer to the time of travel than a vacationer. The airlines know this. Therefore,they reserve seats for these business travelers and charge them a higher price (for thesame seats). The airlines profit, and the customers are happy.

■ Financial planning services have a virtually unlimited supply of data about securityprices, and they have customers with widely differing preferences for various typesof investments. Trying to find a match of investments to customers is a very chal-lenging problem. However, customers can easily take their business elsewhere ifgood decisions are not made on their behalf. Therefore, financial planners are underextreme competitive pressure to analyze masses of data so that they can makeinformed decisions for their customers.

■ We all know about the pressures U.S. manufacturing companies have faced from foreigncompetition in the past couple of decades. The automobile companies, for example,have had to change the way they produce and market automobiles to stay in business.They have had to improve quality and cut costs by orders of magnitude. Although thestruggle continues, much of the success they have had can be attributed to data analysisand wise decision making. Starting on the shop floor and moving up through the organi-zation, these companies now measure almost everything they do, analyze these measure-ments, and then act on the information from these measurements.

We talk about companies analyzing data and making decisions. However, companies don’treally do this; people do it. And who will these people be in the future? They will be you! Weknow from experience that students in all areas of business, at both the undergraduate andgraduate level, will soon be required to describe large complex data sets, run regressionanalyses, make quantitative forecasts, create optimization models, and run simulations. Youare the person who will soon be analyzing data and making important decisions to help gainyour company a competitive advantage. And if you are not willing or able to do so, there willbe plenty of other technically trained people who will be more than happy to replace you.

Our goal in this book is to teach you how to use a variety of quantitative methods toanalyze data and make decisions. We plan to do so in a very hands-on way. We discuss anumber of quantitative methods and illustrate their use in a large variety of realistic busi-ness problems. As you will see, this book includes many examples from finance, market-ing, operations, accounting, and other areas of business. To analyze these examples, wetake advantage of the Microsoft Excel spreadsheet package, together with a number of

1.1 Introduction 3

00837_01_ch1_p0001-0028.ps 6/17/05 12:53 PM Page 3

Page 4: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

powerful Excel add-ins. In each example we will provide step-by-step details of themethod and its implementation in Excel.

This is not a “theory” book. It is also not a book where you can lean comfortably backin your chair, prop your legs up on a table, and read about how other people use quantita-tive methods. It is a “get your hands dirty” book, where you will learn best by actively fol-lowing the examples throughout the book at your own PC. In short, you will learn bydoing. By the time you have finished, you will have acquired some very useful skills fortoday’s business world.

1.2 AN OVERVIEW OF THE BOOKThis book is packed with quantitative methods and examples, probably more than can becovered in any single course. Therefore, we purposely intend to keep this introductorychapter brief so that you can get on with the analysis. Nevertheless, it is useful to introducethe methods you will be learning and the tools you will be using. In this section we providean overview of the methods covered in this book and the software that is used to implementthem. Then in the next section we preview some of the examples we cover in much moredetail in later chapters. Finally, we present a brief discussion of models and the modelingprocess. Our primary purpose at this point is to stimulate your interest in what is to follow.

1.2.1 The Methods

This book is rather unique in that it combines topics from two separate fields: statistics andmanagement science. In a nutshell, statistics is the study of data analysis, whereas man-agement science is the study of model building, optimization, and decision making. In theacademic arena these two fields traditionally have been separated, sometimes widely.Indeed, they are often housed in separate academic departments. However, from a user’sstandpoint it makes little sense to separate them. Both are useful in accomplishing what thetitle of this book promises: data analysis and decision making.

Therefore, we do not distinguish between the “statistics” and “management science”parts of this book. Instead, we view the entire book as a collection of useful quantitativemethods that can be used to analyze data and help make business decisions. In addition,our choice of software helps to integrate the various topics. By using a single package,Excel, together with a number of add-ins, we see that the methods of statistics and man-agement science are similar in many important respects. Most importantly, their combina-tion gives us the power and flexibility to solve a wide range of business problems.

Three important themes run through this book. Two of them are in the title: data analysisand decision making. The third is dealing with uncertainty.1 Each of these themes has sub-themes. Data analysis includes data description, data inference, and the search for relationships in data. Decision making includes optimization techniques for problems withno uncertainty, decision analysis for problems with uncertainty, and structured sensitivityanalysis. Dealing with uncertainty includes measuring uncertainty and modeling uncertaintyexplicitly into the analysis. There are obvious overlaps between these themes and subthemes.When we make inferences from data and search for relationships in data, we must deal withuncertainty. When we use decision trees to help make decisions, we must deal with uncer-tainty. When we use simulation models to help make decisions, we must deal with uncertainty,and we often make inferences from the simulated data.

4 Chapter 1 Introduction to Data Analysis and Decision Making

1The fact that the uncertainty theme did not find its way into the title of this book does not detract from its impor-tance. We just wanted to keep the title reasonably short!

00837_01_ch1_p0001-0028.ps 6/17/05 12:53 PM Page 4

Page 5: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

Figure 1.1 shows where you will find these themes and subthemes in the remaining chap-ters of this book. In the next few paragraphs we discuss the book’s contents in more detail.

1.2 An Overview of the Book 5

Figure 1.1Themes andSubthemes

We begin in Chapters 2 and 3 by illustrating a number of ways to summarize the infor-mation in data sets. These include graphical and tabular summaries, as well as numericalsummary measures such as means, medians, and standard deviations. The material in thesetwo chapters is elementary from a mathematical point of view, but it is extremely impor-tant. As we stated at the beginning of this chapter, organizations are now able to collecthuge amounts of raw data. The question then becomes, What does it all mean? Althoughthere are very sophisticated methods for analyzing data sets, some of which we cover inlater chapters, the “simple” methods in Chapters 2 and 3 are crucial for obtaining an initialunderstanding of the data. Fortunately, Excel and available add-ins now make what wasonce a very tedious task quite easy. For example, Excel’s pivot table tool for “slicing anddicing” data is an analyst’s dream come true. You will be amazed at the complex analysis itenables you to perform—with almost no effort!

After the analysis in Chapters 2 and 3, we step back for a moment in Chapter 4 to seehow we get the data we need in the first place. We know from experience that many studentsand businesspeople are able to perform appropriate statistical analysis once they have thedata in a suitable form. Often the most difficult part, however, is getting the right data, in theright form, into a software package for analysis. Therefore, in Chapter 4 we present a num-ber of extremely useful methods for doing this within Excel. Specifically, we discuss meth-ods for using Excel’s built-in filtering tools to perform queries on Excel data sets, for usingMicrosoft Query (part of Microsoft Office) to perform queries on external databases (suchas Access) and bring the resulting data into Excel, for importing data directly into Excelfrom Web sites, and for “cleansing” data sets (getting rid of “bad” data values). This chapterprovides tools that many analysts need but are usually not even aware of.

Uncertainty is a key aspect of most business problems. To deal with uncertainty, we needa basic understanding of probability. We provide this understanding in Chapters 5 and 6.

00837_01_ch1_p0001-0028.ps 6/17/05 12:53 PM Page 5

Themes Subthemes Chapters Where Emphasized

Page 6: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

Chapter 5 covers basic rules of probability and then discusses the extremely important con-cept of probability distributions. Chapter 6 follows up this discussion by focusing on two ofthe most important probability distributions, the normal and binomial distributions. It alsobriefly discusses the Poisson and exponential distributions, which have many applications inprobability models.

We have found that one of the best ways to make probabilistic concepts “come alive”and easier to understand is by using computer simulation. Therefore, simulation is a com-mon theme that runs through this book, beginning in Chapter 5. Although the final twochapters of the book are devoted entirely to simulation, we do not hesitate to use simula-tion early and often to illustrate difficult statistical concepts.

In Chapter 7 we apply our knowledge of probability to decision making under uncer-tainty. These types of problems—faced by all companies on a continual basis—are charac-terized by the need to make a decision now, even though important information (such asdemand for a product or returns from investments) will not be known until later. The mate-rial in Chapter 7 provides a rational basis for making such decisions. The methods weillustrate do not guarantee perfect outcomes—the future could unluckily turn out differ-ently than we had expected—but they do enable us to proceed rationally and make the bestof the given circumstances. Additionally, the software we use to implement these methodsallows us, with very little extra work, to see how sensitive the optimal decisions are toinputs. This is crucial because the inputs to many business problems are, at best, educatedguesses. Finally, we examine the role of risk aversion in these types of decision problems.

In Chapters 8, 9, and 10 we discuss sampling and statistical inference. Here the basicproblem is to estimate one or more characteristics of a population. If it is too expensive ortime consuming to learn about the entire population—and it usually is—we instead selecta random sample from the population and then use the information in the sample to inferthe characteristics of the population. We see this continually on news shows that describethe results of various polls. We also see it in many business contexts. For example, auditorstypically sample only a fraction of a company’s records. Then they infer the characteristicsof the entire population of records from the results of the sample to conclude whether thecompany has been following acceptable accounting standards.

In Chapters 11 and 12 we discuss the extremely important topic of regression analysis,which is used to study relationships between variables. The power of regression analysis isits generality. Every part of a business has variables that are related to one another, andregression can often be used to estimate possible relationships between these variables. Inmanagerial accounting, regression is used to estimate how overhead costs depend on directlabor hours and production volume. In marketing, regression is used to estimate how salesvolume depends on advertising and other marketing variables. In finance, regression isused to estimate how the return of a stock depends on the “market” return. In real estatestudies, regression is used to estimate how the selling price of a house depends on theassessed valuation of the house and characteristics such as the number of bedrooms andsquare footage. Regression analysis finds perhaps as many uses in the business world asany method in this book.

From regression, we move to times series analysis and forecasting in Chapter 13. Thistopic is particularly important for providing inputs into business decision problems. Forexample, manufacturing companies must forecast demand for their products to make sen-sible decisions about quantities to order from their suppliers. Similarly, fast-food restau-rants must forecast customer arrivals, sometimes down to the level of 15-minute intervals,so that they can staff their restaurants appropriately.

There are many approaches to forecasting, ranging from simple to complex. Someinvolve regression-based methods, in which one or more time series variables are used toforecast the variable of interest, whereas other methods are based on extrapolation. In anextrapolation method the historical patterns of a time series variable, such as product

6 Chapter 1 Introduction to Data Analysis and Decision Making

00837_01_ch1_p0001-0028.ps 6/17/05 12:53 PM Page 6

Page 7: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

demand or customer arrivals, are studied carefully and are then “extrapolated” into thefuture to obtain forecasts. A number of extrapolation methods are available. In Chapter 13we study both regression and extrapolation methods for forecasting.

Chapters 14 and 15 are devoted to spreadsheet optimization, with emphasis on linearprogramming. We assume a company must make several decisions, and there are con-straints that limit the possible decisions. The job of the decision maker is to choose thedecisions such that all of the constraints are satisfied and an objective, such as total profitor total cost, is optimized. The solution process consists of two steps. First, we build aspreadsheet model that relates the decision variables to other relevant quantities by meansof logical formulas. In this first step there is no attempt to find the optimal solution; all wewant to do is relate all relevant quantities in a logical way. The second step is then to findthe optimal solution. Fortunately, Excel contains a Solver add-in that performs this step.All we need to do is specify the objective, the decision variables, and the constraints;Solver then uses powerful algorithms to find the optimal solution. As with regression, thepower of this approach is its generality. An enormous variety of problems can be solved byspreadsheet optimization.

Finally, Chapters 16 and 17 illustrate a number of computer simulation models. This isnot our first exposure to simulation—it is used in a number of previous chapters to illustratestatistical concepts—but here it is studied in its own right. As we discussed previously, mostbusiness problems have some degree of uncertainty. The demand for a product is unknown,future interest rates are unknown, the delivery lead time from a supplier is unknown, and soon. Simulation allows us to build this uncertainty explicitly into spreadsheet models.Essentially, some cells in the model contain random values with given probability distribu-tions. Every time the spreadsheet recalculates, these random values change, which causes“bottom-line” output cells to change as well. The trick then is to force the spreadsheet torecalculate many times and keep track of interesting outputs. In this way we can see whichoutput values are most likely, and we can see best-case and worst-case results.

Spreadsheet simulations can be performed entirely with Excel’s built-in tools. However,this can be quite tedious. Therefore, we use a spreadsheet add-in to streamline the process. Inparticular, we learn how the @RISK add-in can be used to run replications of a simulation,keep track of outputs, create useful charts, and perform sensitivity analyses. With the inher-ent power of spreadsheets and the ease-of-use of such add-ins as @RISK, spreadsheet simu-lation is becoming one of the most popular quantitative tools in the business world.

1.2.2 The Software

The topics we have just discussed are very important. Together, they can be used to solve awide variety of business problems. However, they are not of much practical use unless wehave the software to do the number crunching. Very few business problems are smallenough to be solved with pencil and paper. They require powerful software.

The software included in new copies of this book, together with Microsoft Excel, pro-vides you with a powerful software combination that you will not use for one course andthen discard. This software is being used—and will continue to be used—by leading com-panies all over the world to solve large, complex problems. We firmly believe that theexperience you obtain with this software, through working the examples and problems inthis book, will give you a key competitive advantage in the marketplace.

It all begins with Excel. All of the quantitative methods that we discuss are imple-mented in Excel. We cannot forecast the state of computer software in the long-termfuture, but as we are writing this book Excel is the most heavily used spreadsheet packageon the market, and there is every reason to believe that this state will persist for manyyears. Most companies use Excel, most employees and most students have been trained inExcel, and Excel is a very powerful, flexible, and easy-to-use package.

1.2 An Overview of the Book 7

00837_01_ch1_p0001-0028.ps 6/17/05 12:53 PM Page 7

Page 8: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

Built-in Excel Features

Virtually everyone in the business world knows the basic features of Excel, but relatively fewknow many of its more powerful features. In short, relatively few people are the “powerusers” we expect you to become by working through this book. To get you started, the fileExcel Tutorial.doc on the CD-ROM inside new copies of this book explains some of the“intermediate” features of Excel—features that we expect you to be able to use. Theseinclude the SUMPRODUCT, VLOOKUP, IF, NPV, and COUNTIF functions. They alsoinclude range names, the Data Table command, the Paste Special command, the Goal Seekcommand, and a few others. Finally, although we assume you can perform routine spread-sheet tasks such as copying and pasting, we include a few tips to help you perform these tasksmore efficiently.

In the body of the book we describe several of Excel’s advanced features in moredetail. In Chapters 2 and 3 we introduce pivot tables, the Excel tool that enables you tosummarize data sets in an almost endless variety of ways. (Excel has many useful tools, butwe personally believe that pivot tables are the most ingenious and powerful of all. Wewon’t be surprised if you agree.) Beginning in Chapter 5, we introduce Excel’s RANDfunction for generating random numbers. This function is used in all spreadsheet simula-tions (at least those that do not take advantage of an add-in).

Solver Add-in

In Chapters 14 and 15 we make heavy use of Excel’s Solver add-in. This add-in, developedby Frontline Systems (not Microsoft), uses powerful algorithms—all behind the scenes—to perform spreadsheet optimization. Before this type of spreadsheet optimization add-inwas available, specialized (nonspreadsheet) software was required to solve optimizationproblems. Now we can do it all within a familiar spreadsheet environment.

StatTools Add-in

Much of this book discusses basic statistical analysis. Here we needed to make an importantdecision as we developed the book. A number of excellent statistical software packages are onthe market, including Minitab, SPSS, SAS, StatGraphics, and many others. Although there arenow user-friendly Windows versions of these packages, they are not spreadsheet-based. Wehave found through our own experience that students resist the use of nonspreadsheet pack-ages, regardless of their inherent quality, so we wanted to use Excel as our “statistics package.”(We briefly discuss SPSS and SAS in Chapter 4, but they are not used anywhere else in thebook.) Unfortunately, Excel’s built-in statistical tools are rather limited, and the AnalysisToolPak (developed by a third party) that ships with Excel has significant limitations.

Therefore, we developed an add-in called StatTools that accompanies this book.2

StatTools is powerful, easy to use, and capable of generating output quickly in an easilyinterpretable form. We do not believe you should have to spend hours each time you wantto produce some statistical output. This might be a good learning experience the first time,but after that it acts as a strong incentive not to perform the analysis at all! We believe youshould be able to generate output quickly and easily. This gives you the time to interpretthe output, and it also allows you to try different methods of analysis.

A good illustration involves the construction of histograms, scatterplots, and time seriesgraphs, discussed in Chapter 2. All of these extremely useful graphs can be created in a straight-forward way with Excel’s built-in tools. But by the time you perform all the necessary steps and“dress up” the charts exactly as you want them, you will not be very anxious to repeat the wholeprocess again. StatTools does it all quickly and easily. (You still might want to “dress up” the

8 Chapter 1 Introduction to Data Analysis and Decision Making

2Users of the previous edition of the book will note the change from StatPro to StatTools. Palisade Corporationhas redeveloped StatPro as a commercial package under the name StatTools. The user interface has changed con-siderably (for the better), but the statistical functionality is virtually the same.

Although the tutorial ispresented in a Wordfile, it contains“embedded” Excelspreadsheets thatallow you to practicespreadsheet techniqueswithin Word.

00837_01_ch1_p0001-0028.ps 6/17/05 12:53 PM Page 8

Page 9: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

resulting charts, but that’s up to you.) Therefore, if we advise you in a later chapter, say, to lookat several scatterplots as a prelude to a regression analysis, you can do so in a matter of seconds.

SolverTable Add-in

An important theme throughout this book is sensitivity analysis: How do outputs changewhen inputs change? Typically these changes are made in spreadsheets with a data table, abuilt-in Excel tool. However, data tables don’t work in optimization models, where wewould like to see how the optimal solution changes when certain inputs change. Therefore,we include an Excel add-in called SolverTable to perform this type of sensitivity analysis.It works almost exactly like Excel’s data tables, and it is included with this book. InChapters 14 and 15 we explain how to use SolverTable.

Decision Tools Suite

In addition to StatTools, SolverTable, and built-in Excel add-ins, we also have included inthis book a slightly scaled-down version of Palisade Corporation’s powerful DecisionTools suite. Most of the items in this suite are Excel add-ins—so the learning curve isn’tvery steep. There are six separate packages in this suite: @RISK, PrecisionTree, TopRank,RISKOptimizer, BestFit, and RISKview. The first two are the most important for our pur-poses, but all are useful for certain tasks.

@RISK

The simulation add-in @RISK enables us to run as many replications of a spreadsheet sim-ulation as we like. As the simulation runs, @RISK automatically keeps track of the outputswe select, and it then displays the results in a number of tabular and graphical forms.@RISK also enables us to perform a sensitivity analysis, so that we can see which inputshave the most effect on the outputs. Finally, @RISK provides a number of spreadsheet func-tions that enable us to generate random numbers from a variety of probability distributions.

PrecisionTree

The PrecisionTree add-in is used in Chapter 7 to analyze decision problems with uncer-tainty. The primary method for performing this type of analysis is to draw a decision tree.Decision trees are inherently graphical, and they have always been difficult to implementin spreadsheets, which are based on rows and columns. However, PrecisionTree does thisin a very clever and intuitive way. Equally important, once the basic decision tree has beenbuilt, it is easy to use PrecisionTree to perform a sensitivity analysis on the model inputs.

TopRank

Although we will not use the other Palisade add-ins as extensively as @RISK andPrecisionTree, they are all worth investigating. TopRank is the most general of them. Itstarts with any spreadsheet model, where a set of inputs are used, along with a number ofspreadsheet formulas, to produce an output. TopRank then performs a sensitivity analysisto see which inputs have the largest effect on the output. For example, it might tell uswhich input affects after-tax profit the most: the tax rate, the risk-free rate for investing, theinflation rate, or the price charged by a competitor. Unlike @RISK, TopRank is used whenuncertainty is not explicitly built into a spreadsheet model. However, it considers uncer-tainty implicitly by performing sensitivity analysis on the important model inputs.

RISKOptimizer

RISKOptimizer combines optimization with simulation. There are often times when wewant to use simulation to model some business problem, but we also want to optimize asummary measure, such as a mean, of an output distribution. This optimization can be performed in a trial-and-error fashion, where we try a few values of the decision vari-able(s) and see which provides the best solution. However, RISKOptimizer provides amore automatic (and time-intensive) optimization procedure.

1.2 An Overview of the Book 9

00837_01_ch1_p0001-0028 3/16/06 1:54 PM Page 9

Page 10: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

BestFit

BestFit is used to determine the most appropriate probability distribution for a spreadsheetmodel when we have data on some uncertain quantity. For example, a simulation mightmodel each week’s demand for a product as a random variable. What probability distribu-tion should we use for weekly demand: the well-known normal distribution or possiblysome skewed distribution? If we have historical data on weekly demands for the product,we can feed them into BestFit and let it recommend the distribution that best fits the data.This is a very useful tool in real applications. Instead of guessing a distribution that wethink might be relevant, we can let BestFit point us to a distribution that fits historical datawell. We discuss BestFit briefly in Chapter 6.

RISKview

Finally, RISKview is a drawing tool that complements @RISK. A number of probability dis-tributions are available in @RISK and can be used in simulations. Each has an associated@RISK function, such as RiskNormal, RiskBinomial, and so on. Before selecting any ofthese distributions, however, it is useful (especially for beginners) to see what these distribu-tions look like. RISKview performs this task easily. For any selected probability distribution(and any selected parameters of this distribution), it creates a graph of the distribution, and itallows us to find probabilities for the distribution in a completely intuitive, graphical manner.We use RISKview in Chapter 16 to help learn about potential input probability distributionsfor simulation models.

Software Guide

Figure 1.2 provides a guide to where these various add-ins appear throughout the book. Wedon’t show Excel explicitly in this figure for the simple reason that Excel is used exten-sively in all chapters.

With Excel and the add-ins included in this book, you have a wealth of software atyour disposal. The examples and step-by-step instructions throughout this book will helpyou to become a power user of this software. Admittedly, this takes plenty of practice anda willingness to experiment, but it is certainly within your grasp. When you are finished,we will not be surprised if you rate “improved software skills” as the most valuable thingyou have learned from this book.

10 Chapter 1 Introduction to Data Analysis and Decision Making

Developer Add-In Chapter(s) Where Used

StatTools

@RISK 6, 16–17

RISKview

Figure 1.2Software Guide

Palisade Corporationoriginally marketedBestFit and RISKview as separate products.Although they still existas separate products,their functionality is nowincluded in @RISK.

00837_01_ch1_p0001-0028.ps 6/17/05 12:53 PM Page 10

Page 11: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

1.3 A Sampling of Examples 11

1.3 A SAMPLING OF EXAMPLESPerhaps the best way to illustrate what you will be learning in this book is to preview a fewexamples from later chapters. Our intention here is not to teach you any methods; that willcome later. We only want to indicate the types of problems you will learn how to solve.Each example below is numbered as in the chapter where it appears.

E X A M P L E 3.9

The Spring Mills Company produces and distributes a wide variety of manufacturedgoods. Because of its variety, it has a large number of customers. The company classi-

fies these customers as small, medium, and large, depending on the volume of businesseach does with Spring Mills. Recently, Spring Mills has noticed a problem with itsaccounts receivable. It is not getting paid back by its customers in as timely a manner as itwould like. This obviously costs Spring Mills money. If a customer delays a payment of$300 for 20 days, say, then the company loses potential interest on this amount. The com-pany has gathered data on 280 customer accounts. For each of these accounts, the data setlists three variables: Size, the size of the customer (coded 1 for small, 2 for medium, 3 forlarge); Days, the number of days since the customer was billed; and Amount, the amountthe customer owes. What information can we obtain from these data?

Objective To use charts, summary measures, and pivot tables to understand data onaccounts receivable at Spring Mills.

SolutionIt is always a good idea to get a rough sense of the data first. We do this by calculating sev-eral summary measures for Days and Amount, a histogram of Amount, and a scatterplot ofAmount versus Days. The next logical step is to see whether the different customer sizeshave any effect on either Days, Amount, or the relationship between Days and Amount.There is obviously a lot going on here. We point out the following: (1) there are far fewerlarge customers than small or medium customers; (2) the large customers tend to oweconsiderably more than small or medium customers; (3) the small customers do not tendto be as long overdue as the medium or large customers; and (4) there is no relationshipbetween Days and Amount for the small customers, but there is a definite positive relation-ship between these variables for the medium and large customers. If Spring Mills reallywants to decrease its receivables, it might want to target the medium-size customer group,from which it is losing the most interest. Or it could target the large customers becausethey owe the most on average. The most appropriate action depends on the cost and effec-tiveness of targeting any particular customer group. However, the analysis presented heregives the company a much better picture of what’s currently going on.

This example from Chapter 3 is a typical example of trying to make sense out of a largedata set. Spring Mills has 280 observations on each of three variables. By realistic stan-dards, this is not a large data set, but it still presents a challenge. We examine the data froma number of angles and present several tables and charts. For example, the scatterplots inFigures 1.3 through 1.5 clearly indicate that there is a positive relationship between theamount owed and the number of days since billing for the medium- and large-size cus-tomers, but that no such relationship exists for the small-size customers. As we will see,graphs such as these are very easy to construct in Excel, regardless of the size of the data set.

00837_01_ch1_p0001-0028.ps 6/17/05 12:53 PM Page 11

Page 12: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

12 Chapter 1 Introduction to Data Analysis and Decision Making

Figure 1.4 Scatterplot of Amount versus Days for Medium-size Customers

Figure 1.3 Scatterplot of Amount versus Days for Small-size Customers

00837_01_ch1_p0001-0028.ps 6/17/05 12:53 PM Page 12

Page 13: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

1.3 A Sampling of Examples 13

Figure 1.5 Scatterplot of Amount versus Days for Large-size Customers

E X A M P L E 7.1

SciTools Incorporated, a company that specializes in scientific instruments, has beeninvited to make a bid on a government contract. The contract calls for a specific num-

ber of these instruments to be delivered during the coming year. The bids must be sealed(so that no company knows what the others are bidding), and the low bid wins the contract.SciTools estimates that it will cost $5000 to prepare a bid and $95,000 to supply the instru-ments if it wins the contract. On the basis of past contracts of this type, SciTools believesthat the possible low bids from the competition, if there is any competition, and the associ-ated probabilities are those shown in Table 1.1. In addition, SciTools believes there is a30% chance that there will be no competing bids.

Table 1.1 Probabilities of Low Bids from Competition

Low Bid Probability

Less than $115,000 0.2Between $115,000 and $120,000 0.4Between $120,000 and $125,000 0.3Greater than $125,000 0.1

SolutionThis is a typical example of decision making under uncertainty, the topic of Chapter 7.SciTools has to make decisions now (whether to bid and, if so, how much to bid), withoutknowing what the competition is going to do. The company can’t assure itself of a perfect

00837_01_ch1_p0001-0028.ps 6/17/05 12:53 PM Page 13

Page 14: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

14 Chapter 1 Introduction to Data Analysis and Decision Making

Figure 1.6 Decision Tree for SciTools

outcome, but it can make a rational decision in light of the uncertainty it faces. We will seehow decision trees, produced easily with the PrecisionTree add-in to Excel, not only layout all of the elements of the problem in a logical manner but also indicate the best solu-tion. The completed tree for this problem is in Figure 1.6, which indicates that SciToolsshould indeed prepare a bid, for the amount $115,000.

E X A M P L E 9.5

An auditor wants to determine the proportion of invoices that contain price errors—thatis, prices that do not agree with those on an authorized price list. He checks 93 ran-

domly sampled invoices and finds that two of them include price errors. What can he con-clude, in terms of a 95% one-sided confidence interval, about the proportion of all invoiceswith price errors?

SolutionThis is an important application of statistical inference in the auditing profession. Auditorstry to determine what is true about a population (in this case, all of a company’s invoices)by examining a relatively small sample from the population. The auditor wants an upperlimit so that he is 95% confident that the overall proportion of invoices with errors is no

00837_01_ch1_p0001-0028.ps 6/17/05 12:53 PM Page 14

Page 15: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

1.3 A Sampling of Examples 15

greater than this upper limit. We show the spreadsheet solution in Figure 1.7, which showsthat the auditor can be 95% confident that the overall proportion of invoices with errors isno greater than 6.6% (see cell B10).

Figure 1.7Analysis of AuditingExample

E X A M P L E 11.2

The Bendrix Company manufactures various types of parts for automobiles. The man-ager of the factory wants to get a better understanding of overhead costs. These over-

head costs include supervision, indirect labor, supplies, payroll taxes, overtime premiums,depreciation, and a number of miscellaneous items such as charges for building deprecia-tion, insurance, utilities, and janitorial and maintenance expenses. Some of these overheadcosts are “fixed” in the sense that they do not vary appreciably with the volume of workbeing done, whereas others are “variable” and do vary directly with the volume of work.The fixed overhead costs tend to come from the supervision, depreciation, and miscella-neous categories, whereas the variable overhead costs tend to come from the indirect labor,supplies, payroll taxes, and overtime premiums categories. However, it is not easy to drawa clear line between the fixed and variable overhead components.

The Bendrix manager has tracked total overhead costs over the past 36 months. Tohelp “explain” these, he has also collected data on two variables that are related to theamount of work done at the factory. These variables are

■ MachHrs: number of machine hours used during the month■ ProdRuns: number of separate production runs during the month

The first of these is a direct measure of the amount of work being done. To understand thesecond, we note that Bendrix manufactures parts in fairly large batches. Each batch corre-sponds to a production run. Once a production run is completed, the factory must “set up”for the next production run. During this setup there is typically some downtime while themachinery is reconfigured for the part type scheduled for production in the next batch.Therefore, the manager believes both of these variables might be responsible (in differentways) for variations in overhead costs. Do scatterplots support this belief?

SolutionThis is a typical regression example, in a cost-accounting setting. The manager is trying tosee what type of relationship, if any, there is between overhead costs and the two explana-tory variables: number of machine hours and number of production runs. The scatterplotsrequested appear in Figures 1.8 and 1.9. They do indeed indicate a positive and linear rela-tionship between overhead and the two explanatory variables.

00837_01_ch1_p0001-0028.ps 6/17/05 12:53 PM Page 15

Page 16: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

16 Chapter 1 Introduction to Data Analysis and Decision Making

Figure 1.8 Scatterplot of Overhead versus Machine Hours

Figure 1.9 Scatterplot of Overhead versus Production Runs

00837_01_ch1_p0001-0028.ps 6/17/05 12:53 PM Page 16

Page 17: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

1.3 A Sampling of Examples 17

However, regression goes well beyond scatterplots. It estimates an equation relatingthe variables. This equation can be determined from regression output such as that shownin Figure 1.10. This output implies the following equation for predicted overhead as afunction of machine hours and production runs:

Predicted Overhead � 3997 � 43.54MachHrs � 883.62ProdRuns

Figure 1.10 Multiple Regression Output for Bendrix Example

The positive coefficients of MachHrs and ProdRuns indicate the effects these variableshave on overhead. We will not take the example any further at this point but will simply indicatethat it is easy to generate the output in Figure 1.10 with StatTools. The challenge is learninghow to interpret it. We spend plenty of time in Chapters 11 and 12 on interpretation issues. ■

E X A M P L E 13.3

The file PCDevices.xls contains quarterly sales data (in millions of dollars) for a chip-manufacturing firm from the beginning of 1990 through the end of 2004. Are the com-

pany’s sales growing exponentially through this entire period?

SolutionThis example illustrates a regression-based trend curve, one of several possible forecastingtechniques for a time series variable. A time series graph of the company’s quarterly salesappears in Figure 1.11. It indicates that sales have been increasing steadily at an increasingrate. This is basically what an exponential trend curve implies. To estimate this curve weuse regression analysis to obtain the following equation for predicted quarterly sales as afunction of time:

Predicted Sales � 61.376e0.0663Time

This equation implies that the company’s sales are increasing by approximately 6.6%per quarter during this period, which translates to an annual percentage increase of about29%! As we see in Chapter 13, this is the typical approach used in forecasting. We look ata time series graph to discover trends or other patterns in historical data and then use oneof a variety of techniques to fit the observed patterns and extrapolate them into the future.

00837_01_ch1_p0001-0028.ps 6/17/05 12:53 PM Page 17

Page 18: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

18 Chapter 1 Introduction to Data Analysis and Decision Making

Figure 1.11Time Series Graphof Quarterly Sales at a PC ChipManufacturer

E X A M P L E 15.6

At the present time, the beginning of year 1, the Barney-Jones Investment Corporationhas $100,000 to invest for the next 4 years. There are five possible investments,

labeled A through E. The timing of cash outflows and cash inflows for these investments issomewhat irregular. For example, to take part in investment A, cash must be invested at thebeginning of year 1, and for every dollar invested, there are returns of $0.50 and $1.00 atthe beginnings of years 2 and 3. Similar information for the other investments are as fol-lows, where all returns are per-dollar invested:

■ Investment B: Invest at the beginning of year 2, receive returns of $0.50 and $1.00 atthe beginnings of years 3 and 4.

■ Investment C: Invest at the beginning of year 1, receive return of $1.20 at the begin-ning of year 2.

■ Investment D: Invest at the beginning of year 4, receive return of $1.90 at the begin-ning of year 5.

■ Investment E: Invest at the beginning of year 3, receive return of $1.50 at the begin-ning of year 4.

We assume that any amounts can be invested in these strategies and that the returns are thesame for each dollar invested. However, to create a diversified portfolio, Barney-Jonesdecides to limit the amount put into any investment to $75,000. The company wants aninvestment strategy that maximizes the amount of cash on hand at the beginning of year 5.At the beginning of any year, it can invest only cash on hand, which includes returns fromprevious investments. Any cash not invested in any year can be put in a short-term moneymarket account that earns 3% annually.

SolutionThis is one of many optimization examples we present in Chapters 14 and 15. The typicalsituation is that a company such as Barney-Jones must make several decisions, subject tocertain constraints, that optimize some objective. In this case, Barney-Jones needs to decidethe amounts to invest, subject to some constraints, to maximize its ending cash 4 years from

00837_01_ch1_p0001-0028.ps 6/17/05 12:53 PM Page 18

Page 19: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

1.3 A Sampling of Examples 19

now. Our job is to formulate a spreadsheet model, similar to the one shown in Figure 1.12,that relates the various elements of the problem.

The investment amounts in row 26 are the decision variables, called “changing cells”in Excel’s terminology. When we formulate the model, we can enter any values in thesechanging cells; we do not need to guess “good” values. Then we turn it over to Excel’sSolver add-in. The Solver uses a powerful algorithm to find the optimal values in thechanging cells—that is, the values that optimize the objective while satisfying the con-straints. The values shown in Figure 1.12 are actually the optimal values. They imply thatBarney-Jones can end with final cash of $286,792 by investing as indicated in row 26.

Figure 1.12 Investment Model

E X A M P L E 17.9

We assume that there are two dominant companies in the soft drink industry: “us” and“them.” For this example, we will view everything from the point of view of “us.”

We start with a 45% market share. During each of the next 20 quarters, each company pro-motes its product to some extent. To make the model simple, we will assume that eachcompany each quarter either promotes at a “regular” level or at a “blitz” level. Dependingon each company’s promotional behavior in a given month, the change in our market sharefrom this month to the next is triangularly distributed, with parameters given in Table 1.2.For example, if we blitz and they don’t, then we could lose as much as 1% market share,we could gain as much as 6% market share, and our most likely outcome is an increase of2% market share. We want to develop a simulation model that allows us to gauge the long-term change in our market share for any pattern of blitzing employed by us and them.

00837_01_ch1_p0001-0028.ps 6/17/05 12:53 PM Page 19

Page 20: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

20 Chapter 1 Introduction to Data Analysis and Decision Making

Figure 1.13 Spreadsheet Simulation for Market Share Example

Table 1.2 Parameters of Market Share Change Distributions

Blitzer Minimum Most Likely Maximum

Neither -0.03 0.00 0.03Both -0.05 0.00 0.05Only us -0.01 0.02 0.06Only them -0.06 -0.02 0.01

SolutionThis is a typical example of computer simulation. We make a number of assumptions,build a spreadsheet model around these assumptions, explicitly incorporate uncer-tainty into some of the cells, and see how this uncertainty affects “bottom-line” out-puts. The simulation model appears in Figure 1.13. Several cells in this model arerandom, including all of the numerical values in rows 21 through 24. Therefore, thenumbers you see in this figure represent just one possible scenario of how marketshares might evolve through time. By generating new random values, we see differentscenarios.

Our job is to build the logic and randomness into the spreadsheet model. Then we canuse Excel’s built-in tools or an add-in such as @RISK to replicate the model and keep trackof selected outputs. A typical result from @RISK appears in Figure 1.14. It shows a timeseries graph of how our market share might evolve, given a certain strategy of blitzing byour company and theirs.

00837_01_ch1_p0001-0028.ps 6/17/05 12:53 PM Page 20

Page 21: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

1.4 Modeling and Models 21

Figure 1.14 Summary Chart of Our Market Share for One Set of Strategies

1.4 MODELING AND MODELSWe have already used the term model several times in this chapter. In fact, we have shownseveral spreadsheet models in the previous section. Models and the modeling process arekey elements throughout this book, so we explain them in more detail in this section.3

A model is an abstraction of a real problem. A model tries to capture the essence andkey features of the problem without getting bogged down in relatively unimportant details.There are different types of models, and, depending on an analyst’s preferences and skills,each can be a valuable aid in solving a real problem. We describe three types of modelshere: (1) graphical models, (2) algebraic models, and (3) spreadsheet models.

1.4.1 Graphical Models

Graphical models are probably the most intuitive and least quantitative type of model. Theyattempt to portray graphically how different elements of a problem are related—what affectswhat. A very simple graphical model appears in Figure 1.15. It is called an “influence diagram.” (It can be constructed with the PrecisionTree add-in discussed in Chapter 7, but wewill not use influence diagrams in this book.)

Figure 1.15Influence Diagramfor SouvenirExample

3Management scientists tend to use the terms model and modeling more than statisticians. However, many tradi-tional statistics topics such as regression analysis and forecasting are clearly applications of modeling.

00837_01_ch1_p0001-0028.ps 6/17/05 12:53 PM Page 21

Page 22: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

22 Chapter 1 Introduction to Data Analysis and Decision Making

This particular influence diagram is for a company that is trying to decide how manysouvenirs to order for the upcoming Olympics. The essence of the problem is that the com-pany will order a certain supply, customers will request a certain demand, and the combi-nation of supply and demand will yield a certain payoff for the company. The diagramindicates fairly intuitively what affects what. As it stands, the diagram does not provideenough quantitative details to enable us to “solve” the company’s problem. But this is usu-ally not the purpose of a graphical model. Instead, its purpose is usually to show the impor-tant elements of a problem and how they are related. For complex problems this can bevery helpful and enlightening information for management.

1.4.2 Algebraic Models

Algebraic models are at the opposite end of the spectrum. By means of algebraic equationsand inequalities, they specify a set of relationships in a very precise way, and their precise-ness and lack of ambiguity are very appealing to people with a mathematical background.In addition, algebraic models can usually be stated concisely and with great generality.

A typical example is the “product mix” problem we discuss in Chapter 14. A companycan make several products, each of which contributes a certain amount to profit and con-sumes certain amounts of several scarce resources. The problem is to select the productmix that maximizes profit subject to the limited availability of the resources. All productmix problems can be stated algebraically as follows:

max �n

j�1

pjx j (1.1)

subject to �n

j�1

aijx j � bi, 1 � i � m (1.2)

0 � xj � uj, 1 � j � n (1.3)

Here xj is the amount of product j produced, uj is an upper limit on the amount of product jthat can be produced, pj is the unit profit margin for product j, aij is the amount of resource iconsumed by each unit of product j, bi is the amount of resource i available, n is the numberof products, and m is the number of scarce resources. This algebraic model states very con-cisely that we should maximize total profit [expression (1.1)], subject to consuming nomore of the resources than is available [inequalities (1.2)], and all production quantitiesshould be between 0 and the upper limits [inequalities (1.3)].

Algebraic models such as this appeal to mathematically trained analysts. They areconcise, they spell out exactly which data are required (we would need to estimate the uj’s,the pj’s, the aij’s, and the bi’s from company data), they scale well (a problem with 500 products and 100 resource constraints is just as easy to state as one with only 5 prod-ucts and 3 resource constraints), and many software packages accept algebraic models inessentially the same form as shown here, so that no “translation” is required. Indeed, alge-braic models were the preferred type of model for years—and still are by many analysts.Their main drawback is that they require an ability to work with abstract mathematicalsymbols. Some people have this ability, but many perfectly intelligent people do not.

1.4.3 Spreadsheet Models

A fairly recent alternative to algebraic modeling is spreadsheet modeling. Instead of relat-ing various quantities with algebraic equations and inequalities, we relate them in a spread-sheet with cell formulas. This process is much more intuitive to most people (at least in ourexperience). One of the primary reasons for this is the instant feedback available from

00837_01_ch1_p0001-0028.ps 6/17/05 12:53 PM Page 22

Page 23: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

1.4 Modeling and Models 23

spreadsheets. If you enter a formula incorrectly, it is often immediately obvious (from errormessages or unrealistic numbers) that you have made an error, which you can then go backand fix. Algebraic models provide no such immediate feedback.

A specific comparison might help at this point. We already saw a general algebraicmodel of the product mix problem. Figure 1.16, taken from Chapter 14, illustrates aspreadsheet model for a specific example of this problem. The spreadsheet model shouldbe fairly self-explanatory. All quantities in shaded cells are inputs to the model, the quanti-ties in row 16 are the decision variables (they correspond to the xj’s in the algebraic model),and all other quantities are created through appropriate Excel formulas. To indicate con-straints, we enter inequality signs in appropriate cells.

1234567891011121314151617181920212223242526272829303132

A B C D E F G H IProduct mix model

Input data Range names used:Hourly wage rate $8.00 Frames_produced =Model!$B$16:$E$16Cost per oz of metal $0.50 Maximum_sales =Model!$B$18:$E$18Cost per oz of glass $0.75 Profit =Model!$F$32

Resources_available =Model!$D$21:$D$23Frame type 1 2 3 4 Resources_used =Model!$B$21:$B$23Labor hours per frame 2 1 3 2Metal (oz.) per frame 4 2 1 2Glass (oz.) per frame 6 2 1 2Unit selling price $28.50 $12.50 $29.25 $21.50

Production planFrame type 1 2 3 4Frames produced 1000 800 400 0

<= <= <= <=Maximum sales 1000 2000 500 1000

Resource constraints Used AvailableLabor hours 4000 <= 4000Metal (oz.) 6000 <= 6000Glass (oz.) 8000 <= 10000

Revenue, cost summaryFrame type 1 2 3 4 TotalsRevenue $28,500 $10,000 $11,700 $0 $50,200Costs of inputs Labor $16,000 $6,400 $9,600 $0 $32,000 Metal $2,000 $800 $200 $0 $3,000 Glass $4,500 $1,200 $300 $0 $6,000Profit $6,000 $1,600 $1,600 $0 $9,200

Figure 1.16 Optimal Solution for Product Mix Example

Although a well-designed and well-documented spreadsheet model such as the one inFigure 1.16 is undoubtedly more intuitive for most people than its algebraic counterpart,the art of developing good spreadsheet models is not easy. Obviously, they must be correct.The formulas relating the various quantities must have the correct syntax, the correct cellreferences, and the correct logic. In complex models this can be quite a challenge.

However, correctness is not enough. If spreadsheet models are to be used in the busi-ness world, they must also be well designed and well documented. Otherwise, no one otherthan you (and maybe not even you after a few weeks have passed) will be able to under-stand what your models do or how they work. The strength of spreadsheets is their

00837_01_ch1_p0001-0028.ps 6/17/05 12:53 PM Page 23

Page 24: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

24 Chapter 1 Introduction to Data Analysis and Decision Making

flexibility—you are limited only by your imagination. However, this flexibility can be aliability in spreadsheet modeling unless you plan the design of your models carefully.

Note the clear design in Figure 1.16. Most of the inputs are grouped at the top ofthe spreadsheet. All of the financial calculations are done at the bottom. When there areconstraints, the two sides of the constraints are placed next to each other (as in therange B21:D23). Borders, colors (which appear on the screen but not in this book), andshading are used for added clarity. Descriptive labels are used liberally. Excel itselfimposes none of these “rules,” but you should impose them on yourself.

We have made a conscious effort to establish good habits for you to follow throughoutthis book. We have designed and redesigned our spreadsheet models so that they are as clearas possible. This does not mean that you have to copy everything we do—everyone tends todevelop their own spreadsheet style—but our models should give you something to emulate.Just remember that in the business world, you typically start with a blank spreadsheet. It isthen up to you to develop a model that is not only correct but is also intelligible to you and toothers. This takes a lot of practicing and a lot of editing, but it is a skill well worth developing.

1.4.4 The Seven-Step Modeling ProcessMost of the modeling you will do in this book is only part of the overall modeling processtypically done in the business world. We portray it as a seven-step process, as discussedhere. Of course, not all problems require all seven steps. For example, the analysis of sur-vey data might entail primarily steps 2 (data analysis) and 5 (decision making), without theformal model building discussed in steps 3 and 4.

The Modeling Process1. Define the problem. Typically, a company does not develop a model unless it

believes it has a problem. Therefore, the modeling process really begins by identify-ing an underlying problem. Perhaps the company is losing money, perhaps its marketshare is declining, or perhaps its customers are waiting too long for service. Anynumber of problems might be evident. However, as several people have warned [seeMiser (1993) and Volkema (1995), for example], this step is not always as straight-forward as it might appear. The company must be sure that it has identified the rightproblem before it spends time, effort, and money trying to solve it.

For example, Miser cites the experience of an analyst who was hired by the military toinvestigate overly long turnaround times between fighter planes landing and taking offagain to rejoin the battle. The military was convinced that the problem was caused byinefficient ground crews; if they were sped up, turnaround times would decrease. Theanalyst nearly accepted this statement of the problem and was about to do classical time-and-motion studies on the ground crew to pinpoint the sources of their inefficiency.However, by snooping around, he found that the problem obviously lay elsewhere. Thetrucks that refueled the planes were frequently late, which in turn was due to the ineffi-cient way they were refilled from storage tanks at another location. Once this latter prob-lem was solved—and its solution was embarrassingly simple—the turnaround timesdecreased to an acceptable level without any changes on the part of the ground crews. Ifthe analyst had accepted the military’s statement of the problem, the real problem mightnever have been located or solved.

2. Collect and summarize data. This crucial step in the process is often the mosttedious. All organizations keep track of various data on their operations, but these dataare often not in the form an analyst requires. They are also typically scattered in dif-ferent places throughout the organization, in all kinds of different formats. Therefore,one of the first jobs of an analyst is to gather exactly the right data and summarize the

00837_01_ch1_p0001-0028.ps 6/17/05 12:53 PM Page 24

Page 25: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

data appropriately—as we discuss in detail in Chapters 2 and 3—for use in the model.Collecting the data typically requires asking questions of key people (such as theaccountants) throughout the organization, studying existing organizational databases,and performing time-consuming observational studies of the organization’s processes.In short, it entails a lot of leg work.

3. Formulate a model. This is the step we emphasize, especially in the latter chaptersof the book. The form of the model varies from one situation to another. It could be agraphical model, an algebraic model, or a spreadsheet model. The key is that themodel should capture the key elements of the business problem in such a way that itis understandable by all parties involved. This latter requirement is why we favorspreadsheet models, especially when they are well designed and well documented.

4. Verify the model. Here the analyst tries to determine whether the model developedin the previous step is an accurate representation of reality. A first step in determininghow well the model fits reality is to check whether the model is valid for the currentsituation. This verification can take several forms. For example, the analyst could usethe model with the company’s current values of the input parameters. If the model’soutputs are then in line with the outputs currently observed by the company, the ana-lyst has at least shown that the model can duplicate the current situation.

A second way to verify a model is to enter a number of input parameters (even ifthey are not the company’s current inputs) and see whether the outputs from themodel are reasonable. One common approach is to use extreme values of the inputsto see whether the outputs behave as they should. If they do, then we have anotherpiece of evidence that the model is reasonable.

If certain inputs are entered in the model, and the model’s outputs are not as expected,there could be two causes. First, the model could simply be a poor representation ofreality. In this case it is up to the analyst to refine the model until it provides reason-ably accurate predictions. The second possible cause is that the model is fine but ourintuition is not very good. In this case the fault lies with us, not the model.

A typical example of faulty intuition occurs with random sequences of 0’s and 1’s,such as might occur with successive flips of a fair coin. Most people expect that headsand tails will alternate and that there will be very few sequences of, say, four or moreheads (or tails) in a row. However, a perfectly accurate simulation model of these flipswill show, contrary to what most people expect, that fairly long runs of heads or tailsare not at all uncommon. In fact, one or two long runs should be expected if there areenough flips.

The fact that outcomes sometimes defy intuition is an important reason why modelsare important. These models prove that our ability to predict outcomes in complexenvironments is often not very good.

5. Select one or more suitable decisions. Many, but not all, models are decision mod-els. For any specific decisions, the model indicates the amount of profit obtained, theamount of cost incurred, the level of risk, and so on. If we believe the model is work-ing correctly, as discussed in step 4, then we can use the model to see which deci-sions produce the best outputs.

6. Present the results to the organization. In a classroom setting you are typically fin-ished when you have developed a model that correctly solves a particular problem. Inthe business world a correct model, even a useful one, is not always enough. An ana-lyst typically has to “sell” the model to management. Unfortunately, the people inmanagement are sometimes not as well trained in quantitative methods as the analyst,so they are not always inclined to trust complex models.

1.4 Modeling and Models 25

00837_01_ch1_p0001-0028.ps 6/17/05 12:53 PM Page 25

Page 26: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

There are two ways to mitigate this problem. First, it is helpful to include relevantpeople throughout the company in the modeling process—from beginning to end—sothat everyone has an understanding of the model and feels an ownership for it.Second, it helps to use a spreadsheet model whenever possible, especially if it isdesigned and documented properly. Almost everyone in today’s business world iscomfortable with spreadsheets, so spreadsheet models are more likely to be accepted.

7. Implement the model and update it over time. Again, there is a big differencebetween a classroom situation and a business situation. When you turn in a classroomassignment, you are typically finished with that assignment and can await the next one.In contrast, an analyst who develops a model for a company can usually not pack up hisbags and leave. If the model is accepted by management, the company will then need toimplement it company-wide. This can be very time consuming and politically difficult,especially if the model’s prescriptions represent a significant change from the past. Atthe very least, employees must be trained how to use the model on a day-to-day basis.

In addition, the model will probably have to be updated over time, either because ofchanging conditions or because the company sees more potential uses for the modelas it gains experience using it. This presents one of the greatest challenges for amodel developer, namely, the ability to develop a model that can be modified as theneed arises. Keep this in mind as you develop models throughout this book. Alwaystry to make them as general as possible.

1.5 CONCLUSIONIn this chapter we tried to convince you that the skills in this book are important for you toknow as you enter the business world. The methods we discuss are no longer the soleprovince of the “quant jocks.” By having a PC on your desk that is loaded with powerfulsoftware, you incur a responsibility to use this software to solve business problems. Wehave described the types of problems you will learn to solve in this book, along with thesoftware you will use to solve them. We also discussed the modeling process, a theme thatruns throughout this book. Now it’s time for you to get started!

26 Chapter 1 Introduction to Data Analysis and Decision Making

00837_01_ch1_p0001-0028.ps 6/17/05 12:53 PM Page 26

Page 27: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

Case 1.1 Entertainment on a Cruise Ship 27

C A S E

Cruise ship traveling has become big business.Many cruise lines are now competing for cus-

tomers of all age groups and socioeconomic levels.They offer all types of cruises, from relatively inex-pensive 3- to 4-day cruises in the Caribbean, to 12- to 15-day cruises in the Mediterranean, to several-month around-the-world cruises. Cruiseshave several features that attract customers, manyof whom book 6 months or more in advance: (1)they offer a relaxing, everything-done-for-you wayto travel; (2) they serve food that is plentiful, usu-ally excellent, and included in the price of thecruise; (3) they stop at a number of interestingports and offer travelers a way to see the world;and (4) they provide a wide variety of entertain-ment, particularly in the evening.

This last feature, the entertainment, presents adifficult problem for a ship’s staff. A typical cruisemight have well over 1000 passengers, includingelderly singles and couples, middle-aged people withor without children, and young people, often honey-mooners. These various types of passengers havevaried tastes in terms of their after-dinner prefer-ences in entertainment. Some want traditionaldance music, some want comedians, some wantrock music, some want movies, some want to goback to their cabins and read, and so on. Obviously,cruise entertainment directors want to provide thevariety of entertainment their customers desire—within a reasonable budget—because satisfied cus-tomers tend to be repeat customers. The questionis how to provide the right mix of entertainment.

On a cruise one of the authors and his wife tooka few years ago, the entertainment was of high qualityand there was plenty of variety. A seven-piece showband played dance music nightly in the largest lounge,two other small musical combos played nightly at twosmaller lounges, a pianist played nightly at a piano barin an intimate lounge, a group of professional singersand dancers played Broadway-type shows about twice

weekly, and various professional singers and comedi-ans played occasional single-night performances.4

Although this entertainment was free to all of thepassengers, much of it had embarrassingly low atten-dance. The nightly show band and musical combos,who were contracted to play nightly until midnight,often had less than a half dozen people in the audi-ence—sometimes literally none. The professionalsingers, dancers, and comedians attracted larger audi-ences, but there were still plenty of empty seats. Inspite of this, the cruise staff posted a weekly schedule,and they stuck to it regardless of attendance. In ashort-term financial sense, it didn’t make much differ-ence.The performers got paid the same whether any-one was in the audience or not, the passengers hadalready paid (indirectly) for the entertainment as partof the cost of the cruise, and the only possible oppor-tunity cost to the cruise line (in the short run) wasthe loss of liquor sales from the lack of passengers inthe entertainment lounges. The morale of the enter-tainers was not great—entertainers love packedhouses—but they usually argued, philosophically, thattheir hours were relatively short and they were stillgetting paid to see the world.

If you were in charge of entertainment on thisship, how would you describe the problem with enter-tainment: Is it a problem with deadbeat passengers,low-quality entertainment, or a mismatch between theentertainment offered and the entertainment desired?How might you try to solve the problem? What con-straints might you have to work within? Would youkeep a strict schedule such as the one followed by thiscruise director, or would you play it more “by ear”?Would you gather data to help solve the problem?What data would you gather? How much would finan-cial considerations dictate your decisions? Would theybe long-term or short-term considerations? ■

1.1 ENTERTAINMENT ON A CRUISE SHIP

4There was also a moderately large onboard casino, but it tended toattract the same people every night, and it was always closed whenthe ship was in port.

00837_01_ch1_p0001-0028.ps 6/17/05 12:53 PM Page 27

Page 28: CHAPTER 1 Introduction to Data Analysis and Decision · PDF file2 Chapter 1 Introduction to Data Analysis and Decision ... we present a brief discussion of models and the modeling

00837_01_ch1_p0001-0028.ps 6/17/05 12:53 PM Page 28