dbmspost-chapter05-401

49
220 What You Will Learn in This Chapter How can SQL be used to answer more complex questions? Why are some business questions more difficult than others? What common uses for subqueries? How do you find something that did not happen? How do you include rows from tables in a join even if the rows do not match? What are correlated subqueries? What tricky problems arise and how do you handle them in SQL? What are the SQL data definition commands? What SQL commands alter the data stored in tables? How do you know if your query is correct? Chapter Outline Advanced Queries and Subqueries 5 Chapter Introduction, 221 Sally’s Pet Store, 222 Subqueries, 222 Calculations or Simple Lookup, 223 Subqueries and Sets of Data , 224 Subquery with ANY, ALL, and EXISTS, 227 Subtraction: NOT IN, 227 OUTER JOINS, 229 Correlated Subqueries Are Dangerous, 231 More Features and Tricks with SQL SELECT, 234 UNION, INTERSECT, EXCEPT, 234 Multiple JOIN Columns, 235 Reflexive Join, 236 CASE Function, 239 Inequality Joins, 239 Questions with “Every” Need the EXISTS Clause, 240 SQL SELECT Summary, 243 SQL Data Definition Commands, 244 SQL Data Manipulation Commands, 246 INSERT and DELETE, 247 UPDATE, 248 Quality: Testing Queries, 249 Summary, 250 Key Terms, 251 Review Questions, 251 Exercises, 252 Web Site References, 257 Additional Reading, 258 Appendix: Introduction to Programming, 259 Variables and Data, 259 Variable Scope, 260 Computations, 261 Standard Internal Functions, 263 Input and Output, 264 Conditions, 265 Loops, 266 Subroutines, 267 Summary, 268

Upload: devinliao

Post on 11-Apr-2015

354 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: DBMSPost-Chapter05-401

220

What You Will Learn in This Chapter

How can SQL be used to answer more complex questions?•Whyaresomebusinessquestionsmoredifficultthanothers?•What common uses for subqueries?•Howdoyoufindsomethingthatdidnothappen?•How do you include rows from tables in a join even if the rows do not match?•What are correlated subqueries?•What tricky problems arise and how do you handle them in SQL?•WhataretheSQLdatadefinitioncommands?•What SQL commands alter the data stored in tables?•How do you know if your query is correct?•

Chapter Outline

Advanced Queries and Subqueries

5Chapter

Introduction, 221Sally’s Pet Store, 222Subqueries, 222

Calculations or Simple Lookup, 223Subqueries and Sets of Data , 224Subquery with ANY, ALL, and EXISTS, 227

Subtraction: NOT IN, 227OUTER JOINS, 229CorrelatedSubqueriesAreDangerous,231More Features and Tricks with SQL SELECT,234

UNION, INTERSECT, EXCEPT, 234Multiple JOIN Columns, 235Reflexive Join, 236CASE Function, 239Inequality Joins, 239Questions with “Every” Need the EXISTS Clause, 240SQL SELECT Summary, 243

SQLDataDefinitionCommands,244SQL Data Manipulation Commands, 246

INSERT and DELETE, 247UPDATE, 248

Quality:TestingQueries,249Summary, 250Key Terms, 251Review Questions, 251Exercises, 252Web Site References, 257AdditionalReading,258Appendix:IntroductiontoProgramming,259

Variables and Data, 259Variable Scope, 260Computations, 261Standard Internal Functions, 263Input and Output, 264Conditions, 265Loops, 266Subroutines, 267Summary, 268

Page 2: DBMSPost-Chapter05-401

221Chapter 5: Advanced Queries and Subqueries

A Developer’s View Ariel: Hi Miranda. You look happy.

Miranda: I am. This query system is great. I can see how it will help the managers. Once I get the application done, they can get answers to any questions they have. They won’t have to call me for answers every day. Plus, I can really see how the query system relates to data normalization. With normalization I split the tables so the database could store them properly. Now the query system helps me rejoin them to answer my questions.

Ariel:Doesthatmeanyou’refinallyreadyto create the application?

Miranda: Close, but I’m not quite ready. Yesterday my uncle asked me a question that I don’t know how to

answer.

Ariel: Really, I thought you could do anything with SQL. What was the question?

Miranda: Something about customers who did not order anything last month. I tried several times to get it to work, but the answers I get just aren’t right.

Ariel: It doesn’t sound like a hard question.

Miranda: I know. I can get a list of customers and orders that were placed any time except last month. But every time I join the Customer table to the Order table, all I get are the customers who did place orders. I don’tknowhowtofindsomethingthat’s not there.

IntroductionHow can SQL be used to answer more complex questions? Now that you un-derstand the basics of the SQL SELECT statement as described in Chapter 4, it is time to study more complex questions. One of the most powerful features of the SQL SELECT command is known as a subquery or nested query. This feature enables you to ask complex questions that entail retrieving different types of data or data from different sources. SQL is also more than a query language. It can be usedtocreatetheentiredatabase(datadefinitionlanguage).SQLalsohaspow-erful commands toalter thedata (datamanipulation language).SQLalsohasacoupleofcommandstosetsecurityconditions(datacontrollanguage).Twokeypointswillhelpyou learnhow tousesubqueries: (1)SQLwasde-

signed to work with sets of data—avoid thinking in terms of individual rows, and (2)youcansplitnestedqueries into theirseparatepartsanddealwith thepartsindividually.

The features of SQL covered in Chapter 4 are already quite powerful. Why do you need more features? Consider this common business question for Sally’s Pet Store: Which animals have not been sold? Think about how you might answer that questionusingtheSQLyouknowtothispoint.Thefirststepmightbetochoosethe tables: SaleAnimal and Animal appear to be likely choices. Second, select the columns as output: AnimalID and Name. Third, specify a condition. Fourth, join the tables. These last two steps cause the most problems in this example. How do you specify that an animal has not been sold? You cannot refer to any data in the SaleAnimal table. Because the animal has not been sold, the SaleAnimal table will not contain any entries for it.

Page 3: DBMSPost-Chapter05-401

222Chapter 5: Advanced Queries and Subqueries

Actually,thefourthstep(joiningthetables)causesyouevenmoreproblems.Say you wrote a query like this: SELECT AnimalID, Name FROM Animal IN-NER JOIN SaleAnimal ON (Animal.AnimalID = SaleAnimal.AnimalID). Assoon as you write that JOIN condition, you eliminate all the animals you want to see. The JOIN clause restricts the output—just like a WHERE clause would. In this example, you told the DBMS to return only those animals that are listed in both the Animal and SaleAnimal tables. But only animals that have been sold are listed in the SaleAnimal table, so this query can never tell you anything about animals that have not been sold. The following sections describe two solutions to thisproblem:eitherfixtheJOINstatementsothatitisnotasrestrictiveoruseasubquery.

Sally’s Pet StoreWhy are some business questions more difficult than others? Figure 5.1 shows some more business questions that Sally needs to answer to manage her business. Again, think about how you might answer these questions using the basic SQL of Chapter4.Atfirstglancetheydonotseemtoodifficult.However,eventheeasi-est question—to identify cats that sold for more than the average price—is harder thanitfirstappears.

The common feature of these questions is that they need to be answered in mul-tiple steps. All of these questions require an additional tool: the subquery. Techni-cally,youcananswermulti-stepquestionsbywritingandsavingthefirstpartasaview and then using the view in another query. Sometimes it is easiest to save the initial queries as views. However, you should generally try to use subqueries so theDBMSqueryoptimizercanusethecompletequerytofindthemostefficientsolution.

SubqueriesWhat are the common uses for subqueries?Themostdifficultstepincreatinga query is determining the overall structure. Chapter 4 shows you how to use the

Whichcatssoldformorethantheaveragesalepriceofcats?•Listallcustomerswhopurchasedoneofthefollowingitems(1,2,30,32,33).•Listallcustomerswhoboughtitemsforcats.•ListallofthecustomerswhoboughtsomethinginMarchandwhobought•somethinginMay.(Twotestsonthesamedata!)

Listsanimalsthatsoldformorethanthesalepriceofthecheapestcat.•Has one salesperson made all of the sales on a particular day?•Whichanimalssoldformorethantheaveragepriceofsimilaranimals?•Listallthemanagersandtheirdirectreports.•Convertagerangesintocategories.•Classifypaymentsbynumberofdayslate.•Whichemployeeshavesoldanimalsfromeverycategory?•Listcustomerswhoboughtdogsandalsoboughtcatproducts.•

Figure 5.1Harderquestions.Eventhoughtherearefewconstraintsontheproblems,thesequestionsaremorecomplex.Toanswermanyofthem,weneedtousesubqueriesorouterjoins.

Page 4: DBMSPost-Chapter05-401

223Chapter 5: Advanced Queries and Subqueries

four big questions to determine the structure of simple queries. But you need to recognize when subqueries are needed. If you fail to use a subquery, you are likely to end up with bad results, and waste considerable time in the process. This sec-tion presents the most common situations that require the use of subqueries. The main situations are: Calculations or lookup comparisons, matching sets of data, existence checks, and findingitemsthatarenotinalist.Thelastexampleissocommon and important that it is in a separate section.

Calculations or Simple LookupPerhaps the easiest way to see the value of a subquery is to consider the rela-tively simple question: Which cats sold for more than the average price of cats? If youalreadyknowtheaveragesalepriceofcats(say,$170),thequeryiseasy,asshown in the top half of Figure 5.2.

If you do not know the average SalePrice of cats, it is also easy to write a query to compute the average. If you do not know anything about subqueries, you could write the average value on a piece of paper and then enter it into the main query in place of the 170. However, with a subquery, you can go one step further: The result(average)fromthequerycanbetransferreddirectlytotheoriginalquery.Simplyreplacethevalue($170)withthecompleteSELECTAVGqueryasshownin the lower half of Figure 5.2. In fact, anytime you want to insert a value or com-parison, you can use a subquery instead. You can even go to several levels, so a subquery can contain another subquery and so on. The DBMS generally evaluates theinnermostqueryfirstandpassestheresultsbacktothehigherlevel.

Typically, subqueries for calculations arise in WHERE clauses similar to this example when you need to make a comparison. You can also add subqueries to the SELECT statement to retrieve a value for a calculation. For instance, you might issue a subquery to retrieve a tax rate that is multiplied times a total. Regardless

Whichcatssoldformorethantheaveragesalepriceofcats?

SELECTSaleAnimal.AnimalID,Animal.Category,SaleAnimal.SalePriceFROM AnimalINNERJOINSaleAnimalONAnimal.AnimalID=SaleAnimal.AnimalIDWHERE(Animal.Category=’Cat’)AND(SaleAnimal.SalePrice>170);SELECTSaleAnimal.AnimalID,Animal.Category,SaleAnimal.SalePriceFROM AnimalINNERJOINSaleAnimalONAnimal.AnimalID=SaleAnimal.AnimalIDWHERE(Animal.Category=’Cat’)AND(SaleAnimal.SalePrice> (SELECTAVG(SalePrice) FROM AnimalINNERJOINSaleAnimalONAnimal.AnimalID=SaleAnimal.AnimalIDWHERE(Animal.Category=‘Cat’)));

Figure 5.2Subqueries for calculation. If you know the average price is 170, the query is straightforward. If you do not know the average price, you can use a subquery to compute it. The subquery is always written inside a separate set of parentheses.query(thesubqueryinparenthesesreplacesthe170intheoriginalquery).

Page 5: DBMSPost-Chapter05-401

224Chapter 5: Advanced Queries and Subqueries

of its location, always remember to enclose the entire subquery in parentheses or it will not run.

Two useful practices you should follow when building subqueries are to indent the subquery to make it stand out so humans can read it and to test the subquery before inserting it into the main query. Fortunately, most modern database systems make it easy to create a subquery and then cut and paste the SQL into the main query. Similarly, if you have problems getting a complex query to work, cut out the inner subqueries and test them separately.Themaindrawbacktosubqueriesisthattheyaredifficulttoreadandunder-

stand.Itiseasytomakemistakesanditisdifficulttoreadcomplexqueriescre-ated by other developers. You should always document your work when creating complex queries. Whenever possible, use the SQL comment characters (--)toaddnotes to the query to explain its purpose and how it is supposed to work. Some-times, it isbetter tostorecomplexsubqueriesasviewsanduseafinalquerytoretrieve data from the carefully-named views.

The other trick you will quickly learn is that QBE grids are not very useful when designing subqueries. You almost always need to work with plain SQL statements. If you want to save some typing, you can use QBE to write the join statements, but eventually, you need to copy and paste the SQL text.

Subqueries and Sets of Data Some special SQL operators (IN, ALL, ANY, EXISTS)areoftenusedwithsub-queries. They are a little easier to understand if you begin with simple numbers instead of a subquery. Consider the relatively easy query illustrated in Figure 5.3: List all customers who purchased one of the following items: 1, 2, 30, 32, 33.

You might consider writing a basic SQL statement to answer this question. Just use lots of OR connectors in the WHERE statement. You might start: ItemID = 1 OR ItemID = 2 OR ItemID = 30. Although this approach would work for this simple example, it would entail some extra typing. A better approach is to treat the list as a set of data and use the INoperator:WHEREItemIDIN(1,2,30,32,33).This condition is true if ItemID matches any of the values in the set of numbers.

Although the IN operator saved some typing in this example, it actually has considerably more power. The power comes by noting that the list of numbers can be generated from a subquery. For example, the manager might want a list of cus-tomers who bought products for cats. Although you could look up all ItemID val-

Listallcustomerswhopurchasedoneofthefollowingitems(1,2,30,32,33).

SELECTCustomer.LastName,Customer.FirstName,SaleItem.ItemIDFROM(CustomerINNERJOINSaleONCustomer.CustomerID=Sale.CustomerID)INNERJOINSaleItemONSale.SaleID=SaleItem.SaleIDWHERE(SaleItem.ItemIDIN(1,2,30,32,33))ORDERBYCustomer.LastName,Customer.FirstName;

Figure 5.3Querieswithsets(In).YoucoulduseORstatements,buttheINoperatoriseasierandmore powerful. The WHERE condition is true if the ItemID matches any of the ID values in the accompanying list.

Page 6: DBMSPost-Chapter05-401

225Chapter 5: Advanced Queries and Subqueries

ues for cat products, it is easier to let the DBMS do it. Simply change the WHERE clause to ItemID IN (SELECT ItemID FROM Merchandise WHERE Category = 'Cat').Effectively,thisqueryisthesameastheoneinFigure5.3.Insteadoftryingto memorize the ItemID numbers of cat merchandise, you can use the subquery to retrieve that list of numbers.

Notice one crucial feature with the IN operator: The values in the list must be ofthesamedatatype(domain)asthevariablebeingtested.TheexampleinFigure5.4 compares SaleItem.ItemID to SELECT ItemID. Like a JOIN statement, the IN operator compares similar types of data. For example, mixing ItemID with Em-ployeeID would lead to a nonsensical result. In fact, the IN operator can be used as a substitute for a JOIN statement—although a JOIN query usually has faster performance.

The last examples showed you how the subquery helps you by retrieving data so you do not have to memorize numbers. Keep in mind that the subqueries are dynamic in the sense that if the underlying values change (the average changes ornewcatitemsareintroduced),thequerieswillstillfunctionperfectlywithoutneeding to be rewritten. But, it is even more important to think of subqueries as retrieving sets of data. Think about an apparently simple question: List all of the customers who bought something in March and in May. A beginner might try to answer the question by creating a simple query with the WHERE clause: SaleDate Between 01-Mar And 31-Mar AND SaleDate Between 01-May and 31-May. What is wrong with this approach? Try it. The query will run, but you will not get any matches. Why not? Because the clause is asking the DBMS to return rows where the SaleDate is in March and in May, at the same time! It is not possible for a date to be in two months at the same time.

The answer to this question lies in realizing that you need to get two sepa-rate lists of people: those who bought something in March and those who bought something in May. Then you combine the lists to identify the people in both sets. You can answer this question with a subquery, or you can create two separate views and join them. The subquery illustrates the set operations.Figure5.5showsthesubqueryapproach.Theoutermost(top)queryretrieves

customers who bought something in March, and the subquery retrieves ID num-bers for customers who bought something in May. Either month could be tested first,butitiscriticaltorecognizethatyouneedtwoseparatequeries.TheINop-

Listallcustomerswhoboughtitemsforcats.

SELECTCustomer.LastName,Customer.FirstName,SaleItem.ItemIDFROM(Customer INNERJOINSaleONCustomer.CustomerID=Sale.CustomerID) INNERJOINSaleItemONSale.SaleID=SaleItem.SaleIDWHERE(SaleItem.ItemIDIN (SELECTItemIDFROMMerchandiseWHERECategory=‘Cat’));

Figure 5.4Using IN with a subquery.. The subquery generates a list of ItemIDs designated as cat products. The main query then matches if the ItemID being purchased is in that subquery list.

Page 7: DBMSPost-Chapter05-401

226Chapter 5: Advanced Queries and Subqueries

eratorperformsthematchingsothatthefinalquerydisplaysonlythosecustomerswho fall in both sets of data.

Figure 5.6 shows how to answer the same query with a JOIN statement on saved views. The views are used to retrieve the desired sets, and they highlight thatthesetsareseparate.ThefinalqueryusestheJOINcommandtoretrieveonlythevaluesthatexistinbothofthesavedviews(MarchandMay).Bothapproaches(subqueryandsavedviews)providethesameanswerandyou

generally get to choose which approach you want to use. The drawback to saving views is that you end up with a huge collection of views, and no one remembers which views depend on other views. Some administrator could accidentally delete

ListallofthecustomerswhoboughtsomethinginMarchandwhoboughtsomethinginMay.

SELECTCustomer.LastName,Customer.FirstNameFROMCustomerINNERJOINSaleONCustomer.CustomerID=Sale.CustomerIDWHERE(SaleDateBetween3/1/2007And3/31/2007)AND Customer.CustomerIDIN(SELECTCustomerID FROM SaleWHERE(SaleDateBetween5/1/2007And5/31/2007));

Figure 5.5Combining two separate lists. The question requires you to create two separate lists andthencomparethematchingvalues.ThisqueryusestheINstatementtofindthecustomers that appear in both lists.

ListallofthecustomerswhoboughtsomethinginMarchandwhoboughtsomethinginMay.(Savedviews)

CREATEVIEWMarchCustomersASSELECT CustomerIDFROM SaleWHERE(SaleDateBetween3/1/2007And3/31/2007);

CREATEVIEWMayCustomersASSELECT CustomerIDFROM SaleWHERE(SaleDateBetween5/1/2007And5/31/2007);

SELECTCustomer.LastName,Customer.FirstNameFROM Customer INNER JOIN MarchCustomers ONCustomer.CustomerID=MarchCustomers.CustomerIDINNER JOIN MayCustomers ONMarchCustomers.CustomerID=MayCustomers.CustomerID;

Figure 5.6Combining two separate lists with JOIN. You can save separate lists as views and use the JOIN command to retrieve only the values that match.

Page 8: DBMSPost-Chapter05-401

227Chapter 5: Advanced Queries and Subqueries

a view that is required by another query. On the other hand, the views could be re-used in multiple queries, which might save a developer time on a different project. The bottom line is that you need to know how to write the queries both ways, and choose the method that is best in each situation.

Subquery with ANY, ALL, and EXISTSThe ANY and ALL operators combine comparison of numbers with subsets. In the preceding section, the IN operator compared a value to a list of items in a set: however, the comparison was based on equality. The test item had to exactly match an entry in the list. The ANY and ALL operators work with a less than (<)or greater than (>)operatorandcomparethetestvaluetoalistofvalues.TheEX-ISTS operator simply returns true if the subquery returns any values at all.Figure5.7illustratestheuseoftheANYquery.Itishardtofindasolidbusiness

example that needs the ANY operator. In the example, it would be just as easy to firstfindtheminimumvalueinthelistandthendothecomparison.However,itissometimes clearer to use the ANY operator. The word SOME can be used in place of the word ANY, but they work exactly the same way.

The ALL operator behaves similarly, but the test value must be greater than all of the values in the list. In other words, the test value must exceed the largest val-ue in the list. Hence, the ALL operator is much more restrictive. The ALL operator can be a powerful tool—particularly when used with an equals (=)comparison.For instance, you might want to test whether one salesperson made all of the sales on a particular day, so the WHERE clause would contain this statement: WHERE EmployeeID = ALL (SELECT EmployeeID FROM Sale WHERE SaleDate = '15-JUN').YoucanalsocombineALLwiththegreaterthanoperator(>)tofindvaluesgreater than the maximum value retrieved in a list.

Subtraction: NOT INHow do you find something that did not happen? One question that commonly arises in business settings is illustrated in Figure 5.8 with the question: Which animalshavenotbeensold?Thisquestionisdeceptive.Atfirstglanceitlookslikeyou could just join the Animal table to the SaleAnimal table. But then what? The

Listsanimalsthatsoldformorethanthesalepriceofthecheapestcat.

SELECTAnimal.AnimalID,Name,SalePrice,ListPriceFROM Animal INNERJOINSaleAnimalONAnimal.AnimalID=SaleAnimal.AnimalIDWHERE((SalePrice>Any(SELECTListPrice FROM Animal INNERJOINSaleAnimalONAnimal.AnimalID=SaleAnimal.AnimalIDWHERECategory=‘Cat’))AND(Category=‘Cat’);

Figure 5.7SubquerywithANYandALL.Theexampleidentifiesanyanimalthatsoldformorethan any of the prices of cats. Effectively, it returns values greater than the smallest entry in the subquery list.

Page 9: DBMSPost-Chapter05-401

228Chapter 5: Advanced Queries and Subqueries

standard JOIN statement will display only those animals that appear in both the Animal and the SaleAnimal tables. As soon as you enter the JOIN statement, you automatically restrict your list to only those animals that have been sold. One way to solve this problem is to change the behavior of the JOIN command, which is explored in the next section.

Another useful approach is to think about the problem in terms of sets of data. Think about how you would solve this problem if you had only paper lists: one list of all the animals the store has purchased and one list of all the animals that have been sold. As illustrated in Figure 5.9, to answer the question you would go through the main Animal list and cross off all the animals that have been sold

Which animals have not been sold?

SELECTAnimal.AnimalID,Animal.Name,Animal.CategoryFROM AnimalWHERE(Animal.AnimalIDNOT IN(SELECTAnimalIDFromSaleAnimal));

AnimalID Name Category241219253488181

SimonLeishaGeneVivianRhondaBrandy

FishDogDogDogDogDogDogFish

Figure 5.8Subquery for NOT IN. Which animals have not been sold? Start with a list of all the animals and then subtract the list of those that were sold.

ID Name Category Breed2 Fish Angel4 Simon Dog Vizsla5 Fish Shark6 Rosie Cat Oriental Shorthair7 Eugene Cat Bombay8 Miranda Dog Norfolk Terrier9 Fish Guppy

10 Sherri Dog Siberian Huskie11 Susan Dog Dalmation12 Leisha Dog Rottweiler

SaleID AnimalID SalePrice4 8 $183.385 183 $114.306 58 $132.197 24 $147.588 42 $174.279 53 $46.80

Animal SaleAnimal

Figure 5.9Sample data for subtraction subquery. The NOT IN statement removes animals that have been sold, leaving only those that were not sold.

Page 10: DBMSPost-Chapter05-401

229Chapter 5: Advanced Queries and Subqueries

(whichappearontheSaleAnimallist).TheonesremainingontheAnimallistarethe animals that have not been sold.SQLaccomplishesthesametaskinasimilarfashionusingasubquery.Thefirst

step is to list all the animals—as shown in the main part of the query in Figure 5.8. ButonlylisttheanimalsthatdonotappearintheSaleAnimaltable(thesubquery).The SQL NOT IN command is a useful tool. It is particularly useful for complex queries that involve several constraints. By moving the relevant constraints to the subquery, the main query becomes easier to understand.

OUTER JOINSHow do you include rows from tables in a join even if the rows do not match? Up to this point, the examples and discussions have focused on the INNER JOIN (or equi-join)whenmultipletableswereinvolved.Thistypeofjoinisthemostcommon, and you should use it for most of your queries. However, you need to understand its limitations and the use of a different type of JOIN known as an OUTER JOIN.

To illustrate an OUTER JOIN, consider the question from the previous section: Which animals have not been sold? Note that if you use an INNER JOIN between the Animal and SaleAnimal tables, the result will be a list of only those animals that appear in both tables. The INNER JOIN command instructs the DBMS to match every entry in the SaleAnimal table with a corresponding AnimalID in the Animal table. If it cannot make the match for any reason, the data in question will not be displayed. OUTER JOINs change the way the data is matched from the two tables. In particular, the OUTER JOIN describes what should happen when values in one table do not exist in the second table.In joining two tables,youhave toconsider twobasic situations: (1)Avalue

mightexistinthelefttablewithnomatchingvalueintherighttable,or(2)avaluemight exist in the right table with no matching value in the left table. Of course, it

Which animals have not been sold?

SELECTAnimal.AnimalID,Animal.Name,Animal.CategoryFROM Animal LEFT JOIN SaleAnimalONAnimal.AnimalID=SaleAnimal.AnimalIDWHERE(SaleAnimal.SaleIDIsNull);

AnimalID Name Category241219253488181

SimonLeishaGeneVivianRhondaBrandy

FishDogDogDogDogDogDogFish

Figure 5.10LeftOUTERJOIN.TheleftjoinincludesallrowsfromtheAnimal(left)tableandany matching rows from the SaleAnimal table. If an animal has not been sold, there will be no entry in the SaleAnimal table, so the corresponding entries will be NULL.

Page 11: DBMSPost-Chapter05-401

230Chapter 5: Advanced Queries and Subqueries

really does not matter which table is on the left or right. However, you have to be careful about not mixing them up after you list the tables.

The query in Figure 5.10 illustrates a typical left OUTER JOIN (or just LEFT JOIN).WithaLEFTJOIN,allrowsinthetableontheleftwillbedisplayedinthe results, regardless of what rows exist in the other table. If there is no matching value from the table on the right, NULL values will be inserted into the output. Note how the LEFT JOIN resolves the problem of identifying animals that have not been sold. Because the query will now list all animals, the rows where the SaleID is Null represent animals that are not in the Sale table and have not been sold. Sample output from the query is displayed in Figure 5.11.

The RIGHT JOIN behaves similarly to the LEFT JOIN. The only difference is the order of the tables. If you want to use all the rows from the table on the right side,useaRIGHTJOIN.WhynotjusthaveaLEFTJOINandsimplyrearrangethe tables? Most of the time, that is exactly what you will do. However, if you haveaquerythatjoinsseveraltables,itissometimeseasiertouseaRIGHTJOINinstead of trying to rearrange the tables.

Another join is the full OUTER JOIN (FULL JOIN)thatcombineseveryrowfrom the left table and every row from the right table. Where the rows do not match(fromtheONcondition),thejoininsertsNULLvaluesintotheappropriatecolumns.

Warning: Be careful with OUTER JOINs—particularly full joins. With two large tables that do not have much data in common, you end up with a very large result that is not very useful. Also be careful when using out-er joins on more than two tables in one query. You get different results de-pending on the order in which you join the tables. Many times you will find it necessary to create a viewwith only two tables to create an outer join. You can then use that view in other queries to add more tables.

Finally, note that you have been relying on the SQL 92 syntax, which is fairly easy to read and understand. Some database systems do not yet support that syn-tax. A common older technique to specify OUTER JOINs is to use an asterisk in conjunction with an equals sign; for example, (*=)indicatesaleftOUTERJOIN,since the asterisk is to the left of the equals sign. Older versions of SQL server

A_AnimalID Name Category Breed SA.AnimalID SaleID SalePrice2456789101112

Simon

RosieEugeneMiranda

SherriSusanLeisha

FishDogFishCatCatDogFishDogDogDog

AngelVizslaSharkOrientalBombayNorfolkGuppySiberianDalmatianRottweiler

567891011

1227254101817

19.80173.99251.59183.381.80150.11148.47

Figure 5.11Resultsfromtheleftouterjoin.Notethemissing(Null)valuesforanimalsthathavenot been sold.

Page 12: DBMSPost-Chapter05-401

231Chapter 5: Advanced Queries and Subqueries

used this syntax, and the animal query for SQL Server is displayed in Figure 5.12. Although the syntax is different from SQL 92, the effect is the same. Be on the lookout for this syntax when you read queries developed for older systems—that littleasterisk(*)canbehardtospot,butitradicallyaltersthequeryresults.

Older Oracle queries used yet another syntax to signify an outer join. The WHERE statement in Figure 5.12 would become Animal.AnimalID = (+)SaleAn-imal.AnimalID.Theplussigninparentheses(+)indicatesanOUTERJOIN,butbe careful. The plus is to the right of the equals sign. In other words, if you want a left OUTER JOIN, you place the plus sign to the right of the equals sign. It is opposite from what you might expect and opposite from the asterisk notation. Try reading through hundreds of lines of SQL statements looking for tiny asterisks or plus signs and you will quickly see the value in the SQL-92 syntax. Fortunately, all of the major systems now accept the newer syntax.

Correlated Subqueries Are DangerousWhat are correlated subqueries? Recall the example in Figure 5.2 that asked, Which cats sold for more than the average price of cats? This example used a sub-querytofirstfindtheaveragesalepriceofcatsandthenexaminedallsalesofcatsto display the ones that had higher prices. It is a reasonable business question to extend this idea to other categories of animals. Managers would like to identify all

SELECT ALLFROM Animal, SaleAnimalWHEREAnimal.AnimalID*=SaleAnimal.AnimalIDAndSaleAnimal.SaleIDIsNull

Figure 5.12Older syntax for LEFT JOIN. Which animals have not been sold? Note the asterisk-equalsoperator(*=),whichisusedforaLEFTJOIN.Thereverse(=*)wouldbeusedforaRIGHTJOIN.

Whichanimalssoldformorethantheaveragepriceofsimilaranimals?

SELECTAnimal.AnimalID,Name,Category,SalePriceFROM AnimalINNERJOINSaleAnimalONAnimal.AnimalID=SaleAnimal.AnimalIDWHERE(SaleAnimal.SalePrice>(SELECTAvg(SaleAnimal.SalePrice) FROM Animal INNER JOIN SaleAnimal ONAnimal.AnimalID=SaleAnimal.AnimalID WHERE (Animal.Category=Animal.Category)))ORDERBYSaleAnimal.SalePriceDESC;

Figure 5.13Correlated subquery creation. List the animals that have sold for more than the average price of other animals in their category. The subquery needs to compute the average sale price for the category of animal shown in the main query. But both tables are called “Animal” so this query will not work yet.

Page 13: DBMSPost-Chapter05-401

232Chapter 5: Advanced Queries and Subqueries

animals that were sold for a price greater than the average price of other animals withintheirrespectivecategories(dogsgreaterthantheaveragepriceofdogs,fishcomparedtofish,etc.).

Although this business question is perfectly reasonable, it can lead to serious problems as a query. To start with, it is not immediately obvious how to build this query. One approach might be to take the query in Figure 5.2 and substitute the category “Dog” for “Cat” in both the main and the subquery. This approach will work,butyouwillfirsthavetofindallofthecategoriesandtheneditthequeryforeach category.

Instead of having you enter each category by hand, perhaps the subquery can get the category from the main query and then compute the average sale price for thatcategory.Figure5.13showsafirstattempttocreatethisquery.Thedifficultyis that the main query refers to the Animal table and the subquery also uses the Animal table. Consequently, the WHERE constraint does not make sense. It needs to specify that one of the Animal tables is in the main query and the other is in the subquery.Todo that,youneed torename(alias) the tables,asshowninFigure5.14.

SELECTA1.AnimalID,A1.Name,A1.Category,SaleAnimal.SalePriceFROM Animal As A1INNERJOINSaleAnimalONA1.AnimalID=SaleAnimal.AnimalIDWHERE(SaleAnimal.SalePrice>(SELECTAvg(SaleAnimal.SalePrice) FROM Animal As A2INNERJOINSaleAnimalONA2.AnimalID=SaleAnimal.AnimalID WHERE (A2.Category=A1.Category)))ORDERBYSaleAnimal.SalePriceDESC;

Figure 5.14Correlated subquery that will run. Note the use of aliases to distinguish the two tables. However, this approach might be slow on some DBMSs.

Fish $10.80Dog $156.66Fish $19.80Cat $173.99Cat $251.59Dog $183.38Fish $1.80Dog $150.11Dog $148.47

Category SalePriceComputeAvg:$37.78ComputeAvg:$174.20ComputeAvg:$37.78ComputeAvg:$169.73ComputeAvg:$169.73

Recompute averageforevery row in the main query!

Figure 5.15Problem with correlated subquery. The average is recomputed for every row in the mainquery.EverytimetheDBMSseesafish,itcomputestheaveragetobe$37.78.Itisinefficient(andveryslow)toforcethemachinetorecalculatetheaverageeachtime.

Page 14: DBMSPost-Chapter05-401

233Chapter 5: Advanced Queries and Subqueries

Thequery inFigure5.12will run.However, it canbe extremely inefficient.Even on a fast computer, queries of this type have been known to run for sev-eraldayswithoutfinishing!Thistypeofqueryiscalledacorrelated subquery, because the subquery refers to data rows in the main query. If the query is run as written, the calculation in the subquery must be recomputed for each entry in the main table. The problem is illustrated in Figure 5.15. Essentially, the DBMS starts at the top row of the main Animal table. When it sees the category is Fish, itcomputestheaveragesalepriceoffish($37.78).Thenitmovestothenextrowand computes the average sale price for dogs. When the DBMS encounters the thirdrowandseesFishagain,itmustrecomputetheaverageforfish(still$37.78).Recomputing the average sale price for every single row in the main query is time-consuming. To compute an average, the DBMS must go through every row in the table that has the same category of animal. Consider a relatively small query of100,000rowsandfivecategoriesofanimals.Onaverage,thereare20,000rowsper category. To recompute the average each time, the DBMS will have to retrieve 100,000 * 20,000 or 2,000,000,000 rows!

Unfortunately, you cannot just tell the manager that it is impossible to answer this important business question. Is there an efficientway to answer this ques-tion? The answer illustrates the power of SQL and highlights the importance of thinking about the problem before you try to write a query. The problem with the correlated subquery lies in the fact that it has to continually recompute the average for each category. Think about how you might solve this problem by hand. You wouldfirstmakeatablethatlistedtheaverageforeachcategoryandthensimplylook up the appropriate value when you needed it. As shown in Figure 5.16, the same approach can be used with SQL. Just create the query for the averages using GROUPBYandsaveit.ThenjoinittotheAnimalandSaleAnimaltablestodothe comparison.

Today, you probably do not have to worry too much about the performance of correlated subquries. The high-end DBMSs have good query optimizers that can recognizetheproblemandautomaticallyfindthesolutiontocomputethevaluesquickly and store them in a cache. However, some queries still require hand tun-

Fish $10.80Dog $156.66Fish $19.80Cat $173.99Cat $251.59Dog $183.38Fish $1.80Dog $150.11Dog $148.47

Category SalePrice

Animal + SaleAnimal

Bird $176.57Cat $169.73Dog $174.20Fish $37.78Mammal $80.72Reptile $181.83Spider $118.16

Category AvgOfSalePrice

Saved Query

Animal.Category=Query1.Category

Figure 5.16Moreefficientsolution.CreateandsaveaquerytocomputetheaveragesusingGROUPBYCategory.ThenjointhequerytotheAnimalandSaleAnimaltablestodo the comparison.

Page 15: DBMSPost-Chapter05-401

234Chapter 5: Advanced Queries and Subqueries

ing. Also, you need to remember to look for different ways to approach queries. The solution in Figure 5.16 is much easier to read and verify that the answer is correct.

More Features and Tricks with SQL SELECTWhat tricky problems arise and how do you handle them in SQL? As you may have noticed, the SQL SELECT command is powerful and has plenty of op-tions. There are even more features and tricks that you should know about. Busi-nessquestionscanbedifficulttoanswer.Ithelpstostudydifferentexamplestogain a wider perspective on the problems and solutions you will encounter. One ofthefirstbigquestionsyouwillfaceistheneedtocombine rows from different tables. You also need to know how to handle several other complications, such as joining tables with multiple columns or inequality joins.

UNION, INTERSECT, EXCEPTUp to this point, the tables you have encountered have contained unique columns of data. The JOIN command links tables together so that a query can display and compare different columns of data from tables. Occasionally you will encounter a different type of problem where you need to combine rows of data from similar tables. The UNION operator is designed to accomplish this task.Asanexample,assumeyouworkforacompanythathasofficesinLosAngeles

andNewYork.Eachofficemaintainsitsowndatabase.EachofficehasanEm-ployeefilethatcontainsstandarddataaboutitsemployees.Theofficesarelinkedby a network, so you have access to both tables (call them EmployeeEast and EmployeeWest).ButthecorporatemanagersoftenwanttosearchtheentireEm-ployeefile—forexample, todeterminetotalemployeesalariesof themarketingdepartment. One solution might be to run their basic query twice (once on each table)andthencombinetheresultsbyhand.

SELECTEID,Name,Phone,Salary,‘East’AsOfficeFROM EmployeeEastUNIONSELECTEID,Name,Phone,Salary,‘West’AsOfficeFROMEmployeeWest;

EID Name Phone Salary Office352876372

890361

JonesInezStoiko

SmytheKim

335287367632

98037736

45,00047,00038,000

62,00073,000

EastEastEast

WestWest

Figure 5.17The UNION operator combines rows of data from two SELECT statements. The columns in both SELECT lines must match. The query is usually saved and used when managers need to search across both tables. Note the use of a new, constant column(Office)totrackthesourceofthedata.

Page 16: DBMSPost-Chapter05-401

235Chapter 5: Advanced Queries and Subqueries

As shown in Figure 5.17, the easier solution is to use the UNION operator to create a new query that combines the data from the two tables. All searches and operations performed on this new query will treat the two tables as one large table. Bycombiningthetableswithaview,eachofficecanmakechangestotheoriginaldata on its system. Whenever managers need to search across the entire company, they use the saved query, which automatically examines the data from current ver-sions of both tables.

The most important concept to remember when creating a UNION is that the data fromboth tablesmustmatch (e.g.,EID toEID,Name toName).Anotheruseful trick is to insert a constant value in the SELECT statement. In this example the constant keeps track of which table held the original data. This value can also be used to balance out a SELECT statement if one of the queries will produce a column that is not available in the other query. To make sure both queries return the same number of columns, just insert a constant value in the query that does not contain the desired column. Make sure that it contains the same type of data that is storedintheotherquery(domainsmustmatch).

The UNION command combines matching rows of data from two tables. The basic version of the command automatically eliminates duplicate rows of data. If you want to keep all the rows—even the duplications, use the command UNION ALL. Two other options for combining rows are EXCEPT and INTERSECT. Figure 5.18 shows the difference between the three commands. They all apply to setsofrowsandtheVenndiagramshowsthatthetablesmighthavesomedataincommon(areaB).TheUNIONoperatorreturnsalltherowsthatappearineitherone of the tables, but rows appearing in both tables are only listed once. The IN-TERSECToperatorreturnstherowsthatappearinbothtables(areaB).TheEX-CEPToperatorreturnsonlyrowsthatappearinthefirsttable(areaA).NoticethattheresultoftheEXCEPToperatordependsonwhichtableislistedfirst.

Multiple JOIN ColumnsSometimes you will need to join tables based on data in more than one column. In the Pet Store example, each animal belongs to some category (Cat, Dog, Fish, etc.).Eachcategoryofanimalhasdifferentbreeds.Forexample,aCatmightbe

T1 T2

A B C

AT1 EXCEPT T2BT1 INTERSECT T2A + B + CT1 UNION T2

AT1 EXCEPT T2BT1 INTERSECT T2A + B + CT1 UNION T2

Figure 5.18Operators for combining rows from two tables. UNION selects all of the rows. INTERSECTretrievesonlytherowsthatareinbothtables.EXCEPTretrievesrowsthat exist in only one table.

Page 17: DBMSPost-Chapter05-401

236Chapter 5: Advanced Queries and Subqueries

a Manx, Maine Coon, or Persian; a Dog might be a Retriever, Labrador, or St. Bernard. A portion of the class diagram is reproduced in Figure 5.19. Notice the two lines connecting the Breed and Animal tables. This relationship ensures that only breeds listed in the Breed table can be entered for each type of Animal. A real store might want to include additional features in the Breed table (such as regis-trationorganization,breeddescription,orbreedcharacteristics).Thekeypointisthat the tables must be connected by both the Category and the Breed.

In Microsoft Access QBE, the JOIN can be created by marking both columns and simultaneously dragging the two columns to the Animal table. The syntax for the SQL JOIN command is given in Figure 5.19. Simply expand the ON statement by listing both column connections. In this case, you want both sets of columns to be equal at the same time, so the statements are connected with an AND.

Reflexive JoinA reflexive join or self-join means simply that a table is joined to itself. One column in the table is used to match values in a second column in the same ta-ble. A common business example arises with an Employee table as illustrated in Figure 5.20. Employees typically have one manager. Hence the manager’s ID can be stored in the row corresponding to each employee. The table would be Employee(EID,Name, Phone, . . .,Manager).The interesting feature is that amanager is also an employee, so the Manager column actually contains a value for EID. To get the corresponding name of the manager, you need to join the Em-ployee table to itself.

The only trick with this operation is that you have to be careful with the ON condition. For instance, the following condition does not make sense: ON Em-ployee.Manager = Employee.EID. The query would try to return employees who were their own managers, which is not likely to be what you wanted. Instead, you mustusetwoinstancesoftheEmployeetableanduseanalias(say,E2)torenamethe second copy. Then the correct ON condition becomes ON Employee.Manager = E2.EID. The key to self-joins is to make sure that the columns contain the same type of data and to create an alias for the second copy of the table.

AnimalIDNameCategoryBreedDateBornGender...

CategoryBreed

Breed

Animal

SELECT *FROM Breed INNER JOIN AnimalONBreed.Category=Animal.CategoryANDBreed.Breed=Animal.Breed

Figure 5.19Multiple JOIN columns. The values in the tables are connected only when both the category and the breed match.

Page 18: DBMSPost-Chapter05-401

237Chapter 5: Advanced Queries and Subqueries

SQL1999providesanevenmorepowerful feature related to reflexive joins.Consider the employee example where you want to list all of the people who work for someone—not just the direct reports, but also the people who work for them, and the people who work for that group, and so on down the employee hierarchy tree. The standard provides the WITHRECURSIVEcommand thathas severaloptions to search a data tree. Examine the Pet Store case with the partial Employ-eetable:Employee(EmployeeID,LastName,Title,ManagerID).Youwanttostartat the top with the CEO/owner and list all of the employees who report directly to amanager.Forexample,EmployeeID1(Reeves)istheonlypersonwhoreportsdirectlytoSally(EmployeeID=11),buttwopeople(EmployeeID2and3)reportdirectly to Reeves.

The actual syntax can be slightly different across systems. Figure 5.21 shows the syntax used by SQL Server 2005. The main difference with the standard is that the standardusesWITHRECURSIVE insteadof just theWITHkeyword.Themainstepistodefinethecommontableexpressiontohandletherecursion.Yougiveauniquename(e.g.,DirectReports) to thenewexpressionandspecify thecolumnsthatwillberetrieved.Thethreemainstepsare:(1)Definetherootstart-ingpointforthetreewithaSELECTstatement,(2)DefinetherecursivememberswithasecondSELECTstatementthatlinkstothelevelabove,and(3)WritethefinalSELECTstatementtoexecutetherecursivetableandsortorgroupthere-sults.Intheexample,rootlevelisdefinedbychoosingtheownerwhodoesnotreport toanyone(ManagerID=0).Youmightneed toexamine thedata toknowhowtodefinetherootlevel—itmightbesetbytitle,orbyaNullvalueinsomecolumn. The second step is the one that does most of the work. You retrieve data from the Employee table, but the JOIN statement is the key. Notice that you join the Employee.ManagerID column to the higher-level DirectReports.EmployeeID table. The DirectReports table represents the parent level entry, and this SELECT statement will always have a similar JOIN condition. The third step is the easiest, because now you can treat the DirectReports entity as just another table. Open the

SELECTEmployee.EID,Employee.Name, Employee.Manager,E2.Name

FROM Employee INNER JOIN Employee AS E2

ONEmployee.Manager=E2.EID

EID Name . . . Manager115 Sanchez 765462 Miller 115523 Hawk 115765 Munoz 886

Employee

EID Name Manager Name115 Sanchez 765 Munoz462 Miller 115 Sanchez523 Hawk 115 Sanchez

SQL

Result

Figure 5.20ReflexiveJOINtoconnectEmployeetablewithitself.Amanagerisalsoanemployee.UseasecondcopyoftheEmployeetable(renamedtoE2)togetthemanager’s name.

Page 19: DBMSPost-Chapter05-401

238Chapter 5: Advanced Queries and Subqueries

pet store’s Employee table and work through the results given here to see how the organization structure chart is created.TheLevelcolumnisalsoausefultrick.Youdefineitwiththeroot-levelSE-

LECT statement, and increment it with the recursive SELECT. It provides an easy way to specify the distance from the root. Picture the organizational chart with Smithatthetop,followedbyReevesatLevel1,andGibsonandReasoneratLev-el 2 because both report to Reeves. The recursive query is a powerful statement. Without it, you need to write substantial code to accomplish the same task.

Listallthemanagersandtheirdirectreports.

WITHDirectReports(EmployeeID,LastName,ManagerID,Title,Level)AS( --Root/anchormember(findemployeewithnomanager) SELECTEmployeeID,LastName,ManagerID,Title,0AsLevel FROMEmployeeWHEREManagerID=0 --startinglevel UNION ALL -- Recursive members SELECTEmployee.EmployeeID,Employee.LastName, Employee.ManagerID,Employee.Title,Level+1 FROM Employee INNER JOIN DirectReports ONEmployee.ManagerID=DirectReports.EmployeeID)-- Now exectue the common table expressionSELECTManagerID,EmployeeID,LastName,Title,LevelFROM DirectReportsORDERBYLevel,ManagerID,LastName

ManagerID EmployeeID LastName Title Level0 11 Smith Owner 011 1 Reeves Manager 11 2 Gibson Manager 21 3 Reasoner Manager 22 6 Eaton Animal Friend 32 7 Farris Animal Friend 32 5 James Animal Friend 32 9 O’Connor Animal Friend 32 10 Shields Animal Friend 33 8 Carpenter Worker 33 4 Hopkins Worker 3

Figure 5.21Recursive query. The employee-manager relationship is a classic recursive example. Therecursivequeryrequiresthreesteps:(1)Definetherootlevel,(2)Definetherecursionmemberthatlinkstothehigherlevel,and(3)RuntheSELECTstatementto execute the expression and sort the results.

Page 20: DBMSPost-Chapter05-401

239Chapter 5: Advanced Queries and Subqueries

CASE FunctionSQL 92 added the CASE function to simplify certain types of queries. However, many database systems have not yet implemented all the features of SQL 92. The CASE function evaluates a set of conditions and returns a single value. Similar to theOracledecodefunction,theconditionscanbesimple(R=1)orcomplex.

Perhaps the managers want to classify the animals in Sally’s Pet Store based on their age. Figure 5.22 shows the SQL statement that would create four categories based on different ages. Note the use of date arithmetic using today’s date—Date( )—andDateBorn.Wheneverthisqueryisexecuted,itwillusethecurrentdaytoassign each animal to the appropriate category. Of course, the next logical step is torunaGROUPBYqueryagainstthisviewtocountthenumberofanimalsfall-ing within each age category.

Inequality JoinsA JOIN statement is actually just a condition. Most problems are straightforward and use a simple equality condition. For example, the following statement joins the Customer and Order tables: FROM Customer INNER JOIN Order ON (Cus-tomer.CustomerID=Order.CustomerID).

SQL supports complex conditions including inequality joins, where the com-parison ismadewith inequalityoperators (less than,greater than) insteadofanequals sign. The generic name for any inequality or equality join is a theta join.

There are a few situations where this type of join can be useful. For example, consider a common business problem. You have a table for AccountsReceivable( TransactionID,CustomerID,Amount,DateDue).Managerswouldliketocatego-rize the customer accounts and determine how many transactions are past due by 30, 90, and 120 or more days. This query can be built in a couple of ways. For instance, you could write three separate queries. However, what happens if man-agers decide to change the business rules or add a new category? Then someone hastofindyourthreequeriesandmodifythem.Amoreusefultrickistocreateanew table to hold the business rules or categories. For example, as shown in Fig-ure 5.23, create the table LateCategory(Category, MinDays, MaxDays, Charge, etc.).Thistabledefinesthelatecategoriesbasedonthenumberofdayspastdue.Now use inequality conditions to join the two tables. First, compute the number

Convertagerangesintocategories.

Select AnimalID, CASE WHENDate()-DateBorn<90Then‘Baby’ WHENDate()-DateBorn>=90 ANDDate()-DateBorn<270Then‘Young’ WHENDate()-DateBorn>=270 ANDDate()-DateBorn<365Then‘Grown’ ELSE‘Experienced’ ENDFROMAnimal;

Figure 5.22CASE function to convert DateBorn into age categories. Note the use of date arithmetic to generate descriptions that are always current.

Page 21: DBMSPost-Chapter05-401

240Chapter 5: Advanced Queries and Subqueries

ofdayslateusingthecurrentdate(Date()–AR.DateDue).Finally,comparethenumberofdayslatetominimumandmaximumvaluesspecifiedintheLateCat-egory table.

The ultimate value of this approach is that the business rules are now stored in asimpletable(LateCategory).Ifmanagerswanttochangetheconditionsoraddnew criteria, they simply alter the data in the table. You can even build a form that makes it easy for managers to see the rules and quickly make the needed changes. With any other approach, a programmer would have to rewrite the code for the queries.

Questions with “Every” Need the EXISTS ClauseSome queries need the EXISTScondition.Considerthebusinessquestion:Whichemployees have sold an animal in every category? The word every is the key here. Think about how you would answer that question if you did not have a computer. For each employee you would make a list of animal categories (Bird, Cat, Dog, etc.).ThenyouwouldgothroughthelistofAnimalSalesandcrossoffeachanimalcategorysoldbytheemployee.Whenfinished,youwouldlookattheemployeelisttoseewhichpeoplehaveeveryanimalcrossedoff(oranemptylist).Youwilldo the same thing using queries.

First, you need a query that lists all the animal categories that have not been sold by an employee. Consider the list for EmployeeID 5, with the query shown in Figure 5.24. Notice that it uses the basic NOT IN query. You could also use an OUTER JOIN, but with three tables, it is easier to use NOT IN.

Remember, if this query returns any rows at all, then the selected employee has not sold every one of the animals. What you really want then is a list of employees for whom this query returns no rows of data. In other words, the rows from this queryshouldNOTEXIST.

The next step is to examine the entire list of employees and see which ones do notretrieveanyrowsfromthequeryinFigure5.24.ThefinalqueryisshowninFigure5.24.NotethatthespecificEmployeeID5hasbeenreplacedwiththeEm-

Classifypaymentsbynumberofdayslate.

AR(TransactionID,CustomerID,Amount,DateDue)LateCategory(Category, MinDays,MaxDays,Charge,…)

Month 30 90 3%Quarter 90 120 5%Overdue 120 9999 10%

SELECT *FROMARINNERJOINLateCategoryON((Date()–AR.DateDue)>=LateCategory.MinDays)AND((Date()–AR.DateDue)<LateCategory.MaxDays)

Figure 5.23Inequalityjoin.ManagerswanttoclassifytheAccountsReceivable(AR)dataintothree categories of overdue payments. First, store the business rules/categories in a new table. Then join the table to the AR data through inequality joins.

Page 22: DBMSPost-Chapter05-401

241Chapter 5: Advanced Queries and Subqueries

ployeeID matching the value in the outer loop. Notice that this action results in a correlated subquery. Unfortunately, you cannot avoid the correlated subquery in thistypeofproblem.Thisqueryreturnsoneemployee(Reasoner)whohassoldevery type of animal.

The type of query in Figure 5.25 is commonly used to answer questions that include some reference to “every” item. In some cases, a simpler solution is to just count the number of categories for each employee. One catch to this approach isthattheDBMSmustsupporttheCount(DISTINCT)format.Ingeneral, thesecomplex questions are probably better answered with multiple queries, or with tools provided by a data warehouse approach.Thequery inFigure5.25 isan interestingapplicationof theEXISTSclause.

However, there is an easier way to answer the question. You should build a cross-tab or pivot query that counts the number of animals sold by each employee and by each category. Notice that this question contains two “by each” statements. You could write a simple query that contains both of those variables (Employee andCategory)intheGROUPBYsection.However,mostpeoplefinditeasierto

Listthecategoriesthathavenotbeensoldbyanemployee.

SELECTCategoryFROMCategoryWHERE(Category<>‘Other’ANDCategoryNOTIN(SELECTAnimal.CategoryFROMAnimalINNERJOIN(SaleINNERJOINSaleAnimalONSale.SaleID=SaleAnimal.SaleID)ONAnimal.AnimalID=SaleAnimal.AnimalIDWHERESale.EmployeeID=5))

Figure 5.24List the animal categories that have not been sold by EmployeeID 5. Use a basic NOT IN query.

Whichemployeeshavesoldanimalsfromeverycategory?

SELECTEmployee.EmployeeID,Employee.LastNameFROM EmployeeWHERE Not EXISTS(SELECTCategoryFROMCategoryWHERE(Category<>‘Other’)AndCategoryNOTIN(SELECTAnimal.CategoryFROMAnimalINNERJOIN(SaleINNERJOINSaleAnimalONSale.SaleID=SaleAnimal.SaleID)ONAnimal.AnimalID=SaleAnimal.AnimalIDWHERESale.EmployeeID=Employee.EmployeeID));

Figure 5.25ExampleofNOTEXISTSclause.Listtheemployeeswhohavesoldananimalfromeverycategory(except“Other”).

Page 23: DBMSPost-Chapter05-401

242Chapter 5: Advanced Queries and Subqueries

readtheresultsiftheyarepresentedinatable,withoneGroupByvariable(Em-ployee)astherowsandtheother(Category)asthecolumns.Theneachcellcancontainthecountofthenumberofanimalssoldforaspecificemployeeinagivecategory.

Figure 5.26 shows the basic query. Microsoft Access has a simpler crosstab query, but with traditional SQL, you need to compute each column separately. Hence, you have to use the CASE function to select each category of animal—

Whichemployeeshavesoldanimalsfromeverycategory?

SELECTEmployee.EmployeeID,Employee.LastName, Count(CASECategoryWHEN'Bird'THEN1END)AsBird, Count(CASECategoryWHEN'Cat'THEN1END)AsCat, Count(CASECategoryWHEN'Dog'THEN1END)AsDog, Count(CASECategoryWHEN'Fish'THEN1END)AsFish, Count(CASECategoryWHEN'Mammal'THEN1END)AsMammal, Count(CASECategoryWHEN'Reptile'THEN1END)AsReptile, Count(CASECategoryWHEN'Spider'THEN1END)AsSpiderFROM Employee INNERJOINSaleONSale.EmployeeID=Employee.EmployeeIDINNERJOINSaleAnimalONSale.SaleID=SaleAnimal.SaleIDINNERJOINAnimalONAnimal.AnimalID=SaleAnimal.AnimalIDGROUPBYEmployee.EmployeeID,Employee.LastNameORDERBYEmployee.LastName

Figure 5.26Crosstab query. The columns are built using the CASE statement to select each specificcategory.TherowsareformedbytheGROUPBYclause.NotethatOracleuses the DECODE function instead of the CASE statement.

EID LastName Bird Cat Dog Fish Mammal Reptile Spider8 Carpenter 2 6 10 2 1 0 06 Eaton 1 5 6 0 1 0 07 Farris 1 6 16 0 0 2 02 Gibson 2 6 12 2 1 0 24 Hopkins 2 4 14 0 0 1 05 James 3 8 7 5 1 2 09 O’Connor 0 6 8 2 1 0 03 Reasoner 3 6 14 2 1 1 11 Reeves 0 0 6 0 0 0 010 Shields 1 0 2 0 0 0 0

Figure 5.26Crosstab query. The columns are built using the CASE statement to select each specificcategory.TherowsareformedbytheGROUPBYclause.NotethatOracleuses the DECODE function instead of the CASE statement.

Page 24: DBMSPost-Chapter05-401

243Chapter 5: Advanced Queries and Subqueries

which means you have to know the categories ahead of time. Essentially, you compute each column separately by using a CASE statement to select only rows that match the group condition you want for the column.

Figure 5.27 shows the result of the crosstab query. It is relatively easy to see the types of animals sold by each employee. To answer the overall question of who sold animals from each category, you simply look for a row with no zeros. With this sample data, Reasoner has sold at least one animal from each category, but Gibsoncomescloseswithonlyonecategorycontainingazero.Technically,thisquery contains more information that required to answer the question. However, additionaldataisoftenuseful.IfyouwritetheEXISTSquerytoreturnexactlytheinformation requested, it will return no names. Oftentimes, it is preferable to see thatGibsoncomesclosetomeetingtheconditions,insteadofsimplysayingthatno one meets them exactly.

SQL SELECT SummaryThe SQL SELECT command is powerful and has many options. To help you re-member the various options, they are presented in Figure 5.28. Each DBMS has a similar listing for the SELECT command, and you should consult the relevant Help system for details to see if there are implementation differences. Remember that the WHERE clause can have subqueries. Also remember that you can use the SELECT line to perform computations, both in-line and aggregations across the rows.

Most database systems are picky about the sequence of the various components of the SELECT statement. For example, the WHERE statement should come be-foretheGROUPBYstatement.Sometimestheseerrorscanbehardtospot,soif

SELECTDISTINCTTable.Column{ASAlias},…FROMTable/QueryINNERJOINTable/QueryONT1.ColA=T2.ColBWHERE(Condition)GROUPBYColumnHAVING(GroupCondition)ORDERBYTable.Column{UNIONSecondSelect}

Figure 5.28SQL SELECT options. Remember that WHERE statements can have subqueries.

Someone SELECTFrom FROMIreland INNER JOINWill WHEREGrow GROUPBYHorseradishand HAVINGOnions ORDER BY

Figure 5.29Mnemonic to help remember the proper sequence of the SELECT operators.

Page 25: DBMSPost-Chapter05-401

244Chapter 5: Advanced Queries and Subqueries

you receive an enigmatic error message, verify that the segments are in the proper order. Figure 5.29 presents a mnemonic that may help you remember the proper sequence. Also, you should always build a query in pieces, so you can test each piece. For example, if you use aGROUPBY statement, first check the resultswithout it to be sure that the proper rows are being selected.

SQL Data Definition CommandsWhat are the SQL data definition commands? So far you have focused on only one aspect of a database: retrieving data. Clearly, you need to perform many more operations with a database. SQL was designed to handle all common operations. Onesetofcommands isdescribed in this section:datadefinitioncommands tocreate and modify the database and its tables. Note that the SQL commands can be cumbersome for these tasks. Hence, most modern database systems provide a visual or menu-driven system to assist with these tasks. The SQL commands are generally used when you need to automate some of these tasks and set up or make changes to a database from within a separate program.ThefivemostcommondatadefinitioncommandsarelistedinFigure5.30.In

buildinganewdatabase,thefirststepistoCREATE a SCHEMA. A schema is a collection of tables. In some systems, the command is equivalent to creating a newdatabase.Inothersystems,itsimplydefinesalogicalareawhereeachusercan store tables, which might or might not be in one physical database. The Au-thorization component describes the user and sets a password for security. Most DBMSs also have visually-oriented tools to perform these basic tasks. However, theSQLcommandscanbescriptedandstoredinafilethatcanberunwheneveryou need to recreate the database.

CREATE TABLEisoneofthemainSQLdatadefinitioncommands.Itisusedtodefineacompletelynewtable.Thebasiccommandliststhenameofthetablealong with the names and data types for all of the columns. Figure 5.31 shows the formatforthedatadefinitioncommands.Additionaloptionsincludetheabilitytoassign default values with the DEFAULT command.

SQL 92 provides several standard data types, but system vendors do not yet implement all of them. SQL 92 also enables you to create your own data types with the CREATE DOMAIN command. For example, to ensure consistency you couldcreateadomaincalledDomAddressthatconsistsofCHAR(35).Thenanytable that used an address column would refer to the DomAddress.

With SQL 92, you identify the primary key and foreign key relationships with constraints. SQL constraints are rules that are enforced by the database system. Figure5.32 illustrates thesyntax fordefiningbothaprimarykeyanda foreignkey for an Order table. First, notice that each constraint is given a name (e.g.,

CREATE SCHEMA AUTHORIZATION DBName PasswordCREATETABLETableName(ColumnType,…)ALTERTABLETable{Add,Column,Constraint,Drop}DROP{TableTableName|IndexIndexNameONTableName}CREATEINDEXIndexNameONTableName(ColumnASC/DESC)

Figure 5.30PrimarySQLdatadefinitioncommands.Inmostcasesyouwillavoidthesecommandsanduseavisualormenu-drivensystemtodefineandmodifytables.

Page 26: DBMSPost-Chapter05-401

245Chapter 5: Advanced Queries and Subqueries

pkOrder).Youcanchooseanyname,butyoushouldpickonethatyouwillrecog-nize later if problems arise. The primary key constraint simply lists the column or columns that make up the primary key. Note that each column in the primary key should also be marked as NOT NULL.

The foreign key constraint is easier to understand if you examine the relevant class diagram. Here you want to place orders only to customers who have data in the Customer table. That is, the CustomerID in the Order table must already exist in the Customer table. Hence, the constraint lists the column in the origi-nalOrdertableandthenspecifiesaREFERENCE to the Customer table and the CustomerID.

The ALTER TABLE and DROP TABLE commands enable you to modify the structure of an existing table. Be careful with the DROP command, as it will removetheentiretablefromthedatabase,includingitsdataandstructuraldefini-tion. The ALTER TABLE command is less drastic. It can be used to ADD or DE-LETE columns from a table. Obviously, when you drop an entire column, all the data stored in that column will be deleted. Similarly, when you add a new column, it will contain NULL values for any existing rows.

You can use the CREATEINDEXandDROPINDEXcommandstoimprovethe performance of the database. Indexes can improve performance when re-trieving data, but they can cause problems when many transactions take place in a short period of time. In general, these commands are issued once for a table. Typically, indexes are built for primary key columns. Most DBMSs automatically build those indexes.

Finally, as described in Chapter 4, the CREATE VIEW creates and saves a newquery.Thebasicsyntaxisstraightforward:CREATEVIEWmyviewASSE-LECT…. The command simply gives a name and saves any SELECT statement. Again, these commands are almost always easier to create and execute from a menu-driveninterface.However,becauseyoumayhavetocreateSQLdatadefi-nition statements by hand sometime, so it is good to know how to do so.

CREATE TABLE Customer( CustomerID INTEGERNOTNULL, LastName NVARCHAR(10), …);

ALTER TABLE Customer DROPCOLUMNZIPCode;

ALTER TABLE Customer ADDCOLUMNCellPhoneNVARCHAR(15);

Figure 5.31TheCREATETABLEcommanddefinesanewtableandallofthecolumnsthatit will contain. The NOT NULL command typically is used to identify the key column(s)forthetable.TheALTERTABLEcommandenablesyoutoaddanddeleteentire columns from an existing table.

Page 27: DBMSPost-Chapter05-401

246Chapter 5: Advanced Queries and Subqueries

SQL Data Manipulation CommandsWhat SQL commands alter the data stored in tables? A third set of SQL com-mands demonstrates the true power of SQL. The SELECT command retrieves data, whereas data manipulation commands are used to change the data within the tables. The basic commands and their syntax are displayed in Figure 5.33. These commandsareusedtoinsertdata,deleterows,andupdate(change)thevaluesofspecificcells.Remembertwopointswhenusingthesecommands:(1)Theyoper-

CREATE TABLE Order( OrderID INTEGERNOTNULL,

OrderDate DATE,CustomerID INTEGER

CONSTRAINT pkorder PRIMARYKEY(OrderID),CONSTRAINT fkorder FOREIGNKEY(CustomerID)REFERENCESCustomer(CustomerID)

);

OrderIDOrderDateCustomerID

CustomerIDLastNameFirstNameAddress…

Order Customer

*

Figure 5.32IdentifyingprimaryandforeignkeysinSQL.Keysaredefinedasconstraintsthatareenforced by the DBMS. The primary key constraint lists the columns that make up theprimarykey.Theforeignkeyliststhecolumn(CustomerID)inthecurrenttable(Order)thatislinkedtoacolumn(CustomerID)inasecondtable(Customer).

INSERTINTOtarget(column1,column2,…) VALUES(value1,value2,…)

INSERTINTOtarget(column1,column2,…) SELECT … FROM …

DELETE FROM table WHERE condition

UPDATE table SETColumn1=Value1,Column2=Value2,… WHERE condition

Figure 5.33Common SQL commands to add, delete, and change data within existing tables. The commands operate on entire sets of data, and they utilize the power of the SELECT and WHERE statements, including subqueries.

Page 28: DBMSPost-Chapter05-401

247Chapter 5: Advanced Queries and Subqueries

ateonsetsofdataatonetime—avoidthinkingintermsofindividualrows,and(2)they utilize the power of the SELECT and WHERE statements you already know.

INSERT and DELETEAs you can tell from Figure 5.32, the INSERT command has two variations. The first version (VALUES) is used to insert one rowof data at a time.Except forsome programming implementations, it is not very useful. Most database systems provide a visual or tabular data entry system that makes it easy to enter or edit sin-glerowsofdata.Generally,youwillbuildformstomakeiteasyforuserstoenterand edit single rows of data. On most systems, the data will be inserted directly to the tables. In a few cases, you might have to write your own INSERT statement.

The second version of the INSERT command is particularly useful at copy-ingdatafromonetableintoasecond(target)table.NotethatitacceptsanySE-LECT statement, including one with subqueries, making it far more powerful than it looks. For example, in the Pet Store database, you might decide to move older Animalfilestoadifferentcomputer.Tomoverecordsforallanimalsorderedbe-fore January 1, 2004, you would issue the INSERT command displayed in Figure 5.34. Notice that the subquery selects the animals based on the date they were or-dered. The INSERT command then copies the associated rows in the Animal table into an existing OldAnimals table.ThequeryinFigure5.35justcopiesthespecifiedrowstoanewtable.Thenext

step is to delete them from the main Animals table to save space and improve performance. The DELETE command performs this function easily. As Figure

INSERT INTO OldAnimalsSELECT *FROM AnimalWHERE AnimalID IN (SELECTAnimalOrderItem.AnimalID FROM AnimalOrder INNER JOIN AnimalOrderItem ONAnimalOrder.OrderID=AnimalOrderItem.OrderID WHERE(AnimalOrder.OrderDate<’01-Jan-2007’));

Figure 5.34INSERT command to copy older data rows. Note the use of the subquery to identify the rows to be copied.

DELETEFROM AnimalWHERE AnimalID IN (SELECTAnimalOrderItem.AnimalID FROM AnimalOrder INNER JOIN AnimalOrderItem ONAnimalOrder.OrderID=AnimalOrderItem.OrderID WHERE(AnimalOrder.OrderDate<’01-Jan-2007’));

Figure 5.35DELETE command to remove the older data. Use cut and paste to make sure the subquery is exactly the same as the previous query.

Page 29: DBMSPost-Chapter05-401

248Chapter 5: Advanced Queries and Subqueries

5.34illustrates,yousimplyreplacethefirsttworowsofthequery(INSERTandSELECT)withDELETE.Becarefulnot toalter thesubquery.Youcanuse thecut-and-paste feature to delete only rows that have already been copied to the backup table. Be sure you recognize the difference between the DROP and DE-LETE commands. The DROP command removes an entire table. The DELETE command deletes rows within a table.

UPDATEThe syntax of the UPDATE command is similar to the INSERT and DELETE commands. It, too, makes full use of the WHERE clause, including subqueries. The key to the UPDATE command is to remember that it acts on an entire collec-tion of rows at one time. You use the WHERE clause to specify which set of rows need to be changed.

In the example in Figure 5.36, managers wish to increase the ListPrice of the cats and dogs. The price for cats should increase by 10 percent and the price for dogs by 20 percent. Because these are two different sets of animals, you will often use two separate UPDATE statements. However, this operation provides a good use for the CASE function. You can reduce the operation to one UPDATE state-ment by replacing the 1.10 and 1.20 values with a CASE statement that selects 1.10 for Cats and 1.20 for Dogs.

The UPDATE statement has some additional features. For example, you can change several columns at the same time. Just separate the calculations with a comma. You can also build calculations from any row within the table or query. For example, an animal’s list price could take into consideration the animal’s age with the command SET ListPrice = ListPrice*(1 - 0.001*(Date( )-DateBorn)).This command takes 1/10 of 1 percent off the list price for each day since the animal was born.Notice theuseof the internalDate( ) function toprovide today’sdate in the

last example. Most database systems provide several internal functions that can be used within any calculation. These functions are not standardized, but you can generallygetalist(andthesyntaxchart)fromthesystem’sHelpcommands.TheDate, String, and Format functions are particularly useful.

When using the UPDATE command, remember that all the data in the calcula-tion must exist on one row within the query. There is no way to refer to a previous or next row within the table. If you need data from other rows or tables, you can

UPDATE AnimalSETListPrice=ListPrice*1.10WHERECategory=‘Cat’;

UPDATE AnimalSETListPrice=ListPrice*1.20WHERECategory=‘Dog’;

Figure 5.36Sample UPDATE command. If the CASE function is not available, use two separate statements to increase the list price by 10 percent for cats and 20 percent for dogs.

Page 30: DBMSPost-Chapter05-401

249Chapter 5: Advanced Queries and Subqueries

build a query to join tables. However, you can update data in only a single table at a time.

Quality: Testing QueriesHow do you know if your query is correct? The greatest challenge with complex queries is that even if you make a mistake, you usually get results. The problem is that the results are not the answer to the question you wanted to ask. The only way to ensure the results are correct is to thoroughly understand SQL, to build your queries carefully, and to test your queries.Figure5.37outlinesthebasicstepsfordealingwithcomplexqueries.Thefirst

step is to break complex queries into smaller pieces, particularly when the query involves subqueries. You need to examine and test each subquery separately. You can do the same thing with complex Boolean conditions. Start with a simple con-dition, check the results, and then add new conditions. When the subqueries are correct, use cut-and-paste techniques to combine them into one main query. If necessary, save the initial queries as views, and use a completely new query to combine the results from the views. The third step is to create sample data to test the queries. Find or create data that represents the different possible cases. Opti-mize queries that will become part of an application and run multiple times. Most DBMSs have an optimizer that will suggest performance improvements. You shouldalsolookforalternatewaystowritethequerytofindafasterapproach.

In terms of quality issues, consider the example in Figure 5.38: List custom-ers who bought dogs and also bought cat products. The query consists of four situations:

Customers bought dogs and cat products on the same sale.1. Customers bought dogs and then cat products at a different time.2. Customers bought dogs and never bought cat products.3. Customers never bought dogs but did buy cat products.4.

• Breakquestionsintosmallerpieces.• Testeachquery. • ChecktheSQL. • Lookatthedata. • Checkcomputations.• Combineintosubqueries. • Usethecut-and-pastefeaturestoreduceerrors. • Checkforcorrelatedsubqueries.• Testsampledata. • Identifydifferentcases. • Checkfinalqueryandsubqueries. • Verifycalculations.• TestSELECTqueriesbeforeexecutingUPDATEqueries.• Optimizequeriesthatrunmultipletimes. • Runaqueryoptimizer. • Thinkaboutnewwaystostructurethequery.

Figure 5.37Steps to building quality queries. Be sure there are recent backups of the database before you execute UPDATE or DELETE queries.

Page 31: DBMSPost-Chapter05-401

250Chapter 5: Advanced Queries and Subqueries

Because there are only four cases, you should create data and test each one. If there were thousands of possible cases, you might have to limit your testing to the major possibilities. Thefinalstepinbuildingqueriesinvolvesdatamanipulationqueries(suchas

UPDATE).YoushouldfirstcreateaSELECTquery thatretrieves therowsyouplan to change. Examine and test the rows to make sure they are the ones you wanttoalter.Whenyouaresatisfiedthatthequeryiscorrect,makesureyouhavea recent backup of the database—or at least a recent copy of the tables you want to change. Now you can convert the SELECT query to an UPDATE or DELETE statement and execute it.

SummaryAlways remember that SQL operates on sets of data. The SELECT command re-turns a set of data that matches some criteria. The UPDATE command changes values of data, and the DELETE command deletes rows of data that are in a speci-fiedset.SetscanbedefinedintermsofasimpleWHEREclause.Theycanalsobedefinedbycomplexconditionsinvolvingsubqueriesandmultipletables.ThekeytounderstandingSQListothinkoftheWHEREclauseasdefiningasetofdata.

Subqueries are powerful, but be careful to ensure that the query accurately rep-resents the business questions. You must test subqueries in pieces and make sure you understand exactly what each piece is returning.

In everyday situations, data can exist in one table but not another. For example, you might need a list of customers who have not placed orders recently. The prob-lem can also arise if the DBMS does not maintain referential integrity—and you needtofindwhichordershavecustomerswithnomatchingdatainthecustomertable.Outerjoins(ortheNOTINsubquery)areusefulinthesesituations.

The most important thing to remember when building queries is that if you make a mistake, most likely the query will still execute. Unfortunately, it will not give you the results you wanted. That means you have to build your queries care-

SELECTDISTINCTAnimal.Category,Sale.CustomerIDFROMSaleINNERJOIN(AnimalINNERJOINSaleAnimalONAnimal.AnimalID=SaleAnimal.AnimalID)ONSale.SaleID=SaleAnimal.SaleIDWHERE(Animal.Category=’Dog’)

ANDSale.CustomerIDIN(

SELECTDISTINCTSale.CustomerID FROMSaleINNERJOIN(MerchandiseINNERJOINSaleItem ONMerchandise.ItemID=SaleItem.ItemID) ONSale.SaleID=SaleItem.SaleID WHERE(Merchandise.Category=’Cat’));

Figure 5.38Sample query: Which customers who bought dogs also bought cat products (at any time)?Buildeachqueryseparately.ThenpastethemtogetherinSQLandaddtheconnecting link. Use sample data to test the results.

Page 32: DBMSPost-Chapter05-401

251Chapter 5: Advanced Queries and Subqueries

fully and always check your work. Begin with a smaller query and then add ele-ments until you get the query you want. To build an UPDATE or DELETE query, always start with a SELECT statement and check the results. Then change it to UPDATE or DELETE.

Key Terms

ALLALTER TABLEANYCASEconstraintcorrelated subqueryCREATE DOMAINCREATE SCHEMACREATE TABLECREATEVIEWDELETEDROP TABLEequi-joinEXCEPT EXISTS

FULL JOININinequality joinINSERTINTERSECTLEFT JOINnested queryouter joinreflexivejoinRIGHTJOINschemaself joinsubqueryUNIONUPDATE

Review Questions1. What is a subquery and in what situations is it useful?2. What is a correlated subquery and why does it present problems?3. Howdoyoufinditemsthatarenotinalist,suchascustomerswhohavenot

placed orders recently?4. How do you join tables when the JOIN column for one table contains data

that is not in the related column of the second table?5. How do you join a column in one table to a related column in the same table?6. What are inequality joins and when are they useful?7. What is the SQL UNION command and when is it useful?8. What is the purpose of the SQL CASE function?

A Developer’s ViewMiranda saw that some business questions are more complex than others. SQL sub-queries and outer joins are often used to answer these questions. Practice the SQL subqueries until you thoroughly understand them. They will save you hundreds of hours of work. Think about how long it would take to write code to answer some of the questions in this chapter! For your class project, you should create several que-ries to test your skills, including subqueries and outer joins. You should build and test some SQL UPDATE queries to change sets of data. You should be able to use SQL to create and modify tables.

Page 33: DBMSPost-Chapter05-401

252Chapter 5: Advanced Queries and Subqueries

9. WhatarethebasicSQLdatadefinitioncommands?10. What are the basic SQL data manipulation commands?11. How are UPDATE and DELETE commands similar to the SELECT

statement?

ExercisesSally’s Pet StoreWrite the SQL statements that will answer questions 1 through 25 based on the tables in the Pet Store database. Test your queries in the database. Hint: Many are easier if you split the question into multiple queries.

1. Which supplier has the highest average percentage of shipping cost per value of merchandise order?

2. Which breed of cats has the store never carried?3. What is the average merchandise value per sale as well as the average

number of items sold?4. Estimate shipping cost per animal by dividing shipping cost by the number of

animalsperorder.Computetheprofitmarginasthelistpriceminusthecostminus the estimated shipping cost. List the animals with a negative margin.

5. Computetheaverageprofitmarginforeachmerchandiseitem(ignoringshippingcosts).Computetheweightedaveragecostofanitem,andsubtractthis value from the list price for each item.

SupplierIDNameContactNamePhoneAddressZIPCodeCityID

Supplier

PONumberOrderDateReceiveDateSupplierIDEmployeeIDShippingCost

MerchandiseOrder

OrderIDOrderDateReceiveDateSupplierIDShippingCostEmployeeID

AnimalOrder

OrderIDAnimalIDCost

AnimalOrderItem

CityIDZIPCodeCityStateAreaCodePopulation2000Population1990Population1980CountryLatitudeLongitude

City

EmployeeIDLastNameFirstNamePhoneAddressZIPCodeCityIDTaxPayerIDDateHiredDateReleased

Employee

PONumberItemIDQuantityCost

OrderItem

CategoryRegistration

Category

CategoryBreed

Breed

AnimalIDNameCategoryBreedDateBornGenderRegisteredColorListPricePhoto

Animal

SaleIDSaleDateEmployeeIDCustomerIDSalesTax

Sale

SaleIDItemIDQuantitySalePrice

SaleItem

ItemIDDescriptionQuantityOnHandListPriceCategory

Merchandise

SaleIDAnimalIDSalePrice

SaleAnimal

CustomerIDPhoneFirstNameLastNameAddressZIPCodeCityID

Customer

*

*

*

*

*

*

*

*

*

**

*

*

*

**

*

**

*

Page 34: DBMSPost-Chapter05-401

253Chapter 5: Advanced Queries and Subqueries

6. Createaview(savedquery)thatcomputesthetotalvalueofasale,includingboth animals and merchandise. To be useful later, your query should return the animal, merchandise, and overall totals. Warning: Some sales have only animals and some only merchandise. Hint: Use a Left Join with the Sale table to include all rows.

7. Which employee has the highest average sales per order? Hint: Use the view created in Exercise 6.

8. Which merchandise items had the largest quantity on hand at the start of July 1? Hint: Compute total purchases then subtract total sales.

9. Which employees sold animals, but no merchandise in February? 10. Which merchandise items hold more than the average value of inventory

quantity on hand? Compute the value as QOH times list price.11. Similar to Exercise 10, but make a statistical comparison to identify

the merchandise items with an inventory value more than two standard deviations above the average value. Bonus point: assuming inventory follows a random normal distribution, what is the probability of an item being more than two standard deviations above the mean?

12. Which customers who bought more than the average value of animals (percustomer)alsoboughtmorethantheaveragevalueofmerchandise?Hint: Use the view created in Exercise 6 to compute the total purchases by customer.

13. Which employees ordered merchandise from suppliers in May, but not animals?

14. What is the total number and value of merchandise orders placed with each supplier?Hint:Youmustfirstcomputethetotalvalueperorder.

15. Create a crosstab query to display the total value of sales per month per employee. Hint: Use the view created in Exercise 6. Hint: Use Crosstab in Access,DecodeinOracle,orCASEinSQLServer(asinFigure5.25).Hint:Use a function to return the month number from a date.

16. Which merchandise items with fewer than 500 units on hand have not been ordered in 2007? Use an OUTER JOIN to answer the question.

17. Create the table shown to categorize customers based on sales. Write a query that lists each total sales by each customer and displays the proper label. Hint: Use the view created in Exercise 6.

Category Low HighWeak 0 200Good 200 800Best 800 10,000

18. List the employees and their total merchandise sales expressed as a percentage of total merchandise sales for all employees.

19. List the dogs that took longer than average dog to sell. Do not include dogs thatwerepreordered(saledatebeforetheorderdate).

Page 35: DBMSPost-Chapter05-401

254Chapter 5: Advanced Queries and Subqueries

20.ListtheemployeeswhoreporttoGibson.21. Write an INSERT query to add the Tokinese cat to the Breed table.22. Write a query to decrease the list price by 10 percent of any item that has a

QOH greater than 14 and a price greater than $50.23.UseSQLtocreateaVetstablewithcolumnsforVetID,LastName,

FirstName,andPhoneNumber.BesuretosetVetIDastheprimarykey.24.UseanINSERTcommandtocopyfivenamesfromtheCustomertableand

insertthemintotheVetstablecreatedinExercise23usingCustomerIDvalues 11-15.

25.DeleteanyentriesfromtheVetstablewithanareacodeof502.

Rolling Thunder BicyclesWrite the SQL statements that will answer questions 26 through 50 based on the tables in the Rolling Thunder database. Build your queries in Access.

26. In which state were customers less likely to use a retail store to order a bike? Hint: Look at StoreID values of 1 or 2 and compute the percentage of sales made without retail stores.

27. What is the price per gram of weight savings in switching between the latest ShimanoLXcrankandanXTcrank?

SerialNumberTubeIDQuantity

CustomerIDPhoneFirstNameLastNameAddressZIPCodeCityIDBalanceDue

Customer

CustomerIDTransDateEmployeeIDAmountDescriptionReference

CustomerTrans

StoreIDStoreNamePhoneContactFirstNameContactLastNameAddressZIPCodeCityID

RetailStore

StateTaxRate

StateTaxRate

SerialNumberCustomerIDModelTypePaintIDFrameSizeOrderDateStartDateShipDateShipEmployeeFrameAssemblerPainterConstructionWaterBottleCustomNameLetterStyleIDStoreIDEmployeeIDTopTubeChainStayHeadTubeAngleSeatTubeAngleListPriceSalePriceSalesTaxSaleStateShipPriceFramePriceComponentList

Bicycle

CityIDZipCodeCityStateAreaCodePopulation1990Population1980CountryLatitudeLongitude

Customer

ModelTypeDescriptionComponentID

ModelType

Paint

EmployeeIDTaxpayerIDLastNameFirstNameHomePhoneAddressZIPCodeCityIDDateHiredDateReleasedCurrentManagerSalaryGradeSalaryTitleWorkArea

Employee

BicycleTube

ModelTypeMSizeTopTubeChainStayTotalLengthGroundClearanceHeadTubeAngleSeatTubeAngle

ModelSize

LetterStyleDescription

LetterStyle

PurchaseIDEmployeeIDManufacturerIDTotalListShippingCostDiscountOrderDateReceiveDateAmountDue

PurchaseOrder

SerialNumberTubeNameTubeIDLength

BikeTubes

SerialNumberComponentIDSubstituteIDLocationQuantityDateInstalledEmployeeID

BikeParts

PurchaseIDComponentIDPricePaidQuantityQuantityReceived

PurchaseItem

ManufacturerIDManufacturerNameContactNamePhoneAddressZIPCodeCityIDBalanceDue

Manufacturer

CompGroupGroupNameBikeTypeYearEndYearWeight

Groupo

ComponentIDManufacturerIDProductNumberRoadCategoryLengthHeightWidthWeightYearEndYearDescriptionListPriceEstimatedCostQuantityOnHand

Component

ManufacturerIDTransactionDateEmployeeIDAmountDescriptionReference

ManufacturerTrans

TubeIDMaterialDescriptionDiameterThicknessRoundnessWeightStiffnessListPriceConstruction

TubeMaterial

GroupIDComponentID

GroupCompon

ComponentNameAssemblyOrderDescription

ComponentName

PaintIDColorNameColorStyleColorListDateIntroducedDateDiscontinued

Page 36: DBMSPost-Chapter05-401

255Chapter 5: Advanced Queries and Subqueries

28.Asapercentageoftotalnumberofdays(OrderDatetoShipDate),didconstruction(StartDateversusShipDate)representasubstantiallyhigherpercentage for different model types between 2000 and 2005?

29. Using the latest DateInstalled for a bike part, what was the average number of days to build a race bike in 2005?

30.Bypercentageofbikes,wereCampy(Campagnolo)grouposinstalledonmore race bikes or road bikes in 2005? Hint: Look at just the crank.

31. Compare the average percentage of wheel weight to total component weight forroadandracebikesversusmountain(andfullsuspension)bikesin2004.

32.Usingonequery(withasubquery),listthepercentagevalueofsalesbymodel type for 2005.

33. In 2005, what percentage of the bikes was shipped in less than three days from the order date?

34. Based on the down tube material, list the percentage of bikes ordered in 2005 constructed of each type of material.

35. Which customers bought race bikes after they had already purchased mountain or full suspension bikes?

36. List customers who bought at least one bike before 2000, but have not bought a bike since then.

37.Whichcomponentshaveaninventoryvalueonhand(basedonListPrice)greater than $10,000 but have not been installed on a bike in 2005?

38. List customers who bought bikes in both 2004 and 2005.39. List the employee name and the transaction date and amount for any

employee who participated in either a customer transaction valued over $6,000 or a manufacturer transaction valued over $100,000 in 2005. Hint: Use a UNION query.

40. Write a crosstab query (see Figure 5.25 if you are not using Microsoft Access),todisplaytotalsalesbymodeltypeandyear.Hint:Useafunctiontoreturn the year from a date.

41. List the customers who have purchased at least one road bike and one mountainbike(atanytime).Useasubquery.

42.ListalloftheemployeeswhoreporttoVenetiaan.43. In which years did the average build time for the year exceed the overall

average build time for all years?44. List the customers who have bought hard-tail mountain bikes but not yet

purchased a full-suspension mountain bike.

Page 37: DBMSPost-Chapter05-401

256Chapter 5: Advanced Queries and Subqueries

45. Use SQL to create a new CustomerSize table as shown.

Size RoadLow RoadHigh MTBLow MTBHighSmall 45 52 14 15.5Medium 52 57 15.5 17Big 57 65 17 23

46. Write the query to insert the rows of data row into the table shown in Exercise 45.

47. Write a query to change the RoadLow value for the small bike to 40 in the table for Exercise 45.

48. Using the table in Exercise 45, write a query that assigns the appropriate size label to each bicycle, then create a query to count the number of bicycles built in 2004 and 2005 that fall into each size category by model type

49.WriteaquerytodeletethefirstrowinthetableforExercise46.50. Write a query to delete the entire table created for Exercise 46.

Corner Med 51. List the visits/cases where the company needs to reimburse the patient for

overpayment—where the insurance company payment plus the patient payment exceeds the amount charged.

52. What was the average number of visits per patient in 2007?53. What percent of the visits resulted in more then $1000 in charges?

Corner

Med

Corner

Med

Patient

PatientIDLastNameFirstNameDateOfBirthGenderTelephoneAddressCityStateZIPCode

Visit

VisitIDPatientIDVisitDateInsuranceCompanyInsuranceGroupCodeInsuranceMemberCodePatientAmountPaidDateBillSubmittedDateInsurancePaidAmountInsurancePaid

PatientDiagnoses

VisitIDICD9DiagnosisComments

VisitProcedures

VisitIDICD9ProcedureEmployeeIDAmountCharged

EmployeeCategory

EmployeeCategory

ICD9DiagnosisCodes

ICD9DiagnosisShortDescription

ICD9ProcedureCodes

ICD9ProcedureShortDescription

Employee

EmployeeIDLastNameFirstNameEmployeeCategoryDateHiredDateLeftEmergencyPhone

*

* *

***

*

Page 38: DBMSPost-Chapter05-401

257Chapter 5: Advanced Queries and Subqueries

54. List the percentage of revenue generated by each physician in 2007. Use separate queries.

55. What percentage of the patient visits were not covered by insurance in 2007?56. Which patients did not have visits in 2007? Use an outer join.57. Which insurance company paid the lowest percentage of the total amount

charged?58. In 2007, how many visits involved three or more procedures?59. For 2007, what was the average number of patients per month seen by each

employee?60. Which patients with an ICD9 diagnosis code beginning with E8 have had a

procedure starting with a 9 code at the time of the diagnosis or later?61.UseSQLtocreateatable(InsuranceDelay)thatcanbeusedtocategorize

insurance company payment delays.

Category MinDays MaxDaysMonth 30 60Late 60 90TooLong 90 1000

62. Write the SQL statements to load the three rows of data into the table in Exercise 61.

63.UsethecategoriesdefinedbythetableinExercise61tocountthenumberof items in each category for each of the insurance companies—based on the number of days between submitting the bill and receiving payment. Write thefinalqueryasacrosstabtoshowthenumberofbillsthatfallintoeachcategory for each insurance company.

64. Write the SQL command to delete all of the rows in the table for Exercise 61.65. Write the SQL command to remove the table created in Exercise 61.

Web Site References

http://www.acm.org/sigmod/ AssociationforComputingMachinery—SpecialInterestGroup:ManagementofData.

http://www.acm.org/dl ACMdigitallibrarycontainingthousandsofsearchablefull-textarticles.Seeifyourlibraryhasasubscription.

http://www.oracle.com/pls/db102/portal.portal_db?selected=3

Oracleonlinedocumentationlibrary,includingSQLReference.(Versiondb102willchange.)

http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx

Microsoft SQL Server Books Online reference.

http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx

IBMDB2reference.

http://dev.mysql.com/doc/refman/5.0/en/index.html

MySQLReference.

Page 39: DBMSPost-Chapter05-401

258Chapter 5: Advanced Queries and Subqueries

Additional ReadingCelko, J., Joe Celko’s SQL Puzzles & Answers, San Mateo: Morgan Kaufmann,

1997. [Challenging SQL problems with solutions.]Iseminger, D. ed, Microsoft SQL Server 2000 Reference Library, Redmond:

Microsoft Press, 2000. [The hardcopy documentation that does not ship with SQL Server.]

Kreines, D. and K. Jacobs, Oracle SQL: The Essential Reference, Cambridge, MA: O’Reilly & Associates, 2000. [Reference book for Oracle’s version of SQL.]

Page 40: DBMSPost-Chapter05-401

259Chapter 5: Advanced Queries and Subqueries

Appendix: Introduction to ProgrammingMany books will help you learn to write computer programs. The purpose of this appendix is to review the highlights of programming and to point out some of the features that are important to programming within a DBMS. If you are new to pro-gramming, you should consider reading several other books to explain the details and logic behind programming.

Variables and DataOne of the most important consequences of programming in a database environ-ment is that there can be three categories of data: (1)datastored ina table, (2)dataheldinacontrolonaformorreport,and(3)traditionaldata variables that are used to hold temporary results. Chapter 3 focuses on storing data within tables. Chapter 6 describes how to create forms and the role of data controls. Chapter 8 provides more details of how the three types of variables interact when building applications. For now, you must learn details about basic programming variables.

Any procedure can create variables to hold data. A program variable is like a smallbox:itcanholdvaluesthatwillbeusedortransferredlater.Variableshaveunique names. More importantly, variables can hold a certain data type. Common typesofvariablesaredisplayedinFigure5.1A.Theycangenerallybeclassifiedinto three categories: integers(1,2,-10,…);reals(1.55,3.14,…);andstrings ("123MainStreet","JoseRojas",…).Eachtypeofvariabletakesupadefinedamountofstoragespace.Thisspaceaf-

fects the size of the data that the variable can hold. The exact size depends on the particular DBMS and the operating system. For example, a short integer typically takes 2 bytes of storage, which is 16 bits. Hence it can hold 216 values or numbers between–32,768and32,767.Realnumberscanhavefractionalvalues.Thereareusually two sizes: single and double precision. If you do not need many variables, it is often wise to choose the larger variables (integers and double-precision re-als).Althoughdouble-precisionvariablesrequiremorespaceandtakelonger toprocess, they provide room for expansion. If you choose too small of a variable, a user might crash your application or get invalid results. For example, it would be amistaketousea2-byteintegertocountthenumberofcustomers—sinceafirmcould generally anticipate having more than 65,000 customers. Along the same

Type Bytes RangeShortIntegerLong

FloatDoubleDecimal

String

248

4816

any

-32,768to32,767+/-2,147,483,647+/-9,223,372,036,854,775,807

+/-1.510e45(7digits)+/-5.010e324(15digits)+/-1.010e28(28digits)

any

Figure 5.1AProgram variable types. Ranges are approximate but supported by most vendors. Note that decimal variables help prevent round-off errors

Page 41: DBMSPost-Chapter05-401

260Chapter 5: Advanced Queries and Subqueries

lines, you should use the Currency data type for monetary values. In addition to handling largenumbers, itavoids round-offerrors thatarecommontofloating-point numbers.

Variable ScopeThe scope and lifetime of a variable are crucial elements of programming, particu-larlyinanevent-drivenenvironment.Variablescope refers to where the variable is accessible, that is, which procedures or code can access the data in that variable. The lifetimeidentifieswhenthevariableiscreatedandwhenitisdestroyed.Thetwo properties are related and are generally automatic. However, you can override the standard procedures by changing the way you declare the variable. In most systems, the scope and lifetime are based on where the variable is declared. Alldatavariablesshouldbeexplicitlydeclared:theyshouldbeidentifiedbe-

fore they are used. The Basic language uses a Dim statement to declare variables. Manyother languages declare variables by specifying the data typefirst.Mostcommonly, the variable is created within the event procedure and is a local vari-able. When the procedure starts, the local variable is created. Any code within that procedure can use the variable. Code in other procedures cannot see the variable. When the procedure ends, the local variable and its data are destroyed.

Figure 5.2A shows two buttons on a form. Each button responds to a Click event,sotwoproceduresaredefined.Eachprocedurecanhaveavariablecalledi1, but these two variables are completely separate. In fact, the variables are not created until the button is clicked. Think of the procedures as two different rooms. When you are in one room, you can see the data for that room only. When you leave the room, the data is destroyed.

However, what if you do not want the data to be destroyed when the code ends, or you want to access the variable from other procedures? You have two choices: (1)Change the lifetimeof thevariablebydeclaring it static,or (2)Change thescope of the variable by declaring it in a different location. You should avoid de-claringastaticvariableunlessitisabsolutelynecessary(whichisrare).Ifthevari-able is static, it keeps its value from the previous time the procedure was called. In the example, each time the button is clicked, the value for i3 will remain from the

Module Code

SubButton1_Click()Dimi1AsIntegeri1=3

End Sub

SubButton2_Click()Dimi1AsIntegeri1=7

End Sub

Figure 5.2AVariablescopeandlifetime.Eacheventhasitsownprocedurewithindependentvariables that are created and destroyed each time a routine is executed.

Page 42: DBMSPost-Chapter05-401

261Chapter 5: Advanced Queries and Subqueries

prior click. You might use this trick if you need to count the number of times the button is clicked.Amoreusefultechniqueistochangewherethevariableisdefined.Figure5.3A

showsthateventproceduresaredefinedwithinaformoramodule, which is a collection of related procedures. The variable i2isdefinedfortheentireformormodule. The lifetime of the variable is established by the form, that is, the variable is created and destroyed as the form is opened and closed. The scope of the vari-able is that all procedures in the form can see and change the value. On the other hand, procedures in other forms or modules do not know that this variable exists.Proceduresorfunctionsalsohaveascope.Anyprocedurethatyoudefineona

form can be used by other procedures on that form. If you need to access a vari-ableoraprocedurefrommanydifferentformsorreports,youshoulddefineitona separate module and then declare it as global (or public).

Be careful with global or public variables. A programmer who tries to revise your code might not know that the variable is used in other procedures and might accidentally destroy an important value. On forms the main purpose of a global variable is to transfer a value from one event to another one. For example, you might need to keep the original value of a text control—before it is changed by a user—and compare it to the new value. You need a global variable because two separateeventsexaminethetextcontrol:(1)Theuserfirstentersthecontrol,and(2)Theuserchangesthedata.Itissometimesdifficulttocreateglobalorsharedvariables in certain systems. In these cases, you might need to store the global variables within a special database table.

ComputationsOne of the main purposes of variables is to perform calculations. Keep in mind that these computations apply to individual variables—one piece of data at a time. If you need to manipulate data in an entire table, it is usually best to use the SQL

Form--Module Code

SubButton2_Click()i2=i2+7End Sub

Form

Button1Button2

Dimi2AsInteger

SubButton1_Click()i2=20End Sub

Figure 5.3AGlobalvariables.Variablesthataredefinedintheform’sGeneralsectionareaccessiblebyanyfunctiononthatform(ormodule).

Page 43: DBMSPost-Chapter05-401

262Chapter 5: Advanced Queries and Subqueries

commands described in Chapter 5. Nonetheless, there are times when you need more complex calculations.

Standard arithmeticoperations(add,subtract,multiply,anddivide)areshownin Figure 5.4A. These operators are common to most programming languages. Some nonstandard, but useful operators include exponentiation (raise to a power, e.g.,2^3=2*2*2=8),andintegerdivide(e.g.,9\2=4),whichalwaysreturnsan integer value. The mod function returns the modulus or remainder of an integer division(e.g.,15mod4=3,since15-12=3).Theselasttwofunctionsareuse-fulwhenyouneedtoknowhowmanyofsomeobjectswillfitintoafixedspace.For example, if there are 50 possible lines on a page and you need to print a re-port with 185 lines, then 185 \ 50 = 3 pages, and 185 Mod 50 leaves 35 lines on a fourth page.

Most languages support string variables, which are used to hold basic text data, such asnames, addresses, or shortmessages.A string is a collection (or array)of characters. Sometimes you will need to perform computations on string vari-ables. How can you perform computations on text data? The most common tech-nique is to concatenate(oradd)twostringstogether.Forexample,ifFirstNameis“George”andLastNameis“Jones”,thenFirstName&LastNameis“George-Jones”. Notice that if you want a space to appear between the names, you have to add one: FirstName & " " & LastName.

Operation Common SyntaxArithmeticExponentiationIntegerDivideModulus

+-*/^ or Power\mod

Figure 5.4ACommonarithmeticoperators.Add(+),subtract(-),multiply(*),anddivide(/).Exponentiation and integer arithmetic are often used for special tasks. For example, integer arithmetic is useful for dividing objects into groups.

Concatenation(&or+)Left,Right,Mid,orSubStrTrim, LTrim, RTrimLCase, UCaseInStr or IndexOf“Frank”+“Rose”→“FrankRose”Left(“Jackson”,5)→“Jacks”Trim(“Maria“)→“Maria”Len(“Ramanujan”)→9“8764Main”.IndexOf(““)→5

Figure 5.5ACommon string functions to add strings, extract portions, examine characters, convert case, compare two strings, and format numerical data into a string variable.

Page 44: DBMSPost-Chapter05-401

263Chapter 5: Advanced Queries and Subqueries

Figure 5.5A lists some of the common string functions. You can learn more about the functions and their syntax from the Help system. Commonly used func-tions include the Left, Right, and Mid, which examine portions of the string. For example,youmightwanttoseeonlythefirstfivecharactersontheleftsideofastring.

Standard Internal FunctionsAs you may recall from courses in mathematics, several common functions are used in a variety of situations. As shown in Figure 5.6A, these functions include the standard trigonometric and logarithmic functions, which can be useful in map-ping and procedures involving measurements. You also will need a function to computethesquarerootandabsolutevalueofnumbers.TheInt(integer)functionis useful for dropping the fractional portion of a number. Most languages also pro-vide a random number generator, which will randomly create numbers between 0 and 1. If you need another range of numbers, you can get them with a simple con-version. For example, to generate numbers between 40 and 90, use the following function:y=40+(90-40)*Rnd.

In a database environment, you will often need to evaluate and modify dates. It isalsouseful tohave functions thatprovide thecurrentdate (Date)and time(Now).TwofunctionsthatareusefulinbusinessaretheDateAdd and DateDiff functions. As illustrated in Figure 5.7A, the DateAdd function adds days to a giv-endatetofindsomedateinthefuture.TheDateDifffunctioncomputesthedif-ference between two dates. Usually, you will want to compute the number of days between various dates. However, the functions can often compute the number of months, weeks, years and so on.

Exp,LogAtn, Cos, Sin, TanSqr or SqrtAbsSgnIntRnd

x=loge(ex)

TrigonometricfunctionsSquare rootAbsolutevalue:Abs(-35)→35Signum:Sgn(-35)→-1Integer:Int(2.718)→2Random number

Figure 5.6AStandard mathematical functions. Even in business applications, you often need basic mathematical functions.

Date, Now, TimeDateAdd, DateDiff

Current date and timeDate arithmetic: DateDue=DateAdd(“d”,30,Date())

Figure 5.7ADate and time functions. Business problems often require computing the number of days between two dates or adding days to a date to determine when payments are due.

Page 45: DBMSPost-Chapter05-401

264Chapter 5: Advanced Queries and Subqueries

Input and OutputHandling input and output were crucial topics in traditional programming. These topics are still important, but the DBMS now performs most data-handling rou-tines and the operating system or Web browser handles most of the user interface. Commonformsandreports(Chapters6and7)areusedformostinputandoutputtasks.

Remember that an important feature of a Windows interface is that users con-troltheflowofdataentry;thatis,thedesignerprovidesaform,andusersworkattheir own pace without interruption. Occasionally, you might choose to interrupt theuser—eithertoprovideinformationortogetaspecificpieceofdata.Onecom-mon reason is to display error messages. Two basic functions serve this purpose: MsgBox and InputBox. As shown in Figure 5.8A, a message box can contain but-tons. The buttons are often used to indicate how the user wants to respond to some problem or error.

An InputBox is a special form that can be used to enter very small amounts of text or a single number. Neither the user nor the developer has much control over the form. In most cases you would be better off creating your own blank form. Then you can have more than one text box, and you can specify and control the buttons. The InputBox is usually for temporary use when development time is extremely limited.

Figure 5.8ASample message box. The message box interrupts the user and displays a few limited choices. It often handles errors or problems.

If(Condition1)Then Statements for trueElse Statements for falseIf(Condition2)Then Statements for true End IfEnd If

Figure 5.9AConditions. Basic conditions are straightforward. Indenting conditions highlights the relationships.

Page 46: DBMSPost-Chapter05-401

265Chapter 5: Advanced Queries and Subqueries

ConditionsThe ability to test and respond to conditions is one of the most common reasons for writing your own procedures. The basic conditional statement (if …then … else) is relativelyeasy tounderstand.Thestructure is shown inFigure5.9A.Acondition is evaluated to be true or false. If it is true, then one set of statements is executed; otherwise, the second set is performed.

Conditions can be complex, particularly when the condition contains several AND, and OR connectors. Some developers use a NOT statement to reverse the value of a condition. Be careful when writing conditions. Your goals are to make sure that the condition evaluates to the correct value and to make sure that other developers can understand the code.

You should always include parentheses to specify the order of evaluation and, for complex conditions, create sample data and test the conditions. Also, indent your code. Indenting is particularly important for nested conditions, in which the statements for one condition contain another conditional statement.

response = 1, 2, 3, 4, 5If(response=1)Then ‘ Statements for 1ElseIf(response=2)Then ‘ Statements for 2 ElseIf(response=3)Then ‘ More If statements End If End IfEnd If

Figure 5.10ANested conditions to test for a user response. The code becomes harder to read as more conditions are added.

Response = 1, 2, 3, 4, 5Select Case response Case 1 ‘ Statements for 1 Case 2 ‘ Statements for 2 Case 3 ‘ More Case statements DefaultEnd Case

Figure 5.11AThe Select statement. The select statement tests the response variable against several conditions. If the response matches a case in the list, the corresponding code is executed.

Page 47: DBMSPost-Chapter05-401

266Chapter 5: Advanced Queries and Subqueries

The Select Case statement is a special type of conditional statement. Many pro-cedures will need to evaluate a set of related conditions. As a simple example, consider what happens if you use a message box with three buttons (Yes, No, andCancel).Youwillhavetotesttheuser’schoiceforeachoption.Figure5.10Ashows how the code might look when you use nested conditions.

Figure 5.11A shows the same problem written with the Select Case statement. Note that this code is much easier to read. Now think about what will happen if you have 10 choices. The If-Then code gets much worse, but the Select Case code just adds new lines to the bottom of the list.

LoopsIteration or loops are another common feature in procedures. Although you should useSQL statements (UPDATE, INSERT, etc.) asmuch as possible, sometimesyou will need to loop through a table or query to examine each row individually.

Some of the basic loop formats are illustrated in Figure 5.12A. The For/Next loopisgenerallyusedonlyifyouneedafixednumberofiterations.TheDoloopis more common. An important feature of loops is the ability to test the condition at the top or the bottom of the loop. Consider the example in which the condition says to execute the statements if (x<=10).Whathappenswhenthestartingvalueof x is 15? If you test the condition at the top of the loop, then the statements in the loop will never be executed. On the other hand, if you test the condition at the bottom, then the statements in the loop will be executed exactly one time—before the condition is tested.

Just as with conditions, it is good programming practice to indent the state-ments of the loop. Indents help others to read your code and to understand the logic.Iftherearenoproblemswithinaloop,youreyecaneasilyfindtheendofthe loop.

Be careful with loops: if you make a mistake, the computer may execute the statements of your loop forever. (On most personal computers, Ctrl+Break will usually stop a runaway loop.)A common mistake occurs when you forget tochange the conditional variable (x in the examples). In tracking through adataquery, you might forget to get the next row of data, in which case your code will perform the same operations forever on one row of data. A good programming practiceistoalwayswriteloopsinfoursteps:(1)Writetheinitialcondition,(2)

DoUntil(x>10)‘Statementsx=x+1Loop

DoWhile(x<=10)‘Statementsx=x+1Loop

Do‘Statementsx=x+1LoopUntil(x>10)

Forx=1to10‘StatementsNext x

Figure 5.12AIteration. All versions of loops follow a common format: initialize a counter value, perform statements, increment the counter, and test the exit condition. You can test the condition at the start or end of the loop.

Page 48: DBMSPost-Chapter05-401

267Chapter 5: Advanced Queries and Subqueries

Writetheendingstatement,(3)Writeastatementtoupdatetheconditionalvari-able,and(4)Writetheinteriorcode.Thefirstthreestatementsgiveyouthestruc-ture.Bywritingandtestingthemfirst,youknowthatyouwillbeusingthecorrectdata.

SubroutinesAn important concept in programming is the ability to break the program into smaller pieces as subroutines or functions. A subroutine is a portion of code that can be called from other routines.When the subroutine is finished, control re-turns to the program code that called it. The goal of using subroutines is to break the program into smaller pieces that are relatively easy to understand, test, and modify.

A subroutine is essentially a self-contained program that can be used by many other parts of the program. For example, you might create a subroutine that dis-plays a status message on the screen. As illustrated in Figure 5.13A, you would write the basic routine once. Then anytime you need to display a status message, your program calls this routine. By passing the message to the subroutine, the ac-tual message can change each time. The advantage of using the subroutine is that you have to write it only once. In addition, your status messages can be standard-izedbecausethesubroutinespecifiesthelocation,style,andcolor.Tochangetheformat, you simply modify the few lines of code in the one subroutine. Without thesubroutine,youwouldhavetofindandmodifycodeineverylocationthatdis-played a status message.

A data variable that is passed to a function or a subroutine is known as a pa-rameter. There are two basic ways to pass a parameter: by reference and by value. The default method used by Microsoft Access is pass-by-reference. In this case the variable in the subroutine is essentially the same variable as in the original program. Any changes made to the data in the subroutine will automatically be returned to the calling program. For example, consider the two examples in Figure 5.14A. Changes to the variable j2 in the subroutine will automatically be passed back to the calling program. However, when only the value is passed, a copy is made in the subroutine. Changes made to the data in the subroutine will not be returned to the calling program. Unless you are absolutely certain that you want to

Mainprogram…StatusMessage “Tryingtoconnect.”…StatusMessage “Verifyingaccess.”…Endmainprogram

SubStatusMessage (Msg AsString)‘ DisplayMsg,location,color

End Sub

Figure 5.13ASubroutine. The StatusMessge subroutine can be called from any location. When the subroutineisfinished,itreturnstothecallingprogram.

Page 49: DBMSPost-Chapter05-401

268Chapter 5: Advanced Queries and Subqueries

alter the original value in the calling program, you should always pass variables byvalue.Subroutinesthatusepass-by-referencecancauseerrorsthataredifficulttofindinprograms.Someotherprogrammermightnotrealizethatyoursubrou-tine changed the value of a parameter.

Most languages also enable you to create new functions. There is a slight techni-cal difference between functions and subroutines. Although subroutines and func-tions can receive or return data through pass-by-reference parameters, a function can return a result or a single value directly to the calling program. For instance, your main program might have a statement such as v1 = Min(x, y).Thefunctionwould choose the smaller of the two values and return it to the main program, where it is assigned to the variable v1.

SummaryThe only way to learn how to program is to write your own programs. Reading books, syntax documentation, and studying code written by others will help, but the only way to become a programmer is through experience.Asyouwriteprograms, remember thatyou (or someoneelse)mighthave to

modify your code later. Choose descriptive variable names. Document your state-ments with comments that explain tricky sections and outline the purpose of each section of code. Write in small sections and subroutines. Test each section, and keep the test data and results in the documentation. Keep revision notes so that you know when each section was changed and why you changed it.

Main:j=3DoSum(j)‘jisnowequalto8…

SubDoSum(ByRefj2AsInteger)j2=8End Sub

By ReferenceChangestodatainthesubroutine are passed back to thecallingprogram.

Main:j=3DoSum(j)‘jisstillequalto3…

SubDoSum(ByValj2AsInteger)J2=8End Sub

ByValueCreates a copy of the variable, sochangesarenotreturned.

Figure 5.14ATwo methods to pass data to a subroutine. Pass parameters by value as much as possible to avoid unwanted changes to data.