10.1.1.115.1881

705
www.dbeBooks.com - An Ebook Library

Upload: balu9876

Post on 12-May-2015

1.868 views

Category:

Documents


0 download

TRANSCRIPT

  • 1.www.dbeBooks.com - An Ebook Library

2. In Praise of Computer Architecture: A Quantitative ApproachFourth EditionThe multiprocessor is here and it can no longer be avoided. As we bid farewellto single-core processors and move into the chip multiprocessing age, it is greattiming for a new edition of Hennessy and Pattersons classic. Few books have hadas signicant an impact on the way their discipline is taught, and the current edi-tion will ensure its place at the top for some time to come. Luiz Andr Barroso, Google Inc.What do the following have in common: Beatles tunes, HP calculators, choco-late chip cookies, and Computer Architecture? They are all classics that havestood the test of time. Robert P. Colwell, Intel lead architectNot only does the book provide an authoritative reference on the concepts thatall computer architects should be familiar with, but it is also a good starting pointfor investigations into emerging areas in the eld. Krisztin Flautner, ARM Ltd.The best keeps getting better! This new edition is updated and very relevant tothe key issues in computer architecture today. Plus, its new exercise paradigm ismuch more useful for both students and instructors. Norman P. Jouppi, HP LabsComputer Architecture builds on fundamentals that yielded the RISC revolution,including the enablers for CISC translation. Now, in this new edition, it clearlyexplains and gives insight into the latest microarchitecture techniques needed forthe new generation of multithreaded multicore processors. Marc Tremblay, Fellow & VP, Chief Architect, Sun MicrosystemsThis is a great textbook on all key accounts: pedagogically superb in exposingthe ideas and techniques that dene the art of computer organization and design,stimulating to read, and comprehensive in its coverage of topics. The rst editionset a standard of excellence and relevance; this latest edition does it again.Milos Ercegovac, UCLA Theyve done it again. Hennessy and Patterson emphatically demonstrate whythey are the doyens of this deep and shifting eld. Fallacy: Computer architectureisnt an essential subject in the information age. Pitfall: You dont need the 4thedition of Computer Architecture.Michael D. Smith, Harvard University 3. Hennessy and Patterson have done it again! The 4th edition is a classic encorethat has been adapted beautifully to meet the rapidly changing constraints oflate-CMOS-era technology. The detailed case studies of real processor productsare especially educational, and the text reads so smoothly that it is difcult to putdown. This book is a must-read for students and professionals alike! Pradip Bose, IBMThis latest edition of Computer Architecture is sure to provide students with thearchitectural framework and foundation they need to become inuential archi-tects of the future. Ravishankar Iyer, Intel Corp.As technology has advanced, and design opportunities and constraints havechanged, so has this book. The 4th edition continues the tradition of presentingthe latest in innovations with commercial impact, alongside the foundational con-cepts: advanced processor and memory system design techniques, multithreadingand chip multiprocessors, storage systems, virtual machines, and other concepts.This book is an excellent resource for anybody interested in learning the architec-tural concepts underlying real commercial products.Gurindar Sohi, University of WisconsinMadisonI am very happy to have my students study computer architecture using this fan-tastic book and am a little jealous for not having written it myself.Mateo Valero, UPC, BarcelonaHennessy and Patterson continue to evolve their teaching methods with thechanging landscape of computer system design. Students gain unique insight intothe factors inuencing the shape of computer architecture design and the poten-tial research directions in the computer systems eld. Dan Connors, University of Colorado at BoulderWith this revision, Computer Architecture will remain a must-read for all com-puter architecture students in the coming decade. Wen-mei Hwu, University of Illinois at UrbanaChampaignThe 4th edition of Computer Architecture continues in the tradition of providinga relevant and cutting edge approach that appeals to students, researchers, anddesigners of computer systems. The lessons that this new edition teaches willcontinue to be as relevant as ever for its readers.David Brooks, Harvard UniversityWith the 4th edition, Hennessy and Patterson have shaped Computer Architec-ture back to the lean focus that made the 1st edition an instant classic. Mark D. Hill, University of WisconsinMadison 4. Computer ArchitectureA Quantitative ApproachFourth Edition 5. John L. Hennessy is the president of Stanford University, where he has been a member of thefaculty since 1977 in the departments of electrical engineering and computer science. Hen-nessy is a Fellow of the IEEE and ACM, a member of the National Academy of Engineering andthe National Academy of Science, and a Fellow of the American Academy of Arts and Sciences.Among his many awards are the 2001 Eckert-Mauchly Award for his contributions to RISC tech-nology, the 2001 Seymour Cray Computer Engineering Award, and the 2000 John von Neu-mann Award, which he shared with David Patterson. He has also received seven honorarydoctorates.In 1981, he started the MIPS project at Stanford with a handful of graduate students. After com-pleting the project in 1984, he took a one-year leave from the university to cofound MIPS Com-puter Systems, which developed one of the rst commercial RISC microprocessors. After beingacquired by Silicon Graphics in 1991, MIPS Technologies became an independent company in1998, focusing on microprocessors for the embedded marketplace. As of 2006, over 500 millionMIPS microprocessors have been shipped in devices ranging from video games and palmtopcomputers to laser printers and network switches.David A. Patterson has been teaching computer architecture at the University of California,Berkeley, since joining the faculty in 1977, where he holds the Pardee Chair of Computer Sci-ence. His teaching has been honored by the Abacus Award from Upsilon Pi Epsilon, the Distin-guished Teaching Award from the University of California, the Karlstrom Award from ACM, andthe Mulligan Education Medal and Undergraduate Teaching Award from IEEE. Patterson re-ceived the IEEE Technical Achievement Award for contributions to RISC and shared the IEEEJohnson Information Storage Award for contributions to RAID. He then shared the IEEE Johnvon Neumann Medal and the C & C Prize with John Hennessy. Like his co-author, Patterson is aFellow of the American Academy of Arts and Sciences, ACM, and IEEE, and he was elected to theNational Academy of Engineering, the National Academy of Sciences, and the Silicon Valley En-gineering Hall of Fame. He served on the Information Technology Advisory Committee to theU.S. President, as chair of the CS division in the Berkeley EECS department, as chair of the Com-puting Research Association, and as President of ACM. This record led to a Distinguished ServiceAward from CRA.At Berkeley, Patterson led the design and implementation of RISC I, likely the rst VLSI reducedinstruction set computer. This research became the foundation of the SPARC architecture, cur-rently used by Sun Microsystems, Fujitsu, and others. He was a leader of the Redundant Arraysof Inexpensive Disks (RAID) project, which led to dependable storage systems from many com-panies. He was also involved in the Network of Workstations (NOW) project, which led to clustertechnology used by Internet companies. These projects earned three dissertation awards fromthe ACM. His current research projects are the RAD Lab, which is inventing technology for reli-able, adaptive, distributed Internet services, and the Research Accelerator for Multiple Proces-sors (RAMP) project, which is developing and distributing low-cost, highly scalable, parallelcomputers based on FPGAs and open-source hardware and software. 6. Computer Architecture A Quantitative Approach Fourth Edition John L. Hennessy Stanford University David A. Patterson University of California at BerkeleyWith Contributions byAndrea C. Arpaci-DusseauDiana FranklinUniversity of WisconsinMadison California Polytechnic State University, San Luis ObispoRemzi H. Arpaci-Dusseau David GoldbergUniversity of WisconsinMadison Xerox Palo Alto Research CenterKrste AsanovicWen-mei W. HwuMassachusetts Institute of Technology University of Illinois at UrbanaChampaignRobert P. Colwell Norman P. JouppiR&E Colwell & Associates, Inc.HP LabsThomas M. Conte Timothy M. PinkstonNorth Carolina State University University of Southern CaliforniaJos DuatoJohn W. SiasUniversitat Politcnica de Valncia and SimulaUniversity of Illinois at UrbanaChampaignDavid A. WoodUniversity of WisconsinMadison Amsterdam Boston Heidelberg London New York Oxford Paris San Diego San Francisco Singapore Sydney Tokyo 7. Publisher Denise E. M. PenroseProject Manager Dusty Friedman, The Book CompanyIn-house Senior Project Manager Brandy LillyDevelopmental Editor Nate McFaddenEditorial Assistant Kimberlee HonjoCover Design Elisabeth Beller and Ross Carron DesignCover Image Richard IAnsons Collection: Lonely Planet ImagesComposition Nancy LoganText Design: Rebecca Evans & AssociatesTechnical Illustration David Ruppe, Impact PublicationsCopyeditor Ken Della PentaProofreader Jamie ThamanIndexer Nancy BallPrinter Maple-Vail Book Manufacturing GroupMorgan Kaufmann Publishers is an Imprint of Elsevier500 Sansome Street, Suite 400, San Francisco, CA 94111This book is printed on acid-free paper. 1990, 1996, 2003, 2007 by Elsevier, Inc.All rights reserved.Published 1990. Fourth edition 2007Designations used by companies to distinguish their products are often claimed as trademarks or reg-istered trademarks. In all instances in which Morgan Kaufmann Publishers is aware of a claim, theproduct names appear in initial capital or all capital letters. Readers, however, should contact theappropriate companies for more complete information regarding trademarks and registration.Permissions may be sought directly from Elseviers Science & Technology Rights Department inOxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail: [email protected]. You may also complete your request on-line via the Elsevier Science homepage ( http://elsevier.com), by selecting Customer Support and then Obtaining Permissions.Library of Congress Cataloging-in-Publication DataHennessy, John L. Computer architecture : a quantitative approach / John L. Hennessy, DavidA. Patterson ; with contributions by Andrea C. Arpaci-Dusseau . . . [et al.].4th ed.p.cm. Includes bibliographical references and index. ISBN 13: 978-0-12-370490-0 (pbk. : alk. paper) ISBN 10: 0-12-370490-1 (pbk. : alk. paper) 1. Computer architecture. I.Patterson, David A. II. Arpaci-Dusseau, Andrea C. III. Title.QA76.9.A73P377 2006004.22dc222006024358For all information on all Morgan Kaufmann publications,visit our website at www.mkp.com or www.books.elsevier.comPrinted in the United States of America06 07 08 09 10 5 4 3 2 1 8. To Andrea, Linda, and our four sons 9. Forewordby Fred Weber, President and CEO of MetaRAM, Inc.I am honored and privileged to write the foreword for the fourth edition of thismost important book in computer architecture. In the rst edition, Gordon Bell,my rst industry mentor, predicted the books central position as the denitivetext for computer architecture and design. He was right. I clearly remember theexcitement generated by the introduction of this work. Rereading it now, withsignicant extensions added in the three new editions, has been a pleasure allover again. No other work in computer architecturefrankly, no other work Ihave read in any eldso quickly and effortlessly takes the reader from igno-rance to a breadth and depth of knowledge. This book is dense in facts and gures, in rules of thumb and theories, inexamples and descriptions. It is stuffed with acronyms, technologies, trends, for-mulas, illustrations, and tables. And, this is thoroughly appropriate for a work onarchitecture. The architects role is not that of a scientist or inventor who willdeeply study a particular phenomenon and create new basic materials or tech-niques. Nor is the architect the craftsman who masters the handling of tools tocraft the nest details. The architects role is to combine a thorough understand-ing of the state of the art of what is possible, a thorough understanding of the his-torical and current styles of what is desirable, a sense of design to conceive aharmonious total system, and the condence and energy to marshal this knowl-edge and available resources to go out and get something built. To accomplishthis, the architect needs a tremendous density of information with an in-depthunderstanding of the fundamentals and a quantitative approach to ground histhinking. That is exactly what this book delivers. As computer architecture has evolvedfrom a world of mainframes, mini-computers, and microprocessors, to a world dominated by microprocessors, andnow into a world where microprocessors themselves are encompassing all thecomplexity of mainframe computersHennessy and Patterson have updatedtheir book appropriately. The rst edition showcased the IBM 360, DEC VAX,and Intel 80x86, each the pinnacle of its class of computer, and helped introducethe world to RISC architecture. The later editions focused on the details of the80x86 and RISC processors, which had come to dominate the landscape. This lat-est edition expands the coverage of threading and multiprocessing, virtualizationix 10. x I Computer Architecture and memory hierarchy, and storage systems, giving the reader context appropri- ate to todays most important directions and setting the stage for the next decade of design. It highlights the AMD Opteron and SUN Niagara as the best examples of the x86 and SPARC (RISC) architectures brought into the new world of multi- processing and system-on-a-chip architecture, thus grounding the art and science in real-world commercial examples. The rst chapter, in less than 60 pages, introduces the reader to the taxono- mies of computer design and the basic concerns of computer architecture, gives an overview of the technology trends that drive the industry, and lays out a quan- titative approach to using all this information in the art of computer design. The next two chapters focus on traditional CPU design and give a strong grounding in the possibilities and limits in this core area. The nal three chapters build out an understanding of system issues with multiprocessing, memory hierarchy, and storage. Knowledge of these areas has always been of critical importance to the computer architect. In this era of system-on-a-chip designs, it is essential for every CPU architect. Finally the appendices provide a great depth of understand- ing by working through specic examples in great detail. In design it is important to look at both the forest and the trees and to move easily between these views. As you work through this book you will nd plenty of both. The result of great architecture, whether in computer design, building design or textbook design, is to take the customers requirements and desires and return a design that causes that customer to say, Wow, I didnt know that was possible. This book succeeds on that measure and will, I hope, give you as much pleasure and value as it has me. 11. ContentsForewordixPreface xvAcknowledgments xxiiiChapter 1 Fundamentals of Computer Design1.1Introduction21.2Classes of Computers41.3Dening Computer Architecture 81.4Trends in Technology 141.5Trends in Power in Integrated Circuits 171.6Trends in Cost 191.7Dependability251.8Measuring, Reporting, and Summarizing Performance281.9Quantitative Principles of Computer Design 371.10 Putting It All Together: Performance and Price-Performance 441.11 Fallacies and Pitfalls 481.12 Concluding Remarks 521.13 Historical Perspectives and References 54 Case Studies with Exercises by Diana Franklin55Chapter 2 Instruction-Level Parallelism and Its Exploitation2.1Instruction-Level Parallelism: Concepts and Challenges 662.2Basic Compiler Techniques for Exposing ILP 742.3Reducing Branch Costs with Prediction802.4Overcoming Data Hazards with Dynamic Scheduling892.5Dynamic Scheduling: Examples and the Algorithm 972.6Hardware-Based Speculation1042.7Exploiting ILP Using Multiple Issue and Static Scheduling 114xi 12. xii I Contents2.8Exploiting ILP Using Dynamic Scheduling, Multiple Issue, and Speculation1182.9Advanced Techniques for Instruction Delivery and Speculation 1212.10 Putting It All Together: The Intel Pentium 4 1312.11 Fallacies and Pitfalls 1382.12 Concluding Remarks 1402.13 Historical Perspective and References141 Case Studies with Exercises by Robert P. Colwell 142Chapter 3 Limits on Instruction-Level Parallelism3.1Introduction 1543.2Studies of the Limitations of ILP1543.3Limitations on ILP for Realizable Processors 1653.4Crosscutting Issues: Hardware versus Software Speculation1703.5Multithreading: Using ILP Support to Exploit Thread-Level Parallelism 1723.6Putting It All Together: Performance and Efciency in Advanced Multiple-Issue Processors1793.7Fallacies and Pitfalls 1833.8Concluding Remarks 1843.9Historical Perspective and References185 Case Study with Exercises by Wen-mei W. Hwu and John W. Sias 185Chapter 4 Multiprocessors and Thread-Level Parallelism4.1Introduction 1964.2Symmetric Shared-Memory Architectures2054.3Performance of Symmetric Shared-Memory Multiprocessors 2184.4Distributed Shared Memory and Directory-Based Coherence2304.5Synchronization: The Basics2374.6Models of Memory Consistency: An Introduction2434.7Crosscutting Issues2464.8Putting It All Together: The Sun T1 Multiprocessor 2494.9Fallacies and Pitfalls 2574.10 Concluding Remarks 2624.11 Historical Perspective and References264 Case Studies with Exercises by David A. Wood 264Chapter 5 Memory Hierarchy Design5.1Introduction 2885.2Eleven Advanced Optimizations of Cache Performance 2935.3Memory Technology and Optimizations310 13. Contents I xiii 5.4Protection: Virtual Memory and Virtual Machines315 5.5Crosscutting Issues: The Design of Memory Hierarchies324 5.6Putting It All Together: AMD Opteron Memory Hierarchy326 5.7Fallacies and Pitfalls 335 5.8Concluding Remarks 341 5.9Historical Perspective and References342Case Studies with Exercises by Norman P. Jouppi342 Chapter 6 Storage Systems 6.1Introduction 358 6.2Advanced Topics in Disk Storage358 6.3Denition and Examples of Real Faults and Failures 366 6.4I/O Performance, Reliability Measures, and Benchmarks371 6.5A Little Queuing Theory379 6.6Crosscutting Issues390 6.7Designing and Evaluating an I/O SystemThe InternetArchive Cluster392 6.8 Putting It All Together: NetApp FAS6000 Filer 397 6.9 Fallacies and Pitfalls399 6.10 Concluding Remarks 403 6.11 Historical Perspective and References404Case Studies with Exercises by Andrea C. Arpaci-Dusseau andRemzi H. Arpaci-Dusseau404Appendix A Pipelining: Basic and Intermediate Concepts A.1IntroductionA-2 A.2The Major Hurdle of PipeliningPipeline HazardsA-11 A.3How Is Pipelining Implemented? A-26 A.4What Makes Pipelining Hard to Implement? A-37 A.5Extending the MIPS Pipeline to Handle Multicycle OperationsA-47 A.6Putting It All Together: The MIPS R4000 Pipeline A-56 A.7Crosscutting IssuesA-65 A.8Fallacies and Pitfalls A-75 A.9Concluding Remarks A-76 A.10 Historical Perspective and ReferencesA-77Appendix B Instruction Set Principles and Examples B.1IntroductionB-2 B.2Classifying Instruction Set Architectures B-3 B.3Memory Addressing B-7 B.4Type and Size of OperandsB-13 B.5Operations in the Instruction SetB-14 14. xiv I ContentsB.6Instructions for Control Flow B-16B.7Encoding an Instruction Set B-21B.8Crosscutting Issues: The Role of CompilersB-24B.9Putting It All Together: The MIPS ArchitectureB-32B.10 Fallacies and PitfallsB-39B.11 Concluding RemarksB-45B.12 Historical Perspective and References B-47Appendix CReview of Memory HierarchyC.1Introduction C-2C.2Cache Performance C-15C.3Six Basic Cache Optimizations C-22C.4Virtual MemoryC-38C.5Protection and Examples of Virtual Memory C-47C.6Fallacies and PitfallsC-56C.7Concluding RemarksC-57C.8Historical Perspective and References C-58Companion CD AppendicesAppendix DEmbedded SystemsUpdated by Thomas M. ConteAppendix EInterconnection NetworksRevised by Timothy M. Pinkston and Jos DuatoAppendix FVector ProcessorsRevised by Krste AsanovicAppendix GHardware and Software for VLIW and EPICAppendix HLarge-Scale Multiprocessors and Scientic Applications Appendix I Computer Arithmeticby David GoldbergAppendix JSurvey of Instruction Set ArchitecturesAppendix KHistorical Perspectives and ReferencesOnline Appendix (textbooks.elsevier.com/0123704901)Appendix LSolutions to Case Study ExercisesReferences R-1Index I-1 15. PrefaceWhy We Wrote This BookThrough four editions of this book, our goal has been to describe the basic princi-ples underlying what will be tomorrows technological developments. Ourexcitement about the opportunities in computer architecture has not abated, andwe echo what we said about the eld in the rst edition: It is not a dreary scienceof paper machines that will never work. No! Its a discipline of keen intellectualinterest, requiring the balance of marketplace forces to cost-performance-power,leading to glorious failures and some notable successes.Our primary objective in writing our rst book was to change the way peoplelearn and think about computer architecture. We feel this goal is still valid andimportant. The eld is changing daily and must be studied with real examplesand measurements on real computers, rather than simply as a collection of deni-tions and designs that will never need to be realized. We offer an enthusiasticwelcome to anyone who came along with us in the past, as well as to those whoare joining us now. Either way, we can promise the same quantitative approachto, and analysis of, real systems.As with earlier versions, we have strived to produce a new edition that willcontinue to be as relevant for professional engineers and architects as it is forthose involved in advanced computer architecture and design courses. As muchas its predecessors, this edition aims to demystify computer architecture throughan emphasis on cost-performance-power trade-offs and good engineering design.We believe that the eld has continued to mature and move toward the rigorousquantitative foundation of long-established scientic and engineering disciplines.This EditionThe fourth edition of Computer Architecture: A Quantitative Approach may bethe most signicant since the rst edition. Shortly before we started this revision,Intel announced that it was joining IBM and Sun in relying on multiple proces-sors or cores per chip for high-performance designs. As the rst gure in thebook documents, after 16 years of doubling performance every 18 months, sin- xv 16. xvi I Prefacegle-processor performance improvement has dropped to modest annual improve-ments. This fork in the computer architecture road means that for the rst time inhistory, no one is building a much faster sequential processor. If you want yourprogram to run signicantly faster, say, to justify the addition of new features,youre going to have to parallelize your program. Hence, after three editions focused primarily on higher performance byexploiting instruction-level parallelism (ILP), an equal focus of this edition isthread-level parallelism (TLP) and data-level parallelism (DLP). While earliereditions had material on TLP and DLP in big multiprocessor servers, now TLPand DLP are relevant for single-chip multicores. This historic shift led us tochange the order of the chapters: the chapter on multiple processors was the sixthchapter in the last edition, but is now the fourth chapter of this edition. The changing technology has also motivated us to move some of the contentfrom later chapters into the rst chapter. Because technologists predict muchhigher hard and soft error rates as the industry moves to semiconductor processeswith feature sizes 65 nm or smaller, we decided to move the basics of dependabil-ity from Chapter 7 in the third edition into Chapter 1. As power has become thedominant factor in determining how much you can place on a chip, we alsobeefed up the coverage of power in Chapter 1. Of course, the content and exam-ples in all chapters were updated, as we discuss below. In addition to technological sea changes that have shifted the contents of thisedition, we have taken a new approach to the exercises in this edition. It is sur-prisingly difcult and time-consuming to create interesting, accurate, and unam-biguous exercises that evenly test the material throughout a chapter. Alas, theWeb has reduced the half-life of exercises to a few months. Rather than workingout an assignment, a student can search the Web to nd answers not long after abook is published. Hence, a tremendous amount of hard work quickly becomesunusable, and instructors are denied the opportunity to test what students havelearned. To help mitigate this problem, in this edition we are trying two new ideas.First, we recruited experts from academia and industry on each topic to write theexercises. This means some of the best people in each eld are helping us to cre-ate interesting ways to explore the key concepts in each chapter and test thereaders understanding of that material. Second, each group of exercises is orga-nized around a set of case studies. Our hope is that the quantitative example ineach case study will remain interesting over the years, robust and detailed enoughto allow instructors the opportunity to easily create their own new exercises,should they choose to do so. Key, however, is that each year we will continue torelease new exercise sets for each of the case studies. These new exercises willhave critical changes in some parameters so that answers to old exercises will nolonger apply.Another signicant change is that we followed the lead of the third edition ofComputer Organization and Design (COD) by slimming the text to include thematerial that almost all readers will want to see and moving the appendices that 17. Preface I xviisome will see as optional or as reference material onto a companion CD. Therewere many reasons for this change:1. Students complained about the size of the book, which had expanded from 594 pages in the chapters plus 160 pages of appendices in the rst edition to 760 chapter pages plus 223 appendix pages in the second edition and then to 883 chapter pages plus 209 pages in the paper appendices and 245 pages in online appendices. At this rate, the fourth edition would have exceeded 1500 pages (both on paper and online)!2. Similarly, instructors were concerned about having too much material to cover in a single course.3. As was the case for COD, by including a CD with material moved out of the text, readers could have quick access to all the material, regardless of their ability to access Elseviers Web site. Hence, the current editions appendices will always be available to the reader even after future editions appear.4. This exibility allowed us to move review material on pipelining, instruction sets, and memory hierarchy from the chapters and into Appendices A, B, and C. The advantage to instructors and readers is that they can go over the review material much more quickly and then spend more time on the advanced top- ics in Chapters 2, 3, and 5. It also allowed us to move the discussion of some topics that are important but are not core course topics into appendices on the CD. Result: the material is available, but the printed book is shorter. In this edition we have 6 chapters, none of which is longer than 80 pages, while in the last edition we had 8 chapters, with the longest chapter weighing in at 127 pages.5. This package of a slimmer core print text plus a CD is far less expensive to manufacture than the previous editions, allowing our publisher to signi- cantly lower the list price of the book. With this pricing scheme, there is no need for a separate international student edition for European readers.Yet another major change from the last edition is that we have moved theembedded material introduced in the third edition into its own appendix, Appen-dix D. We felt that the embedded material didnt always t with the quantitativeevaluation of the rest of the material, plus it extended the length of many chaptersthat were already running long. We believe there are also pedagogic advantagesin having all the embedded information in a single appendix. This edition continues the tradition of using real-world examples to demon-strate the ideas, and the Putting It All Together sections are brand new; in fact,some were announced after our book was sent to the printer. The Putting It AllTogether sections of this edition include the pipeline organizations and memoryhierarchies of the Intel Pentium 4 and AMD Opteron; the Sun T1 (Niagara) 8-processor, 32-thread microprocessor; the latest NetApp Filer; the InternetArchive cluster; and the IBM Blue Gene/L massively parallel processor. 18. xviii I PrefaceTopic Selection and OrganizationAs before, we have taken a conservative approach to topic selection, for there aremany more interesting ideas in the eld than can reasonably be covered in a treat-ment of basic principles. We have steered away from a comprehensive survey ofevery architecture a reader might encounter. Instead, our presentation focuses oncore concepts likely to be found in any new machine. The key criterion remainsthat of selecting ideas that have been examined and utilized successfully enoughto permit their discussion in quantitative terms. Our intent has always been to focus on material that is not available in equiva-lent form from other sources, so we continue to emphasize advanced contentwherever possible. Indeed, there are several systems here whose descriptionscannot be found in the literature. (Readers interested strictly in a more basicintroduction to computer architecture should read Computer Organization andDesign: The Hardware/Software Interface, third edition.)An Overview of the ContentChapter 1 has been beefed up in this edition. It includes formulas for staticpower, dynamic power, integrated circuit costs, reliability, and availability. We gointo more depth than prior editions on the use of the geometric mean and the geo-metric standard deviation to capture the variability of the mean. Our hope is thatthese topics can be used through the rest of the book. In addition to the classicquantitative principles of computer design and performance measurement, thebenchmark section has been upgraded to use the new SPEC2006 suite. Our view is that the instruction set architecture is playing less of a role todaythan in 1990, so we moved this material to Appendix B. It still uses the MIPS64architecture. For fans of ISAs, Appendix J covers 10 RISC architectures, the80x86, the DEC VAX, and the IBM 360/370. Chapters 2 and 3 cover the exploitation of instruction-level parallelism inhigh-performance processors, including superscalar execution, branch prediction,speculation, dynamic scheduling, and the relevant compiler technology. As men-tioned earlier, Appendix A is a review of pipelining in case you need it. Chapter 3surveys the limits of ILP. New to this edition is a quantitative evaluation of multi-threading. Chapter 3 also includes a head-to-head comparison of the AMD Ath-lon, Intel Pentium 4, Intel Itanium 2, and IBM Power5, each of which has madeseparate bets on exploiting ILP and TLP. While the last edition contained a greatdeal on Itanium, we moved much of this material to Appendix G, indicating ourview that this architecture has not lived up to the early claims. Given the switch in the eld from exploiting only ILP to an equal focus onthread- and data-level parallelism, we moved multiprocessor systems up to Chap-ter 4, which focuses on shared-memory architectures. The chapter begins withthe performance of such an architecture. It then explores symmetric anddistributed memory architectures, examining both organizational principles andperformance. Topics in synchronization and memory consistency models are 19. PrefaceI xixnext. The example is the Sun T1 (Niagara), a radical design for a commercialproduct. It reverted to a single-instruction issue, 6-stage pipeline microarchitec-ture. It put 8 of these on a single chip, and each supports 4 threads. Hence, soft-ware sees 32 threads on this single, low-power chip.As mentioned earlier, Appendix C contains an introductory review of cacheprinciples, which is available in case you need it. This shift allows Chapter 5 tostart with 11 advanced optimizations of caches. The chapter includes a new sec-tion on virtual machines, which offers advantages in protection, software man-agement, and hardware management. The example is the AMD Opteron, givingboth its cache hierarchy and the virtual memory scheme for its recently expanded64-bit addresses.Chapter 6, Storage Systems, has an expanded discussion of reliability andavailability, a tutorial on RAID with a description of RAID 6 schemes, and rarelyfound failure statistics of real systems. It continues to provide an introduction toqueuing theory and I/O performance benchmarks. Rather than go through a seriesof steps to build a hypothetical cluster as in the last edition, we evaluate the cost,performance, and reliability of a real cluster: the Internet Archive. The Putting ItAll Together example is the NetApp FAS6000 ler, which is based on the AMDOpteron microprocessor.This brings us to Appendices A through L. As mentioned earlier, AppendicesA and C are tutorials on basic pipelining and caching concepts. Readers relativelynew to pipelining should read Appendix A before Chapters 2 and 3, and thosenew to caching should read Appendix C before Chapter 5.Appendix B covers principles of ISAs, including MIPS64, and Appendix Jdescribes 64-bit versions of Alpha, MIPS, PowerPC, and SPARC and their multi-media extensions. It also includes some classic architectures (80x86, VAX, andIBM 360/370) and popular embedded instruction sets (ARM, Thumb, SuperH,MIPS16, and Mitsubishi M32R). Appendix G is related, in that it covers architec-tures and compilers for VLIW ISAs.Appendix D, updated by Thomas M. Conte, consolidates the embedded mate-rial in one place.Appendix E, on networks, has been extensively revised by Timothy M. Pink-ston and Jos Duato. Appendix F, updated by Krste Asanovic, includes a descrip-tion of vector processors. We think these two appendices are some of the bestmaterial we know of on each topic.Appendix H describes parallel processing applications and coherence proto-cols for larger-scale, shared-memory multiprocessing. Appendix I, by DavidGoldberg, describes computer arithmetic.Appendix K collects the Historical Perspective and References from eachchapter of the third edition into a single appendix. It attempts to give propercredit for the ideas in each chapter and a sense of the history surrounding theinventions. We like to think of this as presenting the human drama of computerdesign. It also supplies references that the student of architecture may want topursue. If you have time, we recommend reading some of the classic papers inthe eld that are mentioned in these sections. It is both enjoyable and educational 20. xx I Preface to hear the ideas directly from the creators. Historical Perspective was one of the most popular sections of prior editions.Appendix L (available at textbooks.elsevier.com/0123704901) contains solu- tions to the case study exercises in the book. Navigating the Text There is no single best order in which to approach these chapters and appendices, except that all readers should start with Chapter 1. If you dont want to read everything, here are some suggested sequences: I ILP: Appendix A, Chapters 2 and 3, and Appendices F and G I Memory Hierarchy: Appendix C and Chapters 5 and 6 I Thread-and Data-Level Parallelism: Chapter 4, Appendix H, and Appendix E I ISA: Appendices B and J Appendix D can be read at any time, but it might work best if read after the ISA and cache sequences. Appendix I can be read whenever arithmetic moves you. Chapter Structure The material we have selected has been stretched upon a consistent framework that is followed in each chapter. We start by explaining the ideas of a chapter. These ideas are followed by a Crosscutting Issues section, a feature that shows how the ideas covered in one chapter interact with those given in other chapters. This is followed by a Putting It All Together section that ties these ideas together by showing how they are used in a real machine.Next in the sequence is Fallacies and Pitfalls, which lets readers learn from the mistakes of others. We show examples of common misunderstandings and architectural traps that are difcult to avoid even when you know they are lying in wait for you. The Fallacies and Pitfalls sections is one of the most popular sec- tions of the book. Each chapter ends with a Concluding Remarks section. Case Studies with Exercises Each chapter ends with case studies and accompanying exercises. Authored by experts in industry and academia, the case studies explore key chapter concepts and verify understanding through increasingly challenging exercises. Instructors should nd the case studies sufciently detailed and robust to allow them to cre- ate their own additional exercises. Brackets for each exercise () indicate the text sections of primary relevance to completing the exercise. We hope this helps readers to avoid exercises for which they havent read the corresponding section, in addition to providing the source for review. Note that we provide solutions to the case study 21. Preface I xxiexercises in Appendix L. Exercises are rated, to give the reader a sense of theamount of time required to complete an exercise:[10] Less than 5 minutes (to read and understand)[15] 515 minutes for a full answer[20] 1520 minutes for a full answer[25] 1 hour for a full written answer[30] Short programming project: less than 1 full day of programming[40] Signicant programming project: 2 weeks of elapsed time[Discussion] Topic for discussion with othersA second set of alternative case study exercises are available for instructorswho register at textbooks.elsevier.com/0123704901. This second set will berevised every summer, so that early every fall, instructors can download a new setof exercises and solutions to accompany the case studies in the book.Supplemental MaterialsThe accompanying CD contains a variety of resources, including the following:I Reference appendicessome guest authored by subject expertscovering arange of advanced topicsI Historical Perspectives material that explores the development of the keyideas presented in each of the chapters in the textI Search engine for both the main text and the CD-only contentAdditional resources are available at textbooks.elsevier.com/0123704901. Theinstructor site (accessible to adopters who register at textbooks.elsevier.com)includes:I Alternative case study exercises with solutions (updated yearly)I Instructor slides in PowerPointI Figures from the book in JPEG and PPT formatsThe companion site (accessible to all readers) includes:I Solutions to the case study exercises in the textI Links to related material on the WebI List of errataNew materials and links to other resources available on the Web will beadded on a regular basis. 22. xxii I Preface Helping Improve This Book Finally, it is possible to make money while reading this book. (Talk about cost- performance!) If you read the Acknowledgments that follow, you will see that we went to great lengths to correct mistakes. Since a book goes through many print- ings, we have the opportunity to make even more corrections. If you uncover any remaining resilient bugs, please contact the publisher by electronic mail ([email protected]). The rst reader to report an error with a x that we incor- porate in a future printing will be rewarded with a $1.00 bounty. Please check the errata sheet on the home page (textbooks.elsevier.com/0123704901) to see if the bug has already been reported. We process the bugs and send the checks about once a year or so, so please be patient.We welcome general comments to the text and invite you to send them to a separate email address at [email protected]. Concluding Remarks Once again this book is a true co-authorship, with each of us writing half the chapters and an equal share of the appendices. We cant imagine how long it would have taken without someone else doing half the work, offering inspiration when the task seemed hopeless, providing the key insight to explain a difcult concept, supplying reviews over the weekend of chapters, and commiserating when the weight of our other obligations made it hard to pick up the pen. (These obligations have escalated exponentially with the number of editions, as one of us was President of Stanford and the other was President of the Association for Computing Machinery.) Thus, once again we share equally the blame for what you are about to read.John Hennessy I David Patterson 23. AcknowledgmentsAlthough this is only the fourth edition of this book, we have actually creatednine different versions of the text: three versions of the rst edition (alpha, beta,and nal) and two versions of the second, third, and fourth editions (beta andnal). Along the way, we have received help from hundreds of reviewers andusers. Each of these people has helped make this book better. Thus, we have cho-sen to list all of the people who have made contributions to some version of thisbook.Contributors to the Fourth EditionLike prior editions, this is a community effort that involves scores of volunteers.Without their help, this edition would not be nearly as polished.ReviewersKrste Asanovic, Massachusetts Institute of Technology; Mark Brehob, Universityof Michigan; Sudhanva Gurumurthi, University of Virginia; Mark D. Hill, Uni-versity of WisconsinMadison; Wen-mei Hwu, University of Illinois at UrbanaChampaign; David Kaeli, Northeastern University; Ramadass Nagarajan, Univer-sity of Texas at Austin; Karthikeyan Sankaralingam, Univeristy of Texas at Aus-tin; Mark Smotherman, Clemson University; Gurindar Sohi, University ofWisconsinMadison; Shyamkumar Thoziyoor, University of Notre Dame, Indi-ana; Dan Upton, University of Virginia; Sotirios G. Ziavras, New Jersey Instituteof TechnologyFocus GroupKrste Asanovic, Massachusetts Institute of Technology; Jos Duato, UniversitatPolitcnica de Valncia and Simula; Antonio Gonzlez, Intel and UniversitatPolitcnica de Catalunya; Mark D. Hill, University of WisconsinMadison; LevG. Kirischian, Ryerson University; Timothy M. Pinkston, University of SouthernCalifornia xxiii 24. xxiv I Acknowledgments Appendices Krste Asanovic, Massachusetts Institute of Technology (Appendix F); Thomas M. Conte, North Carolina State University (Appendix D); Jos Duato, Universi- tat Politcnica de Valncia and Simula (Appendix E); David Goldberg, Xerox PARC (Appendix I); Timothy M. Pinkston, University of Southern California (Appendix E) Case Studies with Exercises Andrea C. Arpaci-Dusseau, University of WisconsinMadison (Chapter 6); Remzi H. Arpaci-Dusseau, University of WisconsinMadison (Chapter 6); Robert P. Col- well, R&E Colwell & Assoc., Inc. (Chapter 2); Diana Franklin, California Poly- technic State University, San Luis Obispo (Chapter 1); Wen-mei W. Hwu, University of Illinois at UrbanaChampaign (Chapter 3); Norman P. Jouppi, HP Labs (Chapter 5); John W. Sias, University of Illinois at UrbanaChampaign (Chapter 3); David A. Wood, University of WisconsinMadison (Chapter 4) Additional Material John Mashey (geometric means and standard deviations in Chapter 1); Chenming Hu, University of California, Berkeley (wafer costs and yield parameters in Chapter 1); Bill Brantley and Dan Mudgett, AMD (Opteron memory hierarchy evaluation in Chapter 5); Mendel Rosenblum, Stanford and VMware (virtual machines in Chapter 5); Aravind Menon, EPFL Switzerland (Xen measurements in Chapter 5); Bruce Baumgart and Brewster Kahle, Internet Archive (IA cluster in Chapter 6); David Ford, Steve Kleiman, and Steve Miller, Network Appliances (FX6000 information in Chapter 6); Alexander Thomasian, Rutgers (queueing theory in Chapter 6) Finally, a special thanks once again to Mark Smotherman of Clemson Univer- sity, who gave a nal technical reading of our manuscript. Mark found numerous bugs and ambiguities, and the book is much cleaner as a result. This book could not have been published without a publisher, of course. We wish to thank all the Morgan Kaufmann/Elsevier staff for their efforts and sup- port. For this fourth edition, we particularly want to thank Kimberlee Honjo who coordinated surveys, focus groups, manuscript reviews and appendices, and Nate McFadden, who coordinated the development and review of the case studies. Our warmest thanks to our editor, Denise Penrose, for her leadership in our continu- ing writing saga. We must also thank our university staff, Margaret Rowland and Cecilia Pracher, for countless express mailings, as well as for holding down the fort at Stanford and Berkeley while we worked on the book. Our nal thanks go to our wives for their suffering through increasingly early mornings of reading, thinking, and writing. 25. AcknowledgmentsI xxvContributors to Previous EditionsReviewersGeorge Adams, Purdue University; Sarita Adve, University of Illinois at UrbanaChampaign; Jim Archibald, Brigham Young University; Krste Asanovic, Massa-chusetts Institute of Technology; Jean-Loup Baer, University of Washington;Paul Barr, Northeastern University; Rajendra V. Boppana, University of Texas,San Antonio; Doug Burger, University of Texas, Austin; John Burger, SGI;Michael Butler; Thomas Casavant; Rohit Chandra; Peter Chen, University ofMichigan; the classes at SUNY Stony Brook, Carnegie Mellon, Stanford, Clem-son, and Wisconsin; Tim Coe, Vitesse Semiconductor; Bob Colwell, Intel; DavidCummings; Bill Dally; David Douglas; Anthony Duben, Southeast MissouriState University; Susan Eggers, University of Washington; Joel Emer; BarryFagin, Dartmouth; Joel Ferguson, University of California, Santa Cruz; Carl Fey-nman; David Filo; Josh Fisher, Hewlett-Packard Laboratories; Rob Fowler,DIKU; Mark Franklin, Washington University (St. Louis); Kourosh Gharachor-loo; Nikolas Gloy, Harvard University; David Goldberg, Xerox Palo AltoResearch Center; James Goodman, University of WisconsinMadison; DavidHarris, Harvey Mudd College; John Heinlein; Mark Heinrich, Stanford; DanielHelman, University of California, Santa Cruz; Mark Hill, University of Wiscon-sinMadison; Martin Hopkins, IBM; Jerry Huck, Hewlett-Packard Laboratories;Mary Jane Irwin, Pennsylvania State University; Truman Joe; Norm Jouppi;David Kaeli, Northeastern University; Roger Kieckhafer, University ofNebraska; Earl Killian; Allan Knies, Purdue University; Don Knuth; Jeff Kuskin,Stanford; James R. Larus, Microsoft Research; Corinna Lee, University of Tor-onto; Hank Levy; Kai Li, Princeton University; Lori Liebrock, University ofAlaska, Fairbanks; Mikko Lipasti, University of WisconsinMadison; Gyula A.Mago, University of North Carolina, Chapel Hill; Bryan Martin; Norman Mat-loff; David Meyer; William Michalson, Worcester Polytechnic Institute; JamesMooney; Trevor Mudge, University of Michigan; David Nagle, Carnegie MellonUniversity; Todd Narter; Victor Nelson; Vojin Oklobdzija, University of Califor-nia, Berkeley; Kunle Olukotun, Stanford University; Bob Owens, PennsylvaniaState University; Greg Papadapoulous, Sun; Joseph Pfeiffer; Keshav Pingali,Cornell University; Bruno Preiss, University of Waterloo; Steven Przybylski; JimQuinlan; Andras Radics; Kishore Ramachandran, Georgia Institute of Technol-ogy; Joseph Rameh, University of Texas, Austin; Anthony Reeves, Cornell Uni-versity; Richard Reid, Michigan State University; Steve Reinhardt, University ofMichigan; David Rennels, University of California, Los Angeles; Arnold L.Rosenberg, University of Massachusetts, Amherst; Kaushik Roy, Purdue Univer-sity; Emilio Salgueiro, Unysis; Peter Schnorf; Margo Seltzer; Behrooz Shirazi,Southern Methodist University; Daniel Siewiorek, Carnegie Mellon University;J. P. Singh, Princeton; Ashok Singhal; Jim Smith, University of WisconsinMadison; Mike Smith, Harvard University; Mark Smotherman, Clemson Univer-sity; Guri Sohi, University of WisconsinMadison; Arun Somani, University of 26. xxvi I Acknowledgments Washington; Gene Tagliarin, Clemson University; Evan Tick, University of Ore- gon; Akhilesh Tyagi, University of North Carolina, Chapel Hill; Mateo Valero, Universidad Politcnica de Catalua, Barcelona; Anujan Varma, University of California, Santa Cruz; Thorsten von Eicken, Cornell University; Hank Walker, Texas A&M; Roy Want, Xerox Palo Alto Research Center; David Weaver, Sun; Shlomo Weiss, Tel Aviv University; David Wells; Mike Westall, Clemson Univer- sity; Maurice Wilkes; Eric Williams; Thomas Willis, Purdue University; Malcolm Wing; Larry Wittie, SUNY Stony Brook; Ellen Witte Zegura, Georgia Institute of Technology Appendices The vector appendix was revised by Krste Asanovic of the Massachusetts Insti- tute of Technology. The oating-point appendix was written originally by David Goldberg of Xerox PARC. Exercises George Adams, Purdue University; Todd M. Bezenek, University of Wisconsin Madison (in remembrance of his grandmother Ethel Eshom); Susan Eggers; Anoop Gupta; David Hayes; Mark Hill; Allan Knies; Ethan L. Miller, University of California, Santa Cruz; Parthasarathy Ranganathan, Compaq Western Research Laboratory; Brandon Schwartz, University of WisconsinMadison; Michael Scott; Dan Siewiorek; Mike Smith; Mark Smotherman; Evan Tick; Tho- mas Willis. Special Thanks Duane Adams, Defense Advanced Research Projects Agency; Tom Adams; Sarita Adve, University of Illinois at UrbanaChampaign; Anant Agarwal; Dave Albo- nesi, University of Rochester; Mitch Alsup; Howard Alt; Dave Anderson; Peter Ashenden; David Bailey; Bill Bandy, Defense Advanced Research Projects Agency; L. Barroso, Compaqs Western Research Lab; Andy Bechtolsheim; C. Gordon Bell; Fred Berkowitz; John Best, IBM; Dileep Bhandarkar; Jeff Bier, BDTI; Mark Birman; David Black; David Boggs; Jim Brady; Forrest Brewer; Aaron Brown, University of California, Berkeley; E. Bugnion, Compaqs West- ern Research Lab; Alper Buyuktosunoglu, University of Rochester; Mark Cal- laghan; Jason F. Cantin; Paul Carrick; Chen-Chung Chang; Lei Chen, University of Rochester; Pete Chen; Nhan Chu; Doug Clark, Princeton University; Bob Cmelik; John Crawford; Zarka Cvetanovic; Mike Dahlin, University of Texas, Austin; Merrick Darley; the staff of the DEC Western Research Laboratory; John DeRosa; Lloyd Dickman; J. Ding; Susan Eggers, University of Washington; Wael El-Essawy, University of Rochester; Patty Enriquez, Mills; Milos Ercegovac; Robert Garner; K. Gharachorloo, Compaqs Western Research Lab; Garth Gib- son; Ronald Greenberg; Ben Hao; John Henning, Compaq; Mark Hill, University 27. AcknowledgmentsI xxviiof WisconsinMadison; Danny Hillis; David Hodges; Urs Hoelzle, Google;David Hough; Ed Hudson; Chris Hughes, University of Illinois at UrbanaChampaign; Mark Johnson; Lewis Jordan; Norm Jouppi; William Kahan; RandyKatz; Ed Kelly; Richard Kessler; Les Kohn; John Kowaleski, Compaq ComputerCorp; Dan Lambright; Gary Lauterbach, Sun Microsystems; Corinna Lee; RubyLee; Don Lewine; Chao-Huang Lin; Paul Losleben, Defense Advanced ResearchProjects Agency; Yung-Hsiang Lu; Bob Lucas, Defense Advanced ResearchProjects Agency; Ken Lutz; Alan Mainwaring, Intel Berkeley Research Labs; AlMarston; Rich Martin, Rutgers; John Mashey; Luke McDowell; SebastianMirolo, Trimedia Corporation; Ravi Murthy; Biswadeep Nag; Lisa Noordergraaf,Sun Microsystems; Bob Parker, Defense Advanced Research Projects Agency;Vern Paxson, Center for Internet Research; Lawrence Prince; Steven Przybylski;Mark Pullen, Defense Advanced Research Projects Agency; Chris Rowen; Marg-aret Rowland; Greg Semeraro, University of Rochester; Bill Shannon; BehroozShirazi; Robert Shomler; Jim Slager; Mark Smotherman, Clemson University;the SMT research group at the University of Washington; Steve Squires, DefenseAdvanced Research Projects Agency; Ajay Sreekanth; Darren Staples; CharlesStapper; Jorge Stol; Peter Stoll; the students at Stanford and Berkeley whoendured our rst attempts at creating this book; Bob Supnik; Steve Swanson;Paul Taysom; Shreekant Thakkar; Alexander Thomasian, New Jersey Institute ofTechnology; John Toole, Defense Advanced Research Projects Agency; Kees A.Vissers, Trimedia Corporation; Willa Walker; David Weaver; Ric Wheeler, EMC;Maurice Wilkes; Richard Zimmerman.John Hennessy I David Patterson 28. 1.1Introduction21.2Classes of Computers41.3Dening Computer Architecture 81.4Trends in Technology 141.5Trends in Power in Integrated Circuits 171.6Trends in Cost 191.7Dependability251.8Measuring, Reporting, and Summarizing Performance281.9Quantitative Principles of Computer Design 371.10 Putting It All Together: Performance and Price-Performance 441.11 Fallacies and Pitfalls 481.12 Concluding Remarks 521.13 Historical Perspectives and References 54 Case Studies with Exercises by Diana Franklin55 29. 1Fundamentals ofComputer DesignAnd now for something completely different.Monty Pythons Flying Circus 30. 2 I Chapter One Fundamentals of Computer Design 1.1IntroductionComputer technology has made incredible progress in the roughly 60 years sincethe rst general-purpose electronic computer was created. Today, less than $500will purchase a personal computer that has more performance, more main mem-ory, and more disk storage than a computer bought in 1985 for 1 million dollars.This rapid improvement has come both from advances in the technology used tobuild computers and from innovation in computer design.Although technological improvements have been fairly steady, progress aris-ing from better computer architectures has been much less consistent. During therst 25 years of electronic computers, both forces made a major contribution,delivering performance improvement of about 25% per year. The late 1970s sawthe emergence of the microprocessor. The ability of the microprocessor to ridethe improvements in integrated circuit technology led to a higher rate of improve-mentroughly 35% growth per year in performance.This growth rate, combined with the cost advantages of a mass-producedmicroprocessor, led to an increasing fraction of the computer business beingbased on microprocessors. In addition, two signicant changes in the computermarketplace made it easier than ever before to be commercially successful with anew architecture. First, the virtual elimination of assembly language program-ming reduced the need for object-code compatibility. Second, the creation ofstandardized, vendor-independent operating systems, such as UNIX and itsclone, Linux, lowered the cost and risk of bringing out a new architecture.These changes made it possible to develop successfully a new set of architec-tures with simpler instructions, called RISC (Reduced Instruction Set Computer)architectures, in the early 1980s. The RISC-based machines focused the attentionof designers on two critical performance techniques, the exploitation of instruction-level parallelism (initially through pipelining and later through multiple instructionissue) and the use of caches (initially in simple forms and later using more sophisti-cated organizations and optimizations).The RISC-based computers raised the performance bar, forcing prior archi-tectures to keep up or disappear. The Digital Equipment Vax could not, and so itwas replaced by a RISC architecture. Intel rose to the challenge, primarily bytranslating x86 (or IA-32) instructions into RISC-like instructions internally,allowing it to adopt many of the innovations rst pioneered in the RISC designs.As transistor counts soared in the late 1990s, the hardware overhead of translat-ing the more complex x86 architecture became negligible.Figure 1.1 shows that the combination of architectural and organizationalenhancements led to 16 years of sustained growth in performance at an annualrate of over 50%a rate that is unprecedented in the computer industry.The effect of this dramatic growth rate in the 20th century has been twofold.First, it has signicantly enhanced the capability available to computer users. Formany applications, the highest-performance microprocessors of today outper-form the supercomputer of less than 10 years ago. 31. 1.1 Introduction I3 10,000Intel Xeon, 3.6 GHz 64-bit Intel Xeon, 3.6 GHz6505AMD Opteron, 2.2 GHz 5764Intel Pentium 4,3.0 GHz 5364 4195AMD Athlon, 1.6 GHzIntel Pentium III, 1.0 GHz2584 Alpha 21264A, 0.7 GHz1779Alpha 21264, 0.6 GHz 1267 1000Alpha 21164, 0.6 GHz993Alpha 21164, 0.5 GHz649Performance (vs. VAX-11/780)481 Alpha 21164, 0.3 GHz280 Alpha 21064A, 0.3 GHz18320%PowerPC 604, 0.1GHz117100Alpha 21064, 0.2 GHz 80 HP PA-RISC, 0.05 GHz 51IBM RS6000/5402452%/year MIPS M2000 18 MIPS M/120 13 10 Sun-4/2609VAX 8700 5VAX-11/780 25%/year1.5, VAX-11/785 0 197819801982 1984 1986 19881990 199219941996 199820002002 2004 2006Figure 1.1 Growth in processor performance since the mid-1980s. This chart plots performance relative to theVAX 11/780 as measured by the SPECint benchmarks (see Section 1.8). Prior to the mid-1980s, processor perfor-mance growth was largely technology driven and averaged about 25% per year. The increase in growth to about52% since then is attributable to more advanced architectural and organizational ideas. By 2002, this growth led to adifference in performance of about a factor of seven. Performance for oating-point-oriented calculations hasincreased even faster. Since 2002, the limits of power, available instruction-level parallelism, and long memorylatency have slowed uniprocessor performance recently, to about 20% per year. Since SPEC has changed over theyears, performance of newer machines is estimated by a scaling factor that relates the performance for two differentversions of SPEC (e.g., SPEC92, SPEC95, and SPEC2000).Second, this dramatic rate of improvement has led to the dominance ofmicroprocessor-based computers across the entire range of the computer design.PCs and Workstations have emerged as major products in the computer industry.Minicomputers, which were traditionally made from off-the-shelf logic or fromgate arrays, have been replaced by servers made using microprocessors. Main-frames have been almost replaced with multiprocessors consisting of small num-bers of off-the-shelf microprocessors. Even high-end supercomputers are beingbuilt with collections of microprocessors.These innovations led to a renaissance in computer design, which emphasizedboth architectural innovation and efcient use of technology improvements. Thisrate of growth has compounded so that by 2002, high-performance microproces-sors are about seven times faster than what would have been obtained by relyingsolely on technology, including improved circuit design. 32. 4 I Chapter One Fundamentals of Computer DesignHowever, Figure 1.1 also shows that this 16-year renaissance is over. Since2002, processor performance improvement has dropped to about 20% per yeardue to the triple hurdles of maximum power dissipation of air-cooled chips, littleinstruction-level parallelism left to exploit efciently, and almost unchangedmemory latency. Indeed, in 2004 Intel canceled its high-performance uniproces-sor projects and joined IBM and Sun in declaring that the road to higher perfor-mance would be via multiple processors per chip rather than via fasteruniprocessors. This signals a historic switch from relying solely on instruction-level parallelism (ILP), the primary focus of the rst three editions of this book,to thread-level parallelism (TLP) and data-level parallelism (DLP), which arefeatured in this edition. Whereas the compiler and hardware conspire to exploitILP implicitly without the programmers attention, TLP and DLP are explicitlyparallel, requiring the programmer to write parallel code to gain performance.This text is about the architectural ideas and accompanying compilerimprovements that made the incredible growth rate possible in the last century,the reasons for the dramatic change, and the challenges and initial promisingapproaches to architectural ideas and compilers for the 21st century. At the coreis a quantitative approach to computer design and analysis that uses empiricalobservations of programs, experimentation, and simulation as its tools. It is thisstyle and approach to computer design that is reected in this text. This book waswritten not only to explain this design style, but also to stimulate you to contrib-ute to this progress. We believe the approach will work for explicitly parallelcomputers of the future just as it worked for the implicitly parallel computers ofthe past. 1.2Classes of ComputersIn the 1960s, the dominant form of computing was on large mainframescom-puters costing millions of dollars and stored in computer rooms with multipleoperators overseeing their support. Typical applications included business dataprocessing and large-scale scientic computing. The 1970s saw the birth of theminicomputer, a smaller-sized computer initially focused on applications in sci-entic laboratories, but rapidly branching out with the popularity of time-sharingmultiple users sharing a computer interactively through independentterminals. That decade also saw the emergence of supercomputers, which werehigh-performance computers for scientic computing. Although few in number,they were important historically because they pioneered innovations that latertrickled down to less expensive computer classes. The 1980s saw the rise of thedesktop computer based on microprocessors, in the form of both personal com-puters and workstations. The individually owned desktop computer replacedtime-sharing and led to the rise of serverscomputers that provided larger-scaleservices such as reliable, long-term le storage and access, larger memory, andmore computing power. The 1990s saw the emergence of the Internet and theWorld Wide Web, the rst successful handheld computing devices (personal digi- 33. 1.2Classes of ComputersI 5FeatureDesktop ServerEmbeddedPrice of system$500$5000$5000$5,000,000$10$100,000 (including network routers at the high end)Price of microprocessor$50$500$200$10,000$0.01$100 (per processor)module (per processor) (per processor)Critical system design issuesPrice-performance,Throughput, availability, Price, power consumption, graphics performancescalability application-specic performanceFigure 1.2 A summary of the three mainstream computing classes and their system characteristics. Note thewide range in system price for servers and embedded systems. For servers, this range arises from the need for verylarge-scale multiprocessor systems for high-end transaction processing and Web server applications. The total num-ber of embedded processors sold in 2005 is estimated to exceed 3 billion if you include 8-bit and 16-bit microproces-sors. Perhaps 200 million desktop computers and 10 million servers were sold in 2005.tal assistants or PDAs), and the emergence of high-performance digital consumerelectronics, from video games to set-top boxes. The extraordinary popularity ofcell phones has been obvious since 2000, with rapid improvements in functionsand sales that far exceed those of the PC. These more recent applications useembedded computers, where computers are lodged in other devices and theirpresence is not immediately obvious. These changes have set the stage for a dramatic change in how we view com-puting, computing applications, and the computer markets in this new century.Not since the creation of the personal computer more than 20 years ago have weseen such dramatic changes in the way computers appear and in how they areused. These changes in computer use have led to three different computing mar-kets, each characterized by different applications, requirements, and computingtechnologies. Figure 1.2 summarizes these mainstream classes of computingenvironments and their important characteristics.Desktop ComputingThe rst, and still the largest market in dollar terms, is desktop computing. Desk-top computing spans from low-end systems that sell for under $500 to high-end,heavily congured workstations that may sell for $5000. Throughout this rangein price and capability, the desktop market tends to be driven to optimize price-performance. This combination of performance (measured primarily in terms ofcompute performance and graphics performance) and price of a system is whatmatters most to customers in this market, and hence to computer designers. As aresult, the newest, highest-performance microprocessors and cost-reduced micro-processors often appear rst in desktop systems (see Section 1.6 for a discussionof the issues affecting the cost of computers).Desktop computing also tends to be reasonably well characterized in terms ofapplications and benchmarking, though the increasing use of Web-centric, inter-active applications poses new challenges in performance evaluation. 34. 6 I Chapter One Fundamentals of Computer DesignServersAs the shift to desktop computing occurred, the role of servers grew to providelarger-scale and more reliable le and computing services. The World Wide Webaccelerated this trend because of the tremendous growth in the demand andsophistication of Web-based services. Such servers have become the backbone oflarge-scale enterprise computing, replacing the traditional mainframe.For servers, different characteristics are important. First, dependability is crit-ical. (We discuss dependability in Section 1.7.) Consider the servers runningGoogle, taking orders for Cisco, or running auctions on eBay. Failure of suchserver systems is far more catastrophic than failure of a single desktop, sincethese servers must operate seven days a week, 24 hours a day. Figure 1.3 esti-mates revenue costs of downtime as of 2000. To bring costs up-to-date, Ama-zon.com had $2.98 billion in sales in the fall quarter of 2005. As there were about2200 hours in that quarter, the average revenue per hour was $1.35 million. Dur-ing a peak hour for Christmas shopping, the potential loss would be many timeshigher.Hence, the estimated costs of an unavailable system are high, yet Figure 1.3and the Amazon numbers are purely lost revenue and do not account for lostemployee productivity or the cost of unhappy customers.A second key feature of server systems is scalability. Server systems oftengrow in response to an increasing demand for the services they support or anincrease in functional requirements. Thus, the ability to scale up the computingcapacity, the memory, the storage, and the I/O bandwidth of a server is crucial.Lastly, servers are designed for efcient throughput. That is, the overall per-formance of the serverin terms of transactions per minute or Web pages served Annual losses (millions of $) with downtime of Cost of downtime per 1%0.5%0.1%Applicationhour (thousands of $) (87.6 hrs/yr)(43.8 hrs/yr)(8.8 hrs/yr)Brokerage operations$6450$565$283 $56.5Credit card authorization $2600$228$114 $22.8Package shipping services $150$13 $6.6 $1.3Home shopping channel $113 $9.9 $4.9 $1.0Catalog sales center $90 $7.9 $3.9 $0.8Airline reservation center $89 $7.9 $3.9 $0.8Cellular service activation$41 $3.6 $1.8 $0.4Online network fees$25 $2.2 $1.1 $0.2ATM service fees $14 $1.2 $0.6 $0.1Figure 1.3 The cost of an unavailable system is shown by analyzing the cost of downtime (in terms of immedi-ately lost revenue), assuming three different levels of availability, and that downtime is distributed uniformly.These data are from Kembel [2000] and were collected and analyzed by Contingency Planning Research. 35. 1.2 Classes of Computers I 7per secondis what is crucial. Responsiveness to an individual request remainsimportant, but overall efciency and cost-effectiveness, as determined by howmany requests can be handled in a unit time, are the key metrics for most servers.We return to the issue of assessing performance for different types of computingenvironments in Section 1.8.A related category is supercomputers. They are the most expensive comput-ers, costing tens of millions of dollars, and they emphasize oating-point perfor-mance. Clusters of desktop computers, which are discussed in Appendix H, havelargely overtaken this class of computer. As clusters grow in popularity, the num-ber of conventional supercomputers is shrinking, as are the number of companieswho make them.Embedded ComputersEmbedded computers are the fastest growing portion of the computer market.These devices range from everyday machinesmost microwaves, most washingmachines, most printers, most networking switches, and all cars contain simpleembedded microprocessorsto handheld digital devices, such as cell phones andsmart cards, to video games and digital set-top boxes.Embedded computers have the widest spread of processing power and cost.They include 8-bit and 16-bit processors that may cost less than a dime, 32-bitmicroprocessors that execute 100 million instructions per second and cost under$5, and high-end processors for the newest video games or network switches thatcost $100 and can execute a billion instructions per second. Although the rangeof computing power in the embedded computing market is very large, price is akey factor in the design of computers for this space. Performance requirementsdo exist, of course, but the primary goal is often meeting the performance need ata minimum price, rather than achieving higher performance at a higher price.Often, the performance requirement in an embedded application is real-timeexecution. A real-time performance requirement is when a segment of the appli-cation has an absolute maximum execution time. For example, in a digital set-topbox, the time to process each video frame is limited, since the processor mustaccept and process the next frame shortly. In some applications, a more nuancedrequirement exists: the average time for a particular task is constrained as well asthe number of instances when some maximum time is exceeded. Suchapproachessometimes called soft real-timearise when it is possible to occa-sionally miss the time constraint on an event, as long as not too many are missed.Real-time performance tends to be highly application dependent.Two other key characteristics exist in many embedded applications: the needto minimize memory and the need to minimize power. In many embedded appli-cations, the memory can be a substantial portion of the system cost, and it isimportant to optimize memory size in such cases. Sometimes the application isexpected to t totally in the memory on the processor chip; other times the 36. 8 I Chapter One Fundamentals of Computer Designapplication needs to t totally in a small off-chip memory. In any event, theimportance of memory size translates to an emphasis on code size, since data sizeis dictated by the application.Larger memories also mean more power, and optimizing power is often criti-cal in embedded applications. Although the emphasis on low power is frequentlydriven by the use of batteries, the need to use less expensive packagingplasticversus ceramicand the absence of a fan for cooling also limit total power con-sumption. We examine the issue of power in more detail in Section 1.5.Most of this book applies to the design, use, and performance of embeddedprocessors, whether they are off-the-shelf microprocessors or microprocessorcores, which will be assembled with other special-purpose hardware.Indeed, the third edition of this book included examples from embeddedcomputing to illustrate the ideas in every chapter. Alas, most readers found theseexamples unsatisfactory, as the data that drives the quantitative design and evalu-ation of desktop and server computers has not yet been extended well to embed-ded computing (see the challenges with EEMBC, for example, in Section 1.8).Hence, we are left for now with qualitative descriptions, which do not t wellwith the rest of the book. As a result, in this edition we consolidated the embed-ded material into a single appendix. We believe this new appendix (Appendix D)improves the ow of ideas in the text while still allowing readers to see how thediffering requirements affect embedded computing. 1.3Dening Computer ArchitectureThe task the computer designer faces is a complex one: Determine whatattributes are important for a new computer, then design a computer to maximizeperformance while staying within cost, power, and availability constraints. Thistask has many aspects, including instruction set design, functional organization,logic design, and implementation. The implementation may encompass inte-grated circuit design, packaging, power, and cooling. Optimizing the designrequires familiarity with a very wide range of technologies, from compilers andoperating systems to logic design and packaging. In the past, the term computer architecture often referred only to instructionset design. Other aspects of computer design were called implementation, ofteninsinuating that implementation is uninteresting or less challenging. We believe this view is incorrect. The architects or designers job is muchmore than instruction set design, and the technical hurdles in the other aspects ofthe project are likely more challenging than those encountered in instruction setdesign. Well quickly review instruction set architecture before describing thelarger challenges for the computer architect.Instruction Set ArchitectureWe use the term instruction set architecture (ISA) to refer to the actual programmer-visible instruction set in this book. The ISA serves as the boundary between the 37. 1.3 Dening Computer ArchitectureI 9software and hardware. This quick review of ISA will use examples from MIPSand 80x86 to illustrate the seven dimensions of an ISA. Appendices B and J givemore details on MIPS and the 80x86 ISAs.1. Class of ISANearly all ISAs today are classied as general-purpose register architectures, where the operands are either registers or memory locations. The 80x86 has 16 general-purpose registers and 16 that can hold oating- point data, while MIPS has 32 general-purpose and 32 oating-point registers (see Figure 1.4). The two popular versions of this class are register-memory ISAs such as the 80x86, which can access memory as part of many instruc- tions, and load-store ISAs such as MIPS, which can access memory only with load or store instructions. All recent ISAs are load-store.2. Memory addressingVirtually all desktop and server computers, including the 80x86 and MIPS, use byte addressing to access memory operands. Some architectures, like MIPS, require that objects must be aligned. An access to an object of size s bytes at byte address A is aligned if A mod s = 0. (See Figure B.5 on page B-9.) The 80x86 does not require alignment, but accesses are generally faster if operands are aligned.3. Addressing modesIn addition to specifying registers and constant operands, addressing modes specify the address of a memory object. MIPS addressing modes are Register, Immediate (for constants), and Displacement, where a constant offset is added to a register to form the memory address. The 80x86 supports those three plus three variations of displacement: no register (abso- lute), two registers (based indexed with displacement), two registers whereName Number Use Preserved across a call?$zero0The constant value 0 N.A.$at1Assembler temporary No$v0$v1 23 Values for function results and Noexpression evaluation$a0$a3 47 Arguments No$t0$t7815 Temporaries No$s0$s7 1623 Saved temporaries Yes$t8$t9 2425 Temporaries No$k0$k1 2627 Reserved for OS kernelNo$gp 28Global pointerYes$sp 29Stack pointer Yes$fp 30Frame pointer Yes$ra 31Return addressYesFigure 1.4 MIPS registers and usage conventions. In addition to the 32 general-purpose registers (R0R31), MIPS has 32 oating-point registers (F0F31) that can holdeither a 32-bit single-precision number or a 64-bit double-precision number. 38. 10 I Chapter One Fundamentals of Computer Design one register is multiplied by the size of the operand in bytes (based with scaled index and displacement). It has more like the last three, minus the dis- placement eld: register indirect, indexed, and based with scaled index.4. Types and sizes of operandsLike most ISAs, MIPS and 80x86 support operand sizes of 8-bit (ASCII character), 16-bit (Unicode character or half word), 32-bit (integer or word), 64-bit (double word or long integer), and IEEE 754 oating point in 32-bit (single precision) and 64-bit (double pre- cision). The 80x86 also supports 80-bit oating point (extended double precision).5. OperationsThe general categories of operations are data transfer, arith- metic logical, control (discussed next), and oating point. MIPS is a simple and easy-to-pipeline instruction set architecture, and it is representative of the RISC architectures being used in 2006. Figure 1.5 summarizes the MIPS ISA. The 80x86 has a much richer and larger set of operations (see Appendix J).6. Control ow instructionsVirtually all ISAs, including 80x86 and MIPS, support conditional branches, unconditional jumps, procedure calls, and returns. Both use PC-relative addressing, where the branch address is speci- ed by an address eld that is added to the PC. There are some small differ- ences. MIPS conditional branches (BE, BNE, etc.) test the contents of registers, while the 80x86 branches (JE, JNE, etc.) test condition code bits set as side effects of arithmetic/logic operations. MIPS procedure call (JAL) places the return address in a register, while the 80x86 call (CALLF) places the return address on a stack in memory.7. Encoding an ISAThere are two basic choices on encoding: xed length and variable length. All MIPS instructions are 32 bits long, which simplies instruction decoding. Figure 1.6 shows the MIPS instruction formats. The 80x86 encoding is variable length, ranging from 1 to 18 bytes. Variable- length instructions can take less space than xed-length instructions, so a pro- gram compiled for the 80x86 is usually smaller than the same program com- piled for MIPS. Note that choices mentioned above will affect how the instructions are encoded into a binary representation. For example, the num- ber of registers and the number of addressing modes both have a signicant impact on the size of instructions, as the register eld and addressing mode eld can appear many times in a single instruction.The other challenges facing the computer architect beyond ISA design areparticularly acute at the present, when the differences among instruction sets aresmall and when there are distinct application areas. Therefore, starting with thisedition, the bulk of instruction set material beyond this quick review is found inthe appendices (see Appendices B and J).We use a subset of MIPS64 as the example ISA in this book. 39. 1.3Dening Computer ArchitectureI11Instruction type/opcodeInstruction meaningData transfers Move data between registers and memory, or between the integer and FP or special registers; only memory address mode is 16-bit displacement + contents of a GPRLB, LBU, SBLoad byte, load byte unsigned, store byte (to/from integer registers)LH, LHU, SHLoad half word, load half word unsigned, store half word (to/from integer registers)LW, LWU, SWLoad word, load word unsigned, store word (to/from integer registers)LD, SD Load double word, store double word (to/from integer registers)L.S, L.D, S.S, S.D Load SP oat, load DP oat, store SP oat, store DP oatMFC0, MTC0 Copy from/to GPR to/from a special registerMOV.S, MOV.D Copy one SP or DP FP register to another FP registerMFC1, MTC1 Copy 32 bits to/from FP registers from/to integer registersArithmetic/logical Operations on integer or logical data in GPRs; signed arithmetic trap on overowDADD, DADDI, DADDU, DADDIU Add, add immediate (all immediates are 16 bits); signed and unsignedDSUB, DSUBUSubtract; signed and unsignedDMUL, DMULU, DDIV, Multiply and divide, signed and unsigned; multiply-add; all operations take and yieldDDIVU, MADD64-bit valuesAND, ANDIAnd, and immediateOR, ORI, XOR, XORI Or, or immediate, exclusive or, exclusive or immediateLUILoad upper immediate; loads bits 32 to 47 of register with immediate, then sign-extendsDSLL, DSRL, DSRA, DSLLV, Shifts: both immediate (DS__) and variable form (DS__V); shifts are shift left logical,DSRLV, DSRAV right logical, right arithmeticSLT, SLTI, SLTU, SLTIU Set less than, set less than immediate; signed and unsignedControlConditional branches and jumps; PC-relative or through registerBEQZ, BNEZ Branch GPRs equal/not equal to zero; 16-bit offset from PC + 4BEQ, BNE Branch GPR equal/not equal; 16-bit offset from PC + 4BC1T, BC1F Test comparison bit in the FP status register and branch; 16-bit offset from PC + 4MOVN, MOVZ Copy GPR to another GPR if third GPR is negative, zeroJ, JRJumps: 26-bit offset from PC + 4 (J) or target in register (JR)JAL, JALRJump and link: save PC + 4 in R31, target is PC-relative (JAL) or a register (JALR)TRAP Transfer to operating system at a vectored addressERET Return to user code from an exception; restore user modeFloating point FP operations on DP and SP formatsADD.D, ADD.S, ADD.PS Add DP, SP numbers, and pairs of SP numbersSUB.D, SUB.S, SUB.PS Subtract DP, SP numbers, and pairs of SP numbersMUL.D, MUL.S, MUL.PS Multiply DP, SP oating point, and pairs of SP numbersMADD.D, MADD.S, MADD.PSMultiply-add DP, SP numbers, and pairs of SP numbersDIV.D, DIV.S, DIV.PS Divide DP, SP oating point, and pairs of SP numbersCVT._._Convert instructions: CVT.x.y converts from type x to type y, where x and y are L (64-bit integer), W (32-bit integer), D (DP), or S (SP). Both operands are FPRs.C.__.D, C.__.S DP and SP compares: __ = LT,GT,LE,GE,EQ,NE; sets bit in FP status registerFigure 1.5 Subset of the instructions in MIPS64. SP = single precision; DP = double precision. Appendix B givesmuch more detail on MIPS64. For data, the most signicant bit number is 0; least is 63. 40. 12 I Chapter One Fundamentals of Computer Design Basic instruction formats Ropcode rsrt rdshamt funct 3126 25 21 2016 15 11 10 6 5 0 Iopcode rsrt immediate 3126 25 21 2016 15 Jopcodeaddress 3126 25 Floating-point instruction formats FR opcode fmt ft fs fd funct 3126 25 21 2016 15 11 10 6 5 0 FI opcode fmt ft immediate 3126 25 21 2016 15Figure 1.6 MIPS64 instruction set architecture formats. All instructions are 32 bitslong. The R format is for integer register-to-register operations, such as DADDU, DSUBU,and so on. The I format is for data transfers, branches, and immediate instructions, suchas LD, SD, BEQZ, and DADDIs. The J format is for jumps, the FR format for oating pointoperations, and the FI format for oating point branches.The Rest of Computer Architecture: Designing theOrganization and Hardware to Meet Goals andFunctional RequirementsThe implementation of a computer has two components: organization andhardware. The term organization includes the high-level aspects of a computersdesign, such as the memory system, the memory interconnect, and the design ofthe internal processor or CPU (central processing unitwhere arithmetic, logic,branching, and data transfer are implemented). For example, two processors withthe same instruction set architectures but very different organizations are theAMD Opteron 64 and the Intel Pentium 4. Both processors implement the x86instruction set, but they have very different pipeline and cache organizations.Hardware refers to the specics of a computer, including the detailed logicdesign and the packaging technology of the computer. Often a line of computerscontains computers with identical instruction set architectures and nearly identi-cal organizations, but they differ in the detailed hardware implementation. Forexample, the Pentium 4 and the Mobile Pentium 4 are nearly identical, but offerdifferent clock rates and different memory systems, making the Mobile Pentium4 more effective for low-end computers.In this book, the word architecture covers all three aspects of computerdesigninstruction set architecture, organization, and hardware.Computer architects must design a computer to meet functional requirementsas well as price, power, performance, and availability goals. Figure 1.7 summa-rizes requirements to consider in designing a new computer. Often, architects 41. 1.3 Dening Computer Architecture I 13Functional requirementsTypical features required or supportedApplication area Target of computerGeneral-purpose desktopBalanced performance for a range of tasks, including interactive performance for graphics, video, and audio (Ch. 2, 3, 5, App. B)Scientic desktops and servers High-performance oating point and graphics (App. I)Commercial servers Support for databases and transaction processing; enhancements for reliability and availability; support for scalability (Ch. 4, App. B, E)Embedded computing Often requires special support for graphics or video (or other application-specic extension); power limitations and power control may be required (Ch. 2, 3, 5, App. B)Level of software compatibilityDetermines amount of existing software for computerAt programming languageMost exible for designer; need new compiler (Ch. 4, App. B)Object code or binaryInstruction set architecture is completely denedlittle exibilitybut nocompatible investment needed in software or porting programsOperating system requirementsNecessary features to support chosen OS (Ch. 5, App. E)Size of address spaceVery important feature (Ch. 5); may limit applicationsMemory managementRequired for modern OS; may be paged or segmented (Ch. 5)Protection Different OS and application needs: page vs. segment; virtual machines (Ch. 5)StandardsCertain standards may be required by marketplaceFloating point Format and arithmetic: IEEE 754 standard (App. I), special arithmetic for graphics or signal processingI/O interfaces For I/O devices: Serial ATA, Serial Attach SCSI, PCI Express (Ch. 6, App. E)Operating systemsUNIX, Windows, Linux, CISCO IOSNetworks Support required for different networks: Ethernet, Inniband (App. E)Programming languagesLanguages (ANSI C, C++, Java, FORTRAN) affect instruction set (App. B)Figure 1.7 Summary of some of the most important functional requirements an architect faces. The left-handcolumn describes the class of requirement, while the right-hand column gives specic examples. The right-hand col-umn also contains references to chapters and appendices that deal with the specic issues. also must determine what the functional requirements are, which can be a major task. The requirements may be specic features inspired by the market. Applica- tion software often drives the choice of certain functional requirements by deter- mining how the computer will be used. If a large body of software exists for a certain instruction set architecture, the architect may decide that a new computer should implement an existing instruction set. The presence of a large market for a particular class of applications might encourage the designers to incorporate requirements that would make the computer competitive in that market. Many of these requirements and features are examined in depth in later chapters. Architects must also be aware of important trends in both the technology and the use of computers, as such trends not only affect future cost, but also the lon- gevity of an architecture. 42. 14 I Chapter One Fundamentals of Computer Design 1.4 Trends in TechnologyIf an instruction set architecture is to be successful, it must be designed to surviverapid changes in computer technology. After all, a successful new instruction setarchitecture may last decadesfor example, the core of the IBM mainframe hasbeen in use for more than 40 years. An architect must plan for technologychanges that can increase the lifetime of a successful computer.To plan for the evolution of a computer, the designer must be aware of rapidchanges in implementation technology. Four implementation technologies, whichchange at a dramatic pace, are critical to modern implementations:I Integrated circuit logic technologyTransistor density increases by about35% per year, quadrupling in somewhat over four years. Increases in die sizeare less predictable and slower, ranging from 10% to 20% per year. The com-bined effect is a growth rate in transistor count on a chip of about 40% to 55%per year. Device speed scales more slowly, as we discuss below.I Semiconductor DRAM (dynamic random-access memory)Capacityincreases by about 40% per year, doubling roughly every two years.I Magnetic disk technologyPrior to 1990, density increased by about 30%per year, doubling in three years. It rose to 60% per year thereafter, andincreased to 100% per year in 1996. Since 2004, it has dropped back to30% per year. Despite this roller coaster of rates of improvement, disks arestill 50100 times cheaper per bit than DRAM. This technology is central toChapter 6, and we discuss the trends in detail there.I Network technologyNetwork performance depends both on the perfor-mance of switches and on the performance of the transmission system. Wediscuss the trends in networking in Appendix E.These rapidly changing technologies shape the design of a computer that,with speed and technology enhancements, may have a lifetime of ve or moreyears. Even within the span of a single product cycle for a computing system(two years of design and two to three years of production), key technologies suchas DRAM change sufciently that the designer must plan for these changes.Indeed, designers often design for the next technology, knowing that when apr