grid database design -...

Download Grid Database Design - virtualpanic.comvirtualpanic.com/anonymousftplistings/ebooks/COMPUTER_BOOKS/... · Network Design: Management and ... ISBN: 0849316081 Network Security Technologies,

If you can't read please download the document

Upload: truongnhan

Post on 06-Feb-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • AU2800_half titlepage 4/26/05 9:33 AM Page 1

    GridDatabaseDesign

  • AUERBACH PUBLICATIONSwww.auerbach-publications.com

    To Order Call: 1-800-272-7737 Fax: 1-800-374-3401E-mail: [email protected]

    Agent-Based Manufacturing and ControlSystems: New Agile ManufacturingSolutions for Achieving Peak PerformanceMassimo Paolucci and Roberto SacileISBN: 1574443364

    Curing the Patch Management HeadacheFelicia M. NicastroISBN: 0849328543

    Cyber Crime Investigator's Field Guide,Second EditionBruce MiddletonISBN: 0849327687

    Disassembly Modeling for Assembly,Maintenance, Reuse and RecyclingA. J. D. Lambert and Surendra M. GuptaISBN: 1574443348

    The Ethical Hack: A Framework forBusiness Value Penetration TestingJames S. TillerISBN: 084931609X

    Fundamentals of DSL TechnologyPhilip Golden, Herve Dedieu,and Krista JacobsenISBN: 0849319137

    The HIPAA Program Reference HandbookRoss LeoISBN: 0849322111

    Implementing the IT Balanced Scorecard:Aligning IT with Corporate StrategyJessica KeyesISBN: 0849326214

    Information Security FundamentalsThomas R. Peltier, Justin Peltier,and John A. BlackleyISBN: 0849319579

    Information Security ManagementHandbook, Fifth Edition, Volume 2Harold F. Tipton and Micki KrauseISBN: 0849332109

    Introduction to Managementof Reverse Logistics and ClosedLoop Supply Chain ProcessesDonald F. BlumbergISBN: 1574443607

    Maximizing ROI on Software DevelopmentVijay SikkaISBN: 0849323126

    Mobile Computing HandbookImad Mahgoub and Mohammad IlyasISBN: 0849319714

    MPLS for MetropolitanArea NetworksNam-Kee TanISBN: 084932212X

    Multimedia Security HandbookBorko Furht and Darko KirovskiISBN: 0849327733

    Network Design: Management andTechnical Perspectives, Second EditionTeresa C. PiliourasISBN: 0849316081

    Network Security Technologies,Second EditionKwok T. FungISBN: 0849330270

    Outsourcing Software DevelopmentOffshore: Making It WorkTandy GoldISBN: 0849319439

    Quality Management Systems:A Handbook for ProductDevelopment OrganizationsVivek NandaISBN: 1574443526

    A Practical Guide to SecurityAssessmentsSudhanshu KairabISBN: 0849317061

    The Real-Time EnterpriseDimitris N. ChorafasISBN: 0849327776

    Software Testing and ContinuousQuality Improvement,Second EditionWilliam E. LewisISBN: 0849325242

    Supply Chain Architecture:A Blueprint for Networking the Flowof Material, Information, and CashWilliam T. WalkerISBN: 1574443577

    The Windows Serial PortProgramming HandbookYing BaiISBN: 0849322138

    OTHER AUERBACH PUBLICATIONS

  • AU2800_titlepage 4/26/05 9:32 AM Page 1

    Boca Raton London New York Singapore

    GridDatabaseDesign

    April J. Wells

  • Published in 2005 byAuerbach Publications Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300Boca Raton, FL 33487-2742

    2005 by Taylor & Francis Group, LLCAuerbach is an imprint of Taylor & Francis Group

    No claim to original U.S. Government worksPrinted in the United States of America on acid-free paper10 9 8 7 6 5 4 3 2 1

    International Standard Book Number-10: 0-8493-2800-4 (Hardcover) International Standard Book Number-13: 978-0-8493-2800-8 (Hardcover) Library of Congress Card Number 2005040962

    This book contains information obtained from authentic and highly regarded sources. Reprinted material isquoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable effortshave been made to publish reliable data and information, but the author and the publisher cannot assumeresponsibility for the validity of all materials or for the consequences of their use.

    No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic,mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, andrecording, or in any information storage or retrieval system, without written permission from the publishers.

    For permission to photocopy or use material electronically from this work, please access www.copyright.com(http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC) 222 Rosewood Drive,Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registrationfor a variety of users. For organizations that have been granted a photocopy license by the CCC, a separatesystem of payment has been arranged.

    Trademark Notice:

    Product or corporate names may be trademarks or registered trademarks, and are used onlyfor identification and explanation without intent to infringe.

    Library of Congress Cataloging-in-Publication Data

    Wells, April J.Grid database design / April J. Wells.

    p. cm.Includes bibliographical references and index.ISBN 0-8493-2800-4 (alk. paper)1. Computational grids (Computer systems) 2. Database design. I. Title.

    QA76.9C58W45 2005

    004'.36--dc22 2005040962

    Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com

    and the Auerbach Publications Web site at http://www.auerbach-publications.com

    Taylor & Francis Group is the Academic Division of T&F Informa plc.

  • v

    Preface

    Computing has come a long way since our earliest beginnings. Many ofus have seen complete revisions of computing technology in our lifetimes.I am not that old, and I have seen punch cards and Cray supercomputers,numbered Basic on an Apple IIe, and highly structured C. Nearly all ofus can remember when the World Wide Web began its popularity andwhen there were only a few pictures available in a nearly all textualmedium. Look at where we are now. Streaming video, MP3s, games, andchat are a part of many thousands of lives, from the youngest childrenjust learning to mouse and type, to senior citizens staying in touch andstaying active and involved regardless of their locations. The Internet andthe World Wide Web have become a part of many households daily livesin one way or another. They are often taken for granted, and highlymissed when they are unavailable. There are Internet cafs springing upin towns all over the United States, and even major cruise lines have themavailable for not only the passengers, but the crew as well.

    We are now standing on the edge of yet another paradigm shift, Gridcomputing. Grid computing, it is suggested, may even be bigger than theInternet and World Wide Web, and for most of us, the adventure is justbeginning. For many of us, especially those of us who grew up withmainframes and stand-alone systems getting bigger and bigger, the newmodel is a big change. But it is also an exciting change where willwe be in the next five years?

    Goals of This Book

    My main goal in writing this book is to provide you with information onthe Grid, its beginning, background, and components, and to give youan idea of how databases will be designed to fit into this new computing

  • vi

    Grid Database Design

    model. Many of the ideas and concepts are not new, but will have to beaddressed in the context of the new model, with many different consid-erations to be included.

    Many people in academia and research already know about the Gridand the power that it can bring to computing, but many in business arejust beginning to hear the rumblings and need to be made aware of waysin which the new concepts could potentially impact them and their waysof computing in the foreseeable future.

    Audience

    The proposed audience is those who are looking at Grid computing asan option, or those who want to learn more about the emerging technol-ogy. When I started out, I wanted to let other database administrators inon what might be coming in the future, and what they could expect thatfuture to look like. However, I believe that the audience is even biggerand should encompass not only database administrators, but systemsadministrators and programmers and executives anyone hearing therumblings and wanting to know more.

    The background in Section 1 is designed as just that, background. Ifyou have a grasp on how we got to where we are now, you may wantto read it for the entertainment value, the trip down memory lane, so tospeak, or you may just want to skip large portions of it as irrelevant towhere you are now.

    Section 2 starts the meat of the book, introducing the Grid and itscomponents and important concepts and ideas, and Section 3 delves intothe part that databases will play in the new paradigm and how thosedatabases need to act to play nicely together.

    Structure of the Book

    This book is broken down into three sections and twelve chapters, asfollows:

    Section 1

    In Section 1 we lay the groundwork. We cover some background oncomputing and how we got to where we are. We are, in many placesand situations, already taking baby steps toward integration of the newparadigm into the existing framework.

  • Preface

    vii

    Chapter 1

    Chapter 1 will cover computing history, how we got here, the majormilestones for computing, and the groundwork for the Grid, where weare launching the future today. It includes information on the beginningsof networking and the Internet, as it is the model on which many peopleare defining the interaction with the Grid.

    Chapter 2

    Chapter 2 will provide definitions of where much of the Grid is now, themajor players, and many of the components that make up the Grid system.

    Chapter 3

    Chapter 3 is sort of the proof of the pudding. It provides a partial list ofthose commercial and academic ventures that have been the early adoptersof Grid and have started to realize its potential. We have a long way togo before anyone can hope to realize anything as ubiquitous as commoditycomputing, but we have come a long way from our beginnings, too.

    Section 2

    Section 2 goes into what is entailed in building a Grid. There are a varietyof ideas and components that are involved in the definition, concepts thatyou need to have your arms around before stepping off of the precipicesand flying into the future.

    Chapter 4

    Chapter 4 looks at the security concerns and some of the means that canbe used to address these concerns. As the Grid continues to emerge, sowill the security concerns and the security measures developed to addressthose concerns.

    Chapter 5

    Chapter 5 looks at the underlying hardware on which the Grid runs. Withthe definition of the Grid being that it can run on nearly anything, fromPC to Supercomputer, the hardware is hard to define, but there areemerging components being built today specifically with the goal ofenabling the new technology.

  • viii

    Grid Database Design

    Chapter 6

    Metadata is important in any large system; the Grid is definitely the rule,rather than the exception. Chapter 6 will look at the role that metadataplays and will need to play in the Grid as it continues to evolve.

    Chapter 7

    What are the business and technology drivers that are pushing the Gridtoday and will continue to push it into the future? Chapter 7 looks at notonly the technological reasons for implementing a Grid environment (andlet us face it, the best reason for many technologists is simply because itis really cool), but also the business drivers that will help to allow thenew technology to make its inroads into the organization.

    Section 3

    Section 3 delves into the details of databases in a Grid environment.Databases have evolved on their own over the last several decades, andcontinue to redefine themselves depending on the organization in whichthey find themselves. The Grid will add environmental impact to theevolution and will help to steer the direction that that evolution will take.

    Chapter 8

    Chapter 8 will provide us with an introduction to databases, particularlyrelational database, which are where some of the greatest gains can bemade in the Grid environment. We will look at the terminology, themathematical background, and some of the differences in different rela-tional models.

    Chapter 9

    Chapter 9 will look at parallelism in database design and how parallelizeddatabases can be applied in the Grid environment.

    Chapter 10

    Chapter 10 will take parallelism a step further and look at distributeddatabases and the ramifications of distributing in a highly distributed Gridenvironment.

  • Preface

    ix

    Chapter 11

    Finally, Chapter 11 will look at the interaction with the database from theapplications and end users. We will look at design issues and issues withinteracting with the different ideas of database design in the environment.

    Chapter 12

    Chapter 12 provides a summary of the previous chapters.

    We are standing on the edge of a new era. Let the adventure begin.

  • xi

    Acknowledgments

    My heartiest thanks go to everyone who contributed to my ability to bringthis book to completion. Thanks especially to John Wyzalek from AuerbachPublications for his support and faith that I could do it. His support hasbeen invaluable.

    As always, my deepest gratitude goes to Larry, Adam, and Amandyafor being there for me, standing beside me, and putting up with the longhours shut away and the weekends that we did not get to do a lot offun things because I was writing. Thank you for being there, for under-standing, and for rescuing me when I needed rescuing.

  • xiii

    Contents

    SECTION I: IN THE BEGINNING

    1

    History ........................................................................................... 3

    Computing 3Early Mechanical Devices 3Computing Machines 11The 1960s 17The 1970s 22The 1980s 26The 1990s 30The 21st Century 33

    2

    Definition and Components ..................................................... 35

    P2P 37Napster 38Gnutella 38

    Types 40Computational Grid 40

    Distributed Servers and Computation Sites 41Remote Instrumentation 41Data Archives 42Networks 43Portal (User Interface) 43Security 44Broker 45User Profile 45Searching for Resources 46Batch Job Submittal 46Credential Repository 48Scheduler 48Data Management 49

    Data Grid 50

  • xiv

    Grid Database Design

    Storage Mechanism Neutrality 51Policy Neutrality 51Compatibility with Other Grid Infrastructure 51Storage Systems 51

    Access or Collaboration Grid 52Large-Format Displays 52Presentation Environments 53Interfaces to Grid Middleware 53Others 54

    Scavenging Grid 54Grid Scope 56

    Project Grid, Departmental Grid, or Cluster Grid 56Enterprise Grid or Campus Grid 58Global Grid 58

    3

    Early Adopters ............................................................................ 59

    Computational and Experimental Scientists 59Bioinformatics 60Corporations 60Academia 60

    University of Houston 61University of Ulm Germany 61The White Rose University Consortium 62

    Science 62Particle Physics 62

    Industries 63Gaming 63Financial 65

    Wachovia 66RBC Insurance 66Charles Schwab 66

    Life Science 67The American Diabetes Association 67North Carolina Genomics and Bioinformatics Consortium 69Spains Institute of Cancer Research 69

    Petroleum 69Royal Dutch Shell 69

    Utilities 70Kansai Electric Power Co., Inc. 70

    Manufacturing 70Ford Motor Company 70Saab Automobile 71Motorola 71

    Government 71NASA 72U.S. Department of Defense 72European Union 73

  • Contents

    xv

    Flemish Government 74Benefits 75

    Virtualization 75

    SECTION II: THE PARTS AND PIECES

    4

    Security ........................................................................................ 83

    Security 83Authentication 84

    Reciprocity of Identification 85Computational Efficiency 85Communication Efficiency 86Third-Party Real-Time Involvement 86Nature of Security 86Secret Storage 87Passwords 87Private Key 88Block Ciphers 89Stream Ciphers 89Public Key 91Digital Signature 96

    Authorization 101Delegation of Identity 102Delegation of Authority 103Accounting 103Audit 103Access Control 104

    DAC 104MAC 105Allow and Deny 106Satisfy 107Role-Based Access 107

    Usage Control 108Cryptography 108

    Block Cipher 109Stream Ciphers 110

    Linear Feedback Shift Register 110One-Time Pad 111Shift Register Cascades 111Shrinking Generators 112Accountability 112

    Data Integrity 115Attenuation 116Impulse Noise 116Cross Talk 116Jitter 117Delay Distortion 117

  • xvi

    Grid Database Design

    Capability Resource Management 118Database Security 121

    Inference 121Server Security 124Database Connections 125Table Access Control 125Restricting Database Access 130DBMS Specific 131

    5

    The Har dwar e ........................................................................... 133

    Computers 133Blade Servers 138

    Storage 140I/O Subsystems 143Underlying Network 143Operating Systems 144Visualization Environments 144People 145

    6

    Metadata .................................................................................... 147

    Grid Metadata 152Data Metadata 153

    Physical Metadata 154Domain-Independent Metadata 154Content-Dependent Metadata 154Content-Independent Metadata 155Domain-Specific Metadata 155Ontology 155User Metadata 155

    Application Metadata 156External Metadata 156Logical Metadata 157User 157Data 158Resources 158Metadata Services 158

    Context 158Structure 158Define the Data Granularity 159Database 159

    Access 159Metadata Formatting 160

    XML 161What Is XML? 161

    Application 168MCAT 169Conclusion 170

  • Contents

    xvii

    7

    Drivers ....................................................................................... 171

    Business 174Accelerated Time to Results 174Operational Flexibility 174Leverage Existing Capital Investments 175Better Resource Utilization 176Enhanced Productivity 176Better Collaboration 178Scalability 178ROI 179Reallocation of Resources 180TCO 181

    Technology 183Infrastructure Optimization 183Increase Access to Data and Collaboration 183Resilient, Highly Available Infrastructure 183Make Most Efficient Use of Resources 184Services Oriented 185Batch Oriented 186Object Oriented 186Supply and Demand 186Open Standards 187Corporate IT Spending Budgets 187Cost, Complexity, and Opportunity 188Better, Stronger, Faster 190Efficiency Initiatives 191

    SECTION III: DATABASES IN THE GRID

    8

    Intr oducing Databases ............................................................. 195

    Databases 195Relational Database 196

    Tuples 197Attributes 198Entities 198

    Relationship 198Relational Algebra 198

    Union 198Intersection 198Difference 199Cartesian Product 199Select 199Project 200Join 200

    Relational Calculus 200Object Database 202

    Architecture Differences between Relational and Object Databases 203

  • xviii

    Grid Database Design

    Object Relational Database 203SQL 205

    Select 206Where 206And/Or 206In 207Between 207Like 207Insert 207Update 208Delete 208

    Database 209Data Model 209Schema 209Relational Model 209Anomalies 209

    Insert Anomaly 210Deletion Anomaly 210Update Anomaly 210

    9

    Parallel Database ...................................................................... 213

    Data Independence 213Parallel Databases 214

    Start-Up 216Interference 216Skew 217Attribute Data Skew 217Tuple Placement Skew 217Selectivity Skew 217Redistribution Skew 217Join Product Skew 218

    Multiprocessor Architecture Alternatives 218Shared Everything 218Shared Disk 219Shared Nothing (Message Passing) 220Hybrid Architecture 221Hierarchical Cluster 221NUMA 222

    Disadvantages of Parallelism 222Database Parallelization Techniques 224

    Data Placement 224Parallel Data Processing 224Parallel Query Optimization 224Transaction Management 224Parallelism Versus Fragmentation 224

    Round-Robin 225Hash Partitioning 225

  • Contents

    xix

    Range Partitioning 225Horizontal Data Partitioning 226Replicated Data Partitioning 226Chained Partitioning 227Placement Directory 227Index Partitioning 228Partitioning Data 228

    Data-Based Parallelism 228Interoperation 228Intraoperation 229Pipeline Parallelism 229Partitioned Parallelism 230

    Parallel Data Flow Approach 231Retrieval 232Point Query 232Range Query 232Inverse Range Query 232

    Parallelizing Relational Operators 233Operator Replication 233Merge Operators 233Parallel Sorting 233Parallel Aggregation 234Parallel Joins 234

    Data Skew 237Load Balancing Algorithm 237Dynamic Load Balancing 238

    10

    Distributing Databases ............................................................. 241

    Advantages 245Disadvantages 245Rules for Distributed Databases 246Fragmentation 248

    Completeness 249Reconstruction 249Disjointedness 249Transparency 249

    Distribution Transparency 250Fragmentation Transparency 250Location Transparency 250Replication Transparency 250Local Mapping Transparency 251Naming Transparency 251Transaction Transparency 251Performance Transparency 252

    Vertical Fragmentation 252Horizontal Fragmentation 254Hybrid 255

  • xx

    Grid Database Design

    Replication 255Metadata 256Distributed Database Failures 257

    Failure of a Site 257Loss of Messages 257Failure of a Communication Link 257Network Partition 257

    Data Access 258

    11

    Data Synchr onization .............................................................. 261

    Concurrency Control 262Distributed Deadlock 262

    Database Deadlocks 264Multiple-Copy Consistency 265

    Pessimistic Concurrency Control 266Two-Phase Commit Protocol 267Time Stamp Ordering 267

    Optimistic Concurrency Control 268Heterogeneous Concurrency Control 270Distributed Serializability 271Query Processing 271

    Query Transformations 271Transaction Processing 271

    Heterogeneity 272

    12

    Conclusion ................................................................................ 275

    Index .................................................................................................. 277

  • I

    IN THE

    BEGINNING

    The adventure begins. We will start our adventure with the history ofcomputing (not just computers). Computing in one fashion or another hasbeen around as long as man. This section looks at those beginnings andtakes a trip through time to the present. It follows computing as its serversand processors grew bigger and bigger, through the introduction of theInternet, and through the rise of the supercomputer.

    We will then take those advances and look at the beginnings ofdistributed computing, first looking at peer-to-peer processing, then at thebeginnings of the Grid as it is becoming defined. We look at the differentkinds of Grids and how the different definitions can be combined to playtogether. Regardless of what you want to accomplish, there is a Grid thatis likely to fill the need. There are even Grids that include the mostoverlooked resource that a company has, its intellectual capital.

    Finally, we will look at others who have stood where many standtoday, on the edge of deciding if they really want to make the step outof the known and into the future with the implementation of the Gridand its new concepts in computing.

    This background section will bring you up to speed to where we findourselves today. Many will skip or skim the material, others will enjoythe walk down memory lane, and others will find it very educationalwalking through these pages of the first section.

    Enjoy your adventure.

  • 3

    Chapter 1

    History

    In pioneer days they used oxen for heavy pulling, and whenone ox couldnt budge a log, they didnt try to grow a largerox. We shouldnt be trying for bigger computers, but for moresystems of computers.

    Rear Admiral Grace Murray Hopper

    Computing

    Computing has become synonymous with mechanical computing and thePC, mainframe, midrange, supercomputers, servers, and other modernviews on what is computing, but computers and computing have a richhistory.

    Early Mechanical Devices

    The very first counting device was (and still is) the very first one we usewhen starting to deal with the concept of numbers and calculations, thehuman hand with its remarkable fingers (and occasionally, for those biggernumbers, the human foot and its toes). Even before the formal conceptof numbers was conceived, there was the need to determine amountsand to keep track of time. Keeping track of numbers, before numberswere numbers, was something that people wanted to do. When the volume

  • 4

    Grid Database Design

    of things to be counted grew too large to be determined by the amountof personal fingers and toes (or by the additional available fingers andtoes of people close by), whatever was readily at hand was used. Pebbles,sticks, and other natural objects were among the first things to extend thecountability and calculability of things. This idea can be equally observedin young children today in counting beads, beans, and cereal.

    People existing in early civilizations needed ways not only to countthings, but also to allow merchants to calculate the amounts to be chargedfor goods that were traded and sold. This was still before the formalconcept of numbers was a defined thing. Counting devices were usedthen to determine these everyday calculations.

    One of the very first mechanical computational aids that man used inhistory was the counting board, or the early abacus. The abacus (Figure1.1), a simple counting aid, was probably invented sometime in the fourthcentury

    B

    .

    C

    . The counting board, the precursor to what we think of todayas the abacus, was simply a piece of wood or a simple piece of stonewith carved, etched, or painted lines on the surface between which beadsor pebbles would have been moved. The abacus was originally made ofwood with a frame that held rods with freely sliding beads mounted onthe rods. These would have simply been mechanical aids to counting,not counting devices themselves, and the person operating these aids stillhad to perform the calculations in his or her head. The device was simplya tool to assist in keeping track of where in the process of calculationthe person was, by visually tracking carries and sums.

    Arabic numerals (for example, the numbers we recognize today as 1,2, 3, 4, 5 ) were first introduced to Europe around the eighth century

    A

    .

    D

    ., although Roman numerals (I, II, III, IV, V ) remained in heavy usein some parts of Europe until as late as the late 17th century

    A

    .

    D

    . and areoften still used today in certain areas. Although math classes taught Roman

    Figure 1.1 The abacus. (From http://www.etedeschi.ndirect.co.uk/sale/picts/ abacus.jpg.)

  • History

    5

    numerals even as late as the 1970s, many of us probably learned to useour Roman numerals for the primary purpose of creating outlines forreports in school. With the extensive use of PCs in nearly all levels ofeducation today, these outlining exercises may be becoming a lost art.The Arabic number system was likely the first number system to introducethe concepts of zero and the concept of fixed places for tens, hundreds,thousands, etc. Arabic numbers went a long way toward helping insimplifying mathematical calculations.

    In 1622, the slide rule, an engineering staple for centuries, was inventedby William Oughtred in England, and joined the abacus as one of themechanical devices used to assist people with arithmetic calculations.

    Wilhelm Schickard, a professor at the University of Tubingen in Ger-many in 1632, could be credited with building one of the very firstmechanical calculators. This initial foray into mechanically assisted calcu-lation could work with six digits and could carry digits across columns.Although this initial calculator worked, and was the first device to calculatenumbers for people, rather than simply being an aid to their calculatingthe numbers themselves, it never made it beyond the prototype stage.

    Blaise Pascal, noted mathematician and scientist, in 1642 built yetanother mechanical calculator, called the Pascaline. Seen using his machinein Figure 1.2, Pascal was one of the few to actually make use of his noveldevice. This mechanical adding machine, with the capacity for eight digits,made use of the users hand turning the gear (later, people improving onthe design added a crank to make turning easier) to carry out thecalculations. In Pascals system, a one-tooth gear (the ones place) engagedits tooth with the teeth in a gear, with ten teeth each time it revolved.The result was that the one-tooth gear revolved 10 times for every tooth,and 100 times for every full revolution of the ten-tooth gear. This is thesame basic principle as the original odometer (the mechanical mechanismused for counting the number of miles, or kilometers, that a car hastraveled), in the years before odometers were computerized. This Pascalinecalculator not only had trouble carrying, but it also had gears that tendedto jam. Because Pascal was the only person who was able to make repairsto the machine, breakage was a time-consuming condition to rectify andwas part of the reasons that the Pascaline would have cost more than thesalaries of all of the people it replaced. But it was proof that it could bedone.

    Gottfried Leibniz, in 1673, built a mechanical calculating machine thatnot only added and subtracted (the hard limits of the initial machines),but also multiplied and divided.

    Although not a direct advancement in computing and calculatingmachines, the discovery, in 1780, of electricity by Benjamin Franklin hasto be included in the important developments of computing history.

  • 6

    Grid Database Design

    Although steam was effective in driving the early machines, and brute-force man power was also an option, electricity would prove to be farmore efficient than any of the alternatives.

    In 1805 Joseph-Marie Jacquard invented an automatic loom that wascontrolled by punch cards. Although this was not a true computingadvance, it proved to have implications in the programming of earlycomputing machines.

    The early 1820s saw the conception of a difference engine by CharlesBabbage (Figure 1.3). Although this difference engine (Figure 1.4) wasnever actually built past the prototype stage (although the British govern-ment, after seeing the 1822 prototype, assisted in working toward itscompletion starting in 1823), it would have been a massive, steam-powered, mechanical calculator. It would have been a machine with afixed instruction program used to print out astronomical tables. Babbage

    Figure 1.2 Pascal and the Pascaline. (From http://www.thocp.net/hardware/ pascaline.htm.)

  • History

    7

    Figure 1.3 Charles Babbage. (From http://www.math.yorku.ca/SCS/Gallery/ images/portraits/babbage.jpg.)

    Figure 1.4 The difference engine. (From http://www.weller.to/his/img/babbage. jpg.)

  • 8

    Grid Database Design

    attempted to build his difference engine over the course of the next 20years only to see the project cancelled in 1842 by the British government.

    In 1833, Babbage conceived his next idea, the analytical engine. Theanalytical engine would be a mechanical computer that could be used tosolve any mathematical problem. A real parallel decimal computer, oper-ating on words of 50 decimals, the analytical engine was capable ofconditional control, built-in operations, and allowed for the instructionsin the computer to be executed in a specific, rather than numerical, order.It was able to store 1000 of the 50-decimal words. Using punch cards,strikingly similar to those used in the Jacquard loom, it could performsimple conditional operations. Based on his realization in early 1810 thatmany longer computations consisted simply of smaller operations thatwere regularly repeated, Babbage designed the analytical engine to dothese operations automatically.

    Augusta Ada Byron, the countess of Lovelace (Figure 1.5), for whomthe Ada programming language would be named, met Babbage in 1833and described in detail his analytic engine as a machine that weaves

    Figure 1.5 Augusta Ada Byron, the countess of Lovelace. (From http://www.uni-bielefeld.de:8081/paedagogik/Seminare/moeller02/3frauen/Bilder/Ada%20 Lovelace.jpg.)

  • History

    9

    algebraic patterns in the same way that the Jacquard loom weaved intricatepatterns of leaves and flowers. Her published analysis provides our bestrecord of the programming of the analytical engine and outlines thefundamentals of computer programming, data analysis, looping structures,and memory addressing.

    While Tomas of Colmar was developing the first successful commercialcalculator, George Boole, in 1854, published The Mathematical Analysisof Logic. This work used the binary system that has since become knownas Boolean algebra.

    Another advancement in technology that is not directly related tocomputers and computing, but that had a tremendous impact on thesharing of information, is the invention of the telephone in 1876 byAlexander Graham Bell. Without it, the future invention of the modemwould have been impossible, and the early Internet (ARPANet) wouldhave been highly unlikely.

    A giant step toward automated computation was introduced by HermanHollerith in 1890 while working for the U.S. Census Bureau. He appliedfor a patent for his machine in 1884 and had it granted in 1889. TheHollerith device could read census information that was punched ontopunch cards. Ironically, Hollerith did not get the idea to use punch cardsfrom the work of Babbage, but from watching a train conductor punchtickets. As a result of Holleriths invention, reading errors in the censuswere greatly reduced, workflow and throughput were increased, and theavailable memory of a computer would be virtually limitless, boundedonly by the size of the stack of cards. More importantly, different problems,and different kinds of problems, could be stored on different batches ofcards and these different batches (the very first use of batch processing?)worked on as needed. The Hollerith tabulator ended up becoming sosuccessful that he ultimately started his own firm, a business designed tomarket his device. Holleriths company (the Tabulating Machine Com-pany), founded in 1896, eventually became (in 1924) known as Interna-tional Business Machines (IBM).

    Holleriths original tabulating machine, though, did have its limitations.Its use was strictly limited to tabulation, although tabulation of nearly anysort. The punched cards that he utilized could not be used to direct morecomplex computations than these simple tabulations.

    Nikola Tesla, a Yugoslavian working for Thomas Edison, in 1903patented electrical logic circuits called gates or switches.

    American physicist Lee De Forest invented in 1906 the vacuum tube,the invention that was to be used for decades in almost all computersand calculating machines, including ENIAC (Figure 1.6), Harvard Mark I,and Collosius, which we will look at shortly. The vacuum tube worked,basically, by using large amounts of electricity to heat a filament inside

  • 10 Grid Database Design

    the vacuum tube until the filament glowed cherry red, resulting in therelease of electrons into the tube. The electrons released in this mannercould then be controlled by other elements within the tube. De Forestsoriginal device was called a triode, and the flow control of electrons wasto or through a positively charged plate inside the tube. A zero would,in these triodes, be represented by the absence of an electron current tothe plate. The presence of a small but detectable current to the platerepresented a 1. These vacuum tubes were inefficient, requiring a greatdeal of space not only for the tubes themselves, but also for the coolingmechanism for them and the room in which they were located, and theyneeded to be replaced often.

    Ever evolutionary, technology saw yet another advancement in 1925,when Vannevar Bush built an analog calculator, called the differentialanalyzer, at MIT.

    In 1928, Russian immigrant Vladimir Zworykin invented the cathoderay tube (CRT). This invention would go on to be the basis for the firstmonitors. In fact, this is what my first programming teacher taught us thatthe monitor that graced the Apple IIe was called.

    In 1941, German Konrad Zuse, who had previously developed severalcalculating machines, released the first programmable computer that wasdesigned to solve complex engineering equations. This machine, calledthe Z3, made use of strips of old, discarded movie films as its control

    Figure 1.6 ENIAC. (From http://ei.cs.vt.edu/~history/ENIAC.2.GIF.)

  • History 11

    mechanism. Zuses computer was the first machine to work on the binarysystem, as opposed to the more familiar decimal system.

    The ones and zeros in a punch card have two states: a hole or nohole. If the card reader read a hole, it was considered to be a 1, and ifno hole was present, it was a zero. This works admirably well in repre-senting things in a binary system, and this is one of the reasons thatpunch cards and card readers remained in use for so long. This discoveryof binary representation, as we all know, was going to prove importantin the future design of computers.

    British mathematician Alan M. Turing in 1936, while at PrincetonUniversity, adapted the idea of an algorithm to the computation of func-tions. Turings machine was an attempt to convey the idea of a compu-tational machine capable of computing any calculable function. Hisconceptual machine appears to be more similar in concept to a softwareprogram than to a piece of hardware or hardware component. Turing,along with Alonzo Church, is further credited with founding the branchof mathematical theory that we now know as recursive function theory.

    In 1936, Turing also wrote On Computable Numbers, a paper in whichhe described a hypothetical device that foresaw programmable computers.Turings imaginary idea, a Turing machine, would be designed to performstructured, logical operations. It would be able to read, write, and erasethose symbols that were written on an infinitely long paper tape. Thetype of machine that Turing described would stop at each step in acomputation and match its current state against a finite table of possiblenext instructions to determine the next step in the operation that it wouldtake. This design would come to be known as a finite state machine.

    It was not Turings purpose to invent a computer. Rather, he wasattempting to describe problems that can be solved logically. Although itwas not his intention to describe a computer, his ideas can be seen inmany of the characteristics of the computers that were to follow. Forexample, the endless paper tape could be likened to RAM, to which themachine can read, write, and erase information.

    Computing MachinesComputing and computers, as we think about them today, can be traceddirectly back to the Harvard Mark I and Colossus. These two computersare generally considered to be the first generation of computers. First-generation computers were typically based around wired circuits contain-ing vacuum valves and used punched cards as the primary storagemedium. Although nonvolatile, this medium was fraught with problems,including the problems encountered when the order of the cards was

  • 12 Grid Database Design

    changed and the problem of a paper punch card and moisture andbecoming bent or folded (the first use of do no bend, fold, spindle, ormutilate). Colossus was an electronic computer built at the University ofManchester in Britain in 1943 by M.H.A. Neuman and Tommy Flowersand was designed by Alan Turing with the sole purpose of cracking theGerman coding system, the Lorenz cipher. The Harvard Mark I (developedby Howard Aiken, Grace Hopper, and IBM in 1939 and first demonstratedin 1944) was designed more as a general-purpose, programmable com-puter, and was built at Harvard University with the primary backing ofIBM. Figure 1.7 is a picture of the Mark I and Figure 1.8 shows its creators.Able to handle 23-decimal-place numbers (or words) and able to performall four arithmetic operations, as well as having special built-in programsto allow it to handle logarithms and other trigonometric functions, theMark I (originally controlled with a prepunched paper tape) was 51 feetlong, 8 feet high, had 500 miles of wiring, and had one major drawback.The paper tape had no provision for transfer of control or branching.Although it was not the be all and end all in respect of speed (it tookthree to five seconds for a single multiplication operation), it was able to

    Figure 1.7 Mark I. (From http://inventors.about.com.)

    Figure 1.8 Grace Hopper and Howard Aiken. (From http://inventors.about. com.)

  • History 13

    do highly complex mathematical operations without human intervention.The Mark I remained in use at Harvard until 1959 despite other machinessurpassing it in performance, and it provided many vital calculations forthe Navy in World War II.

    Aiken continued working with IBM and the Navy, improving on hisdesign, and followed the Harvard Mark I with the building of the 1942concept, the Harvard Mark II. A relay-based computer that would be theforerunner to the ENIAC, the Mark II was finished in 1947. Aiken devel-oped a series of four computers while working in conjunction with IBMand the Navy, but the Mark II had its distinction in the series as a discoverythat would prove to be more widely remembered than any of the physicalmachines on which he and his team worked. On September 9, 1945, whileworking at Harvard University on the Mark II Aiken Relay Calculator, thenLTJG (lieutenant junior grade) Grace Murray was attempting to determinethe cause of a malfunction. While testing the Mark II, she discovered amoth trapped between the points at Relay 70, Panel F. The operatorsremoved the moth and affixed it to the computer log, with the entry:First actual case of bug being found. That event was henceforth referredto as the operators having debugged the machine, thus introducing thephrase and concept for posterity: debugging a computer program.

    Credited with discovering the first computer bug in 1945, perhapsGrace Murray Hoppers best-known and most frequently used contributionto computing was her invention, the compiler, in the early 1950s. Thecompiler is an intermediate program that translates English-like languageinstructions into the language that is understood by the target computer.She claimed that the invention was precipitated by the fact that she waslazy and ultimately hoped that the programmer would be able to returnto being a mathematician.

    Following closely, in 1946, was the first-generation, general-purposegiant Electronic Numerical Integrator and Computer (ENIAC). Built byJohn W. Mauchly and J. Persper Eckert at the University of Pennsylvania,ENIAC was a behemoth. ENIAC was capable of performing over 100,000calculations per second (a giant leap from the one multiplication operationtaking five seconds to complete), differentiating a numbers sign, compar-ing for equality, making use of the logical and and the logical or, andstoring a remarkable 20 ten-digit numbers with no central memory unit.Programming of the ENIAC was accomplished by manually varying theswitches and cable connections.

    ENIAC used a word of ten decimal digits instead of the previouslyused binary. The executable instructions, its core programs, were theseparate units of ENIAC, plugged together to form a route through themachine for the flow of computations. The path of connections had to beredone for each different problem. Although, if you stretch the imagination,

  • 14 Grid Database Design

    this made ENIAC programmable, the wire-it-yourself way of programmingwas very inconvenient, though highly efficient for those programs forwhich ENIAC was designed, and was in productive use from 1946 to 1955.

    ENIAC used over 18,000 vacuum tubes, making it the very first machineto use over 2000. Because of the heat generated by the use of all of thosevacuum tubes, ENIAC, along with the machinery required to keep thecool, took up over 1800 square feet of floor space, 167 square meters.That is bigger than the available floor space in many homes. Weighing30 tons and containing over 18,000 electronic vacuum valves, 1500 relays,and hundreds of thousands of resistors, capacitors, and inductors, ENIACcost well over $486,000 to build.

    ENIAC was generally acknowledged as being the very first successfulhigh-speed electronic digital computer (EDC).

    In 1947,Walter Brattain built the next major invention on the path tothe computers of today, the transistor. Originally nearly a half inch high,the point contact transistor was the predecessor to the transistors thatgrace todays computers (now so small that 7 million or more can fit ona single computer chip). These transistors would replace the far lessefficient and less reliable valves and vacuum tubes and would pave theway for smaller, more inexpensive radios and other electronics, as wellas being a boon to what would become the commercial computer industry.Transistorized computers are commonly referred to as second-generationcomputers and are the computers that dominated the government anduniversities in the late 1950s and 1960s. Because of the size, complexity,and cost, these are the only two entities that were interested in makingthe investment in money and time. This would not be the last time thatuniversities and government would be on the forefront of technologicaladvancement. Early transistors, although definitely among the most sig-nificant advances, had their problems. Their main problem was that likeany other electronic component at the time, transistors needed to besoldered together. These soldered connections had to be, in the beginning,done by hand by a person. As a result, the more complex the circuitsbecame, and the more transistors that were on an integrated circuit, themore complicated and numerous were the soldered connections betweenthe individual transistors and, by extent, the more likely it would be forinadvertent faulty wiring.

    The Universal Automatic Computer (UNIVAC) (Figure 1.9), developedin 1951, can store 12,000 digits in random-access mercury delay lines. Thefirst UNIVAC was delivered to the Census Bureau in June 1951. UNIVACprocessed each digit serially with a much higher design speed than itspredecessor, permitting it to add two ten-digit numbers at a rate of nearly100,000 additions per second. It operated at a clock frequency of 2.25

  • History 15

    MHz, an astonishing speed for a design that relied on vacuum tube circuitsand mercury delay-line memory.

    The Electronic Discrete Variable Computer (EDVAC) (Figure 1.10) wascompleted for the Ordinance Department in 1952, the same year that G.W.Dummer, a British radar expert, proposed that electronic equipment couldbe manufactured as a solid block with no connecting wires. BecauseEDVAC had more internal memory than any other computing device inhistory, it was the intention of Mauchly and Eckert that EDVAC carry itsprogram internal to the computer. The additional memory was achievedusing a series of mercury delay lines through electrical pulses that could

    Figure 1.9 UNIVAC. (From http://www.library.upenn.edu/exhibits/rbm/ mauchly/jwm11.html.)

    Figure 1.10 EDVAC. (From http://lecture.eingang.org/edvac.html.)

  • 16 Grid Database Design

    be bounced back and forth to be retrieved. This made the machine atwo-state device, or a device used for storing ones and zeros. This mercury-based two-state switch was used primarily because EDVAC would use thebinary number system, rather than typical decimal numbers. This designwould greatly simplify the construction of arithmetic units. AlthoughDummers prototype was unsuccessful, and he received virtually no sup-port for his research, in 1959 both Texas Instruments and Fairchild Semi-conductor announced the advent of the integrated circuit.

    In 1957, the former USSR launched Sputnik. The following year, inresponse, the United States launched the Advanced Research ProjectsAgency (ARPA) within the Department of Defense, thereby establishingthe United States lead in military science and technology.

    In 1958, researchers at Bell labs invented the modulator-demodulator(modem). Responsible for converting the computers digital signals toelectrical (or analog) signals and back to digital signals, modems wouldenable communication between computers.

    In 1958, Seymour Cray realized his goal to build the worlds fastestcomputer by building the CDC 1604 (the first fully transistorized super-computer) while he worked for the Control Data Corporation. ControlData Corporation was the company that Cray cofounded with WilliamNarris in 1957.

    This worlds fastest would be followed very shortly by the CDC 6000,which used both 60-bit words and parallel processing and was 40 timesfaster than its immediate predecessor.

    With the third generation of computers came the beginnings of thecurrent explosion of computer use, both in the personal home computermarket and in the commercial use of computers in the business commu-nity. The third generation was the generation that first relied on theintegrated circuit or the microchip. The microchip, first produced inSeptember 1958 by Jack St. Claire Kilby, started to make its appearancein these computers in 1963, not only increasing the storage and processingabilities of the large mainframes, but also, and probably more importantly,allowing for the appearance of the minicomputers that allowed computersto emerge from just academia, government, and very large businesses toa realm where they were affordable to smaller businesses. The discoveryof the integrated circuit of transistors saw nearly the absolute end of theneed for soldering together large numbers of transistors. Now the onlyconnections that were needed were those to other electronic components.In addition to saving space over vacuum tubes, and even over the directsoldering connection of the transistors to the main circuit board, themachines speed was also now greatly increased due to the diminisheddistance that the electrons had to follow.

  • History 17

    The 1960sIn May 1961, Leonard Kleinrock from MIT wrote, as his Ph.D. thesis, thefirst paper on packet switching theory, Information Flow in Large Com-munication Nets.

    In August 1962, J.C.R. Licklider and W. Clark, both from MIT, presentedOn-Line Man Computer Communication, their paper on the galacticnetwork concept that encompasses distributed social interactions.

    In 1964, Paul Baran, who was commissioned in 1962 by the U.S. AirForce to conduct a study on maintaining command and control overmissiles and bombers after nuclear attack, published, through the RANDCorporation, On Distributed Communications Networks, which intro-duces the system concept, packet switching networks, and the idea of nosingle point of failure (especially the reuse of extended redundancy as ameans of withstanding attacks).

    In 1965, MITs Fernando Corbats, along with the other designers ofthe Multics operating system (a mainframe time-sharing operating systemthat was begun in 1965 as a research project and was in continued useuntil 2000, and was an important influence on operating system develop-ment in the intervening 35 years), began to envision a computer processingfacility that operated much like a power company. In their 1968 articleThe Computer as a Communications Device, J.C.R. Licklider and RobertW. Taylor anticipated different Grid-like scenarios. And since the late 1960s,there has been much work devoted to developing efficient distributedsystems. These systems have met with mixed successes and continue tograpple with standards.

    ARPA, in 1965, sponsored a study on time-sharing computers andcooperative networks. In this study, the computer TX-2, located in MITsLincoln Lab, and the AN/FSQ32, located at System Development Corpo-ration in Santa Monica, CA, were directly linked via direct dedicated phonelines at the screaming speed of 1200 bps (bits per second). Later, a DigitalEquipment Corporation (DEC) computer located at ARPA would be addedto form the Experimental Network. This same year, Ted Nelson coinedtwo more terms that would impact the future, hypertext and hyperlink.These two new terms referred to the structure of a computerized infor-mation system that would allow a user to navigate through it nonsequen-tially, without any prestructured search path or predetermined path ofaccess.

    Lawrence G. Roberts of MIT presented the first ARPANet plan, Towardsa Cooperative Network of Time-Shared Computers, in October 1966. Sixmonths later, in a discussion held at a meeting in Ann Arbor, MI, Robertsled discussions for the design of ARPANet.

  • 18 Grid Database Design

    In October 1967, at the ACM Symposium on Operating Systems Prin-ciples in Gatlinburg, TN, not only did Roberts present his paper MultipleComputer Networks and Intercomputer Communication, but also mem-bers of the RAND team (Distributed Communications Networks) andmembers of ARPA (Cooperative Network of Time-Shared Computers) metwith members of the team from the National Physical Laboratory (NPL)(Middlesex, England) who were developing NPL data network under thedirection of Donald Watts Davies. Davies is credited with coining the termpacket. The NPL network carried out experiments in packet switchingusing 768-kbps lines.

    In 1969, the true foundation of the Internet was born. Commissionedby the Department of Defense as a means for research into networking,ARPANet was born. The initial four-node network (Figure 1.11) consistedof four Bolt Beranek and Newman, Inc. (BBN)-built interface messageprocessors (IMPs) using Honeywell DDP-516 minicomputers (Figure 1.12),each with 12K of memory and each connected with AT&T-provided 50-kbps lines. The configuration and location of these computers are asfollows:

    The first node, located in UCLA, was hooked up on September 2,1969, and functioned as the network measurement center. As itsoperating system, it ran SDS SIGMA 7, SEX.

    Figure 1.11 ARPANet original four-node network. (From http://www.computer-history.org.)

    Sigma 7

    940

    360

    PDP 10#2SRI

    #4Utah

    #3UCSB

    *1UCLA

  • History 19

    The second node, located at Stanford Research Institute, washooked up on October 1, 1969, and acted as the network infor-mation center. It ran the SDS940/Genie operating system.

    Node 3 was located at the University of CaliforniaSanta Barbaraand was hooked up on November 1, 1969. Node 3 was runningthe IBM 360/75, OS/MVT operating system.

    The final node, node 4, was located at the University of Utah andwas hooked up in December 1969. It ran the DEC PDP-10, Tenexoperating system.

    Charley Kline sent the first packets on the new network on October 29from the UCLA node as he tried to log in to the network: this first attemptresulted in the entire system crashing as he entered the letter G of LOGIN.

    Thomas Kurtz and John Kemeny developed the Beginners All-PurposeSymbolic Instruction Code (BASIC) in 1963 while they were members ofthe Dartmouth mathematics department. BASIC was designed to allow foran interactive and simple means for upcoming computer scientists toprogram computers. It allowed the use of print statements and variableassignments.

    Programming languages came to the business community in 1960 withthe arrival of the Common Business-Oriented Language (COBOL).Designed to assist in the production of applications for the business worldat large, COBOL separated the description of the data from the actualprogram to be run. This approach not only followed the logic of the likely

    Figure 1.12 Interface message processors (IMPs). (From http://www.computer-history.org.)

    #1 IMPUCLA

    #2 HostSIgma 7

  • 20 Grid Database Design

    programmer candidates (separation of data from code), but also allowedfor modular programming and component reuse because programmerscould separate out these descriptions and eventually whole sections ofcode that could be used later in many programs.

    Nillaus Wirth, Swiss computer scientist, in the late 1960s released hisfirst programming language, Pascal. Oddly, in this case, the scientific andacademic language followed the business language. Although academiahad been programming in machine language for decades, this was thefirst of what we consider to be higher-level programming languages. Pascalforced programmers to write programs in both a structured and logicalfashion. This meant that the programmers had to pay very close attentionto the different type of data in use and to what they needed to do withthe flow of the program. Wirth would follow his release of Pascal withfuture releases of Modula-II and Modula-III.

    Highly important to business computing, in April 1964, IBM introducedthe IBM 360 and the commercial mainframe was born. Over the comingdecades, the 360 and its descendants would become one of the majormoneymakers for IBM and the mainstay of computing in hundreds ofbusinesses.

    In 1965, a typical minicomputer cost about $20,000. An integratedcircuit that cost $1000 in 1959 cost less than $10 in 1965.

    In the 1960s, once computers became more cost effective and viablefor smaller private companies, and once the storage capacity of computersbecame such that more data and programs could be loaded into memory,databases became an option. The first manner that was used for datastorage was accomplished in the computer system through the use of fileprocessing. In file processing systems, data is partitioned into separatefiles; each has its own different format and each application has its ownseparate program.

    The initial forays into databases (where the data is centrally integratedinto a database with common format and managed) were made in 1964with NASAs Apollo moon project. One of the computer advances thatwas spurred by the space project led to the development of GUAM(Generalized Update Access Method) by IBM. Although this was not acommercially available database, it laid the foundation for those that wouldfollow.

    Access to the data stored in the database was accomplished throughlow-level pointer operations linking records. Storage details depended onthe type of data to be stored, and adding an extra field to your databaserequired completely rewriting the underlying access and data modificationmanner. The emphasis was naturally on the records to be processed, notthe overall structure of the system. A user or programmer would need to

  • History 21

    know the physical structure of the database to query, update, process, orreport on the information.

    Many of us know the content, if not the origin, of Moores law. GordonMoore made the observation in 1965 (just four years after the first inte-grated circuit) that the number of transistors per square inch in anintegrated circuit would double, on average, every year. Although thetimeframe has been adjusted somewhat, the law per se has withstood thetest of time, with the number of transistors still doubling, on average,every 18 months. This trend is expected to continue for at least anotherdecade and maybe more.

    In 1966, IBM released the first commercially available database man-agement system, the Information Management System (IMS) based on thehierarchical data model. The hierarchical data model organizes data in aninverted tree structure where there is a hierarchy of parent and child datasegments. This structure implies that every record can have repeatinginformation stored in the child data segments. The data is stored in aseries of records, each record having a set of field values attached to it,collecting every instance of a specific record together as a record type.To create the links between these record types, the hierarchical modeluses parentchild relationships and pointers, often bidirectional pointers,to ensure ease of navigation. Although the model was very popular fortwo decades, and many people have for the last 15 or 20 years beenforeseeing its demise, IMS remains a core data storage manner for manycompanies.

    GE was soon to follow in 1967 with the development of the IntegratedData System (IDS). IDS was based on the network data model.

    In 1968, Doug Engelbart demonstrated what would become three ofthe most common computer programs/applications. He showed an earlyword processor, an early hypertext system, and a collaborative application.This same year, Gordon Moore, along with Robert Noyce, founded Intel,one of the companies most responsible for upholding Moores law in thereinvention of the technology every 18 months.

    In 1969, the Conference on Data Systems Languages (CODASYL)Database Task Group Report set the standards for network databaseproducts. The popularity of the network data model coincided with thatof the hierarchical data model; however, fewer companies invested asheavily in the technology. Some data is naturally modeled with more thanone parent per child, and the network model permits the modeling ofthese many-to-many relationships in data. The basic data-modeling con-struct in the network model is the set theory, wherein a set consists ofan owner record type, a set name, and a member record type. The memberrecord type can have the member record type role in more than one set,allowing for support of the multiparent concept. Not only can a member

  • 22 Grid Database Design

    record type be a member of more than one set, but it can also be a anowner record type in another set, and an owner record type can be eithera member or an owner type in another set. The CODASYL network modelis based on mathematical set theory.

    The 1970sThe year 1970 saw the introduction of the 256-bit RAM chip by FairchildSemiconductor, and later the 1-kilobyte RAM chip by Intel. Intel alsoannounced the 4-bit microprocessor, the 4004.

    Also in 1970 Dr. E.F. Codd, IBM researcher, proposed a relational datamodel in a theoretical paper promoting the disconnection of the dataaccess and retrieval methods from the physical data storage. Because ofthe highly technical and mathematical nature of Codds original article, itssignificance was not widely recognized immediately; however, it wouldbecome one of the bases on which database systems would be based.This model has been standard ever since.

    The supercomputers of the1970s, like the Cray 1, which could cal-culate 150 million floating point operations per second, were immenselypowerful.

    Although processing power and storage capacities have increasedbeyond all recognition since the 1970s, the underlying technology of large-scale-integration (LSI) or very-large-scale-integration (VLSI) microchips hasremained basically the same, so it is widely regarded that most of todayscomputers still belong to the fourth generation.

    The first reports out of the ARPANet project started to appear on thescene in 1970. The first publication on the HostHost Protocol by C.S.Carr, S. Crocker, and V.G. Cerf, HOST-HOST Communication Protocol inthe ARPA Network, was presented in the AFIPS Proceedings of SJCC.Computer Network Development to Achieve Resource Sharing was alsopresented at AFIPS. During this same year, ARPANet started using theNetwork Control Protocol (NCP), the first host-to-host protocol, and thefirst cross-country link between two entities was created, installed by AT&Tat 56 kbs (this initial link would be replaced by one between BBN andRAND), and the second line between MIT and Utah.

    The next advance, in November 1971, was the Intel release of the veryfirst microprocessor (the 4004), and the fourth generation of computerswas born. Using these microprocessors, much of the computer processingabilities are located on a single small chip. Although this microprocessorwas capable of only 60,000 instructions per second, the future was born,and future releases of these processors would see far greater increases inspeed and power.

  • History 23

    Intel further pushed the advancement of these fourth-generation com-puters by coupling the microprocessor with its newly invented RAM chip,on which kilobits of memory could be located on a single chip.

    Norman Agramson at the University of Hawaii developed the firstpacket radio network, ALOHAnet. Becoming operational in July 1970,ALOHAnet connected to ARPANet in 1972.

    In 1971, Intel released the very first microprocessor: a highly specializedintegrated circuit that was able to process several bits of data at a time.The new chip included its own arithmetic logic unit. The circuits usedfor controlling and organizing the work took up a large portion of thechip, leaving less room for the data-handling circuitry. Computers up untilnow had been strictly relegated to use by the military, universities, andvery large corporations because of their preventative cost for not only themachine, but also the maintenance of the machine once it was in place.

    The UNIX Time Sharing System First Edition V1 was presented onNovember 3, 1971; version 2 came out seven months later.

    In 1972, Cray left Control Data Corporation to found the Cray ResearchCompany, where he designed the Cray 1 in 1976. The Cray 1 was an 80-megahertz machine that had the ability to reach a throughput stream of100 megaflops (or 1 gigaflop) of data.

    Holding with Moores law, in 1972, Intel announced the 8008, an 8-bit microprocessor.

    In 1975, the cover of Popular Electronics featured a story on theworlds first minicomputer kit to rival commercial models Altair 8800.The Altair 8800 was produced by Micro Instrumentation and TelemetrySystems (MITS) and retailed for $397. This modest price made it easilyaffordable for the small but growing hacker community, as well as theintrepid few souls destined to be the next generation of computer pro-fessionals.

    Furthering the area of networking, ARPANet was expanded to include23 nodes, including: UCLA, SRI, SCSB, University of Utah, BBN, MIT,RAND, SDC, Harvard, Lincoln Lab, Stanford, UIUC, CWRU, CMU, andNASA/Ames. BBN started to use cheaper Honeywell 316 systems to buildits IMPs and, because the original IMP could support only four nodes,developed the more robust terminal IMP (TIP) that would support anamazing 64 terminals. In Washington, D.C., at the International Conferenceon Computer Communications (ICCC) in 1972, ARPANet using the terminalinterface processor was demonstrated, now with 40 nodes.

    Ray Tomlinson of BBN invented an e-mail program to send messagesacross a distributed network, deriving the original program from a com-bination of an intramachine e-mail program (SENDMSG) and an experi-mental file transfer program (CPYNET). Tomlinson modified his programfor ARPANet and it became a quick success. This initial foray into e-mail

  • 24 Grid Database Design

    was when the @ sign was chosen as the character from the punctuationkeys on Tomlinsons Model 33 Teletype machine for the meaning at inan e-mail address. Several months later, Lawrence Roberts wrote the firste-mail management program to list, selectively read, file, forward, andrespond to messages, adding deeper functionality to Tomlinsons creation.

    The first computer-to-computer chat took place in 1972, at UCLA, andwas repeated during the ICCC in Washington, D.C.

    Specifications for TELENT (RFC 318) rounded out the eventful year.The Altair was not designed for typical home use, or for your computer

    novice. The kit required extensive assembly by the owner, and onceassembled, it was necessary to write the software for the machine becausenone was commercially available. The Altair 8800 needed to be codeddirectly in machine code ones and zeros (accomplished by flippingthe switches that were located directly on the front of the machine) and had an amazing 256 bytes of memory. This made its onboard memoryabout the size of a paragraph.

    Two young hackers who were intrigued by the Altair, having seen thearticle in Popular Electronics, decided that the Altair needed to havesoftware available commercially and contacted MITS owner, Ed Roberts,and offered to provide him with BASIC that would run on the Altair.

    The boost that BASIC would give the Altair would be considerable,so Roberts said he would pay for it, but only if it worked. The two hackers,Bill Gates and Paul Allen, worked feverishly and diligently and finishedthe product barely in time to present it to Roberts. It was a huge successand the basis on which they would design not only BASIC for many othermachines, but also operating systems for a wide variety of machines.

    In 1973, ARPA was renamed the Defense Advanced Research ProjectsAgency (DARPA). Development, under the new DARPA, started on theprotocol that would later be known as TCP/IP (a protocol that allowsdiverse computer networks to not only communicate, but also to inter-connect with each other), by a group headed by Vinton Cerf from Stanfordand Bob Kahn from DARPA. ARPANet was using the NCP to transfer dataand saw the very first international connections from University Collegeof London in England.

    Harvard Ph.D. candidate Bob Metcalfe, in his thesis, outlined the ideaof what would become Ethernet. The concept was tested out on XeroxPARCs Alto computers. The first Ethernet network was called the AltoAloha System. In 1976, Metcalfe developed Ethernet, allowing a coaxialcable to move data rapidly, paving the way to todays local area networks(LANs).

    Kahn suggested the idea of an Internet and started an internettingresearch program at DARPA. Cerf sketched a potential gateway architectureon the back of an envelope in a hotel lobby in San Francisco. The two

  • History 25

    later presented the basic Internet idea at the University of Sussex inBrighton, United Kingdom.

    The year 1976 saw IBMs San Jose Research Lab developing a relationaldatabase model prototype called System R. AT&T developed UUCP (UNIXto UNIX CoPy) that would be distributed with UNIX in 1977. DARPAstarted to experiment with TCP/IP and shortly determined that it wouldbe the standard for ARPANet. Elizabeth II, the Queen of the UnitedKingdom, sent an e-mail on March 26, 1976, from Royal Signals and RadarEstablishment (RSRE) in Malvern.

    Dr. Peter Chen, in 1976, proposed the entity-relationship (ER) modelfor database design. The paper The Entity-Relationship Model: Towarda Unified View of Data, later to be honored as one of the most influentialpapers in computer science, provided insight into conceptual data modelsproviding higher-level modeling that allows the data architect or thedatabase designer to concentrate on the use of the data rather than thelogical table structure.

    The Altair was not the only commercial kid on the block for long. Notlong after its introduction, there came an avalanche of more personal typecomputers. Steve Jobs and Steve Wozniak started this avalanche in 1977at the First West Coast Computer Fair in San Francisco with the unveilingof the Apple II. Boasting the built-in BASIC language, color graphics, anda screaming 4100 characters of board memory, the Apple II sold for $1298.Further, programs could be stored, starting with the Apple II, on a simpleeveryday audiocassette. During the fair, Jobs and Wozniak secured firmorders for 300 of their new machines.

    Also introduced in 1977 was the home computer, the Tandy RadioShacks TRS-80. Its second incarnation, the TRS-80 Model II, came withan amazing 64,000-character memory and another odd new invention, adisk drive on which to store programs and data. With the introduction ofthe disk drive, personal computer applications started to take off at asimilar rate as the computer. A floppy disk remained the most convenientpublishing medium for distribution of software for well over a decade.

    Not to be outdone, IBM, a company geared to creating businessmachines and who, up to this time, had been producing mainframes andminicomputers primarily for medium- to large-size businesses, made thedecision to get into the new act. It started working on the Acorn, latercalled the IBM PC (and the term was born). The PC was the first computerdesigned especially for the home market and featured a modular design.This meant that pieces could easily be added to the architecture either atthe time of purchase or later. It is surprising to note that most of the PCscomponents came from outside of IBM, as building it with IBM partswould have made the resulting machines cost entirely too much for nearlyanyone in the home computer market. When it was first introduced, the

  • 26 Grid Database Design

    PC came with 16,000 characters of memory, the keyboard from an IBMelectric typewriter, and a connection for a cassette tape recorder (forprogram and data storage), and it listed for $1265.

    In 1978, TCP/IP was split into TCP (Transmission Control Protocol)and IP (Internet Protocol).

    USENET, a decentralized new group network initially based on UUCP,was created in 1979 by graduate student Steve Bellovin and programmersTom Truscott and Jim Ellis at the University of North Carolina. The firstmessage was sent between Duke and UNC.

    Again, not to be outdone, IBM created BITNET (Because Its TimeNetwork), introducing the store-and-forward network that would be usedfor e-mail and listservers.

    DARPA established the Internet Configuration Control Board (IICB) toassist in the management of Internet activity. The ICCB would later (1983)be disbanded and replaced by Task Forces and yet later by the InternetActivities Board (IAB) (formed from the chairs of the Task Forces).

    On the lighter side, also getting its start in 1979 was the interjectionof emotion into an e-mail message. Kevin MacKenzie e-mailed the Msg-Group, on April 12, the suggestion that adding emotion into dry text couldbe accomplished by using characters such as -), suggesting that thereferenced sentence was intended as tongue in cheek. MacKenzie foundhimself flamed by the masses at the suggestion, but as millions can attest,emoticons have since become widely used.

    In the late 1970s and early 1980s there were special database machinesthat offloaded database management function onto special processors withintelligent storage devices or database filters. These machines had a highcost of customized hardware and limited extensibility.

    The 1980sIn 1981, IBM released the first commercially available database productbased on the new relational model, the Structured Query Language/DataSystem (SQL/DS), for its mainframe systems. The Relational DatabaseManagement System (RDBMS) is based on the relational model developedby E.F. Codd. This structure allows for the definition of the data structures,storage and retrieval operations, and integrity constraints with the dataand relationships between the different data sets organized into tables (acollection of records with each record containing the same fields). Prop-erties of these tables, in a true relational model, include the fact that eachrow is unique, columns contain values of the same kind, the sequencesof the columns and rows within the table are insignificant, and eachcolumn has a unique name. When columns in two different tables contain

  • History 27

    values from the same set (the columns may or may not have the samenames), a joint operation can be performed to select all of the relevantinformation, and joining multiple tables on multiple columns allows forthe easy reassembly of an entire set of information. The relational databasemodel is based on relational algebra and, by extension, relational calculus.

    Also in 1981, Ashton-Tate released dBase II for microcomputer systems.CSNET (Computer Science NETwork, later to become known as Com-

    puter and Science Network) was also built in 1981 by a collaboration ofcomputer scientists from the University of Delaware, Purdue University,the University of Wisconsin, RAND Corporation, and BBN to providenetworking services (initially the primary use would be e-mail) to univer-sity scientists with no access to ARPANet.

    DB2, produced by IBM in 1982, was an SQL-based database for itsmainframes with a batch operating system. DB2 remains one of the mostpopular relational database systems in business today, now available forWindows, UNIX, Linux, mainframes, and the AS-400 computer systems.

    DCA and DARPA established TCP and IP as the protocol suite, com-monly known as TCP/IP, for ARPANet, and the Department of Defensedetermined it to be the standard. This establishment would lead to oneof the first definitions of an Internet being a connection of sets of networks,primarily a set using TCP/IP. By January 1, 1982, TCP and IP replacedNCP entirely as the core Internet protocol for ARPANet.

    Also in 1982, Larry Ellisons Relational Software, Inc. (RSI, currentlyOracle Corporation) released the C-based Oracle V3, becoming the firstdatabase management system (DBMS) to run not only on mainframes andminicomputers, but also on PCs.

    The following year, Microrim created the R:BASE relational databasesystem based on NASAs mainframe product RIM using SQL.

    In 1983, the University of Wisconsin created the first Domain NameSystem (DNS), which freed people from the need to remember thenumbers assigned to other servers. DNS allowed packets to be directedfrom one domain to a domain name, and that name to be translated bythe destination server database into the corresponding IP address.

    By 1984, both Apple and IBM had come out with new models. At thistime, Apple released the first-generation Macintosh computer, which wasthe first computer to come with both a graphical user interface (GUI) anda mouse. The GUI interface would prove to be one of the most importantadvances in the home computer market. It made the machine much moreattractive to home computer users because it was easier and more intuitiveto use. Sales of the Macintosh soared like nothing ever seen before. IBM,not to be outdone, released the 286-AT. This machine came with realapplications, like a spreadsheet (Lotus 1-2-3) and a word processor

  • 28 Grid Database Design

    (Microsoft Word). These early applications quickly became, and remainedfor many years, the favorite business applications.

    The division of ARPANet into MILNET (designated to serve the needsof the military) and ARPANet (supporting the advanced research compo-nents) occurred in 1984.

    Speed was added to CSNET, also in 1984, when MCI was contractedto upgrade the circuits to T1 lines with speeds of 1.5 Mbps (25 times asfast as the previous 56-kbps lines). IBM pitched into the project byproviding routers, and Merit managed the new network that would nowbe referred to as NSFNET (National Science Foundation Network). Theold lines would remain in place, and the network still using those lineswould continue to be referred to as CSNET.

    In 1984, William Gibson published Neuromancer, the novel thatlaunched the cyberpunk generation. The first novel to win not only theHugo Award but also the Nebula Award and the Philip K. Dick Award,Neuromancer introduced the rest of the world to cyberspace.

    In 1985, the American National Standards Institute (ANSI) adopted SQLas the query language standard.

    Exactly 100 years to the day (18851985) of the last spike being driveninto the cross-country Canadian railroad, the last Canadian universitybecame connected to NetNorth. The one-year effort to have coast-to-coastCanadian connectivity was successful.

    Several firsts rounded out 1985. Symbolics.com was assigned on March15, making it the first registered domain. Carnegie Mellon (cmu.edu),Purdue (purdue.edu), Rice (rice.edu), UCLA (ucla.edu), and the MITRECorporation (mitre.org), a not-for-profit that works in the public interest,working in systems engineering, information technology, operational con-cepts, and enterprise modernization, all became registered domains.

    Cray 2 was built in 1985 and was again the fastest computer of its time.The first Freenet came online in 1986 under the auspices of the Society

    for Public Access Computing (SoPAC). The National Public TelecommutingNetwork (NPTN) assumed the Freenet program management in 1989, thesame year that the Network News Transfer Protocol (NNTP) was designedto enhance USENET news performance running over TCP/IP.

    BITNET and CSNET merged in 1987 to form the Corporation forResearch and Education Networking (CREN), yet another fine work of theNational Science Foundation. This was at the same time that the numberof hosts on BITNET broke the 1000 mark.

    Also in 1987, the very first e-mail link between Germany and Chinawas established using the CSNET protocols, and the first message fromChina was sent September 20.

    The T1 NSFNET backbone was completed in 1988. The traffic increasedso rapidly that plans began immediately on the next major upgrade to

  • History 29

    the network. Canada, Denmark, Finland, France, Iceland, Norway, andSweden connected to NSFNET that year.

    The Computer Emergency Response Team (CERT) was formed byDARPA in response to the needs that became apparent during the MorrisWorm incident. The Morris Worm, released at approximately 5 P.M. onNovember 2, 1988, from the MIT AI laboratory in Cambridge, MA, quicklyspread to Cornell, Stanford, and from there on to other sites. By the nextmorning, almost the entire Internet was infected. VAX and Sun machinesall across the country were rapidly being overloaded by invisible tasks.Users, if they were able to access machines at all, were unable to use theaffected machines effectively. System administrators were soon forced tocut many machines off from the Internet entirely in a vain attempt to limitthe source of infection. The culprit was a small program written by RobertTappan Morris, a 23-year-old doctoral student at Cornell University. Thiswas the year of the first great Internet panic.

    For those who enjoy chat, 1988 was the year that Internet Relay Chat(IRC) was developed by Jarkko Oikarinen.

    Cray Computer Corporation, founded by Seymour Cray, developedCray 3 and Cray 4, each one a gigaflop machine based on the 1-gigahertzgallium arsenide processor (the processor developed by Cray for super-computers).

    In 1989, the Cuckoos Egg was published. The Cuckoos Egg is a CliffordStoll novel recounting the real-life drama of one mans attempts to tracka German cracker group who infiltrated U.S. facilities by making use oflittle-known backdoors. A cuckoo lays an egg in another birds nest; ithatches first and pushes the other eggs out of the nest, forcing the motherof the nest to care for the imposter hatchling. The egg in the book wasa password-gathering program that allowed the crackers to get into manysystems all over the United States. The year of the Cuckoos Egg sawAustralia, Germany, Israel, Italy, Japan, Mexico, the Netherlands, NewZealand, Puerto Rico, and the United Kingdom joining those alreadyconnected to NSFNET.

    By the middle of the 1980s it had become obvious that there wereseveral fields where relational databases were not entirely practical dueto the types of data involved. These industries included medicine, multi-media, and high-energy physics. All of these industries need more flexi-bility in the way their data was represented and accessed.

    This need led to research being started in the field of object-orienteddatabases, where users can define their own methods of access to dataand the way that it is represented and manipulated.

    This object-oriented database research coincided with the appearanceof object-oriented programming languages such as C++.

  • 30 Grid Database Design

    In the late 1980s, there were technological advancements resulting incheap commodity disks, commodity processors and memories, and soft-ware-oriented database management system solutions. Large-scale multi-processor systems with more than 1000 nodes were shipped, providingmore total computing power at a lower cost. Modular architecture nowbegan to allow for incremental growth and widespread adoption of therelational model for database management systems.

    The 1990sNSFNETs T3 backbone was constructed in 1990. During the construction,the Department of Defense disbanded the ARPANet and replaced it andits 56-kbs lines with NSFNET. ARPANet was taken out of service.

    Also in 1990, Argentina, Australia, Belgium, Brazil, Chile, Greece, India,Ireland, Korea, Spain, and Switzerland joined the NSFNETs T1 network.

    In 1991, CSNET and its 56-kbps lines were discontinued. CSNET fulfilledits early role in the provision of academic networking services. One ofCRENs defining features is that all of its operational costs are met throughthe dues paid by its member organizations.

    By the early 1990s, the first object-oriented database managementsystems had started to make an appearance, allowing users to createdatabase systems to store the vast amounts of data that were the resultsfrom research at places such as CERN, and to store patient records atmany major medical establishments.

    In 1991, Oracle Corporation, with its two-year-old Oracle 6.2, not onlybrought to the market clustering with Oracle Parallel Server, but reachedthe 1000 transactions per second mark on a parallel computing machine,becoming the first database to run on a massively parallel computer. Widearea information servers (WAIS) invented by Brewster Kahle, Gopherreleased by Paul Linder and Mark McCahall, Pretty Good Privacy (PGP)released by Phillip Zimmerman, and World Wide Web (WWW) releasedby CERN were all important services that came on to the computing andnetworking scene in 1991.

    Finally, in 1991, Croatia, the Czech Republic, Hong Kong, Hungary,Poland, Portugal, Singapore, South Africa, Taiwan, and Tunisia connectedto NSFNET.

    Veronica, a search tool for Gopher space, was released by the Universityof Nevada in 1992, the year that NSFNETs backbone was upgraded toT3, making the networks speed nearly 45 Mbps.

    The first MBONE audio and video multicast also occurred in 1992.MBONE is a service provided by the Distributed Systems DepartmentCollaboration Technologies Group of the Computing Science Department

  • History 31

    of Berkeley Lab. Currently still somewhat experimental, MBONEs video-conferencing services are not yet available for all operating systems.

    Zen and the Art of the Internet, by Brendan Kehoe, was published in1992. Now available at http://www.cs.indiana.edu/docproject/zen/zen-1.0_toc.html, this book has been a useful tool to many beginners comingto the Internet and has been on the textbook list for university classes.

    Antarctica, Cameroon, Cyprus, Ecuador, Estonia, Kuwait, Latvia, Lux-embourg, Malaysia, Slovakia, Slovenia, Thailand, and Venezuela all joinedNSFNET in 1992, pushing the number of hosts to over 1 million.

    CAVE Automatic Virtual Environment, also developed in 1992 by theElectronic Visualization Laboratory at the University of Illinois at Chicago,can also be defined as a virtual reality theatre display or a spatiallyimmersive display. CAVE is now being produced commercially byFakeSpace Systems.

    In 1993, the National Science Foundation created InterNIC to provideInternet services. AT&T provided directory and database services, NetworkSolutions, Inc., was responsible for registration services, and GeneralAtomics/CERFnet provided information services for the new venture.

    The same year, Marc Andreessen, of the University of Illinois and theNational Center for Supercomputing Applications (providers of A Begin-ners Guide to HTML, available at http://archive.ncsa.uiuc.edu/Gen-eral/Internet/WWW/HTMLPrimer.html), introduced a graphical userinterface to the World Wide Web called Mosaic for X. Through this browser,one could view the pages for the U.S. White House and the United Nations,which both came online in 1993.

    Also in 1993, Bulgaria, Costa Rica, Egypt, Fiji, Ghana, Guam, Indonesia,Kazakhstan, Kenya, Liechtenstein, Peru, Romania, the Russian Federation,Turkey, Ukraine, UAE, and the U.S. Virgin Islands joined the NSFNETsT3 network.

    In 1994, the World Wide Web became the second most popular serviceon the Net, edging out Telnet and sneaking up on FTP, based on thenumber and percentage of packets and bytes of traffic distributed acrossNSFNET. Helping to add to these percentages, Algeria, Armenia, Bermuda,Burkina Faso, China, Colombia, Jamaica, Jordan, Lebanon, Lithuania,Macao, Morocco, New Caledonia, Nicaragua, Niger, Panama, the Philip-pines, Senegal, Sri Lanka, Swaziland, Uruguay, and Uzbekistan added theirnames to those connected to the NSFNET.

    The National Science Foundation, in 1995, an