b5 15 - large-scale complex it systems

7
JULY 2012 | VOL. 55 | NO. 7 | COMMUNICATIONS OF THE ACM 71 ON THE AFTERNOON of May 6, 2010, the U.S. equity markets experienced an extraordinary upheaval. Over approximately 10 minutes, the Dow Jones Industrial Average dropped more than 600 points, representing the disappearance of approximately $800 billion of market value. The share price of several blue-chip multinational companies fluctuated dramatically; shares that had been at tens of dollars plummeted to a penny in some cases and rocketed to values over $100,000 per share in others. As suddenly as this market downturn oc- curred, it reversed, so over the next few minutes most of the loss was recovered and share prices returned to levels close to what they had been before the crash. This event came to be known as the “Flash Crash,” and, in the inquiry re- port published six months later, 7 the trigger event was identified as a single block sale of $4.1 billion of futures con- tracts executed with uncommon ur- gency on behalf of a fund-management company. That sale began a complex pattern of interactions between the high-frequency algorithmic trading systems (algos) that buy and sell blocks of financial instruments on incredibly short timescales. A software bug did not cause the Flash Crash; rather, the interactions of independently managed software sys- tems created conditions unforeseen Large-Scale Complex IT Systems DOI:10.1145/2209249.2209268 The reductionism behind today’s software- engineering methods breaks down in the face of systems complexity. BY IAN SOMMERVILLE, DAVE CLIFF, RADU CALINESCU, JUSTIN KEEN, TIM KELLY, MARTA KWIATKOWSKA, JOHN MCDERMID, AND RICHARD PAIGE key insights Coalitions of systems, in which the system elements are managed and owned independently, pose challenging new problems for systems engineering. When the fundamental basis of engineering—reductionism—breaks down, incremental improvements to current engineering techniques are unable to address the challenges of developing, integrating, and deploying large-scale complex IT systems. Developing complex systems requires a socio-technical perspective involving human, organizational, social, and political factors, as well as technical factors.

Upload: pere-oliete

Post on 18-Dec-2014

209 views

Category:

Education


4 download

DESCRIPTION

 

TRANSCRIPT

Page 1: B5   15 - large-scale complex it systems

juLy 2012 | voL. 55 | no. 7 | CommuniCations oF the aCm 71

on The AfTernoon of May 6, 2010, the U.S. equity markets experienced an extraordinary upheaval. Over approximately 10 minutes, the Dow Jones Industrial Average dropped more than 600 points, representing the disappearance of approximately $800 billion of market value. The share price of several blue-chip

multinational companies fluctuated dramatically; shares that had been at tens of dollars plummeted to a penny in some cases and rocketed to values over $100,000 per share in others. As suddenly as this market downturn oc-

curred, it reversed, so over the next few minutes most of the loss was recovered and share prices returned to levels close to what they had been before the crash.

This event came to be known as the “Flash Crash,” and, in the inquiry re-port published six months later,7 the trigger event was identified as a single block sale of $4.1 billion of futures con-tracts executed with uncommon ur-gency on behalf of a fund-management company. That sale began a complex pattern of interactions between the high-frequency algorithmic trading systems (algos) that buy and sell blocks of financial instruments on incredibly short timescales.

A software bug did not cause the Flash Crash; rather, the interactions of independently managed software sys-tems created conditions unforeseen

Large-scale Complex it systems

Doi:10.1145/2209249.2209268

The reductionism behind today’s software-engineering methods breaks down in the face of systems complexity.

BY ian sommeRViLLe, DaVe CLiFF, RaDu CaLinesCu, Justin Keen, tim KeLLY, maRta KWiatKoWsKa, John mCDeRmiD, anD RiChaRD PaiGe

key insights Coalitions of systems, in which the system

elements are managed and owned independently, pose challenging new problems for systems engineering.

When the fundamental basis of engineering—reductionism—breaks down, incremental improvements to current engineering techniques are unable to address the challenges of developing, integrating, and deploying large-scale complex it systems.

Developing complex systems requires a socio-technical perspective involving human, organizational, social, and political factors, as well as technical factors.

Page 2: B5   15 - large-scale complex it systems

72 CommuniCations oF the aCm | juLy 2012 | voL. 55 | no. 7

contributed articles

Developers cannot analyze inherent complexity during system development, as it depends on the system’s dynamic operating environment.

(probably unforeseeable) by the own-ers and developers of the trading sys-tems. Within seconds, the result was a failure in the broader socio-technical markets that increasingly rely on the algos (see the sidebar “Socio-Techni-cal Systems”).

Society depends on complex IT sys-tems created by integrating and or-chestrating independently managed systems. The incredible increase in scale and complexity in them over the past decade means new software-engi-neering techniques are needed to help us cope with their inherent complexity. Here, we explain the principal reasons today’s software-engineering meth-ods and tools do not scale, proposing a research and education agenda to help address the inherent problems of large-scale complex IT systems, or LSCITS, engineering.

Coalitions of systems The key characteristic of these systems is that they are assembled from other systems that are independently con-trolled and managed. While there is increasing awareness in the software-engineering community of related is-sues,10 the most relevant background work comes from systems engineering. Systems engineering focuses on devel-oping systems as a whole, as defined by the International Council for Systems Engineering (http://www.incose.org/): “Systems engineering integrates all the disciplines and specialty groups into a team effort forming a structured de-velopment process that proceeds from concept to production to operation. Systems engineering considers both the business and the technical needs of all customers with the goal of pro-viding a quality product that meets the user needs.”

Systems engineering emerged to take a systemwide perspective on com-plex engineered systems involving structures and electrical and mechani-cal systems. Almost all systems today are software-intensive, and systems engineers address the challenge of constructing ultra-large-scale software systems.17 The most relevant aspects of systems engineering is work on “sys-tem of systems,” or SoS,12 about which Maier said the distinction between SoS and complex monolithic systems is that SoS elements are operationally

and managerially independent. Char-acterizing SoS, he covered a range of systems, from directed (developed for a particular purpose) to virtual (lack-ing a central management authority or centrally agreed purpose). LSCITS is a type of SoS in which the elements are owned and managed by different orga-nizations. In this classification, the col-lection of systems that led to the Flash Crash (an LSCITS) would be called a “virtual system of systems.” However, since Maier’s article was published in 1998, the word “virtual” has generally taken on a different meaning—virtual machines; consequently, we propose an alternative term that we find more descriptive—“coalition of systems.”

The systems in a coalition of systems work together, sometimes reluctantly, as doing so is in their mutual interest. Coalitions of systems are not explicitly designed but come into existence as different member systems interact ac-cording to agreed-upon protocols. Like political coalitions, there might even be hostility between various members, and members enter and leave accord-ing to their interpretation of their own best interests.

The interacting algos that led to the Flash Crash represent an example of a coalition of systems, serving the pur-poses of their owners and cooperating only because they have to. The owners of the individual systems were compet-ing finance companies that were often mutually hostile. Each system jealous-ly guarded its own information and could change without consulting any other system.

Dynamic coalitions of software-intensive systems are a challenge for software engineering. Designing de-pendability into the coalition is not possible, as there is no overall design authority, nor is it possible to central-ly control the behavior of individual systems. The systems in the coalition can change unpredictably or be com-pletely replaced, and the organiza-tions running them might themselves cease to exist. Coalition “design” in-volves the protocols for communica-tions, and each organization using the coalition orchestrates the constit-uent systems its own way. However, the designers and managers of each individual system must consider how to make it robust enough to ensure

Page 3: B5   15 - large-scale complex it systems

contributed articles

juLy 2012 | voL. 55 | no. 7 | CommuniCations oF the aCm 73

their organizations are not threat-ened by failure or undesirable behav-ior elsewhere in the coalition.

system Complexity Complexity stems from the number and type of relationships between the system’s components and between the system and its environment. If a rela-tively small number of relationships exist between system components and they change relatively slowly over time, then engineers can develop determin-istic models of the system and make predictions concerning its properties.

However, when the elements in a system involve many dynamic relation-ships, complexity is inevitable. Com-plex systems are nondeterministic, and system characteristics cannot be predicted by analyzing the system’s constituents. Such characteristics emerge when the whole system is put to use and changes over time, depend-ing how it is used and on the state of its external environment.

Dynamic relationships include those between system elements and the system’s environment that change. For example, a trust relationship is a dynamic relationship; initially, com-ponent A might not trust component B, so, following some interchange, A checks that B has performed as ex-pected. Over time, these checks may be reduced in scope as A’s trust in B increases. However, some failure in B may profoundly influence that trust, and, after the failure, even more strin-gent checks might be introduced.

Complexity stemming from the dynamic relationships between ele-ments in a system depends on the ex-istence and nature of these relation-ships. Engineers cannot analyze this inherent complexity during system development, as it depends on the system’s dynamic operating environ-ment. Coalitions of systems in which elements are large software systems are always inherently complex. The relationships between the elements of the coalition change because they are not independent of how the systems are used or of the nature of their op-erating environments. Consequently, the nonfunctional (often even the functional) behavior of coalitions of systems is emergent and impossible to predict completely.

Even when the relationships be-tween system elements are simpler, relatively static, and, in principle, un-derstandable, there may be so many elements and relationships that un-derstanding them is practically impos-sible. Such complexity is called “epis-temic complexity” due to our lack of knowledge of the system rather than some inherent system characteris-tics.16 For example, it may be possible in principle to deduce the traceability relationships between requirements and design, but, if appropriate tools are not available, doing so may be prac-tically impossible.

If you do not know enough about a system’s components and their re-lationships, you cannot make predic-tions about overall behavior, even if the system lacks dynamic relationships between its elements. Epistemic com-plexity increases with system size; as ever-larger systems are built, they are inevitably more difficult to understand and their behavior and properties more difficult to predict. This distinc-tion between inherent and epistemic complexity is important. As discussed in the following section, it is the prima-ry reason new approaches to software engineering are needed.

Reductionism and software engineering In some respects, software engineering has been incredibly successful. Com-pared to the systems built in the 1970s

and 1980s, modern software is much larger, more complex, more reliable, and often developed more quickly. Software products deliver astonishing functionality at relatively low cost.

Software engineering has focused on reducing and managing epistemic complexity, so, where inherent com-plexity is relatively limited and a single organization controls all system ele-ments, software engineering is highly effective. However, for coalitions of systems with a high degree of inherent complexity, today’s software engineer-ing techniques are inadequate.

This is reflected in the failure that is all too common in large government-funded projects. The software may be delivered late, be more expensive to develop than anticipated, and inad-equate for the needs of its users. An example of such a project was the at-tempt, from 2000 to 2010, to automate U.K. health records; the project was ul-timately abandoned at a cost estimated at $5 billion–$10 billion.19

The fundamental reason today’s software engineering cannot effec-tively manage inherent complexity is that its basis is in developing individ-ual programs rather than in interact-ing systems. The consequence is that software-engineering methods are unsuitable for building LSCITS. To ap-preciate why, we need to examine the essential divide-and-conquer reduc-tionist assumption that is the basis of all modern engineering.

engineers are concerned primarily with building technical systems from hardware and software components and assume system requirements reflect the organizational needs for integration with other systems, compliance, and business processes. Yet systems in use are not simply technical systems but “socio-technical systems.” To reflect the fact they involve evolving and interacting communities that include technical, human, and organizational elements, they are sometimes also called “socio-technical ecosystems,” though the term socio-technical systems is more common.

socio-technical systems include people and processes, as well as technological systems. The process definitions outline how the system designers intend the system should be used, but, in practice, users interpret and adapt them in unpredictable ways, depending on their education, experience, and culture. individual and group behavior also depend on organizational rules and regulations, as well as organizational culture, or “the way things are done around here.”

defining technical software-intensive systems intended to support an organization’s work in isolation is an oversimplification that hinders software engineering. so-called system requirements represent the interface between the technical system and the wider socio-technical system, yet requirements are inevitably incomplete, incorrect, and/or out of date. Coalitions of systems cannot operate on this basis. rather, engineers must recognize that by taking advantage of the abilities and inventiveness of people, the systems will be more effective and resilient.

Socio-Technical Systems

Page 4: B5   15 - large-scale complex it systems

74 CommuniCations oF the aCm | juLy 2012 | voL. 55 | no. 7

contributed articles

Reductionism is a philosophical position that a complex system is no more than the sum of its parts, and that an account of the overall system can be reduced to accounts of individ-ual constituents. From an engineering perspective, this means systems engi-neers must be able to design a system so it is composed of discrete smaller parts and interfaces allowing the parts to work together. A systems engineer then builds the system elements and integrates them to create the desired overall system.

Researchers generally adopt this reductionist assumption, and their work concerns finding better ways to decompose problems or systems (such as software architecture), better ways to create the parts of the system (such as object-oriented techniques), or bet-ter ways to do system integration (such as test-first development). Underlying all software-engineering methods and techniques (see Figure 1) are three re-ductionist assumptions:

System owners control system devel-opment. A reductionist perspective takes the view that an ultimate control-ler has the authority to take decisions about a system and is therefore able to enforce decisions on, say, how com-ponents interact. However, when sys-tems consist of independently owned and managed elements, there is no such owner or controller and no cen-tral authority to take or enforce design decisions;

Decisions are rational, driven by tech-

nical criteria. Decision making in orga-nizations is profoundly influenced by political considerations, with actors striving to maintain or improve their current positions to avoid losing face. Technical considerations are rarely the most significant factor in large-system decision making; and

The problem is definable, and sys-tem boundaries are clear. The nature of “wicked problems”15 is that the “problem” is constantly changing, de-pending on the perceptions and status of stakeholders. As stakeholder posi-tions change, the boundaries are like-wise redefined.

However, for coalitions of systems, these assumptions never hold true, and many software project “failures,” where software is delivered late and/or over budget, are a consequence of adherence to the reductionist view. To help address inherent complexity, soft-ware engineering must look toward the systems, people, and organizations that make up a software system’s envi-ronment. We need to represent, ana-lyze, model, and simulate potential op-erational environments for coalitions of systems to help us understand and manage, so far as possible, the com-plex relationships in the coalition.

Challenges Since 2006, initiatives in the U.S. and in Europe have sought to ad-dress engineering large coalitions of systems. In the U.S., a report by the influential Software Engineering In-

stitute at Carnegie Mellon University (http://www.sei.cmu.edu/) on ultra-large-scale systems (ULSS)13 triggered research leading to creation of the Center for Ultra-Large Scale Software-Intensive Systems, or ULSSIS (http://ulssis.cs.virginia.edu/ULSSIS), a re-search consortium involving the Uni-versity of Virginia, Michigan State University, Vanderbilt University, and the University of California, San Diego. In the U.K., the comparable LSCITS Initiative addresses problems of inherent and epistemic complexity in LSCITS, while Hillary Sillitto, a se-nior systems architect at Thales Land & Joint Systems U.K., has proposed ULSS design principles.17

Northrop et al.13 made the point that developing ultra-large-scale sys-tems needs to go beyond incremental improvements to current methods, identifying seven important research areas: human interaction, computa-tional emergence, design, computa-tional engineering, adaptive system infrastructure, adaptable and predict-able system quality and policy, and ac-quisition and management. The SEI ULSS report suggested it is essential to deploy expertise from a range of disci-plines to address these challenges.

We agree the research required is interdisciplinary and that incremental improvement in existing techniques is unable to address the long-term soft-ware-engineering challenges of ultra-large-scale systems engineering. How-ever, a weakness of the SEI report was its failure to set out a roadmap outlin-ing how large-scale systems engineer-ing can get from where it is today to the research it proposed.

Software engineers worldwide cre-ating large complex software systems require more immediate, perhaps more incremental, research, driven by the practical problems of complex IT systems engineering. The pragmatic proposals we outline here begin to ad-dress some of them, aiming for medi-um-term, as well as a longer-term, im-pact on LSCITS engineering.

The research topics we propose here might be viewed as part of the roadmap that could lead us from cur-rent practice to LSCITS engineering. We see them as a bridge between the short- and medium-term imperative to improve our ability to create coalitions

Figure 1. Reductionist assumptions vs. LsCits reality.

Reductionist assumptions

Large-scale complex IT systems reality

Control Rationality Problem definition

owners of a system control its development

Decisions made rationally, driven by

technical criteria

Definable problem and clear system

boundaries

no single owner or controller

Decisions driven by political motives

Wicked problem and constantly renegotiated

system boundaries

Page 5: B5   15 - large-scale complex it systems

contributed articles

juLy 2012 | voL. 55 | no. 7 | CommuniCations oF the aCm 75

the nonfunctional (and, often, the functional) behavior of coalitions of systems is emergent and impossible to predict completely.

of systems and the longer-term vision set out in the SEI ULSS report.

Developing coalitions of systems in-volves engineering individual systems to work in the orchestration, as well as configuration, of a coalition to meet organizational needs. Based on the ideas in the SEI ULSS report and on our own experience in the U.K. LSCITS Ini-tiative, we have identified 10 questions that can help define a research agenda for future LSCITS software engineering:

How can interactions between inde-pendent systems be modeled and simu-lated? To help understand and manage coalitions of systems LSCITS engineers need dynamic models that are updated in real time with information from the system itself. These models are needed to help make what-if assess-ments of the consequences of system-change options. This requires new performance- and failure-modeling techniques where the models adapt au-tomatically due to system-monitoring data. We do not suggest simulations can be complete or predict all possible problems. However, other engineering disciplines (such as civil and aeronau-tical engineering) have benefited enor-mously from simulation, and compa-rable benefits could be achieved for software engineering.

How can coalitions of systems be mon-itored? And what are the warning signs problems produce? In the run-up to the Flash Crash, no warning signs indicat-ed the market was tending toward an unstable state. To help avoid transition to an unstable system state, systems engineers need to know the indicators that provide information about the state of the coalition of systems, how they may be used to provide both early warnings of system problems, and, if necessary, switch to safe-mode operat-ing conditions that prevent the possi-bility of damage.

How can systems be designed to re-cover from failure? A fundamental prin-ciple of software engineering is that systems should be built so they do not fail, leading to development of meth-ods and tools based on fault avoidance, fault detection, and fault tolerance. However, as coalitions of systems are constructed with independently man-aged elements and negotiated require-ments, avoiding failure is increasingly impractical. Indeed, what seems to be

a failure for some users may have no ef-fect on others. Because some failures are ambiguous, automated systems cannot cope on their own. Human op-erators must use information from the system, intervening to enable it to re-cover from failure. This means under-standing the socio-technical processes of failure recovery, the support the operators need, and how to design co-alition members to be “good citizens” able to support failure recovery.

How can socio-technical factors be integrated into systems and software-engineering methods? Software- and systems-engineering methods support development of technical systems and generally consider human, social, and organizational issues to be outside the system’s boundary. However, such nontechnical factors significantly af-fect development, integration, and operation of coalitions of systems. Though a considerable body of work covers socio-technical systems, it has not been industrialized or made acces-sible to practitioners. Baxter and Som-merville2 surveyed this work and pro-posed a route to industrial-scale use of socio-technical methods. However, much more research and experience is required before socio-technical analy-ses are used routinely for complex sys-tems engineering.

To what extent can coalitions of sys-tems be self-managing? Needed is re-search into self-management so sys-tems are able to detect changes in both their operation and operational envi-ronment and dynamically reconfigure themselves to cope with the changes. The danger is that reconfiguration will create further complexity, so a key re-quirement is for the techniques to op-erate in a safe, predictable, auditable way ensuring self-management does not conflict with “design for recovery.”

How can organizations manage com-plex, dynamically changing system con-figurations? Coalitions of systems will be constructed through orchestration and configuration, and desired system configurations will change dynami-cally in response to load, indicators of system health, unavailability of components, and system-health warn-ings. Ways are needed to support con-struction by configuration, managing configuration changes and recording changes, including automated chang-

Page 6: B5   15 - large-scale complex it systems

76 CommuniCations oF the aCm | juLy 2012 | voL. 55 | no. 7

contributed articles

es from the self-management system, in real time, so an audit trail includes the configuration of the coalition at any point in time.

How should the agile engineering of coalitions of systems be supported? The business environment changes quickly in response to economic circumstanc-es, competition, and business reorga-nization. Likewise, coalitions of sys-tems must be able to change quickly to reflect current business needs. A model of system change that relies on lengthy processes of requirements analysis and approval does not work. Agile methods of programming have been success-ful for small- to medium-size systems where the dominant activity is systems development. For large complex sys-tems, development processes are often dominated by coordination activities involving multiple stakeholders and engineers who are not colocated. How can agile approaches be effective for “systems development in the large” to support multi-organization global sys-tems development?

How should coalitions of systems be regulated and certified? Many such coalitions represent critical systems, failure of which could threaten individ-uals, organizations, and national econ-omies. They may have to be certified by regulators checking that, as far as pos-sible, they do not pose a threat to their operators or to the wider systems en-vironment. But certification is expen-

sive. For some safety-critical systems, the cost of certification can exceed the cost of development, and certification costs will increase as systems become larger and more complex. Though cer-tification as practiced today is almost certainly impossible for coalitions of systems, research is urgently needed into incremental and evolutionary cer-tification so our ability to deploy criti-cal complex systems is not curtailed by certification requirements. This issue is social, as well as technical, as societ-ies decide what level of certification is socially and legally acceptable.

How can systems undergo “probabilis-tic verification”? Today’s techniques of system testing and more formal analy-sis are based on the assumption that a system involves a definitive specifica-tion and that behavior deviating from it is recognized. Coalitions of systems have no such specification nor is sys-tem behavior guaranteed to be deter-ministic. The key verification issue will not be whether the system is correct but the probability that it satisfies es-sential properties (such as safety) that take into account its probabilistic, real-time, nondeterministic behavior.8,11

How should shared knowledge in a coalition of systems be represented? We assume the systems in a coalition in-teract through service interfaces so the system has no overarching con-troller. Information is encoded in a standards-based representation. The

key problem will not be compatibility but understanding what the informa-tion exchange really means. This is ad-dressed today on a system-by-system basis through negotiation between system owners to clarify the meaning of shared information. However, if dynamic coalitions are allowed, with systems entering and leaving the coali-tion, negotiation is not practical. The key is developing a means of sharing the meaning of information, perhaps through ontologies like those pro-posed by Antoniou and van Harmelen1 involving the semantic Web.

A major problem researchers must address is lack of knowledge of what happens in real systems. High-profile failures (such as the Flash Crash) lead to inquiries, but more is needed about the practical difficulties faced by de-velopers and operators of coalitions of systems and how to address them as they arise. New ideas, tools, and methods must be supported by long-term empirical studies of the systems and their development processes to provide a solid information base for re-search and innovation.

The U.K. LSCITS Initiative5 address-es some of them, working with partners from the computer, financial services, and health-care industries to develop an understanding of the fundamental systems engineering problems they face. Key to this work is a long-term en-gagement with the National Health In-formation Center to create coalitions of systems to provide external access to and analysis of vast amounts of health and patient data.

The project is developing practical techniques of socio-technical systems engineering2 and exploring design for failure.18 It has so far developed prac-tical, predictable techniques for au-tonomic system management3,4 and is investigating the scaling up of agile methods14 and exploring incremental system certification9 and development of techniques for system simulation and modeling.

Education. To address the practical issues of creating, managing, and op-erating LSCITS, engineers need knowl-edge and understanding of the systems and with techniques outside a “nor-mal” software- or systems-engineering education. In the U.K., the LSCITS Ini-tiative provides a new kind of doctoral

Figure 2. outline structure for master’s course in LsCits.

Systems Engineering Business

ultra-large-scale systems

Complexity

Mathematical modeling

Socio-technical systems

options from computer science, engineering, psychology, business

industrial project

Systems engineering

Systems procurement

Systems integration

Systems resilience

organizational change

Legal and regulatory issues

Technology innovation

Program management

Page 7: B5   15 - large-scale complex it systems

contributed articles

juLy 2012 | voL. 55 | no. 7 | CommuniCations oF the aCm 77

degree, comparable in standard to a Ph.D. in computer science or engi-neering. Students get an engineering doctorate, or EngD, in LSCITS,20 with the following key differences between EngD and Ph.D.:

Industrial problems. Students must work on and spend significant time on an industrial problem. Universities cannot simply replicate the complex-ity of modern software-intensive sys-tems, with few faculty members hav-ing experience and understanding of the systems;

Range of courses. Students must take a range of courses focusing on com-plexity and systems engineering (such as for LSCITS, socio-technical systems, high-integrity systems engineering, empirical methods, and technology in-novation); and

Portfolio of work. Students do not have to deliver a conventional thesis, a book on a single topic, but can deliver a portfolio of work around their select-ed area; it is a better reflection of work in industry and makes it easier for the topic to evolve as systems change and new research emerges.

However, graduating a few advanced doctoral students is not enough. Uni-versities and industry must also create master’s courses that educate com-plex-systems engineers for the coming decades; our thoughts on what might be covered are outlined in Figure 2. The courses must be multidisciplinary, combining engineering and business disciplines. It is not only the knowl-edge the disciplines bring that is im-portant but also that students be sen-sitized to the perspectives of a variety of disciplines and so move beyond the silo of single-discipline thinking.

Conclusion Since the emergence of widespread networking in the 1990s, all societies have grown increasingly dependent on complex software-intensive systems, with failure having profound social and economic consequences. Indus-trial organizations and government agencies alike build these systems without understanding how to analyze their behavior and without appropriate engineering principles to support their construction.

The SEI ULSS report14 argued that current engineering methods are inad-

equate, saying: “For 40 years, we have embraced the traditional engineering perspective. The basic premise under-lying the research agenda presented in this document is that beyond certain complexity thresholds, a traditional centralized engineering perspective is no longer adequate nor can it be the primary means by which ultra-complex systems are made real.” A key contri-bution of our work in LSCITS is articu-lating the fundamental reasons this assertion is true. By examining how en-gineering has a basis in the philosophi-cal notion of reductionism and how reductionism breaks down in the face of complexity, it is inevitable that tradi-tional software-engineering methods will fail when used to develop LSCITS. Current software engineering is simply not good enough. We need to think dif-ferently to address the urgent need for new engineering approaches to help construct large-scale complex coali-tions of systems we can trust.

acknowledgments We would like to thank our colleagues Gordon Baxter and John Rooksby of St. Andrews University in Scotland and Hillary Sillitto of Thales Land & Joint Systems U.K. for their constructive comments on drafts of this article. The work report here was partially funded by the U.K. Engineering and Physical Science Research Council (www.epsrc.ac.uk) grant EP/F001096/1.

References 1. antoniou, g. and van harmelen, F. A Semantic Web

Primer, Second Edition. Mit Press, Cambridge, Ma, 2008.

2. baxter, g. and sommerville, i. socio-technical systems: From design methods to systems engineering. Interacting with Computers 23, 1 (Jan. 2011), 4–17.

3. Calinescu, r., grunske, l., Kwiatkowska, M., Mirandola, r., and tamburrelli, g. dynamic Qos management and optimisation in service-based systems. IEEE Transactions on Software Engineering 37, 3 (Mar. 2011), 387–409.

4. Calinescu, r. and Kwiatkowska, M. using quantitative analysis to implement autonomic it systems. in Proceedings of the 31st International Conference on Software Engineering (vancouver, May). ieee Computer society Press, los alamitos, Ca, 2009, 100–110.

5. Cliff, d., Calinescu, r., Keen, J., Kelly, t., Kwiatkowska, M., Mcdermid, J., Paige, r., and sommerville, i. The U.K. Large-Scale Complex IT Systems Initiative 2010; http://lscits.cs.bris.ac.uk/docs/lscits_overview_2010.pdf

6. Cliff, d. and northrop, l. The Global Financial Markets: An Ultra-Large-Scale Systems Perspective. briefing paper for the u.K. government office for science Foresight Project on the Future of Computer trading in the Financial Markets, 2011; http://www.bis.gov.uk/assets/bispartners/foresight/docs/computer-trading/11-1223-dr4-global-financial-markets-systems-perspective.pdf

7. Commodity Futures trading Commission and securities and exchange Commission (u.s.). Findings

Regarding the Market Events of May 6th, 2010. report of the CFtC and seC to the Joint advisory Committee on emerging regulatory issues, 2010; http://www.sec.gov/news/studies/2010/marketevents-report.pdf

8. ge, x., Paige, r.F., and Mcdermid, J.a. analyzing system failure behaviors with PrisM. in Proceedings of the Fourth IEEE International Conference on Secure Software Integration and Reliability Improvement Companion (singapore, June). ieee Computer society Press, los alamitos, Ca, 2010, 130–136.

9. ge, x., Paige, r.F., and Mcdermid, J.a. an iterative approach for development of safety-critical software and safety arguments. in Proceedings of Agile 2010 (orlando, Fl, aug.). ieee Computer society Press, los alamitos, Ca, 2010, 35–43.

10. goth, g. ultralarge systems: redefining software engineering. IEEE Software 25, 3 (May 2008), 91–94.

11. Kwiatkowska, M., norman, g., and Parker d. PrisM: Probabilistic model checking for performance and reliability analysis. ACM SIGMETRICS Performance Evaluation Review 36, 4 (oct. 2009), 40–45.

12. Maier, M.W. architecting principles for system of systems. Systems Engineering 1, 4 (oct. 1998), 267–284.

13. northrop, l. et al. Ultra-Large-Scale Systems: The Software Challenge of the Future. technical report. Carnegie Mellon university software engineering institute, Pittsburgh, Pa, 2006; http://www.sei.cmu.edu/library/abstracts/books/0978695607.cfm

14. Paige, r.F., Charalambous, r., ge, x., and brooke, P.J. towards agile development of high-integrity systems. in Proceedings of the 27th International Conference on Computer Safety, Reliability, and Security (newcastle, u.K., sept.) springer-verlag, heidelberg, 2008, 30–43.

15. rittel, h. and Webber, M. dilemmas in a general theory of planning. Policy Sciences 4 (oct. 1973), 155–73.

16. rushby, J. software verification and system assurance. in Proceedings of the Seventh IEEE International Conference on Software Engineering and Formal Methods (hanoi, nov.). ieee Computer society Press, los alamitos, Ca, 2009, 1–9.

17. sillitto, h.t. in Proceedings of the 20th International Council for Systems Engineering International Symposium (Chicago, July). Curran & associates, inc., red hook, ny, 2010.

18. sommerville, i. Designing for Recovery: New Challenges for Large-scale Complex IT Systems. Keynote address, eighth ieee Conference on Composition-based software systems (Madrid, Feb. 2008); http://sites.google.com/site/iansommerville/keynote-talks/designingForrecovery.pdf

19. u.K. Cabinet office. Programme Assessment Review of the National Programme for IT. Major Projects authority, london, 2011; http://www.cabinetoffice.gov.uk/resource-library/review-department-health-national-programme-it

20. university of york. the lsCits engineering doctorate Centre, york, england, 2009; http://www.cs.york.ac.uk/engd/

Ian Sommerville ([email protected]) is a professor in the school of Computer science, st. andrews university, scotland.

Dave Cliff ([email protected]) is a professor in the department of Computer science, bristol university, england.

Radu Calinescu ([email protected]) is a lecturer in the department of Computer science, aston university, england.

Justin Keen ([email protected]) is a professor in the school of health informatics, leeds university, england.

Tim Kelly ([email protected]) is a senior lecturer in the department of Computer science, york university, england.

Marta Kwiatkowska ([email protected]) is a professor in the department of Computer science, oxford university england.

John McDermid ([email protected]) is a professor in the department of Computer science, york university, england.

Richard Paige ([email protected]) is a professor in the department of Computer science, york university, england.

© 2012 aCM 0001-0782/12/07 $15.00