bran selic rational software canada [email protected] physical programming: beyond mere logic
Post on 20-Dec-2015
219 views
TRANSCRIPT
Bran SelicBran SelicRational Software CanadaRational Software Canada
[email protected]@rational.com
Physical Programming: Beyond Mere Logic
Physical Programming: Beyond Mere Logic
2
What I am Hoping ForWhat I am Hoping For
E TTHEORYHEORY
AANDND
PPRACTICERACTICE OF OF
SSOFTWAREOFTWARE
3
The Ideal and the RealThe Ideal and the Real
By focussing on the imperfect world of physical reality we may miss the essence
Software seems much closer to the “ideal” world
4
The Software World The Software World Fundamental design principle: separate program logic from the
underlying implementation technology separation of concerns software portability
Program LogicProgram Logic
Computing Environment Computing Environment & Technology& Technology
HL ProgrammingHL ProgrammingLanguagesLanguages
HL ProgrammingHL ProgrammingLanguagesLanguages
5
The Real-Time Software World The Real-Time Software World Key question: How long will it take? The quantitative characteristics of the computing environment
encroach upon the purity of the logic software design involves engineering tradeoffs
HL ProgrammingHL ProgrammingLanguagesLanguages
Program LogicProgram Logic
Computing Environment Computing Environment & Technology& Technology
6
A Simple Programming ApplicationA Simple Programming Application Traverse a transactions log database and print all
transactions pertaining to a specific account
open (DB);for i := 1 to DB.size do
record := read (DB);if (record.acctNo = myAccount)then
print (record);enddo;
close (DB);
DBPrinterPrinter CPUCPU
7
open (DB);for i := 1 to DB.size do
record := read (DB);if (record.acctNo = myAccount)then
print (record);enddo;
close (DB);
Porting to a Distributed EnvironmentPorting to a Distributed Environment Can it really be this simple?
RPC_open (DB);for i := 1 to DB.size do
record := RPC_read (DB);if (record.acctNo =
myAccount)thenprint (record);
enddo;RPC_close (DB);
PrinterPrinter CPUCPU DBCPUCPU
NetworkNetwork
DBCPUCPU
Replicated DBservers
8
Some (Unstated!) AssumptionsSome (Unstated!) Assumptions The CPU and database are fast enough for the needs of
the application e.g. random access database hardware
The CPU and database fail as a unit i.e., no need to contend with failures of the database
Communications is reliable order preserving exactly once semantics
A system never has anything more important to do than what it is doing at the moment
9
Partial FailuresPartial Failures Distributed systems can exhibit partial failures
fault tolerance: ability to recover from partial failures Issue: failure recovery strategy
fault detection failure recovery fault diagnosis
Issue: how do other sites detect that a site has failed? (apparent) lack of activity/response how do we distinguish between a failed site and a lost
message?• Timeout is the only general mechanism available
how long do we wait?• Tradeoff between responsiveness vs. degree of certainty
10
A More Realistic Distribution ScenarioA More Realistic Distribution Scenario Dealing with partial failures
DB := locate_database (Network)exception abort; RPC_open (DB)exception do DB := locate_database (Network)exception abort; enddo; for i := 1 to DB.size do record := RPC_read (DB)exception do DB := locate_database (Network)exception abort; for j := 1 to (i-1) do
RPC_read (DB) exception abort; retry; enddo;
if (record.acctNo = myAccount)then print (record); enddo; RPC_close (DB);
Most of the code is in the Most of the code is in the exception handlers!exception handlers!
11
Asynchronous Events and Fault ToleranceAsynchronous Events and Fault Tolerance Partial system failures are only one kind of event that
may need to be handled in the course of execution of a distributed program
Others: high-priority situations (e.g., imminent deadlines) aborts
These events are often unpredictable may occur at any point in the execution of a program fault tolerance requires that whenever they occur and
whatever they are, we need to deal with them
12
Step NStep N
Step N+1Step N+1
Step N+2Step N+2
Revisiting An Old AssumptionRevisiting An Old Assumption Is the traditional “main path” focussed programming style
appropriate when exceptions are the rule?
Exception!Exception! Handler AHandler ANN
Handler BHandler B
Handler AHandler AN+1N+1Exception!Exception!
13
Asynchronous Event HandlingAsynchronous Event Handling This is nicely captured by the state-event matrix of finite state
machines
Step NStep N
Step N+1Step N+1
Step N+2Step N+2
Event AEvent A
Handler AHandler ANN
Handler AHandler AN+1N+1
Handler AHandler AN+2N+2
Event BEvent B
Handler BHandler B
etc.etc.Event SEvent S
14
A ConclusionA Conclusion In an event-driven and deadline-based application, a
state machine-based programming model may be more appropriate than the traditional algorithmic (“main path”) programming model
The environment strikes back the program logic is strongly affected by the environment
15
Communication Media FailuresCommunication Media Failures
Message loss due to hardware failures due to software failures (e.g., buffer overflow)
Message reordering due to different paths due to variable delays (e.g., due to variable message lengths) retransmission due to fault-tolerant protocols
Message duplication due to faulty hardware retransmission due to fault-tolerant protocols
16
Processing Site Processing Site
observeron offoffon
State?State?
“on”“on”
“on”“on”
Transmission DelaysTransmission Delays Possibility of out of date status information
17
clientAclientA notifier1notifier1 notifier2notifier2 clientBclientB
timetime
E1E1
E2
E2
Relativistic EffectsRelativistic Effects Relativistic effects:
different observers see different event orderings (due to different and variable transmission delays)
18
Processing SiteProcessing Site
Communications MediumCommunications Medium
Processing SiteProcessing Site
Distribution TransparenciesDistribution Transparencies Providing supporting layers of functionality that shield the
application from the undesirable effects of distribution e.g., reliable communication protocols
client server
Reliable Comm Service
Reliable Comm Service
19
Impossibility Result No.1Impossibility Result No.1
It is not possible to guarantee that agreement can be reached in finite time over an asynchronous communication medium, if the medium is lossy or one of the distributed sites can fail
Fischer, M., N. Lynch, and M. Paterson, “Impossibility of Distributed Consensus with One Faulty Process” Journal of the ACM, (32, 2) April 1985.
20
Impossibility Result No.2Impossibility Result No.2
Even when communication is fully reliable, it is not possible to guarantee common knowledge if communication delays are unbounded
Halpern, J.Y, and Moses, Y., “Knowledge and common knowledge in a distributed environment” Journal of the ACM, (37, 3) 1990.
21
The “End-To-End” ArgumentThe “End-To-End” Argument
Transparency mechanisms are intended to protect the application from observing the undesirable effects of distribution Most transparency types require distributed agreement!
The end-to-end argument [Saltzer et al.]: if transparency cannot be guaranteed, the application is not
really shielded from the effects of distribution the overhead of introducing transparency mechanisms may
not be justified
22
Stepping Back...Stepping Back... Most distribution problems are a consequence of the
encroachment of the physical world into the pliable and limitless “logical” world of software the problem is fundamental (e.g., the end-to-end argument)
Traditional Programming = Logic Physical Programming = Logic + Physics
like traditional engineers, software designers must take into account the raw material out of which they spin their logic
finite resources, finite delays, finite reliability...
23
Quality of Service Concepts Quality of Service Concepts
The physical characteristics of software can be specified using the general notion of Quality of Service (QoS):
a specification of how well a service is (to be) performed e.g. throughput, capacity, response time usually a quantitative measure
QoS specifications are two sided: offered QoS: the QoS that is offered to clients required QoS: the QoS required by a client
24
Resources and Quality of ServiceResources and Quality of Service Resource: an element whose functional capacity is
limited, directly or indirectly, by the finite capacities of the underlying physical computing environment
The services of a resource are characterized by one or more QoS attributes capacity, reliability, availability, response time, etc.
ClientClientClientClientResource DemandResource Demand
S1S1
S1S1
ResourceResourceResourceResource
{RequiredQoS OfferedQoS}{RequiredQoS OfferedQoS}
RequiredQoSRequiredQoS OfferedQoSOfferedQoS
25
Simple ExampleSimple Example Concurrent tasks accessing a monitor with known response time
characteristics
access ( )
Client2
access ( )
Client1
{Deadline = 5 ms}
Required QoSRequired QoS
{Deadline = 3 ms}
myMonitor
Offered QoSOffered QoS
{MaxExecutionTime = 4 ms}
26
Types and “Physical” TypesTypes and “Physical” Types
The purpose of types is to tell us about the externally
relevant properties of software components so that we
can validate whether they are being used appropriately
Physical types: type specifications that incorporate QoS
characteristics
Answer two key engineering questions: can this component support the “load” intended for it?
what does this component require to support its offered QoS?
27
Physical Type ExamplePhysical Type Example A semaphore type:
class Semaphore {
{heap= 10 bytes} -- required QoS
{CPU 5 MIPS} -- required QoS
get(){proc 0.4*CPU us;stack=4 bytes};rel(){proc 0.4*CPU us;stack=4 bytes};
}
Usage:mySema : Semaphore;
mySema.get() {proc 3 us} -- req. QoS
28
Violation of Encapsulation?Violation of Encapsulation?
Aren’t the offered QoS characteristics a consequence of
the implementation?
Not necessarily...
The offered QoS characteristics can and should be
defined independently of the implementation the “worst-case” numbers of traditional engineering
The contractual obligations that the component designer
is willing to assume
29
Physical Type CheckingPhysical Type Checking Can physical types be statically checked?
The good news: Yes, they can (in most cases) The bad news: typically requires complex analysis methods (queueing network analysis, schedulability analysis, etc.) but then, model checking and theorem proving is not simple either
Some issues: Typically, QoS-based analyses cannot be done incrementally -- the full system context is required but then, the same holds for many formal verification methods Each type of QoS (e.g., bandwidth, CPU performance) combines differently
30
Required QoSRequired QoS Like all guarantees, the offered QoS is contingent on the
component getting what it needs to do its job There are two distinct dimensions to this:
the peer dimension the layering dimension
ClientClientS1S1
S1S1
ResourceAResourceAS2S2
S2S2
ResourceBResourceBResourceBResourceB
CPUCPU
CPU
Physical ProcessorPhysical Processor
31
Logical ViewpointLogical Viewpoint Example: logical view of aircraft simulator software
INSTRUCTORINSTRUCTORSTATIONSTATION
INSTRUCTORINSTRUCTORSTATIONSTATION
AIRFRAMEAIRFRAMEAIRFRAMEAIRFRAME
GROUNDGROUNDMODELMODEL
GROUNDGROUNDMODELMODEL
ATMOSPHEREATMOSPHEREMODELMODEL
ATMOSPHEREATMOSPHEREMODELMODEL
ENGINESENGINESENGINESENGINES
CONTROLCONTROLSURFACESSURFACES
CONTROLCONTROLSURFACESSURFACES PILOTPILOT
CONTROLSCONTROLS
PILOTPILOTCONTROLSCONTROLS
32
Engineering (Realization) ViewpointEngineering (Realization) Viewpoint The realization of a specific set of logical components using facilities of the run-time environment
ProcessorProcessorEthernet LANEthernet LAN
ProcessorProcessor
OSOS process processOSOS process process
stackstack
OSOS process processOSOS process process
stackstack
OSOS process processOSOS process process
stackstack
TCP/IP socketTCP/IP socket
TCP/IP socketTCP/IP socket
33
Engineering ViewpointEngineering ViewpointEngineering ViewpointEngineering Viewpoint
ProcessorProcessorEthernet LANEthernet LAN
ProcessorProcessor
OSOS process processOSOS process process
stackstack
OSOS process processOSOS process process
stackstack
OSOS process processOSOS process process
stackstack
TCP/IP socketTCP/IP socket
TCP/IP socketTCP/IP socket
Viewpoints and MappingsViewpoints and MappingsLogical ViewpointLogical ViewpointLogical ViewpointLogical Viewpoint
INSTRUCTORINSTRUCTORSTATIONSTATION
INSTRUCTORINSTRUCTORSTATIONSTATION
AIRFRAMEAIRFRAMEAIRFRAMEAIRFRAME
GROUNDGROUNDMODELMODEL
GROUNDGROUNDMODELMODEL
ATMOSPHEREATMOSPHEREMODELMODEL
ATMOSPHEREATMOSPHEREMODELMODEL
ENGINESENGINESENGINESENGINES
CONTROLCONTROLSURFACESSURFACES
CONTROLCONTROLSURFACESSURFACES PILOTPILOT
CONTROLSCONTROLS
PILOTPILOTCONTROLSCONTROLS
RealizationmappingsRealizationmappings
34
The Engineering ViewpointThe Engineering Viewpoint
The engineering viewpoint represents the “raw material”
out of which we construct the logical viewpoint the quality of the outcome is only as good as the quality of the
ingredients that are put in
as in all true engineering, the quantitative aspects of the logical
model are often crucial (How long will it take? How much will
be required?…)
35
Distributed Systems DilemmaDistributed Systems Dilemma
Dilemma: How can we account for the engineering
characteristics of the system without prematurely and
possibly unnecessarily committing to a specific
technology?
Proposed solution: Include in the logical model a generic
(technology-neutral) specification of the
required/expected characteristics of the engineering
environment
36
Viewpoint SeparationViewpoint Separation Required Environment: a technology-neutral environment
specification required by the logical elements of a modelLogical ViewpointLogical ViewpointLogical ViewpointLogical Viewpoint
Required EnvironmentRequired EnvironmentRequired EnvironmentRequired Environment
UNIXUNIXProcessProcess
UNIXUNIXProcessProcess
Engineering Viewpoint (alternative A)Engineering Viewpoint (alternative A)Engineering Viewpoint (alternative A)Engineering Viewpoint (alternative A)
UNIXUNIXProcessProcess
UNIXUNIXProcessProcess
WinNTWinNTProcessProcessWinNTWinNT
ProcessProcess
Engineering Viewpoint (alternative B)Engineering Viewpoint (alternative B)Engineering Viewpoint (alternative B)Engineering Viewpoint (alternative B)
WinNTWinNTProcessProcessWinNTWinNT
ProcessProcess
37
Required Environment SpecificationsRequired Environment Specifications What a logical component needs in order to perform its
function according to spec
20MB20MB 3MIPs3MIPs 100Mbit/s100Mbit/s offered QoS valuesoffered QoS valuesoffered QoS valuesoffered QoS values
CPUCPUCPUCPU LANLANLANLANengineering engineering element element (resource)(resource)
engineering engineering element element (resource)(resource)
CPU :CPU :3 MIPs3 MIPs
Bandw. : Bandw. : 70Mbit/s70Mbit/s
Mem :Mem :2MB2MB required QoS valuesrequired QoS valuesrequired QoS valuesrequired QoS values
AirframeAirframeAirframeAirframe logical element (client)logical element (client)logical element (client)logical element (client)
realization mappingrealization mappingrealization mappingrealization mapping
38
Required Environment PartitionsRequired Environment Partitions
INSTRUCTORINSTRUCTORSTATIONSTATION
INSTRUCTORINSTRUCTORSTATIONSTATION
AIRFRAMEAIRFRAMEAIRFRAMEAIRFRAME
GROUNDGROUNDMODELMODEL
GROUNDGROUNDMODELMODEL
ATMOSPHEREATMOSPHEREMODELMODEL
ATMOSPHEREATMOSPHEREMODELMODEL
ENGINESENGINESENGINESENGINES
CONTROLCONTROLSURFACESSURFACES
CONTROLCONTROLSURFACESSURFACES PILOTPILOT
CONTROLSCONTROLS
PILOTPILOTCONTROLSCONTROLS
Logical elements often share common QoS requirements
QoS domain(e.g.,failure unit,uniform comm properties)
QoS domain(e.g.,failure unit,uniform comm properties)
39
QoS DomainsQoS Domains Specify a domain in which certain QoS values apply
throughout: failure characteristics (failure modes, availability, reliability) CPU speeds communications characteristics (delay, throughput, capacity) etc.
The QoS values of a domain can be compared against those of a concrete engineering environment to see if a given environment is adequate for a specific model
40
“Physical” Programming“Physical” Programming The notions of QoS and QoS domains enable the design
of distributed systems that properly account for the effects of distribution and other non-transparent physical phenomena, while allowing for a high degree of portability and technology independence
They are also the basis for formal verification of realization mappings{required QoS QoS of the proposed engineering environment}
May also be used to automatically synthesize engineering environments that satisfy a given QoS specification of a logical model
41
Conclusions and an Appeal...Conclusions and an Appeal... The physical aspects of software will not go away
ignoring them can be perilous especially when working with distributed systems
most interesting software systems of the future will be distributed and will have stringent dependability requirements (“cannot reboot the Internet”)
What is needed is a proper theoretical framework for dealing with physical types
The QoS framework described here is currently being incorporated into a profile of UML for real-time applications