towards a configuration specification language for internet systems archana ganapathi

Post on 18-Jan-2018

225 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Recap: Service Failure Cause OnlineContent Total: 61 failures in 12 months Total: 56 failures in 3 months [Failure Analysis of Two Internet Services - Winter 2003 ROC Research Group Retreat, Granlibakken, CA, January 2003.]

TRANSCRIPT

Towards A Configuration Specification Language for Internet Systems

Archana Ganapathi(archanag@cs.berkeley.edu)

Motivation – Internet Services Failures impact

availability End user satisfaction Economic

repercussions

Predominant causes Human operator Software

[Oppenheimer et al. Architecture, operation, and dependability of large-scale Internet services: three case studies. IEEE Internet Computing special issue on Global Deployment of Data Centers, September/October 2002.]

Recap: Service Failure Cause

Hardware26%

Software28%

Unknown11%

Operator35%

Hardware4%Software

26%

Unknown32% Operator

38%

Online Content

Total: 61 failures in 12 months

Total: 56 failures in 3 months

[Failure Analysis of Two Internet Services - Winter 2003 ROC Research Group Retreat, Granlibakken, CA, January 2003.]

Case Study of Mis-configurations

~25 problems from Online & Content Errors in component-specific configuration Multi-component configuration

inconsistency Non-configuration failure solvable by

reconfiguration?

Configuration Scenarios

Never intendedUnacceptable behavior

Anticipated and testedProblems with solutions (e.g. recovery code)

Anticipated but not testedRare occurrence, high cost of testing

Never anticipatedNew/evolving environments/interactions

Configuration ToolsApple NetinstallBCFGBCONFIGBigFixCfengineEDG Fabric ManagementGrid WeaverHP Utility DataCentreISconfJumpstart/KickstartLCFGMicrosoft SMSNetcoolNovadigm RadiaNPACI Rocks

PsgconfQuattorRadmindREMBORdistRPMRsyncSmartFrogSUESystem ImagerSysTrackerTivoliUnisonXhierZenworks

Configuration Languages:Windows Registry:[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\10.0\Word\

InstallRoot]"Path"="C:\\Program Files\\Microsoft Office\\Office10\\“

Shell Script: if (! $?YPDOMAIN && -r $LOGHOME/.domainname) then

setenv YPDOMAIN `cat $LOGHOME/.domainname` if ("$YPDOMAIN" == "") unsetenv YPDOMAIN

endif

XML:<server>

<server-name>oski</server-name> <num-connections>3</num-connections>

</server>

Configuration Needs

Account for Human Component Dynamic Monitoring of System

Functionality Authenticate Privacy and Integrity Programmatic Manipulation of

Configuration Data Domain Independence

Configuration Needs contd.

User Intent rather than Low Level Assembly Language

Intra-Configuration Constraints (Consistency)

Inter-Configuration Constraints (Conformity)

Formalization and Automatic Derivation

Desired Language Features

Descriptive: Capture inter- and intra- component interactions

User intent and assertions for proper behavior Expressions for failure models & recovery code temporal event relationships

Prescriptive: recovery mechanisms for anticipated events “Software TDR”

“Learning” Model

Internet System

ResponseService Requests

ConfigurationGenerator

System spec

Services specEvent

logs

Error ModelsConfiguration

Files/Software

Operator modifications

LISA Framework Formal models for configurations in IS

Recovery handlers Assertions & consistency checking Coverage/utilization

Uncover pitfalls in configuration APIs Dependence analysis Conformity checks

Use LISA verification modules to authenticate changes

LISA Statement Structure

pre_condition ==> rule_bodyPre_conditions = temporal sequences. Rule_body = action handlers invoked upon matching pattern

Example:pre-condition:

“A->B: ping” is not followed by “B->A:‘I’m alive’” within 5 sec

rule body: A should time out and try C instead.

Language Features

• IS events and transactions • specify event order and transactions• temporal sequences with references to past

and future• logic connectives (and, or, not operators)• repetition, concatenation and overlap of

sequences• sequence vs con-sequence

LISA syntax LISA_Statement ::= Assertion Action

Action ::= ==> {<ok message>, <recovery code>} | ε

Assertion ::= assert Property @ ISA_clk ;

Property ::= Sequential_Expression | Logical_Expression | Temporal_Operation

LISA OperatorsLogical: and(&), or(|), not(~)

Sequential: concatenation(;), overlap(:)

Implication:-> -- logical if or sequential implication<-> -- logical iff implication=> -- temporal ‘next’ implication

Extended Regular Expressions* -- 0 or more repetition+ -- 1 or more repetition? -- optional[] -- count qualifier

LISA SemanticsSemantics defined by model represented by triple <A,F,S>.

A is a non-empty set of atomic propositions. S is a finite set of states. F is a function that maps each state from S to the alphabet 2A,

with a set of valid atomic propositions.F:S → 2A

f |═ b Boolean expression b holds under truth assignment represented by f

f |═ b <═> b ε f f |═ ¬b <═> f |≠ b f |═ b1 & b2 <═> f |═ b1 and f |═ b2 f |═ b1 | b2 <═> f |═ b1 or f |═ b2

Examples If a is True intermittently or continuously for 3

ISA_cycles then after that b must be True within 4 ISA_cycles, unless c happened in the meantime.assert always (a[1..3]) => b[1..4] | c) @ISA_clk

Byzantine fault tolerance, checking if n > 3f always holds [Castro & Liskov]assert always (up_nodes > 3*const_f)

Examples contd. Network property to guarantee “free of routing loops”: at most one

entry in table, count less than number of nodes in network.assert always {(seqa < seqb) - (seqa = seqb ^ hop_a > hop_b)}

Perfect failure detector protocol for completely synchronous systems [Fetzer]; to verify the status of a system component c, a configuration process asserts function ISA_f(c) == “up”.function ISA_f (component c) {

send ping to c; wait on receive pong from c return “up”; after 2*τ return “crashed”;

}always (on receive ping from sender send pong to sender);

LISA to VerilogIS-dictation:

Within 1 to 3 ISA_cycles after ISA_event ping occurs, ISA_event pong must occur assert always {~ping; ping} -> {~pong[1..3]; pong} @(ISA_clk)

Verilog program (hand-written; non state-machine model)always @(ping) begin repeat (1) @(ISA_clk); fork: P begin

@(pong); $display($time,,"Computer up"); disable P;

end begin

repeat (2) @(ISA_clk); $display($time,,"Computer crashed"); disable P;

end join end

Consider ISA_clock = 2*τ τ ping = 0 pong = 0 3*τ ping = 1 pong = 0 5*τ ping = 0 pong = 1 7*τ ping = 1 pong = 1 *** assertion failure 5*τ ► 7*τ

9*τ ping = 0 pong = 0 11*τ ping = 1 pong = 0 13*τ ping = 0 pong = 1 15*τ ping = 1 pong = 017*τ ping = 0 pong = 0 19*τ ping = 0 pong = 0 21*τ ping = 0 pong = 023*τ ping = 0 pong = 0 *** assertion failure 13*τ ► 21*τ

Deployment Run-time

LISA Future Work

Implement LISA to Verilog compiler Implement Internet Service event monitor

with simulated events (anticipatory event sequences)

Incorporate dynamic “learning” phase Deploy at actual Internet Service sites.

Need Data….Please Help

What configuration tasks are regularly performed and whyGood/bad “event sequences”

Types and impact of configuration failures Desired language features for system

configuration

top related