cloud computing and architecture architectural tactics (tonight’s guest star: availability)

18
Cloud Computing and Architecture Architectural Tactics (Tonight’s guest star: Availability)

Upload: raymond-cooper

Post on 19-Jan-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cloud Computing and Architecture Architectural Tactics (Tonight’s guest star: Availability)

Cloud Computing andArchitecture

Architectural Tactics

(Tonight’s guest star: Availability)

Page 2: Cloud Computing and Architecture Architectural Tactics (Tonight’s guest star: Availability)

Quality framework (Bass et al.)

• Central quality attributes– Availability– Interoperability– Modifiability– Performance– Security– Testability– Usability

• Other qualities– Portability– Scalability– Variability– Flexibility

– Cost– Time to market

– …

Strongly recommended

reading!

Page 3: Cloud Computing and Architecture Architectural Tactics (Tonight’s guest star: Availability)

A Writing Template

3

· Source of stimulus. This is some entity (a human, a computer system, or any other actuator) that generated the stimulus.

· Stimulus. The stimulus is a condition that needs to be considered when it arrives at a system.

· Environment. The stimulus occurs within certain conditions. The system may be in an overload condition or may be running when the stimulus occurs, or some other condition may be true.

· Artifact. Some artifact is stimulated. This may be the whole system or some pieces of it.

· Response. The response is the activity undertaken after the arrival of the stimulus.

· Response measure. When the response occurs, it should be measurable in some fashion so that the requirement can be tested.

Page 4: Cloud Computing and Architecture Architectural Tactics (Tonight’s guest star: Availability)

Example: World of Warcraft

CS@AU Henrik Bærbak Christensen 4

Page 5: Cloud Computing and Architecture Architectural Tactics (Tonight’s guest star: Availability)

Example: SkyCave

Quality attribute AvailabilitySource Internal to the systemStimuli A crashArtifact Database serverEnvironment Normal operationResponse Detects events, record it in log, continues in normal operationResponse Measure Within 3 seconds

CS@AU Henrik Bærbak Christensen 5

Quality attribute PerformanceSource 1000 independent clientsStimuli Generate on average 2 character events per second Artifact SkyCave App serverEnvironment Normal operationResponse Events are processed, cave state is updatedResponse Measure With maximal 5 seconds latency

Page 6: Cloud Computing and Architecture Architectural Tactics (Tonight’s guest star: Availability)

Tactic

• Tactic– A design decision that influences the achievement of a

quality attribute response

• Example of modifiability tactic:– Encapsulate: Introduce explicit interface to module

CS@AU Henrik Bærbak Christensen 6

Page 7: Cloud Computing and Architecture Architectural Tactics (Tonight’s guest star: Availability)

CloudArch Core Focus

Discussion

• If a system is not available, what is the point of all other QAs?

• Security ?– Equals slowness

CS@AU Henrik Bærbak Christensen 7

• System quality attributes– Availability– Modifiability– Performance– Security– Testability– Usability– Interoperability– Scalability

Page 8: Cloud Computing and Architecture Architectural Tactics (Tonight’s guest star: Availability)

Availability

CS@AU Henrik Bærbak Christensen 8

Page 9: Cloud Computing and Architecture Architectural Tactics (Tonight’s guest star: Availability)

Definition(s)

• Availability (1): Property of software that it is there and ready to carry out its task when you need it to be

• Availability (2): Ability of a system to mask or repair faults such that the cumulative service outage period does not exceed a required value over a specified time interval

CS@AU Henrik Bærbak Christensen 9

Nygard Stability (resilience, longevity): Ability to keep processing for a long time even when there are transient impulses, persistent stresses, or component failures

Page 10: Cloud Computing and Architecture Architectural Tactics (Tonight’s guest star: Availability)

Measurements

• MTBF: Mean time between failure• MTTR: Mean time to repair

• But often we talk in percentages!– 99% 3d 15h downtime per year– 99,9% 8h 1m– 99,99% 52m– 99,9999% 32 seconds (!)

CS@AU Henrik Bærbak Christensen 10

Page 11: Cloud Computing and Architecture Architectural Tactics (Tonight’s guest star: Availability)

Tactics

• Lots of techs!

CS@AU Henrik Bærbak Christensen 11

Page 12: Cloud Computing and Architecture Architectural Tactics (Tonight’s guest star: Availability)

Tactics

• Categories– Fault detection– Recovery

• Preparation+Repair• Reintroduction

– Prevention

CS@AU Henrik Bærbak Christensen 12

Page 13: Cloud Computing and Architecture Architectural Tactics (Tonight’s guest star: Availability)

Detection

• Ping-echo

• Monitor Nagios – Zabbix - …

• Exceptions– Time out

CS@AU Henrik Bærbak Christensen 13

Page 14: Cloud Computing and Architecture Architectural Tactics (Tonight’s guest star: Availability)

Recover: Prep and Repair

• Active redundancy Hot standby– All receive and process all events

• Millisecond failover

• Passive redundancy Warm standby– Master-slave

• Minute failover

• Spare Cold standby– ”I think we have an extra machine in the cellar”

CS@AU Henrik Bærbak Christensen 14

Page 15: Cloud Computing and Architecture Architectural Tactics (Tonight’s guest star: Availability)

Recover: Prep and Repair

• Exceptions• Rollback

– Used in DB and [exercise: where else?]– Check pointing

• Retry• Degradation

CS@AU Henrik Bærbak Christensen 15

Which Nygard patterns?

Page 16: Cloud Computing and Architecture Architectural Tactics (Tonight’s guest star: Availability)

Recover: Reintroduction

• Shadow– Run in shadow mode until ‘up-to-speed’

• State Resync– Typical DB behaviour

• Cold slaves must catch up with primary

– EcoSense db war story Stale DB

CS@AU Henrik Bærbak Christensen 16

Page 17: Cloud Computing and Architecture Architectural Tactics (Tonight’s guest star: Availability)

Preventing

• Removal from service– ‘scrubbing’– Use to be that Tomcat server would respawn every 12

hours• Easiest way to fix the numerous memory leaks!

• Transactions– ACID guaranties

CS@AU Henrik Bærbak Christensen 17

Page 18: Cloud Computing and Architecture Architectural Tactics (Tonight’s guest star: Availability)

Summary

• All things bad can and will happen to real systems having real users operating in the real world!

• You systems should strive for high availability and graceful degradation– If you want to keep your customers!

• The architectural tool box is big!

CS@AU Henrik Bærbak Christensen 18