cloudcompungpaerns - summersoc · dr. michael gorriz – cio daimler ag ever since t-systems has...

46
Cloud Compu)ng Pa.erns Founda’ons and Introduc’on Christoph Fehling, Prof. Dr. Frank Leymann Ins:tute of Architecture of Applica:on Systems (IAAS) Phone +497117816 470 Fax +497117816 472 email Leymann| Fehling @iaas.unistuNgart.de Universität StuNgart Universitätsstr. 38 70569 StuNgart Germany Tutorial at SummerSoC 2013 (1 July – 6 July, 2013, Hersonissos, Crete, Greece)

Upload: others

Post on 20-May-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Cloud  Compu)ng  Pa.erns  Founda'ons  and  Introduc'on  

Christoph  Fehling,  Prof.  Dr.  Frank  Leymann  Ins:tute  of  Architecture  of  Applica:on  Systems  (IAAS)  

Phone  +49-­‐711-­‐7816  470  Fax  +49-­‐711-­‐7816  472    e-­‐mail  Leymann|  Fehling  

 @iaas.uni-­‐stuNgart.de  

Universität  StuNgart  Universitätsstr.  38  70569  StuNgart  Germany  

Tutorial  at  SummerSoC  2013  (1  July  –  6  July,  2013,  Hersonissos,  Crete,  Greece)  

Page 2: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

1

Christoph Fehling · Frank Leymann · Ralph Retter · Walter Schupeck · Peter Arbitter

Cloud Computing PatternsFundamentals to Design, Build, and Manage Cloud Applications

Cloud Computing Patterns

Christoph Fehling Frank LeymannRalph Retter · Walter SchupeckPeter Arbitter

Computer Science

Cloud Computing Patterns

Fundamentals to Design, Build, and Manage Cloud Applications

! is book provides CIOs, so" ware architects, project managers, developers, and cloud strategy initiatives with a set of archi-tectural patterns that o# er nuggets of advice on how to achieve common cloud computing-related goals. ! e cloud computing patterns capture knowledge and experience in an abstract format that is independent of concrete vendor products. Readers are provided with a toolbox to structure cloud computing strategies and design cloud application architectures. By using this book cloud-native applications can be implemented and best suited cloud vendors and tooling for individual usage scenarios can be selected. ! e cloud computing patterns o# er a unique blend of academic knowledge and practical experience due to the mix of authors. Academic knowledge is brought in by Christoph Fehling and Professor Dr. Frank Leymann who work on cloud research at the University of Stuttgart. Practical experience in building cloud applications, selecting cloud vendors, and designing enterprise architecture as a cloud customer is brought in by Dr. Ralph Retter who works as an IT architect at T Systems, Walter Schupeck, who works as a Technology Manager in the fi eld of Enterprise Architecture at Daimler AG, and Peter Arbitter, the former head of T Systems’ cloud architecture and IT portfolio team and now working for Microso" .

Voices on Cloud Computing Patterns

Cloud computing is especially benefi cial for large companies such as Daimler AG. Prerequisite is a thorough analysis of its impact on the existing applications and the IT architectures. During our collaborative research with the University of Stuttgart, we identifi ed a vendor-neutral and structured approach to describe properties of cloud o# erings and requirements on cloud environments. ! e resulting Cloud Computing Patterns have profoundly impacted our corporate IT strategy regarding the adoption of cloud computing. ! ey help our architects, project managers and developers in the refi nement of architectural guidelines and communicate requirements to our integration partners and so" ware suppliers.

Dr. Michael Gorriz – CIO Daimler AG

Ever since $%%& T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”. Today these cloud services cover a huge variety of corporate applications, especially enterprise resource planning, business intelligence, video, voice communication, collaboration, messaging and mobility services. ! e book was written by senior cloud pioneers shar-ing their technology foresight combining essential information and practical experiences. ! is valuable compilation helps both practitioners and clients to really understand which new types of services are readily available, how they really work and importantly how to benefi t from the cloud.

Dr. Marcus Hacke – Senior Vice President, T-Systems International GmbH

! is book provides a conceptual framework and very timely guidance for people and organizations building applications for the cloud. Patterns are a proven approach to building robust and sustainable applications and systems. ! e authors adapt and extend it to cloud computing, drawing on their own experience and deep contributions to the fi eld. Each pattern includes an extensive discussion of the state of the art, with implementation considerations and practical examples that the reader can apply to their own projects. By capturing our collective knowledge about building good cloud applications and by providing a format to integrate new insights, this book provides an important tool not just for individual practitioners and teams, but for the cloud computing community at large.

Kristof Kloeckner – General Manager, Rational So" ware, IBM So" ware Group

9 783709 115671

I SBN 978 - 3 - 7091 - 1567 - 1

Fehling · Leymann

Retter · Schupeck · Arbitter

Page 3: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Queue-­‐Based  Programs  

3  

•  Connec:on  based  programs  •  are  opera:ng  dependently  •  based  on  predictable  pairings  

•  specify  program  name  of  partner  

•  Queue  based  programs  •  are  opera:ng  independently  

•  based  on  predictable  or  unpredictable  pairings  

•  specify  queue  name  

Program A Program B Program C Program D

RPC, Sockets,… MQI Message Queuing Interface

Page 4: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Problems  in  Direct  TP  

4  

Client   Server  Request  

Server  Availability  

Client   Server  Reply  

Client  Availability  

X   X  

Unbalanced  Load   Priority  Agnos:c  

Server  1   Server  2   Server  3  

Request  

Request  

Request  

Request  

Request  

Request  

Request  

Server  

Request  

Request  

Request  

Urgent          

Request  

Page 5: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Loose  Coupling  

5      

•  Core  principle:    Reduce  number  of  assump:ons  two  par:es  make  about  each  other  when  they  exchange  informa:on  

•  But:  Making  more  assump:ons  facilitates  to  increase  efficiency    Tight  coupling  for  high-­‐performance  environments  •  But  less  tolerance  to  changes  at  a  partner‘s  side  

“Loose  Coupling”  is  a  very  important  new  term  in  prac:ce  today!    Some:mes,  it  is  used  nearly  as  a  synonym  for  “being  message-­‐based”…    

Page 6: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Loose  Coupling:  Autonomy  Aspects  

6      

•  Reference  Autonomy  •  Producers  and  consumers  don’t  know  each  other  

•  Plajorm  Autonomy  •  Producers  and  consumers  may  be  in  different  environments,  wriNen  

in  different  languages,…  

•  Time  Autonomy  •  Producers  and  consumers  access  channel  at  their  own  pace    

•  Communica:on  is  asynchronous  

•  Data  exchanged  is  persistent  

•  Format  Autonomy  •  Producers  and  consumers  may  use  different  formats  of  data  

exchanged  •  Requires  transforma:on  “on  the  wire”  (aka  media)on)  

Page 7: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Messaging  Styles  

7      

•  Point-­‐to-­‐Point  (that’s  what  has  been  discussed  un:l  now)  

•  Publish-­‐Subscribe  •  Sender  “publishes”    a  message  on  a  “topic”  

•  Zero  or  more  “subscribers”  on  that  topic  get  that  message  delivered  

Sender  Receiver1  

Receiverj  

Receivern  

…  

…  

Receiver1  

Receiverj  

Receivern  

…  

…  Sender  

Zero  or  on

e  consum

er  

Zero  or  more  

consum

er  

Page 8: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Message  Queue  Manager  (MQM)  

8      

•  Message  queue  manager  (MQM)  provides  environment  for  queuing  applica:ons  including  the  MQI  •  provides  reliable  storage  for  queued  messages  •  manages  concurrent  access  to  data  •  ensures  security  and  authoriza:on  •  provides  special  queuing  func:ons  (like  triggering)  

•  Applica:ons  that  need  queuing  facili:es  must  first  connect  to  an  MQM  

•  The  MQM  an  applica:on  is  directly  connected  to  is  called  its  local  queue  manager  

•  The  applica:on  may  run  as  clients  using  the  Message  Queuing  Interface  (MQI)  

Page 9: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Local/Remote  Queues  And  The  MQI  

9      

Program B Program C

MQI(Message Queue Interface)

MQM1 MQM2

Program A

“Sen

d-­‐and-­‐Forget”  

“Store-­‐and-­‐Forward”  

Page 10: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

The  Mover  

10      

MQM1

<transmission queue>

MQM2

<target queue>

Mover MoverChannel

2PCl  

Page 11: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Mul)-­‐Hop  Forwarding  

11      

Application

PUT to Q1

MQM 1

Q1 = MQM2.Q2

MQM 2

Q2 = MQM3.Q3

MQM 3

Q3 is local

Page 12: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Message  Delivery:  Summary  

12  

MQM1

<transmission queue>

MQM2

<target queue>

Network

MQI

PUT into Q1 GET from Q7

Mover MoverChannel

..... Q1=(MQM2, Q7)

.....

<queue directory>.....

Q7=(local, Q7) .....

<queue directory>

Program B Program A

Page 13: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

…In  Case  You  Know  Databases    

13      

DBMS  

SQL  

Tables  

MQM  

MQI#  

Queues  

Applica:on   Applica:on  

Tuple   Message  

Page 14: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Transac)ons  Across  DBMS  And  MQM  

14      

DBMS  

SQL  

MQM  

MQI  

Queues  

Applica:on  

UPDATE  TABLE   PUT  INTO  Q  

Page 15: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

MQM  As  Resource  Manager  

15        

•  Applica:on  can  manipulate  resources  in  mul:ple  resource  managers  •  All  manipula:ons  can  be  grouped  into  a  single  transac:on  •  Transac:on  manager  will  run  Two-­‐Phase-­‐Commit  protocol  to  ensure  that  all  manipula:ons  

are  commiNed  or  undone  •  Note  the  “Poisoned  Message  Problem”:  If  a  message  cannot  be  processed  and  results  in  

ROLLBACK  it  will  be  put  into  the  queue  and  processing  begins  again,  i.e.  ROLLBACK  +  processing  +  ROLLBACK  +  …  (ad  infinity)  

•  Mul:ple  solu:ons:  Synchroniza:on  points,  maximum  number  of  retries  maintained  my  MQM    

MQM   DBMS  

Applica:on  

Begin Transaction PUT INTO Q UPDATE TABLE Commit

Ready  to  Commit?  

Transac:on  Manager  

Page 16: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Queued  TP  

16      

Client   Server  

Server  InputQ  

Client  ResponseQ  

Proper  Transac:on  

Xact  1  Xact  2  

PUT  

GET  PUT  

GET  

Client  pushes  request   Server  pulls  request  

Xact  3  

Client  pulls  response  

Page 17: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Problems  Solved  With  Queued  TP  

17       17  

Client   Server  

Server  

Server  

Urgent  

Client  Response  Queue  

PUT   GET  

PUT  

GET  

No  need  for  server  to  be  available  when  

request  is  send   Fully  dynamic  loadbalancing  

Requests  can  be  reordered  according  to  priority  

No  need  for  client  to  be  available  when  response  is  send  

NOTE  Request  scheduling  can  be  done  more  flexible:  servers  can  be  split  based  on  request  priority  class.  This  enables  to  guarantee  certain  service  level  for  each  priority  class.  

Page 18: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Defini)on  &  Solu)on  Principles  

18      

Performed  workload  is  usually  required  to  be  linear  propor:onal  to  resources  (linear  scalability):    

performed  workload  =      %  resources,    where  the  factor  is  ideally  close  to  one:      é  1    In  principle,  this  can  be  achieved  in  three  different  ways:      •  Work  harder  

Processing  speed    

•  Work  smarter  BeNer  algorithms    

•  Get  help  Introduce  parallelism  

 

Scalability  is  the  ability  to    endure  increasing  workloads    

without  decreasing  an  agreed  service  level  when  underlying  resources  are  also  increased  

Page 19: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Clusters:  Defini)on  

19      

A  cluster  is  a  distributed  system  that    

  consists  of  a  collec:on  of  interconnected  whole  computers    

  is  used  as  a  single,  unified  compu:ng  resource  

Page 20: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Clusters  In  Client/Server  Applica)ons  (1/2)  

20      

Cluster  

Page 21: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Clusters  In  Client/Server  Applica)ons  (2/2)  

21  

Client  request  service  from  cluster    Cluster  selects  node  for  performing  requested  func:on  

  Selec:on  can  be  done  by  client-­‐stub  of  cluster    Node  accesses  shared  data  

  Physically  shared  disc,  I/O  shipping,  shared  DB,...    3-­‐Tier-­‐Architecture:  Convenient  implementa:on!  

Cluster  

Page 22: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Defini)ons  

22      

•  System  called  available  if  it  is  up  and  running    and  produces  correct  results  

•  The  availability  of  a  system  is  the  frac:on  of  :me  it  is  available  •  A  system  is  high  available  if  its  availability  is  close  to  1  •  Thus,  high  availability  is  the  property  of  a  system  to  be  up  and  

running  all  the  :me,  always  producing  correct  results  •  I.e.  the  famous  requirement  is...    

 •  Mission  cri:cal  systems  (DBMS,  TPM,  WFMS,...)  must  be  high  

available  •  A  system  fails  if  it  gives  a  wrong  answer  or  no  answer  at  all  

•  Thus:  

•  The  more  a  system  fails,  the  less  it  is  available  •  The  longer  it  takes  to  repair  a  system  azer  it  fails,    

the  less  it  is  available    

7× 24

Page 23: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Relevant  Points  in  Time  

23      

System  is  available  again  

Fault  is  detected  

!!!  Failure  !!!  

MTBF    (Mean  Time  Between  Failure)  

(:me  system  is  available)  

MTTR  (Mean  Time  To  Repair)  

Fault  Detec:on  Time  

Recovery  Time  

:me  

System  is  up  and  running  (up:me)  

System  is  unavailable  (down:me)  

Observa:on  Interval  (  =  MTBF  +  MTTR  )  

Page 24: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Formaliza)on  

24      

•  Mean Time Between Failures (MTBF or ) •  ... the average time a system runs before it fails

•  MTBF measures system reliability

•  Mean Time To Repair (MTTR or ) •  ... the average time it takes to repair a system after failure

•  MTTR measures system downtime

•  Availability :

•  Thus, availability can be improved by •  improving reliability, i.e. MTBF •  decreasing downtime, i.e. MTTR

ϕ

α α = ϕ

ϕ +

ϕ → ∞ : lim

ϕ→∞

ϕϕ +

= 1

→ 0 : lim

→0

ϕϕ +

= 1

Page 25: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Availability  Classes  

25      

1 hour / day 95.8 (~ 96) 1 Unmanaged

1 hour / week 99.41 (~ 99) 2 Managed

1 hour / month(few minutes / day) 99.86 (~ 99.9) 3 Well managed

1 hour / year 99.9886 (~ 99.99) 4 Fault tolerant

1 hour / 20 years(5 minutes / year) 99.99942 (~ 99.999) 5 High available

30 seconds / year(3 seconds / year) 99.9999048 (~ 99.9999) 6

Customers minimum expectation today

Downtime Availability [%] Class System Type

  Class 1: This really bad!   Class 2: Commodity uni-processor system   Class 3: Standard open-system cluster   Class 4: Cluster with special HW/SW - Achieving class 4 is already a challenge!   Class 5: IBM S/390 Parallel Sysplex HW   Class 6: In-flight aircraft computer   Some vendors already call class 3...4 “High Available” &

The number of 9’s is characterizing the system type

Page 26: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Applica)on  Server  Hot-­‐Pooling  

26      

•   Hot  Pool:    •  A  collec:on  of  applica:on  servers  with  iden:cal  func:onality  sharing  a  common  

pool  input  queue    •  Client  sends  requests  to  pool  input  queue  •  Whenever  a  member  of  the  hot  pool  finished  processing  par:cular  request  it  

immediately  gets  next  request  from  pool  input  queue  •  When  a  member  fails  the  other  members  will  con:nue  processing  requests  from  

pool  input  queue:  Hot  pool  appears  as  “virtual  applica:on  server”  •  Watchdog  will  ini:ate  failed  member’s  restart  

Relevance  of  Queued  TP!  

Server  Server  

Server  Client   Server  

PUT  

GET  

Watchdog   Hot  Pool  

Update  

Page 27: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Ensuring  Availability  In  Hot  Pools:  Transac)ons  

27      

•  Assump:on:    •  pool  input  queue  is  persistent  •  each  hot  pool  member  ensures  message  integrity    

•  Consequence:  As  long  as  single  member  is  running  “virtual  applica:on  server”  is  available  

•  MTTR  is  further  reduced!  

Client   Server  

Xact1  

Xact3  PUT  

GET  PUT  

GET  

Xact3  

Page 28: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Probability  Considera)ons  For  Hot  Pools  

28      

•  Binomial  Distribu)on:    Let  E  be  an  event  that  occurs  with  probability  P;  perform  the  same  experiment  N  :mes  independently  (e.g.  in  parallel)  and  count  how  ozen  a  par:cular  event  E  occurs.  Then,  the  probability  that  E  occurs  exactly  k  :mes  is    

 •  The  availability  of  hot  pool  members  is  binomial  distributed:    

Each  member  in  a  hot  pool  runs  a  copy  of  the  same  sozware,  so  MTTR  and  MTBF,  thus  the  availability  is  the  same  for  each  member.  The  N  independent  experiments  consists  of  running  N  hot  pool  members  in  parallel,  and  the  event  observed  is  “member  available”.  Thus,  the  probability  that  exactly  k  members  are  available  of  a  hot  pool  having  N  members  is    

 •  The  probability  of  at  least  k  members  being  available  is    

Pk = kN( )Pk 1−P( )N−k

Pk = kN( )α k 1−α( )N−k

P≥k = Pi

i=k

N

∑ = iN( )

i=k

N

∑ α i 1−α( )N−i

Page 29: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Availability  Of  A  Hot  Pool  

29      

•  The  services  provided  by  its  hot  pool  members  is  available  as  long  as  at  least  one  member  of  the  hot  pool  is  available:    

 

•  Thus,  the  availability  of                        of  a  hot  pool  with  N  members  each  member  having  the  same  availability                        is:  

 

•  If  an  event  has  a  probability  much  less  than  one  and  if  it  is  memoryless,  i.e.  its  occurrence  is  not  influenced  by  the  occurrence  of  the  same  event  before,  then  the  mean  )me  to  such  an  event  is  the  reciprocal  of  the  probability  of  the  event  

•  The  failing  of  a  hot  pool  is  such  an  event  having  the  probability      thus:  

 

αhot pool = P≥1 = i

N( )i=1

N

∑ α i 1−α( )N−i

αhotpool

αmember

αhot pool = 1− 1−αmember( )N

= iN( )

i=0

N

∑ α i 1−α( )N−i− N

0⎛

⎝⎜⎞

⎠⎟α 0 1−α( )N⎛

⎝⎜

⎠⎟

= α + 1−α( )( )N− 1−α( )N

1−αhot pool,

MTBFhot pool = 1/ 1−αhot pool( )( )

Page 30: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Hot  Pool  Availability  (Sample  Numbers)  

30      

•  Assump:ons:  Hot  pool  member  MTBF=24h,  MTTR=6h,  resul:ng  member  availability  is  80%  (i.e.  unacceptable)  

•  But:  Hot  pool  gets  availability  class  5  already  with  8  members  

0,356735159817

1,78367579909

8,91837899541 44,5918949774

5 6 7 8

Hot Pool Cardinality

0

10

20

30

40

50

Yea

rs

Hot Pool MTBF

168,192

33,6384

6,72768000002

1,34553599999

5 6 7 8

Hot Pool Cardinality

0

50

100

150

200

[min

/ ye

ar]

Downtime

Page 31: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Applica)on  Clusters  

31      

•  An  Applica)on  Cluster  is  defined  to  be  •  a  collec:on  of  interchangeable  nodes  •  each  node  hosts  a  hot  pool  of  one  and  the  same  (applica:on)  server  •  the  nodes  hos:ng  the  cluster  fail  independently  

•  Especially:  •  All  systems  have  access  to  all  of  the  data  (thus  the  DBMS  is  cri:cal  w.r.t.  

availability  of  the  cluster!  Thus,  the  DBMS  may  run  in  a  separate  cluster!)  •  The  cluster  provides  a  single  system  image,  i.e.  clients  submit  requests  to  the  

cluster  not  to  a  par:cular  system  (especially,  all  systems  run  the  same  kind  of  hot  pool)  

   Relevance  of  Queued  TP!      

Server  Server  

Server  

Client  

Server  

Node  1  

Server  Server  

Server  Server  

Node  n  

…  

Cluster  

Hot  Pool  

Hot  Pool  

Page 32: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Take-­‐Over  Across  Hot  Pools  

32      

•  If  complete  hot  pool  fails  (e.g.  OS  or  processor  fails)  automa:c  reroute  of  requests  to  available  hot  pool  on  different  system:  MTTR  is  further  reduced  

•  Either…    •  via  cluster  architecture  (hardware,  system  sozware,...)  •  or  underlying  messaging  system  supports  reroute  directly  (aka  virtual  queue  or  

cluster  queue)  •  or  client  exploits  specific  enhancements  of  underlying  MOM  (e.g.  server  APIs)    

•  Thus,  client  behaves  as  if  it  has  a  session  with  a  hot-­‐pool-­‐plex  •  Mechanism  to  restart  the  failed  hot  pool  depends  on  par:cular  fault  

Server  Server  

Server  

Client  

Server  

Server  Server  

Server  Server  

…  

Hot  Pool  1  

Hot  Pool  n  

MOM  

Page 33: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Cluster  Availability  Versus  Cost  

33      

•  Let                            denote  the  availability  of  a  system  (i.e.  a  hos:ng  environment),  M  the  number  of  systems  in  the  cluster  

•  The  services  provided  by  a  cluster  is  available  as  long  as  at  least  one  system  of  the  cluster  is  available.  As  before:  

•  Basically,  the  cost  Ccluster  of  a  cluster  is  the  sum  of  the  cost  Csystem  of  its  systems  :    Ccluster  =  MCsystem      

•  Assuming  a  unit  price  Csystem  of    $8,000  per  system  and  an  availability                            of  66%  the  following  results:   80

,109739369

8

26,7032464563

9

8,90108215211

10

2,96702738404

11 0,989009127992

12

64000 72000 80000 88000 96000

Cluster Cost [$]

0102030405060708090

Clu

ster

Dow

ntim

e [m

in/y

] Cost Of Availability

αsystem

αsystem

αcluster =1− 1−αsystem( )M

Page 34: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Scaling  Applica)ons  Up  in  the  Cloud  

34      

•  Scale-­‐up  results  in  (massive)  distributed  systems  

App  

App   App  

App  

On  Premise/  Private  Cloud  

Public  Cloud  B  

Public  Cloud  A  

Scale-­‐up  

Scale-­‐up  

Page 35: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

The  Problem  

35      

•  Massive  distributed  systems  operate  on  a  large  (even  world-­‐wide)  scale  

•  Large  scale  creates  addi:onal  challenges:    When  a  system  processes  billions  (even  trillions)  of  requests,  events  that  normally  have  a  low  probability  of  occurrence  are  now  guaranteed  to  happen  

•  These  events  need  to  be  accounted  for  up  front  in  the  design  and  architecture  of  the  system  

Page 36: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

×  

Updates  in  Massive  Distributed  Systems  

36      

•  Data  are  ozen  replicated  across  the  nodes  

•  Clients  updates  single  node  of  the  system  •  System  replicates  updates  to  other  nodes  in  a  lazy  manner  

Node  1   Node  2   Node  N  …  

Client  

Write  vnew  

vold  

vnew  

vnew  

vnew  

1  

1  

1  ×  vold  vnew  

2   ×  vold  vnew  

k  

vnew  

Page 37: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

The  Role  of  Queues  in  Replica)on  

37  

•  Replica:on  is  typically  based  on  (persistent)  queues  

•  Updated  node  sends  new  value  to  other  nodes  via  queues  •  Other  nodes  update  their  local  copy  (lazily)  

Node  1   Node  2   Node  N  …  

Client  

vnew  vold   vold  

vnew   vnew  vold  ⟶vnew   vold  ⟶vnew  k  

1  

2  

1  

   

Page 38: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

In  an  Ideal  World...  

38      

•  ...all  replicas  would  be  updated  at  the  same  :me  –    resul:ng  in  data  consistency  •  Each  node  holding  a  copy  of  a  data  item  has  the  same  value  of  the  

data  item  •  i.e.  the  client  can  access  any  of  the  nodes  to  retrieve  the  actual  latest  value  of  the  data  item  

•  …any  responding  node  would  offer  all  func:ons  of  the  overall  system  –  resul:ng  in  availability  •  Failures  have  no  impact  of  the  available  func:ons  of  the  overall  

systems  •  i.e.  the  client  can  select  any  of  the  nodes  to  request  all  func:ons  

•  ...failure  of  network  connec:ons  would  not  affect  the  correctness  of  the  overall  system  –  resul:ng  in  par::on  tolerance  •  A  par::oning  of  the  network  has  no  impact  on  the  func:onality  of  

the  overall  system  •  i.e.  the  client  can  access  any  of  the  nodes  for  any  func:on  and  will  always  get  same  and  correct  results  

Page 39: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

   

But  Life  is  Cruel    

39        

The  CAP  Theorem  (Brewer,  2000)    

 Out  of  the  following  three  proper:es  of  a  system:  

   Consistency      Availability      Par::on  tolerance    …only  two  can  be  achieved  at  the  same  :me  

•  Since  in  larger  distributed  systems  network  par::ons  are  a  given,  such  systems  must  be  build  with  par::on  tolerance  in  mind!  

•  As  a  consequence,  in  such  systems  consistency  and  availability  cannot  be  achieved  at  the  same  :me!  

Page 40: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Architectural  Relevance  

40      

There  are  only  two  choices:  

 •  Demand  consistency    

 This  means  that  under  certain  condi:ons  the  system  will  not  be  available  

 

•  Relax  consistency    

 This  will  allow  the  system  to  remain    highly  available      

 

Page 41: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Consistency  Models  (Vogels,  2008)  

41      

•  Strong  consistency  

•  Azer  update  completes,  any  subsequent  access  by  anybody  will  return  the  updated  value  

•  Weak  consistency  

•  System  does  not  guarantee  that  subsequent  accesses  will  return  the  updated  value  

•  Period  between  update  and  the  moment  when  it  is  guaranteed  that  any  observer  will  read  the  updated  value  is  called  inconsistency  window  

•  Eventual  consistency  (specific  form  of  weak  consistency)  

•  Storage  system  guarantees  that  if  no  new  updates  are  made  to  the  object,  eventually  all  accesses  will  return  the  last  updated  value  

Page 42: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Tolera)ng  Par))ons  

42      

•  Assume  a  par::oning  is  detected  

•  Each  par::on  is  treated  as  self-­‐contained  replica  set  •  i.e.  reads  and  writes  are  done  to  par::on  only  

•  Merge  is  performed  once  par::oning  is  healed  

➾  This  how  Amazon’s  shopping  cart  applica:on  works  

Page 43: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

The  BASE  Proper)es  

43      

•  Transac:onal  proper:es  for  large-­‐scale  applica:ons  •  …proposed  as  ACID  alterna:ve…  

•  BASE  means  •  Basically  Available  

:=  support  par:al  failures  without  total  failure  –  E.g.  if  a  node  fails,  only  the  users  on  that  node  are  affected  by  the  

failure  

•  Soz  state  :=  state  changes  triggered  by  an  opera:on  may  be  scaNered  across  different  components  of  the  system  

–  E.g.  the  debit  of  a  funds  transfer  is  in  a  database,  but  the  credit  of  the  funds  transfer  is  s:ll  in  a  message  within  some  input  queue  

•  Eventually  consistent  :=  at  some  point  in  :me  the  effects  of  an  opera:on  is  reflected  in  all  affected  components,  and  in  a  consistent  manner  

–  E.g.  finally,  the  debit  and  the  credit  required  for  a  successful  and  consistent  funds  transfer  are  reflected  on  both  databases  

Page 44: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

Example  

   

•  Real  world  funds  transfer  follows  the  BASE  paradigm  

My  Bank   Your  Bank  

Client  

myAcc  ⟶  myAcc  -­‐  100  

Xfer  100€  from  myAcc  to    yourAcc  

yourAcc  ⟶  yourAcc  +  100  

1  

2  2  

3  

•  Debit  and  Credit  can  succeed  independently  (Basically  Available)  •  State  is  scaNered  across  Debit  DB  and  Request  queue  at  [2]  (Soz  state)  •  Funds  transfer  finally  consistently  reflected  in  both  DBs  [3]  (Eventually  consistent)  

Increase  yourAcc  by  100  

   Relevance  of  Queued  TP!      

Page 45: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

A  Note  to  Cau)on  

45      

•  Consequently,  large-­‐scale  distribute  applica:ons  (e.g.  applica:ons  distributed  across  clouds)  require  message  queuing  etc.  

•  Such  middleware  is  available  as  cloud  offerings  •  Amazon  SQS,  Microsoz  Azure,…  

•  But:  These  queuing  services  behave  differently  than  used  from  on-­‐premise  middleware  (MQSeries,  MSMQ,…)  

•  No  exactly-­‐once  delivery,  but  at-­‐least-­‐once  delivery  •  No  2PC  transac:ons  between  MOM  and  DBMS  

•  …  •  Thus,  building  large-­‐scale  applica:ons  in  the  cloud  requires  to  consider  

these  differences  

•  Build  idempotent  opera:ons  

•  Detected  duplicate  messages  

•  …    

Page 46: CloudCompungPaerns - SummerSOC · Dr. Michael Gorriz – CIO Daimler AG Ever since T-Systems has provided a fl exible and reliable cloud platform with its “Dynamic Services”

End  of  First  Part  of  the  Tutorial  

46