© 2013 a. haeberlen, z. ives internet basics faults & failures 1

69
© 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

Upload: nelson-grant

Post on 24-Dec-2015

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

© 2013 A. Haeberlen, Z. Ives

1

Internet BasicsFaults & Failures

Page 2: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

2

Below HTTP: Routing

Page 3: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

3

The Internet

• The Internet consists of tens of thousands of interconnected networks

• Routers and switches forward the data from one network link to the next

• Request and response travel along a path through these networks (usually, but not always the 'shortest' path)

Server inUSA

Google PTCL

Cogent

AT&TLevel 3 Router

Switch

NetworksIndividual

network link

Path

Client in Leads

Page 4: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

4

Packet switching

• Communication consists of packets

• Each packet traverses the path independently

• No dedicated connection like in the telephone network

• Packets are relatively small (typically up to 1,500 bytes)

• Why is this a good idea?

Google UPenn

Cogent

AT&TLevel 3

Server inCalifornia

Client

Page 5: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

5

IP addresses

• How do routers know where to send a packet?

• Each machine is assigned an IP address

• Machines in the same network are given similar addresses, usually from an IP range

• Each packet has a source and a destination address

• Each router has a forwarding table that maps ranges to links over which packets in that range should be sent

Google UPenn

Cogent

AT&TLevel 3

173.194.34.104158.130.53.72

?

4Bit 0 Bit 31

Source IPDestination IP

(data)

Indicates this isan IPv4 packet

Page 6: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

6

AAAA

IP routing

• Networks exchange routing information

• If a connection or router fails, this information is updated

• Result: Global reachability. Any machine on the Internet can (in principle) communicate with any other machine.

LL

MM

II

JJ

NN

EE

KK

GG

CC

BB

DD

FF

HH

I know how to

get to A

Networks

Page 7: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

7

Path properties: Bottleneck capacity

• How fast can we send data on our path?

• Limited by the bottleneck capacity

ServerClient

Bottleneck

Page 8: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

8

Path properties: Propagation delay

• Speed of light: 299 792 458 m/s

• Latency matters!

[ahae@ds01 ~]$ traceroute www.mpi-sws.orgtraceroute to www.mpi-sws.org (139.19.1.156), 30 hops max, 60 byte packets 1 SUBNET-46-ROUTER.seas.UPENN.EDU (158.130.46.1) 1.744 ms 2.134 ms 2.487 ms 2 158.130.21.34 (158.130.21.34) 5.327 ms 5.395 ms 5.649 ms 3 isc-uplink-2.seas.upenn.edu (158.130.128.2) 5.671 ms 5.825 ms 6.175 ms 4 external3-core1.dccs.UPENN.EDU (128.91.9.2) 6.007 ms 6.283 ms 6.362 ms 5 external-core2.dccs.upenn.edu (128.91.10.1) 6.830 ms 6.990 ms 7.080 ms 6 local.upenn.magpi.net (216.27.100.73) 7.250 ms 3.429 ms 3.533 ms 7 remote.internet2.magpi.net (216.27.100.54) 4.487 ms 3.002 ms 2.925 ms 8 198.32.11.51 (198.32.11.51) 90.557 ms 90.806 ms 91.028 ms 9 so-6-2-0.rt1.fra.de.geant2.net (62.40.112.57) 97.403 ms 97.473 ms 97.766 ms10 dfn-gw.rt1.fra.de.geant2.net (62.40.124.34) 98.834 ms 98.890 ms 99.043 ms11 xr-fzk1-te2-3.x-win.dfn.de (188.1.145.50) 100.627 ms 101.034 ms 101.387 ms12 xr-kai1-te1-1.x-win.dfn.de (188.1.145.102) 103.985 ms 104.383 ms 104.528 ms13 xr-saa1-te1-1.x-win.dfn.de (188.1.145.97) 103.636 ms 103.903 ms 104.139 ms14 kr-0unisb.x-win.dfn.de (188.1.234.38) 103.983 ms 103.746 ms 103.853 ms15 mpi2rz-hsrp2.net.uni-saarland.de (134.96.6.28) 104.469 ms 104.355 ms 104.491 ms[ahae@ds01 ~]$

~6,270km (one way)

Round-triptime

Page 9: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

9

Path properties: Queueing delay

• What if we send packets too quickly?

• Router stores the packets in a queue until it can send them

• Consequence : End-to-end delay increases

• Where does this matter?

• What if the router runs out of queue space?

• Packets are dropped and lost

Page 10: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

10

TCP

• Transmission Control Protocol (TCP) provides abstraction of a reliable stream of bytes

• Ensures packets are delivered to application in correct order

• Retransmits lost packets

• Tracks available capacity and prevents packets from being sent too fast (congestion control)

• Prevents sender from overwhelming the receiver (flow control)

1 2 3 4IP 1 24 IP

Sender Receiver

TCP TCPData packets

ACK 1 ACK 2Acknowledgments

3

Page 11: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

13

What Can Go Wrong?

Page 12: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

14

Complications in wide-area networks

• Communication is slower, less reliable

• Latencies are higher, more variable

• Bottleneck capacity is lower

• Packet loss, reordering, queueing delays

• Faults are more common

• Broken or malfunctioning nodes

• Network partitions

Page 13: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

15

Faults and failures

• Terminology:

• Fault: Some component is not working correctly

• Failure: System as a whole is not working correctly

X=5

X=5

Set X:=5

X=5

X=5

What is X?

X=5

X=5

What is X?

X=5

X=3

What is X?

X=3

Fault(masked)

Faultscausingfailure

Correct

Page 14: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

16

Faults in distributed systems

• What could possibly go wrong?

• Node loses power

• Hard disk fails

• Administrator accidentally erases data

• Administrator configures node incorrectly

• Software bug triggers

• Network overloaded, drops lots of packets

• Hacker breaks into some of the nodes

• Disgruntled employee manipulates node

• Fire breaks out in data center where node resides

• Police confiscates node because of illegal activity

• ...

• ...

Page 15: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

17

Common misconceptions about faults

• "Faults are rare exceptions"

• NO! At scale, faults are occurring all the time

• Stopping the system while handling the fault is NOT an option - system needs to continue despite the fault

• "Faulty machines always stop/crash"

• NO! There are many types of faults with different effects

• If your system is designed to handle only crash faults and another type of fault occurs, things can become very bad

Page 16: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

18

Types of faults

• Crash faults

• Node simply stops

• Examples: OS crash, power loss

• Rational behavior

• Owner manipulates node to increase profit

• Example: Lying about performance to get a sale

• Byzantine faults

• Arbitrary - faulty node could do anything (stop, tamper with data, tell lies, attack other nodes, send spam, spy on user...)

• Example: Node compromised by a hacker, data corruption, hardware defect...

Page 17: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

Example Byzantine fault

19http://status.aws.amazon.com/s3-20080720.html

Page 18: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

20

Correlated faults

• A single problem can cause many faults

• Overloaded machine crashes, increases load on other machines domino effect

• Bug is triggered in a program that is used on lots of machines

• Hacker manages to break into many computers due to a shared vulnerability

• Machines may be connected to the same power grid, cooled by the same A/C, managed by the same admin

• ...

Page 19: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

22

So what can we do?

Page 20: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

23

What can we do?

• Prevention and avoidance

• Example: Prevent crashes with software verification

• Detection

• Example: Cross-check network's route announcements with other information to see whether it is lying, and hold it accountable if it is (e.g., sue for breach of contract)

• Masking

• Example: Store replicas of the data on multiple nodes; if data is lost or corrupted on one of them, we still have the other copies

Page 21: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

24

Masking faults with replication

• Alice can store her data on both servers

• Bob can get the data from either server

• A single crash fault on a server does not lead to a failure

• Availability is maintained

Server A

Server BAlice

Bob

Page 22: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

25

Problem: Maintaining consistency

• What if multiple clients are accessing the same set of replicas?

• Requests may be ordered differently by different replicas

• Result: Inconsistency!

Server A

Server BAlice

Bob

X:=5X:=7X:=5

X:=7

Page 23: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

26

Types of consistency

• Strong consistency

• After an update completes, any subsequent access will return the updated value

• Weak consistency

• Updated value not guaranteed to be returned immediately, only after some conditions are met (inconsistency window)

• Eventual consistency

• A specific type of weak consistency

• If no new updates are made to the object, eventually all accesses will return the last updated value

Page 24: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

27

Example: Storage system

• Scenario: Replicated storage

• We have N nodes that can store data

• Data contains a monotonically increasing timestamp

• To write a value:

• Pick W replicas and write the value to each, using a fresh timestamp (say, the current wallclock time)

• To read a value:

• Pick R replicas and read the value from each

• Return the value with the highest timestamp

• If any replicas had a lower timestamp, send them the newer value

X=3v1

X=3v1

X=3v1

X=5v2

X=2v4

X=5v2

Replica

Page 25: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

28

Consensus

• Replicas need to agree on a single order in which to execute client requests

• How can we do this?

• Does the specific order matter?

• Problem: What if some replicas are faulty?

• Crash fault: Replica does not respond; no progress (bad)

• Byzantine fault: Replica might tell lies, corrupt order (worse)

• Solution: Consensus protocol

• Paxos (for crash faults), PBFT (for Byzantine faults)

• Works as long as no more than a certain fraction of the replicas are faulty (PBFT: one third)

Page 26: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

29

How do consensus protocols work?

• Idea: Correct replicas 'outvote' faulty ones

• Clients send requests to each of the replicas

• Replicas coordinate and each return a result

• Client chooses one of the results, e.g., the one that is returned by the largest number of replicas

• If a small fraction of the replicas returns the wrong result, or no result at all, they are 'outvoted' by the other replicas

Page 27: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

30

What If the Network Breaks?

Page 28: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

31

Network partitions

• Network can partition

• Hardware fault, router misconfigured, undersea cable cut, ...

• Result: Gobal connectivity is lost

• What does this mean for the properties of our system?

Server A

Server B

What if this linkbreaks?

Alice

Bob

Page 29: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

32

Recap: Consistency and partitions

• Use replication to mask limited # of faults

• Can achieve strong consistency by having replicas agree on a common request ordering

• Even non-crash faults can be handled, as long as there are not too many of them (typical limit: 1/3)

• Partition tolerance, availability, consistency?

• Can't have all three (CAP theorem)

• For some services, need to drop one (usually availability)

• If service works with weaker consistency guarantees, such as eventual consistency, can get a compromise (BASE)

• Example: Shopping cart

Page 30: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

33

Cloud Computing

Page 31: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

34

History: The early days

• Cloud computing: A new term for a concept that has been around since the 1960s

• Who invented it?

• No agreement. Some candidates:

• John McCarthy (Stanford professor and inventor of Lisp; proposed the 'service bureau' model in 1961)

• J.C.R. Licklider (contributed key ideas to ARPANET; published a memo on the "Intergalactic Computer Network" in 1963)

• Douglas Parkhill (published a book on "The Challenge of the Computer Utility" in 1966)

Page 32: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

35

History: Becoming a cloud provider

• Early 2000s: Phenomenal growth of web services

• Many large Internet companies deploy huge data centers, develop scalable software infrastructure to run them

• Due to economies of scale, these companies were now able to run computation very cheaply

Technology Cost in medium DC(~1,000 servers)

Cost in large DC (~50,000 servers)

Ratio

Network $95 per Mbit/sec/month $13 per Mbit/sec/month 7.1

Storage $2.20 per GByte/month $0.40 per GByte/month 5.7

Administration ~140 servers/admin >1,000 servers/admin 7.1

Source: James Hamilton's Keynote, LADIS 2008

Page 33: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

36

History: Incentives

• Idea: Use your existing data center to provide cloud services

• Why is this a good idea?

• Make a lot of money

• Price advantage of 3x-7x Can offer services much cheaper than medium-size company and still make profit

• Leverage existing investment

• New revenue stream at low incremental cost (example: many Amazon AWS technologies were initially developed for Amazon's internal operations)

Page 34: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

37

History: Incentives (continued)

• Attack an incumbent

• Company with requisite datacenter may want to establish a 'beach head' before a '800 pound gorilla' emerges

• Leverage existing customer relationships

• IT service organizations like IBM Global Services have extensive customer relationships; provide anxiety-free migration path to existing customers

• Become a platform

• Example: Facebook's initiative to enable plug-in applications is a great fit for cloud computing

Page 35: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

38

History: The pioneers

• Jul 2002: Amazon Web Services launched

• Third-party sites can search and display products from Amazon's web site, add items to Amazon shopping carts

• Mar 2006: Amazon S3 launched

• Innovative 'pay-per-use' pricing model, which is now the standard in cloud computing

• Cheaper than many small/medium storage solutions: $0.15/GB/month of storage, $0.20/GB/month for traffic

• Amazon no longer a pure retailer, entering technology space

• Aug 2006: EC2 launched

• Core computing infrastructure becomes available

Page 36: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

39

History: Wide-spread adoption

• Apr 2008: Google App Engine launched

• Same building blocks Google uses for its own applications: Bigtable and GFS for storage, automatic scaling and load balancing, ...

• Nov 2009: Windows Azure Beta launched

• Becomes generally available in 21 countries in Feb 2010

• Microsoft’s online services are gradually transitioning to Azure

• Dec 2013: Google Compute Engine launched

• Provides lower level support vs. App Engine, gives full set of services

• Dramatically lower prices, quickly matched by AWS and Azure

Page 37: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

40

One Set of Cloud Services: Amazon Web Services

Page 38: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

41

AWS Documentation

http://aws.amazon.com/documentation/

Page 39: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

42

Why Amazon AWS and not others?

• Amazon is only one of several cloud providers

• Others include Microsoft Azure, Google Cloud Engine / App Engine, ...

• There is no common standard (yet)

• Initially, MS and Google supported PaaS

• Gradually each has grown to support both IaaS and PaaS

• AWS is PaaS/IaaS with a broad menu of choices

• So we had to pick one specific provider

• Amazon AWS is going to be used for the rest of this class

• Amazon's only involvement is providing free AWS cycles/storage

• Everything we do on AWS has an equivalent on Azure and GCE/GAE

Page 40: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

43

What is Amazon AWS?

• Amazon Web Services (AWS) provides a number of different services, including:

• Amazon Elastic Compute Cloud (EC2)Virtual machines for running custom software

• Amazon Simple Storage Service (S3)Simple key-value store, accessible as a web service

• Amazon DynamoDBDistributed “NoSQL” database, one of several in AWS

• Amazon Elastic MapReduceScalable MapReduce computation

• Amazon Mechanical Turk (MTurk)A 'marketplace for work'

• Amazon CloudFrontContent delivery network

• ...

Use

d fo

r the p

roje

cts

Page 41: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

44

Setting up an AWS account

aws.amazon.com

Sign up for an account on aws.amazon.com

You need to choose an username and a password These are for the management interface only Your programs will use other credentials (RSA

keypairs, access keys, ...) to interact with AWS

Page 42: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

45

AWS credentials

• Why so many different types of credentials?

Sign-in credentials X.509 certificates

EC2 key pairs Access keys

AWS web site andmanagement console

Command-line toolsSOAP APIs

REST APIsConnecting to aninstance (e.g., via ssh)

Page 43: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

46

The AWS management console

• Used to control many AWS services:

• For example, start/stop EC2 instances, create S3 buckets...

Page 44: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

47

REST and SOAP• How do your programs access AWS?

• Via the REST or SOAP protocols

• Example: Launch an EC2 instance, store a value in S3, ...

• Simple Object Access protocol (SOAP)

• Not as simple as the name suggests

• XML-based, extensible, general, standardized, but also somewhat heavyweight and verbose

• Increasingly deprecated (e.g., for SimpleDB and EC2)

• Representational State Transfer (REST)

• Much simpler to develop than SOAP

• Web-specific; lack of standards

Page 45: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

48

Example: REST

https://sdb.amazonaws.com/?Action=PutAttributes&DomainName=MyDomain&ItemName=Item123&Attribute.1.Name=Color&Attribute.1.Value=Blue&Attribute.2.Name=Size&Attribute.2.Value=Med&Attribute.3.Name=Price&Attribute.3.Value=0014.99&AWSAccessKeyId=<valid_access_key>&Version=2009-04-15&Signature=[valid signature]&SignatureVersion=2&SignatureMethod=HmacSHA256&Timestamp=2010-01-25T15%3A01%3A28-07%3A00

<PutAttributesResponse><ResponseMetadata><StatusCode>Success</StatusCode><RequestId>f6820318-9658-4a9d-89f8-b067c90904fc</RequestId><BoxUsage>0.0000219907</BoxUsage></ResponseMetadata></PutAttributesResponse>

Sample request Sample response

Source: http://awsdocs.s3.amazonaws.com/SDB/latest/sdb-dg.pdf

Invokedmethod

Parameters

Credentials

Responseelements

Page 46: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

49

Example: SOAP

<?xml version='1.0' encoding='UTF-8'?><SOAP-ENV:Envelopexmlns:SOAP-ENV='http://schemas.xmlsoap.org/soap/envelope/'xmlns:SOAP-ENC='http://schemas.xmlsoap.org/soap/encoding/'xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'xmlns:xsd='http://www.w3.org/2001/XMLSchema'><SOAP-ENV:Body><PutAttributesRequest xmlns='http://sdb.amazonaws.com/doc/2009-04-15'><Attribute><Name>a1</Name><Value>2</Value></Attribute><Attribute><Name>a2</Name><Value>4</Value></Attribute><DomainName>domain1</DomainName><ItemName>eID001</ItemName><Version>2009-04-15</Version></PutAttributesRequest></SOAP-ENV:Body></SOAP-ENV:Envelope>

<?xml version="1.0"?><SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"><SOAP-ENV:Body><PutAttributesResponse><ResponseMetadata><RequestId>4c68e051-fe45-43b2-992a-a24017ffe7ab</RequestId><BoxUsage>0.0000219907</BoxUsage></ResponseMetadata></PutAttributesResponse></SOAP-ENV:Body></SOAP-ENV:Envelope>

Sample request Sample response

Source: http://awsdocs.s3.amazonaws.com/SDB/latest/sdb-dg.pdf

Page 47: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

50

Amazon Compute Cloud (EC2)

Page 48: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

51

What is Amazon EC2?

• Infrastructure-as-a-Service (IaaS)

• You can rent various types of virtual machines by the hour

• In your VMs, you can run your own (Linux/Windows) programs

• Examples: Web server, search engine, movie renderer, ...

htt

p:/

/aw

s.am

azo

n.c

om

/ec2

/#p

rici

ng

(9

/11

/20

13

)

68.4 GB memory

8 virtual cores(3.25 CU each)

1690 GB storage

'high' I/O

1.7 GB memory

1 virtual core(1 CU each)

160GB storage

'moderate' I/O

Page 49: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

52

Oh no - where has my data gone?

• EC2 instances do not have persistent storage

• Data survives stops & reboots, but not termination

• So where should I put persistent data?

• Elastic Block Store (EBS)1

• Ideally, use an AMI with an EBS root (Amzon's default AMI has this property)

If you store data on the virtual hard disk of your instanceand the instance fails or you terminate it,

your data WILL be lost!

1 http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AmazonEBS.html

Page 50: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

53

Amazon Machine Images1

• When I launch an instance, what software will be installed on it?

• Software is taken from an Amazon Machine Image (AMI)

• Selected when you launch an instance

• Essentially a file system that contains the operating system, applications, and potentially other data

• Lives in S3

• How do I get an AMI?

• Amazon provides several generic ones, e.g., Amazon Linux, Fedora Core, Windows Server, ...

• You can make your own

• You can even run your own custom kernel (with some restrictions)

1 http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html

Page 51: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

54

Security Groups1

• Basically, a set of firewall rules

• Can be applied to groups of EC2 instances

• Each rule specifies a protocol, port numbers, etc...

• Only traffic matching one of the rules is allowed through

• Sometimes need to explicitly open ports

Instance

Evilattacker

Legitimateuser (you or

your customers)

1 http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html

Page 52: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

55

Regions and Availability Zones1

• Where exactly does my instance run?

• No easy way to find out - Amazon does not say

• Instances can be assigned to regions

• Currently 9 availble: US East (Northern Virginia), US West (Northern California), US West (Oregon), EU (Ireland), Asia/Pacific (Singapore), Asia/Pacific (Sydney), Asia/Pacific (Tokyo), South America (Sao Paulo), AWS GovCloud

• Important, e.g., for reducing latency to customers

• Instances can be assigned to availability zones

• Purpose: Avoid correlated fault

• Several availability zones within each region

1 http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html

Page 53: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

56

Network pricing

• AWS does charge for network traffic

• Price depends on source and destination of traffic

• Free within EC2 and other AWS services in same region (e.g., S3)

• Remember: ISPs are typically charged for upstream traffic

htt

p:/

/aw

s.am

azo

n.c

om

/ec2

/#p

rici

ng

(9

/11

/20

13

)

Page 54: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

57

Instance types

• So far: On-demand instances

• Also available: Reserved instances

• One-time reservation fee to purchase for 1 or 3 years

• Usage still billed by the hour, but at a considerable discount

• Also available: Spot instances

• Spot market: Can bid for available capacity

• Instance continues until terminated or price rises above bid

Source: http://aws.amazon.com/ec2/reserved-instances/

Page 55: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

58

Service Level Agreement

http://aws.amazon.com/ec2-sla/ (9/11/2013; excerpt)

4.38h downtimeper year allowed

Page 56: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

59

What is Elastic Block Store (EBS)?

• Persistent storage

• Unlike the local instance store, data stored in EBS is not lost when an instance fails or is terminated

• Should I use the instance store or EBS?

• Typically, instance store is used for temporary data

Instance EBS storage

Page 57: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

60

Volumes

• EBS storage is allocated in volumes

• A volume is a 'virtual disk' (size: 1GB - 1TB)

• Basically, a raw block device

• Can be attached to an instance (but only one at a time)

• A single instance can access multiple volumes

• Placed in specific availability zones

• Why is this useful?

• Be sure to place it near instances (otherwise can't attach)

• Replicated across multiple servers

• Data is not lost if a single server fails

• Amazon: Annual failure rate is 0.1-0.5% for a 20GB volume

Page 58: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

61

EC2 instances with EBS roots

• EC2 instances can have an EBS volume as their root device ("EBS boot")

• Result: Instance data persists independently from the lifetime of the instance

• You can stop and restart the instance, similar to suspending and resuming a laptop

• You won't be charged for the instance while it is stopped (only for EBS)

• You can enable termination protection for the instance

• Blocks attempts to terminate the instance (e.g., by accident) until termination protection is disabled again

• Alternative: Use instance store as the root

• You can still store temporary data on it, but it will disappear when you terminate the instance

• You can still create and mount EBS volumes explicitly

Page 59: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

62

TimeSnapshots

• You can create a snapshot of a volume

• Copy of data in the volume at the time snapshot was made

• Only the first snapshot makes a full copy; subsequent snapshots are incremental

• What are snapshots good for?

• Sharing data with others

• DBpedia snapshot ID is "snap-882a8ae3"

• Access control list (specific account numbers) or public access

• Instantiate new volumes

• Point-in-time backups

Page 60: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

63

Pricing

• You pay for...

• Storage space: $0.10 per allocated GB per month

• I/O requests: $0.10 per million I/O requests

• S3 operations (GET/PUT)

• Charge is only for actual storage used

• Empty space does not count

Page 61: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

64

Creating an EBS volume

Needs to be in sameavailability zone as

your instance!

DBpediasnapshot ID

Create volume

Page 62: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

65

Mounting an EBS volume

• Step 1: Attach the volume

• Step 2: Mount the volume in the instance

mkse212@vm:~$ ec2-attach-volume -d /dev/sda2 -i i-9bd6eef1 vol-cca68ea5ATTACHMENT vol-cca68ea5 i-9bd6eef1 /dev/sda2 attachingmkse212@vm:~$

mkse212@vm:~$ ssh [email protected]

__| __|_ ) Amazon Linux AMI _| ( / Beta ___|\___|___|

See /usr/share/doc/system-release-2011.02 for latest release notes. :-)[ec2-user@ip-10-196-82-65 ~]$ sudo mount /dev/sda2 /mnt/[ec2-user@ip-10-196-82-65 ~]$ ls /mnt/dbpedia_3.5.1.owl dbpedia_3.5.1.owl.bz2 en other_languages[ec2-user@ip-10-196-82-65 ~]$

Page 63: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

66

Detaching an EBS volume

• Step 1: Unmount the volume in the instance

• Step 2: Detach the volume

mkse212@vm:~$ ec2-detach-volume vol-cca68ea5ATTACHMENT vol-cca68ea5 i-9bd6eef1 /dev/sda2 detachingmkse212@vm:~$

[ec2-user@ip-10-196-82-65 ~]$ sudo umount /mnt/[ec2-user@ip-10-196-82-65 ~]$ exitmkse212@vm:~$

Page 64: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

67

Plan for today

• A brief history of cloud computing

• Introduce one specific commercial cloud

• Amazon Web Services (AWS)

• Elastic Compute Cloud (EC2)

• Elastic Block Storage (EBS)

• Other services: Mechanical Turk, CloudFront, ...

• Next time: S3 and SimpleDB

NEXT

Page 65: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

68

AWS Import/Export

• Import/export large amounts of data to/from S3 buckets via physical storage device

• Mail an actual hard disk to Amazon (power adapter, cables!)

• Signature file for authentication

• Discussion: Is this the Right Way to be shipping data, or should we rather be using a network?

Method Time

Internet (20Mbps)

45 days

FedEx 1 dayTime to transfer 10TB [AF10]

Page 66: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

69

Mechanical Turk (MTurk)

• A crowdsourcing marketplace

• Requesters post small jobs (HIT - Human Intelligence Task), offer small rewards ($0.01-$0.10)

htt

ps:

//w

ww

.mtu

rk.c

om

/mtu

rk/

(9/2

3/2

01

0 1

:58

am

)

Page 67: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

70

CloudFront

• Content distribution network

• Caches S3 content at edge locations for low-latency delivery

• Some similarities to other CDNs like Akamai, Limelight, ...

Page 68: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

71

Plan for today

• A brief history of cloud computing

• Introduce one specific commercial cloud

• Amazon Web Services (AWS)

• Elastic Compute Cloud (EC2)

• Elastic Block Storage (EBS)

• Other services: Mechanical Turk, CloudFront, ...

• Next time: S3 and SimpleDB

NEXT

Page 69: © 2013 A. Haeberlen, Z. Ives Internet Basics Faults & Failures 1

72

Stay tuned

Next time you will learn about: Cloud storage