whowas: a platform for measuring web deployments on iaas clouds liang wang *, antonio nappa +, juan...

22
WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang * , Antonio Nappa + , Juan Caballero + , Thomas Ristenpart * , Aditya Akella * * University of Wisconsin-Madison + IMDEA Software Institute 1

Upload: byron-greer

Post on 24-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang *, Antonio Nappa +, Juan Caballero +, Thomas Ristenpart *, Aditya Akella * *

1

WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds

Liang Wang*, Antonio Nappa+, Juan Caballero+, Thomas Ristenpart*, Aditya Akella*

* University of Wisconsin-Madison+ IMDEA Software Institute

Page 2: WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang *, Antonio Nappa +, Juan Caballero +, Thomas Ristenpart *, Aditya Akella * *

2

MotivationAn increasing number services are using clouds

Understanding cloud usage pattern is important

What is the usage pattern of a website?

How many instances are used by a website?

Do tenants leverage elasticity?

Is piratebay using EC2?

Are there OpenVPN servers in EC2?

- Design new services & applications- Design provisioning & scaling algorithm

Page 3: WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang *, Antonio Nappa +, Juan Caballero +, Thomas Ristenpart *, Aditya Akella * *

3

Motivation

We need more measurement tools

Little research about how tenants use public cloudsDeepfield, 2012: 1/3 of daily users, 1% of Internet traffic are associated with AWS He et al., IMC 2013: 4% of the Alexa top million are in EC2/Azure - Answer the question: Who is using public clouds?- Technique: Investage DNS entries for Alexa top websites

and network packet capture data.- No insight into changes to deployment pattern over timeBermudez et al, INFOCOM 2013: Exploring the cloud from passive measurements: The Amazon AWS case

Page 4: WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang *, Antonio Nappa +, Juan Caballero +, Thomas Ristenpart *, Aditya Akella * *

4

ContributionsWe develop a new measurement platform, WhoWas, to facilitate measurement studies of public cloud services

WhoWas

High churn rates of IPs used by

services each day

Most of web services use a

single IP

New software adopted slowly.

Outdated software popular

Quantify growth in usage of EC2 & Azure

Small number of malicious

websites in clouds

Page 5: WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang *, Antonio Nappa +, Juan Caballero +, Thomas Ristenpart *, Aditya Akella * *

The WhoWas Platform

Analysis

Clustering Engine

VPC Map

Feature Generator

IP ranges

TCP SYN Probes

At most 3 probes for an IP

per day

At most two GET requests for an

IP per day

HTTP GET: http(s)://1.1.1.1/

IP=1.1.1.1

Lightweight probing to associate content to IPs over time

5

WhoWasDB

AnalysisAPIs

Page 6: WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang *, Antonio Nappa +, Juan Caballero +, Thomas Ristenpart *, Aditya Akella * *

6

Ethical Measurement Design

• Lightweight, low-frequency probing• Robots.txt checking• Note in the User-Agent• IP exclusion list• Collected data kept private

Page 7: WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang *, Antonio Nappa +, Juan Caballero +, Thomas Ristenpart *, Aditya Akella * *

10/31/2013 11/10/2013 11/20/2013 11/30/2013 12/10/2013 12/20/2013 12/30/2013106K108K110K112K114K116K118K120K122K

Azure

Date

10/1/2013 10/11/2013 10/21/2013 10/31/2013 11/10/2013 11/20/2013 11/30/2013 12/10/2013 12/20/2013 12/30/20131.02M

1.04M

1.06M

1.08M

1.1M

1.12M

1.14M

1.16M

EC2

EC2: 4,702,208 IPs Oct 2013 – Dec 2013 51 roundsAzure: 495,872 IPs Nov 2013 – Dec 2013 46 roundsAbout 900 GB data in total

Data Collection & DataSetsN

o. o

f clu

ster

s

24.4% of all IPs

22.6% of all IPs

22.6% of all IPs

24.3% of all IPs

Overall growth of No. of IPs responding to probes: 4.9% in EC2 and 7.7% in Azure

7

Page 8: WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang *, Antonio Nappa +, Juan Caballero +, Thomas Ristenpart *, Aditya Akella * *

WhoWas Engines--Clustering

WhoWas offers a new clustering heuristic

How to find IPs being operated by the same website?

Webpage Clustering

8

Page 9: WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang *, Antonio Nappa +, Juan Caballero +, Thomas Ristenpart *, Aditya Akella * *

9

WhoWas Engines--Clustering

Feature Extractor

• Title• Keywords• Template• Google Analytics ID• Simhash of HTML textual content• Server version

Fingerprint (six-item tuple)

• Title• Keywords• Template• Google Analytics ID• Server version• Simhash of HTML textual content

HTML contents

For two fingerprints, check if : title1=title2 & keyword1=keyword2 & template1=template2 & server1=server2 & GID1=GID2?

No

Different clusters

Yes Same top level clusters

<IP, Round Number, Fingerprint>

<IP, Round Number, Fingerprint>

Clusters

Unsupervised clustering + Elbow method

Use simhash

Page 10: WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang *, Antonio Nappa +, Juan Caballero +, Thomas Ristenpart *, Aditya Akella * *

10

WhoWas Engines--Clustering

The No. of clusters increased by : 3.3% in EC2 and 6.2% in Azure

EC2: 1,767,072 simhashes 243,164 clustersAzure: 210,418 simhashes 31,728 clusters

Page 11: WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang *, Antonio Nappa +, Juan Caballero +, Thomas Ristenpart *, Aditya Akella * *

11

WhoWas Engines--Clustering

About 80% use 1 IP, 0.1% use more than 50 IPsLarge clusters tend to leverage cloud elasticity

Total #IP Mean #IP/Round Min #IP Max #IP

51,211 33,145 30,624 34,509

15,283 5,597 5,435 5,785

3,869 2,029 1,724 2,228

22,226 1,167 179 2,5018,488 617 57 1,837

Top 5 clusters by average number of IP addresses used per round (EC2)

Page 12: WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang *, Antonio Nappa +, Juan Caballero +, Thomas Ristenpart *, Aditya Akella * *

12

More Results from WhoWas

1. Feature Adoption2. Malicious Activity 3. Cloud Availability 4. Software Adoption

Page 13: WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang *, Antonio Nappa +, Juan Caballero +, Thomas Ristenpart *, Aditya Akella * *

13

More Results from WhoWas

1. Feature Adoption2. Malicious Activity 3. Cloud Availability 4. Software Adoption

Page 14: WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang *, Antonio Nappa +, Juan Caballero +, Thomas Ristenpart *, Aditya Akella * *

14

Virtual Private Cloud Mapping

Host A, Public IP=a

Host B, Public IP=b

DNS

Resolve Host A Resolve Host B

Get a Private IP != a Always Get Public IP b

VPC networksClassic network

Default DNS hostname

=region specific string + IP

EC2 Data Center

Page 15: WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang *, Antonio Nappa +, Juan Caballero +, Thomas Ristenpart *, Aditya Akella * *

15

EC2 VPC usage increase whereas classic decrease

Change over time in classic-only, VPC-only, and mixed clusters in EC2

classic-only VPC-only mixed clusters

Page 16: WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang *, Antonio Nappa +, Juan Caballero +, Thomas Ristenpart *, Aditya Akella * *

16

More Results from WhoWas

1. Feature Adoption2. Malicious Activity 3. Cloud Availability 4. Software Adoption

Page 17: WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang *, Antonio Nappa +, Juan Caballero +, Thomas Ristenpart *, Aditya Akella * *

Lifetime of malicious IP is long

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 910

0.10.20.30.40.50.60.70.80.9

1

Lifetime (days) on EC2

CDF

90+ days!

Webpage from an IP URLs in webpage

60% up for 7+ days

WhoWasDB

Safe Browsing API

IP is malicious

IP is benign

EC2: 1,393 malicious URLs 196 malicious IPsAzure: 14 malicious URLs 13 malicious IPs

17

Page 18: WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang *, Antonio Nappa +, Juan Caballero +, Thomas Ristenpart *, Aditya Akella * *

18

File hosting services are used for distributing malicious contents

Domain # of URLs flagged as maliciousdl.dropboxusercontent.com 993

dl.dropbox.com 936

download-instantly.com 295

tr.im 268

www.wishdownload.com 223

IP rangesMalicious activity history

VirusTotal API

EC2: 2,070 malicious IPs 13,752 malicious URLsAzure: No malicious IPs!

Page 19: WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang *, Antonio Nappa +, Juan Caballero +, Thomas Ristenpart *, Aditya Akella * *

19

Cloud Measurement Challenge and Future

VM1.1.1.1Backend VM

No public IP

Frontend VMPublic IP = 1.1.1.1

VPCVM

No default HTTP(S) Port

Firewall

VM

VM

Default website

Other websites

VMWebsite

VM

Website: deny IP access

Only see a portion of web servers

Only see a portion of web sites’ pages

Lower bound on number of IPs used by web services

Able to find

Fail to find

Page 20: WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang *, Antonio Nappa +, Juan Caballero +, Thomas Ristenpart *, Aditya Akella * *

20

Other results are in the paper!Visit our website:

www.cloudwhowas.orgto get more information!

Page 21: WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang *, Antonio Nappa +, Juan Caballero +, Thomas Ristenpart *, Aditya Akella * *

21

ConclusionWhoWas: new measurement platform Lightweight probing to associate content to IPs over timeUsed WhoWas for several first-of-their-kind measurements:

Growth rates of IP usageIdentification of malicious websitesSoftware adoption rate in clouds…

Questions?www.cloudwhowas.org

Page 22: WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang *, Antonio Nappa +, Juan Caballero +, Thomas Ristenpart *, Aditya Akella * *

22

ConclusionWhoWas: new measurement platform Lightweight probing to associate content to IPs over timeUsed WhoWas for several first-of-their-kind measurements:

Growth rates of IP usageIdentification of malicious websitesSoftware adoption rate in clouds…

Questions?www.cloudwhowas.org