cloudclustering ankur dave*, wei lu†, jared jackson†, roger barga† *uc berkeley †microsoft...

24
CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the Cloud

Upload: reynold-sharp

Post on 12-Jan-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the

CloudClustering

Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga†

*UC Berkeley†Microsoft Research

Toward an Iterative Data Processing Pattern on the Cloud

Page 2: CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the

Wei Lu Jared Jackson Roger BargaAnkur Dave

Page 3: CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the

Background

MapReduce and its successors reimplement communication and storage

But cloud services like EC2 and Azure natively provide these utilities

Can we build an efficient distributed application using only the primitives of the cloud?

Page 4: CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the

Design Goals

• Efficient • Fault tolerant• Cloud-native

Page 5: CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the

Outline

Windows AzureK-Means clustering algorithmCloudClustering architecture– Data locality– Buddy system

EvaluationRelated work

Page 6: CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the

Windows AzureVM instances

Blob storage

Queue storage

Page 7: CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the

Outline

Windows AzureK-Means clustering algorithmCloudClustering architecture– Data locality– Buddy system

EvaluationRelated work

Page 8: CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the

K-Means algorithm

Iteratively groups points into clusters

Centroids

Initialize

Assign points to centroids

Recalculate centroids

Points moved?

Page 9: CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the

K-Means algorithmInitialize

Partition work

Recalculate centroids

Points moved?

Assign points to centroids

Compute partial sums

Page 10: CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the

Outline

Windows AzureK-Means clustering algorithmCloudClustering architecture– Data locality– Buddy system

EvaluationRelated work

Page 11: CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the

Architecture

Centroids:

Initialize

Partition work

Recalculate centroids

Points moved?

Assign points to centroids

Compute partial sums

Page 12: CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the

Outline

Windows AzureK-Means clustering algorithmCloudClustering architecture– Data locality– Buddy system

EvaluationRelated work

Page 13: CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the

Data locality

Multiple-queue pattern unlocks data localityTradeoff: Complicates fault tolerance

Single-queue pattern is naturally fault-tolerantProblem: Not suitable for iterative computation

Page 14: CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the

Outline

Windows AzureK-Means clustering algorithmCloudClustering architecture– Data locality– Buddy system

EvaluationRelated work

Page 15: CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the

Handling failureInitialize

Partition work

Recalculate centroids

Points moved?

Assign points to centroids

Compute partial sums

Stall? Buddy system

Page 16: CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the

Buddy system

Buddy system provides distributed fault detection and recovery

Page 17: CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the

Buddy system

Spreading buddies across Azure fault domains provides increased resilience to

simultaneous failure

1

2

3

Fault domain

Page 18: CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the

Buddy system

Cascaded failure detection reduces communication and improves resilience

vs.

Page 19: CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the

Outline

Windows AzureK-Means clustering algorithmCloudClustering architecture– Data locality– Buddy system

EvaluationRelated work

Page 20: CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the

Evaluation

Linear speedup with instance count

Page 21: CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the

Evaluation

Sublinear speedup with instance sizeReason: I/O bandwidth doesn’t scale

Page 22: CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the

Outline

Windows AzureK-Means clustering algorithmCloudClustering architecture– Data locality– Buddy system

EvaluationRelated work

Page 23: CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the

Related work

• Existing frameworks (MapReduce, Dryad, Spark, MPI, …) all support K-Means, but reimplement reliable communication

• AzureBlast built directly on cloud services, but algorithm is not iterative

Page 24: CloudClustering Ankur Dave*, Wei Lu†, Jared Jackson†, Roger Barga† *UC Berkeley †Microsoft Research Toward an Iterative Data Processing Pattern on the

Conclusions

• CloudClustering shows that it's possible to build efficient, resilient applications using only the common cloud services

• Multiple-queue pattern unlocks data locality• Buddy system provides fault tolerance