Transcript
Page 1: Large dataset processing in the Cloud

Large dataset processing in the CloudKevin Glenny and GridwiseTech team

Page 2: Large dataset processing in the Cloud

Simplified data oriented system

Internal or external

data sources

applications working on data

Page 3: Large dataset processing in the Cloud

IT systems are constantly growing

Increased numberof users

Increased numberof applications

Increased amountof data

Page 4: Large dataset processing in the Cloud

IT systems are constantly growing

Infrastructure bottleneck

Page 5: Large dataset processing in the Cloud

Example

Electronics manufacturer

24/7 production

Report computation too long

for decision making

2.5 million transactions daily

4TB data to manage

Page 6: Large dataset processing in the Cloud

What is Cloud computing?

„Transparant access to

capabilities using a

pay-per-use

business model”

Benefits:– Dynamic scaling

– Pay-for-use

– Off-shored administration

Page 7: Large dataset processing in the Cloud

What are the delivery models?

SaaS (Software as a Service)– SalesForce.com, 63,00 clients

PaaS (Platform as a Service)– Google App Engine (2008), Microsoft Azure

(2008)

IaaS (Infrastructure as a Service)– Amazon Elastic Compute Cloud, 8.2 million

instances launched since 2006

Page 8: Large dataset processing in the Cloud

Application data processing

Database sharding (MySQL,

postgreSQL etc.)

NoSQL (Google's BigTable,

Amazon's Dynamo etc.)

Data-grid (GigaSpaces XAP, Oracle Coherance, InfiniSpan etc.)

Page 9: Large dataset processing in the Cloud

Data-grid and sharding in the Cloud

All data processing and persistencein the Cloud

Achievements:•Near real-time•Dynamic scaling (applicationand resources)•Pay-per-use•Reduced administration•HA

Page 10: Large dataset processing in the Cloud

Remaining issues

Getting large datasets in and out of the Cloud– Bandwidth limited client side

– Resort to mailing hard drives!

Performance - 2 to 50% slow down

Data security/privacy - trust

SLAs – plan for the worst

Page 11: Large dataset processing in the Cloud

Conclusions

Data oriented systems datasets grow causing bottlenecks

Datasets in the Cloud can be processed using scalable technologies

Challenges remain

Main – how to get the data to the Cloud?


Top Related