platform as a service standard for hadoop environment

Post on 11-Jul-2015

54 Views

Category:

Engineering

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Platform as a Service Standard for Hadoop Environment

Presented By : Abhay Nitin Pai

Content

Why PaaS Standard for Hadoop ?

Literature Survey

My Proposal

Conclusion

References

Why PaaS standard for Hadoop ?• Enterprise Level data is increasing rapidly, especially by Cloud Applications

• Storing, Managing and to perform a Mining on the Enterprise Data need tools like Hadoop

• Setting up Hadoop can become a tedious task.

• Currently Hadoop Environment is provided by AWS known as Amazon Elastic Map Reduce

• Other IaaS providers are yet to bring out Hadoop Environment as a service to the customers

• Thus this would be a right time to make out a standard for deploying HadoopEnvironment

Literature Survey • Cloud Storage vs Traditional Storage [2]

• Cloud Storage provides High Performance computation, Transaction, Processing application and Multiple types of network storage services

• Cloud Storage mode can provide High security, High reliability, High Efficiency, Suitable for handling Large Scale users and complex Business network environment

• Cloud Storage mode not only provide traditional file access methods but also can support massive data management and provide public services support functions to facilitate cloud data storage system data Management and Maintenance.

[2]

[2]

[3]

• MapReduce algorithm was developed by google and implemented by Yahoo. Inc. [1]

• Yahoo created their own Hadoop environment with 3500 Nodes and 25000 VM’s; on which they ran a MapReduce algorithm over 25 PB of data [1]

(1 000 000 000 000 000) Bytes !

• Well known Virtualization Tools are : Xen Hypervisor, VMWare Vspeare, KVM, Microsoft’s HyperV [1]

• Performance depends on number of VM’s, not on Physical Servers : Because of I/O controller on VM Hypervisor [1]

• Performance of I/O intensive jobs is more sensitive to the virtualization overhead than that of CPU-Intensive [1]

• For I/O intensive jobs the best practice is to increase the number of VM’s [1]

• KVM has about : 7 % write and 0 % read degrade [1]• Xen has about : 15 % of degrade in both read and write [1]• VMWare reported some unknown performance improvement [1]• The point is………..

My Proposal

• Server Side Daemon Process• Client Side Common Architecture• Cloud Controller• XML SOAP like Request and Response with CLI and

API features• VM Templates• Scripts for VM• Job Store• Job Templates• Benchmark Reports

Architectural Components

Conclusion

• As of now, Hadoop is just another Platform to be provided• In future, we can take any kind of applications and provide it as a PaaS• This Architecture for PaaS will give a general standard for developing PaaS Clusters• For a company or an individual, setting up a platform to deploy application is a

tedious job• Also testing out each and every PaaS service by various cloud providers is not a

feasible task• Shifting of PaaS would lead to porting to the entire application• Thus a common architecture would fulfill the need

References1. “Design and Performance Evaluation for Hadoop Clusters on Virtualized

Environments” by Masakuni Ishii, Jungkyu Han and Hiroyuki Makino, ICOIN 2013, IEEE 978-1-4673-5742-5

2. ”Research on Hadoop Based Enterprise File Cloud Storage System”, Da-Wei Zhan,Fu-Quan Sun,Xu Cheng and Chao Liu

3. “MapReduce : Simplified Data Processing on Large Clusters”, by Jeffrey Dean and Sanjay Ghemawat, Google Inc.

4. https://developer.yahoo.com/hadoop/tutorial/

Some Interesting Facts

top related