an evaluation of distributed datastores using appscale cloud platform

Post on 02-Dec-2014

485 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

An Evaluation of Distributed Datastores UsingThe AppScale Cloud Platform

Presented By- Himanshu Ranjan Vaishnav

TE-42065 (Comp-I)

SEMINAR GUIDE - Prof. Mrs S. S. Sonawani

1

04/09/23

What is AppScale?

AppScale is an open-source implementation of the Google App Engine

cloud platform.

AppScale is an extension of the non-scalable software development kit

that Google makes available for testing and debugging applications.

App-Scale currently supports HBase, Hypertable, Cassandra, Voldemort,

MongoDB, MemcacheDB, Scalaris, and MySQL Cluster datastores.

2

04/09/23

What AppScale Does?

AppScale is a robust, open source implementation of the Google App

Engine APIs that executes over private virtualized cluster resources and

cloud infrastructures including Amazon Web Services and Eucalyptus.

Users can execute their existing Google App Engine applications over

AppScale without modification.

AppScale automates deployment and simplifies configuration of datastores

that implement the API and facilitates their comparison and evaluation on

end-to-end performance using real programs (Google App Engine

applications).

3

04/09/23

AppScale Features

• More Choices of data Stores

• Neptune Language

• MapReduce

• Fault Tolerance

And More

• App Engine Portability

4

04/09/23

Google App Engine

A software development platform

Platform-as-a-service (PaaS)

GAE Datastore

Big Table

A master/slave relationship

5

04/09/23

Continue….

GAE Datastore API provides the following primitives:

For eg.

• Put (k, v): Add key k and value v to table; creating a table if needed

• Get (k): Return value associated with key k

• Delete (k): Remove key k and its value

• Query (q): Perform query q using the Google Query Language (GQL) on a single table, returning a list of values

• Count (t): For a given query, returns the size of the list of values returned

6

04/09/23

Google App Engine APIs

Blobstore API

Channel API

Datastore API

Images API

Memcache API

Namespace API

Task Queue API

Users API

URL Fetch API

XMPP API

MapReduce Streaming API

EC2 API

7

04/09/23

AppScale deployment

AS – App Server

ALB – App Load Balancer

DBS – Data Base Slave Peer

DBM – Data Base Master Peer

8

04/09/23

Multi-tiered approach within AppScale

9

04/09/23

Database Services

Protocol Buffer Server (PBServer)

User/App Server (UAServer)

Blobstore service

Monitoring Services

Neptune

10

04/09/23

APPSCALE DISTRIBUTED DATABASE SUPPORT

Cassandra

HBase

Hypertable

MemcacheDB

MongoDB

Voldemort

MySQL

11

04/09/23

1. Cassandra

Facebook engineers designed, implemented, and released

A hybrid approach

Consistent

Written in the Java and exposes its API through the Thrift software

framework

Supports range queries

12

04/09/23

2. HBase

Developed and released by PowerSet

An official Hadoop subproject

Employs a master-slave distributed architecture

Provides flexible column support

Written primarily in Java, with a small portion of the code base in C

HBase is deployed over the Hadoop Distributed File System (HDFS)

13

04/09/23

3. Hypertable

Hypertable was developed by Zvents

Provide an open source version of Google’s BigTable

Written in C++

RangeServer

14

04/09/23

4. MemcacheDB

Developed by Open source developer Steve Chu

Employs a master-slave approach

Runs with a single master node and multiple replica nodes

Written in C and uses Berkeley DB

15

04/09/23

5. MongoDB

Developed and released by 10gen

Provide both the speed and scalability

Written in C++

Queries are performed using hashtable

16

04/09/23

6. Voldemort

Developed by and currently in use internally at LinkedIn

Eventual consistency

More Developer friendly

Written in Java and exposes its API via Thrift

17

04/09/23

7. MySQL

A well-known relational database

Employ MySQL Cluster

Provides concurrent access to the system

Written in C and C++

18

04/09/23

EVALUATION

Load tables in all databases with 1000 items

Test specifics:

– On Each database put, get, delete, no-op performed

– Considered- light load: one thread, medium load: three concurrent thread,

heavy thread: nine concurrent thread

– Repeat each experiment 5 times

Executes this application in an AppScale cloud

Each node executes with 2 virtual processors, 10GB of disk(max), 4GB of

memory

19

04/09/23

Experimental Results20

04/09/23

Limitations

Persistence

Blobstore Max File Size

Datastore

Task Queue

Mail

Follow a ”deploy on all nodes”

Limited distribution supported

Lake of retrieving the entire

table to run a query

Not released the source code of

the Java App Engine server

21

04/09/23

Future Work

Expand out of the web services domain

– Investigating opportunities in streaming

– Integrated MapReduce support for highperformance computing (HPC)

– Co-locate AppEngines and use shared memory

Additional databases:

– MongoDB, Scalaris, CouchDB

22

04/09/23

Continue…

Extending AppScale with new services for

- large-scale data analytics

- data

- computation intensive tasks

Cloud-agnostic

Integration of mobile device

23

04/09/23

CONCLUSION

Presents an open source implementation of the Google App Engine

(GAE) Datastore API with in a cloud platform called AppScale

The implementation unifies access to wide range of open source

distributed database technologies and automates their configuration

and deployment. However, each database differs in the degree to which

it implements the APIs.

24

04/09/23

DEMO

25

04/09/23

Thank YouAny Questions ??

26

04/09/23

top related