appscale talk at sbonrails

Post on 01-Nov-2014

1.997 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

These are the slides from my talk about the AppScale project at the SBonRails meetup. It covers AppScale as well as Google App Engine and the research projects have come out of it, including Neptune, a Ruby DSL focused on computation-heavy workloads.

TRANSCRIPT

The AppScale ProjectPresented by Chris Bunch

(on behalf of the AppScale team)March 7, 2011 @ sbonrails meetup

Thursday, March 10, 2011

Thursday, March 10, 2011

Overview

• Google App Engine

• AppScale - now with 50% Ruby!

• Research Directions

• Neptune - A Ruby DSL for the cloud

Thursday, March 10, 2011

Google App Engine

• A web framework introduced in 2008

• Python and Java supported

• Offers a Platform-as-a-Service: Use Google’s APIs to achieve scale

• Upload your app to Google

Thursday, March 10, 2011

Quotas

Thursday, March 10, 2011

Data Model

• Not relational - semi-structured schema

• Compare to models in Rails

• Exposes a get / put / delete / query interface

Thursday, March 10, 2011

Storing Data

• Datastore API - Persistent storage

• Memcache API - Transient storage

• User can set expiration times

• Blobstore API - Store large files

• need to enable billing to use it

Thursday, March 10, 2011

Be Social!

• Mail API - Send and receive e-mail

• XMPP API - Send and receive IMs

• Channel API - Creating persistent connections via XMPP

• Use for chat rooms, games, etc.

Thursday, March 10, 2011

Background Tasks

• Cron API - Access a URL periodically

• Descriptive language: “every 5 minutes”, “every 1st Sun of Jan, Mar, Dec”, etc.

• Uses a separate cron.yaml file

• Taskqueue API - Within your app, fire off tasks to be done later

Thursday, March 10, 2011

Dealing with Users

• Users API: Uses Google Accounts

• Don’t write that ‘forgot password’ page ever again!

• Authorization: via app.yaml:

• anyone, must login, or admin only

Thursday, March 10, 2011

When Services Fail

• Originally: failures throw exceptions

• Just catch them all!

• Capabilities API: Check if a service is available

• Datastore, Memcache, and so on

Thursday, March 10, 2011

Deploying Your App

• Develop locally on SDK

• Stub implementations of most APIs

• Then deploy to Google

Thursday, March 10, 2011

How to Scale

• Limitations on the programming model:

• No filesystem interaction

• 30 second limit per web request

• Language libraries must be on whitelist

• Sandboxed execution

Thursday, March 10, 2011

Enter AppScale

• App Engine is easy to use

• but we really want to tinker with the internals!

• Need an open platform to experiment on

• test API implementations

• add new APIs

Thursday, March 10, 2011

Enter AppScale

• Lots of NoSQL DBs out there

• Hard to compare DBs

• Configuration and deployment can be complex

• Need one-button deployment

Thursday, March 10, 2011

Storing Data

• Datastore API - AppServers use a database agnostic layer - sends requests to PBServer

• Named for data format: Protocol Buffers

• Memcache API - memcached

• Blobstore API - Custom server

Thursday, March 10, 2011

Be Social!

• Mail API - sendmail (disabled by default)

• XMPP API - ejabberd

• Channel API - strophejs

Thursday, March 10, 2011

Background Tasks

• Cron API - Uses Vixie Cron

• Taskqueue - Separate thread fetches web page

• Both make a single attempt

• Will replace with distributed, fault-tolerant versions

Thursday, March 10, 2011

Dealing with Users

• Users API: Defers users to AppLoadBalancer

• Password reset via command-line tools

• Authorization: no major changes here

Thursday, March 10, 2011

Deploying Your App

• Develop locally on SDK

• Stub implementations of most APIs

• Then deploy to AppScale!

• Use your own cluster or via Amazon

• Command-line tools mirror Amazon’s

Thursday, March 10, 2011

Deploying Your App

• run-instances: Start AppScale

• describe-instances: View cloud metadata

• upload-app: Deploy an App Engine app

• remove-app: Un-deploy an App Engine app

• terminate-instances: Stop AppScale

Thursday, March 10, 2011

Deployment Models

• Cloud deployment: Amazon EC2 or Eucalyptus (the open source implementation of the EC2 APIs)

• Just specify how many machines you need

• Non-cloud deployment via Xen or KVM

Thursday, March 10, 2011

Thursday, March 10, 2011

AppController

• The brains of the outfit

• Runs on every node

• Handles configuration and deployment of all services (including other AppControllers)

• Written in Ruby

Thursday, March 10, 2011

Load balancer

• Routes users to their app via nginx

• haproxy makes sure app servers are live

• Can’t assume the user has DNS:

• Thus we wrote the AppLoadBalancer

• Rails app that routes users to apps

• Performs authentication as well

Thursday, March 10, 2011

AppLoadBalancer

Thursday, March 10, 2011

App Server

• We modified the App Engine SDK

• Easier for Python (source included)

• Harder for Java (had to decompile)

• Removed non-scalable API implementations

• Goal: Use open source whenever possible

Thursday, March 10, 2011

A Common Feature Request

Thursday, March 10, 2011

Database Options

• Open source / open APIs / proprietary

• Master / slave v. peer-to-peer

• Differences in query languages

• Data model (key/val, semi-structured)

• In-memory or persistent

• Data consistency model

• Interfaces - REST / Thrift / libraries

Thursday, March 10, 2011

In AppScale:

• BigTable clones:

• Master / slave relationship

• Master stores metadata

• Slaves store data

• Fault-tolerant to slave failure

• Partially tolerant to master failure

Thursday, March 10, 2011

In AppScale:

• Variably consistent DBs

• Voldemort and

• Both are peer-to-peer: no SPOF

• Voldemort: Specify consistency per table

• Cassandra: Specify consistency per request

Thursday, March 10, 2011

In AppScale:

• Relational:

• Not NoSQL but used like NoSQL

• Document-oriented:

• Targets append-heavy workloads

Thursday, March 10, 2011

In AppScale:

• Key-value datastores:

• MemcacheDB: like memcached but persistent and replicated

• Scalaris: in-memory, no persistence

• SimpleDB: semi-structured but used as key-value (will update this in the future)

Thursday, March 10, 2011

Research Ideas• Placement support

• Monitoring

• Shared memory

• Cost modeling

• Hybrid cloud

• Active Cloud DB

• Disaster Recovery

• Neptune

Thursday, March 10, 2011

Placement Support

Thursday, March 10, 2011

Monitr

Thursday, March 10, 2011

Shared memory

• Since AppServer + DB are co-located, reduce message overhead

• no serialization

• Leverage CoLoRs to do so across languages

• AS is in Python or Java, DBS is Python

• Can be orders-of-magnitude faster

Thursday, March 10, 2011

Cost modeling

• Can we reproduce Google’s cost model?

• We can reproduce memory, network bandwidth in / out, size and types of data

• Can’t reproduce CPU - it’s based on Google’s load, which we can’t capture

• varies based on placement and time of day

Thursday, March 10, 2011

Hybrid Cloud

Thursday, March 10, 2011

Database Agnostic Transactions

• Want to support disparate DBs with ACID

• Leverage ZooKeeper for versioning

• And PBServer as the DB agnostic layer

• Needs strong consistency from DB itself

• And row-level atomicity on updates

Thursday, March 10, 2011

Active Cloud DB

• Need a common interface to DBs

• But not just for Java / Python

• Named after Rails’ ActiveRecord

• Exposes REST interface for DB

• Included in AppScale 1.3

Thursday, March 10, 2011

Disaster Recovery

• People are using App Engine as a production level environment

• Need a way to automatically back up data

• Can leverage this data for data analytics

• Need to also seamlessly switch to AppScale version if App Engine version goes down

Thursday, March 10, 2011

Neptune

• Need a simple way to run compute-intensive jobs

• We have the code from the ‘net

• We have the resources - the cloud

• But the average user does not have the know how

• Our solution: create a domain specific language for configuring cloud apps

• Based on Ruby

Thursday, March 10, 2011

Syntax

• It’s as easy as:

neptune :type => “mpi”,

:code => “MpiNQueens”,

:nodes_to_use => 8,

:output => “/mpi/output-1.txt”

Thursday, March 10, 2011

Neptune Supports:

• Message Passing Interface (MPI)

• MapReduce

• Unified Parallel C (UPC)

• X10

• Erlang

Thursday, March 10, 2011

Extensibility

• Experts can add support for other computational jobs

• Biochemists can run simulations via DFSP and dwSSA

• Embarassingly parallel Monte Carlo simulations

Thursday, March 10, 2011

Compiling Code

• You may not have the binaries, so compile from source!

• Auto-generates makefiles for beginners

neptune :type => “compile”,

:code => “/home/appscale/mpi_nqueens”

Thursday, March 10, 2011

Installing Neptune

• Just use good old ‘gem’:

• gem install neptune

• Current version is 0.0.4, fully compatible with AppScale 1.5

• More info at our web page:

• http://neptune-lang.org

Thursday, March 10, 2011

Wrapping It Up

• Thanks to the AppScale team, especially:

• Co-lead Navraj Chohan and advisor Professor Chandra Krintz

• Check us out on the web:

• http://appscale.cs.ucsb.edu

• http://code.google.com/p/appscale

Thursday, March 10, 2011

top related