an overview of the amazon paas

All rights reserved. Copyright 2012, Transcend Computing.

1

An Overview of the Amazon PaaS

Platform-as-a-Service One of the key characteristics of cloud computing is abstraction, which is the concept of hiding low-level complexity through automation so developers can focus on applications which, at the end of the day, is what really matters to the business. Initially, this abstraction was focused on compute, network and storage infrastructureso called infrastructure as a service (IaaS), which removed the time and complexity of configuring and provisioning infrastructure as the basis for deploying software. But now these abstractions have moved up the stack to encompass OS and middleware platforms (application servers, portal, message queues, etc.), which developers have traditionally set up by hand. Platform as a service (PaaS) enables the full realization of application-centric computing by abstracting away all of the complexity below the application tier. This enables improved business agility through faster deployment of applications and application changes. Today, developers are requesting that their middleware are delivered in this same as a service model.

PaaS is quickly becoming

the preferred style for

software development and

delivery. Amazon has built

upon its IaaS foundation to

create a full-feature PaaS

offering for application

logic, integration, caching

and database

management.


2

Although this concept means different things to different people, common characteristics include:

- Middleware software has been re-written to take advantage of the elastic and resilient nature of modern computing (cheap servers with lots of memory, commodity operating systems, massive storage, horizontal scaling, replicated data, etc.)

- Metered pricing (pay-per-use) is preferred over traditional CPU licensing. - The platforms auto-scale, auto-heal, auto-patch and auto-configure. - The functions of the platform can be called remotely over an IP based

network with an API (HTTP, JSON, REST, XML, etc.) - Application lifecycle management (ALM) tools such as version control, build

management and deployment management are available as a service and integrated into the platform services.

In addition to changes in the characteristics of the platform itself, there may also be changes in how the software is delivered. In many cases PaaS services are hosted by a public cloud provider which is responsible for infrastructure including servers, networking, power, data centers, etc. While these traditional PaaS providers were typically third-party managed hosting or cloud service providers, today large IT shops also deliver common shared platform-as-a-service offerings. In this context, IT is a managed service provider in its own right, beholden to similar (or often more demanding) service level guarantees as a public PaaS provider. The most highly contested attribute of PaaS is multi-tenancy, which describes the level and degree of computational sharing. Typically, low-sharing environments (data centers, servers, platforms, etc.) see lower efficiencies. Due to reduced purchasing power, they have a weaker bargaining position and often must pay higher prices. Also, the lack of scale economies mean that resources (like people and machines) yield lower utilization rates than you may find in a large shared environment. A downside of multi-tenant environment is when a neighbor uses up large amounts of resources , negatively impacting your performance. Think of it like living in a condo: lawn service may be one of many valued convenience, but its the opposite of convenient when your neighbors use up the hot water! There are tradeoffs to shared environments. The Amazon offering has multiple levels of tenancy, implemented at various layers of their stack:

- At the infrastructure layer, a user can reserve a complete server and place their preferred platforms on the server. This is called, dedicated instances.

- Also at the infrastructure layer, a user can put their platforms on regular EC2 instances where the sharing is at the hypervisor layer.


3

- A variation of the previous model is where AWS uses EC2 instances for sharing but locks down the hypervisor and maintains control over it. This is used in several of their PaaS services (RDS, ElastiCache, etc.)

- A final type of tenancy is when the computing model is completely hidden from the user. In this paper, we refer to this approach as encapsulated. In these cases, Amazon is responsible for the availability, scalability, security and other non-functional concerns of the platform.

Some purists may argue that the only kind of PaaS is one that is fully encapsulated. However, we have found that it is beneficial to have choices. For example, by using a service that provisions servers and platforms and exposes some of their details is great when you need to directly interact with the component. It allows developers to use existing engines like MySQL and Memcached. That said, it puts a larger burden on the developer to maintain the scaling, availability, data backups and so on.


4

Support Services There are a number of services that dont technically fall into the PaaS category, nor are they naturally part of IaaS. Typically, these crosscutting services intersect with other services and apply some added functional behavior or management value. Amazon examples include:

- CloudWatch This is the Amazon monitoring service, which is used to

collect data on the health of the other services, record the data and if necessary, trigger events and alarms so that new actions can be taken. Amazon has also built agents for their existing services (RDS, SNS, etc.) to capture their health and report the findings to CloudWatch. This data is available to any user who provisions a platform service.

- CloudFormation This is the Amazon orchestrated provisioning service, which is used to launch entire environments in a predictable and repeatable manner. For example, one might use CloudFormation to provision a multi-tiered application by giving the service a template that describes all of the components and their interdependencies. A single template might launch a load balancer, four compute instances, two databases, set up the host names, define auto-scaling properties, and so on. CloudFormation isnt a traditional piece of developer middleware, but it is commonly used to provision PaaS services as part of multi-tier application architectures.

- Autoscaling As the name implies, autoscaling is a service that is used to increase or decrease the amount of computing resources applied to a task. The service uses data held in CloudWatch (the monitor) to determine if a server is overloaded. When this is the case, the autoscaling service can launch new servers and attach load balancers to those servers to redirect incoming traffic. Conversely, when load decreases the servers are spun down.

- Identity & Access Management Security is another crosscutting concern that affects IaaS and PaaS elements. All of the AWS services are integrated into the IAM service and make extensive use of their policy system.

Infrastructure Services Amazon Web Services is perhaps best known for their IaaS offerings, including compute, network and storage (EC2, Route 53, ELB, Security Groups, Virtual Private Cloud and Elastic Load Balancer). Although these services are not in the scope of this paper, it is worth noting that most large systems that are developed today use a combination of IaaS and PaaS elements together to solve the problem.


5

The Amazon PaaS Services Amazons platform services can be categorized according to their contribution relative to the application architecture:

1. Application Logic-as-a-Service 2. Database-as-a-Service 3. Caching-as-a-Service 4. Integration-as-a-Service

Application Logic-as-a-Service Today, application logic is typically written by hand in modern programming languages like Ruby, Java, PHP or C#. Each language also has frameworks or libraries that are used to accelerate development. For example, the Rails framework remains popular for Ruby developers while Java developers commonly use servlet engines or Spring containers. It is common for a PaaS solution to embrace the use of multiple programming languages and multiple frameworks; Amazon is no different. The primary service used to host and execute application logic is Elastic Beanstalk This service originally focused exclusively on running applications that were written for the Java Virtual Machine and could be executed inside of an Apache Tomcat servlet engine. The service allows a user to upload a .war file (a pre-packaged servlet) and the Beanstalk service takes care of things like managing the JVM, patching Tomcat, adjusting configuration files, auto-scaling the service according to an SLA, managing the dev/test/stage/prod environments (roll forward and roll back) and controlling multiple versions of the users software. Beanstalk applications will often use the other platform services for integration, persistence, security, etc. More recently, the Beanstalk service was extended to support PHP. In this scenario, the unit of deployment is the source code not a compiled unit (like the Java .war file). To make source code transfer simple, Beanstalk also added support for the Git version control system. Development teams that are already using Git can continue to do so and copy their source branches to the Beanstalk service. From here, the source files are picked up and can be executed. Developers that are using an alternative version control system like SVN or CVS will need to take an extra step of bridging their current system with Git. Current criticisms of Elastic Beanstalk include the lack of additional language support (Ruby, Node.JS, C#, etc.), the lack of a continuous build environment like Hudson/Jenkins and the lack of integrated testing frameworks for functional testing,


6

regression testing, stress testing, etc. Despite the limitations, a growing number of third parties are filling the gaps and Amazon is continuing release updates at a frantic pace.

Database-as-a-Service Amazon offers three native choices for databases each with their own advantages and disadvantages. The earliest offering was SimpleDB. This solution was introduced as a simple way to store information persistently by using key/value pairs. SimpleDBs claim-to-fame is that it really is easy to use, mostly because it doesnt have many of the more complicated features developers have come to expect in database management systems. It does excel from an administrative perspective. For example, data is automatically replicated and backed up for the user. The design of SimpleDB embraces encapsulated horizontal scalability enabling applications to generate massive loads against the database without ever worrying about the number of CPUs, memory or other physical resources that are provisioned behind the scenes. Although SimpleDB satisfied many needs, most business applications used a relational database. Amazon responded with Relational Database Service (RDS). Unlike SimpleDB, RDS is not an encapsulated horizontally scaling system as this would require significant changes to the underlying database engines. Instead, RDS gives the users the ability to self-service provision a database and configure it to their needs. The service currently supports most of the popular editions and versions of MySQL and Oracle. Users can specify specific configuration settings for their database including the size of the machine (CPUs and Memory), backup & restore options, the ability to auto-patch the database engine, the publishing of monitoring data and high availability features like the auto-recovery of a database system in a remote data center if the original went down. The third database service offered by Amazon is DynamoDB. This offering is considered a NoSQL database, which means that it doesnt rely on SQL for data definition (create table, etc.) or for data manipulation (select * where). Instead, DynamoDB offers a schema-less database management system. Many view this offering as a replacement for SimpleDB because it has a superset of the functionality while being delivered in the same encapsulated, horizontally scalable manner.

Caching-as-a-Service High-speed caching has become a mainstay in modern computing architectures. A properly implemented caching layer will significantly reduce both latency and increase data throughput.


7

Amazon offers an implementation of a clustered cache by wrapping one of the most popular open source solutions, Memcached. Users are able to launch a cache via self-service provisioning (API or portal). The memcached software is exposed to the developer and commands can be issued directly against it. The Elasticache service offers the ability to associate the caching software with various types of EC2 compute services (# of CPUs, amount of memory to dedicate, etc.) Arrays of instances are combined to create a horizontal scaling effect. A set of nodes that are associated together are known as a cache cluster, which can be managed as a single unit from a scaling and availability and perspective. For instance, if a caching node locks up or goes down, the Elasticache service will automatically replace those instances with new nodes. If the cache is overloaded, alerts can be defined to grow the size of the cluster. Finally, the Elasticache service manages the patching and maintenance of the memcached software. Software updates are applied according to user specified parameters, typically associated with off-peak or after-hours maintenance windows.

Integration-as-a-Service Amazon Web Services currently offers two types of integration services for system-to-system decoupling and messaging. At this time, there is no mechanism to do payload transformations or protocol mediation. The current services are Simple Notification Service (pub/sub communication) and Simple Queue Service (message queue). A key principle to system design is decoupling of modules via messaging. AWS provides an event-based mechanism to allow a publisher to create a topic of interest and then publishes messages related to the topic. Multiple users (or systems) can subscribe to the topic and receive a copy of any published messages. Simple Notification Service (SNS) provides pub/sub (publication/subscription) capabilities inside the AWS cloud. The service is an encapsulated, horizontally scalable offering. Amazon does not indicate which message libraries they use behind the service interface to provide the functionality. Developers can call the service via SOAP- or a REST-based commands and they specify their delivery protocol of choice (HTTP, HTTPS, SMTP, SQS or SMS). After a message has been placed on a topic, the SNS service sends the message to all subscribers. In its current state, SNS does not offer guaranteed delivery notification by confirming receipt of individual messages, nor does it provide guarantees on the timeliness of delivery. SNS should be viewed as an Internet scale pub/sub delivery system that provides best-effort service levels. It should not be used in instances where guaranteed delivery (at least once, exactly once, not more than once) is


8

required such as in financial transactions unless additional guarantees are built around the core service. The service is considered massively scalable and does provide high availability by offering intra-region redundancy and redundant replication of temporarily persisted objects. The service leverages other AWS services such as CloudWatch for monitoring, CloudFormation for orchestration and Identity & Access Management for fine-grained access control. A second integration service offered by AWS is Simple Queue Service (SQS), which allows developers to separate two modules from a load-over-time perspective. For example, if module A were to receive significant load in a short period of time, work requests can be placed in a queue. Module B could then pull items off the queue and begin processing them in order. A common scenario is when modules have different owners (other companies, siloed applications, etc.) and the modules need to communicate. Normally, the modules would be forced to communicate and process loads at the same pace. The message queue enables the two modules to work at different speeds, where the module working at the slower speed will queue the work requests. Using the SQS service, developers can use either a SOAP or RESTful interface to create, delete and inspect queues as well as to add or remove items from a queue. Messages can also be batched allowing a group of messages to be processed together. Each message is locked while its being processed. This prevents multiple consumers from accidentally processing the same item on a queue. The SQS service is an encapsulated, horizontally scalable offering; while this design enables massive scaling, a downside is that the distributed design makes it more difficult to manage the state of messages across nodes in the cloud. AWS has chosen not to implement advanced queuing capabilities like FIFO (first-in-first-out) or priority queues. The assumption is that if users want these features, they will extend the core service to include finer grained management of message arrival and departures. The decision to make SQS a massively scalable, highly available system may have contributed to the decision to not support existing protocols like STOMP or AMQP. Although neither SQS nor SNS try to meet JMS (Java Messaging Service) requirements, they do satisfy several of the API mandates and libraries are available. Simple Workflow Services (SWS) is a recent addition to the AWS PaaS suite. Architects often break large complex applications into multiple smaller modules. These modules are then called one at a time based on the results (or state) of the prior call. SWS manages the distributed state and facilitates the execution of multi-step applications. With workflow as part of the service name, many think this offering is human workflow or BPM; this isnt the case. It could serve as the engine for a traditional BPM solution but in its current low-level form, it would be cumbersome to build end-user application with it. Instead, it should be used as the


9

coordinator of distributed execution of system tasks with dependencies, concurrency, pre-defined ordering and ordering based on state.

Summary findings While the Amazon cloud is best known for the original EC2 infrastructure services, the majority of the recent releases have been in the platform services space. This is consistent with the growing belief that IaaS is necessary, but not sufficient; the real value in enabling application-centric computing models come from innovations in the PaaS space. Although Amazon doesnt publish revenue figures on their cloud offering many have developed models that project impressive usage and growth rates. Advanced users are increasingly expanding the breadth of the platform services they rely upon because of their convenience, accessibility and low price. Although we cant substantiate it with data, Transcend believes that Amazon currently has the largest PaaS offering when measured by annual revenue, total number of users or total compute hours. By virtually any measure, the AWS PaaS offering is a market leader. Perhaps more importantly, Amazon has demonstrated a strong commitment to this space and a desire to innovate and lead at progressively higher layers of the stack. Based on its impressive vision and unrivaled ability to execute, we believe Amazon will parlay its IaaS dominance into a similar position of strength in PaaS.


10

About Transcend Transcend Computing is an innovator in Amazon Compatible Environments (ACE) for public, private and hybrid cloud computing. Transcend was formed to help developers, enterprises and managed service providers to capitalize on the momentum of Amazon Web Services. StackStudio is a visual, drag-and-drop online development environment for assembling multi-tier application topologies using the Amazon CloudFormation format. Application stacks assembled with StackStudio are ready to run on Amazon Web Services (AWS) and on other public and private ACE platforms. These stacks can then be shared with other developers in StackPlace, an open social architecture community sponsored by Transcend Computing. StackPlace allows developers to create, contribute, consume and collaborate on ACE-compatible application topologies.

an overview of the amazon paas

Documents