TEKsystems Educations Services PresentsCloud Computing II
• VirtualBox (https://www.virtualbox.org/wiki/Downloads)
64-bit OS
• Classfiles (Large zip containing a VM)
• Unzip to a folder on your computer
Cloud Computing
What You Will Need…
Upon completion of this course, you will be able to:
• Explore advanced infrastructure topics such as:
• elasticity, availability, reliability, and orchestration
• Examine open source and private cloud offerings such as OpenStack
• Utilize distributed processing frameworks such as Apache Hadoop
• Create PaaS-based applications with Openshift and others using Java, Python, and interface-driven approaches
• Explore security, identity access management techniques
Cloud Computing
Course Objectives
Session 1
• The State of the CloudReview, Key Players, Trends
• Private and Open Source Cloud Platforms
Session 2
• Working with Distributed Processing Frameworks
Cloud Computing
Course Agenda
Session 3
• Cloud Best PracticesScalability, Availability, Elasticity
• Exploring PaaS with Openshift
Session 4
• PaaS Continued
• Cloud Security TopicsIdentity Access Management
Cloud Computing
Course Agenda
Cloud Computing
Instructor Introduction
• Name
• What you work on
• Reason for attending
Cloud Computing
Student Introductions
• Sign-in Sheet
• Training Manual
• Start / Stop Time
• Breaks and Lunch
• Questions and Answers
Facilities and Logistics
Module 1: The State of the Cloud
The State of the CloudImplementing IaaSDistributed ProcessingThe PaaS ModelSecurity, Standards, and Governance
• Brief Review
• Cloud Principles
• Key Players, Products, and Services
• Trends
Cloud Computing
Overview
• Virtualization / Cloud Technologies
• Hypervisors, Computer Ring Security
• Horizontal vs. Vertical Scaling
• Type I, II, Paravirtualization, Full Virtualization,
Hardware Acceleration
• Virtual Appliances
Cloud Computing
Brief Review
• 3 primary modes for server virtualization:
• Full virtualization
• Paravirtualization
• Hardware acceleration
Cloud Computing
Virtualization Types
Ring 0: Kernel
Ring 1: Device DriversRing 2:Device Drivers
Ring 3:Applications
• Amazon Web Services (AWS)
• Elastic Cloud Compute (EC2)
• Simple Storage Service (S3)
• Elastic Beanstalk
• SimpleDB
• Availability Zones
Cloud Computing
Brief Review (continued)
• Some believe in the 5-3-2 principle
• It defines (for clouds):
• 5 key characteristics
• 3 delivery methods
• 2 deployment models
• These can be summarized as follows…
Cloud Computing
Cloud Theory
Cloud Computing
5-3-2 Principle
On-demandself-service Broad
Network Access Resource Pooling(location transparent)
Rapid Elasticity
Metered Service (pay-per-use)
SaaSIaaS PaaS
Public Cloud Private Cloud
Cloud Computing
Public Cloud Key Players (IaaS)
Amazon Web Services
Terremark
HP
IBM SmartCloud
Cloud Computing
Public Cloud Key Players (PaaS)
Force.com
Wyaworks
Cloud Computing
Public Cloud Key Players (SaaS)
IntacctTaleo
Rollbase
Cloud Computing
Private Cloud Key Players
Eucalyptus Systems
1. Cloud Management Services– Cloud Federation Services
– One-click deployments, monitoring, alerts, scheduling, logging, auditing tools
2. Emphasis on better availability, stronger SLAs3. More Open Source Competition
– OpenStack vs. Eucalyptus vs. Commercial Products
4. Fragmentation of PaaS markets– Vendors providing different types of PaaS offerings
5. Big Data use within cloud services
Cloud Computing
5 Trends in the Cloud
Cloud Computing
Open Source Cloud-Related Products
DeltaCloud
• 5-3-2 defines cloud characteristics, service, and deployment models
• Hundreds of companies now provide products and services at all levels of the service model stack
• Open source tools continue to evolve and mature
– Tools are rapidly being adopted by larger companies and incorporated into commercial offerings
Cloud Computing
Summary
Module 2: Implementing IaaS
The State of the CloudImplementing IaaSDistributed ProcessingThe PaaS ModelSecurity, Standards, and Governance
• Cloud Platforms, Private Clouds• Open Source Cloud• Distributed Processing• Cloud Implementation Best Practices• Cloud Orchestration
Cloud Computing
Overview
• Advantages of implementing private clouds:– Increased utilization of assets
– Security (trust issues)
– Easier integration (with on-site systems)
– Control of the cloud operating environment
– Less likelihood of vendor lock-in
• Like public clouds, private clouds typically also meter client usage
Cloud Computing
Private Clouds
• Currently, several products have evolved that are competing for the open source cloud market
• Eucalyptus - Private IaaS cloud, compatible with EC2 & S3
• Open Nebula – a cloud virtualization platform
• CloudStack – Cloud.com's offering incorporating OpenStack technologies (code base)
• OpenStack• Designed for private cloud and public cloud developments• First released: Oct 2010• Combined effort of Rackspace Cloud and NASA Nebula Cloud• Provides storage, management tools, and virtualization capabilities
Cloud Computing
The Open Source Cloud
Cloud Computing
Eucalyptus Cloud Environment
Eucalyptus offers an open source cloud
Solution allowing users to provision their own resources via a UI against a company's on-premise data center
• Another configuration recognized by the cloud community is a hybrid cloud
• A hybrid cloud has elements of both a private cloud combined with a public cloud
• One way to accomplish this is via Amazon's VPC (Virtual Private Cloud) service
• VPC use cases:– Provisioning test environments
– New branch office/units (virtual desktops)
Cloud Computing
Virtual Private Cloud
Cloud Computing
Remote Cloud to Enterprise
Enterprise
Data Center
VPCSubnet
VPN Gatewayinternet, VPN connectionCustomer
Gateway
AWS VPC provides one cloud and VPN connection.
Within the cloud, define up to 20 subnets, using conventional CIDR notation
Subnets are connected in a star topology with a single virtual router between them
• OpenStack is a collection of tools that deliver a highly scalable cloud operating system
• It is free to use under Apache 2.0 license
Cloud Computing
OpenStack Technologies
• Strong support from the business community
Cloud Computing
OpenStack Commercial Backing
• Requirements:• Most Linux distros, targeted for Ubuntu
• Hypervisors: Xen, XenServer, Hyper-V, KVM, ESX
• Version Releases• Austin (version 1) released: Oct 2010
• Grizzly (v7): Apr 2013
• Havana (v8): Oct 2013
• Icehouse (v9): Apr 2014
• Juno (v10):Oct 2014
• Kilo (v11):Apr 2015
• Liberty(v12) Oct 2015
Cloud Computing
OpenStack Versions/Requirements
• Work from Public Clouds
• Install from a "vanilla" script
• Install from the ground up
• Install from build scripts
Cloud ComputingInstallation Options
http://docs.openstack.org/juno/install-guide/install/apt/content/
http://docwiki.cisco.com/wiki/OpenStack
Cloud Computing
OpenStack Components
Cloud Computing
OpenStack Compute Architecture
• nova-api – often referred to as the cloud controller, initiates most orchestration efforts
• nova-schedule – handles a VM creation request determining where best to create it
• nova-compute – a worker that creates and destroys VMs
• glance-registry – stores image metadata• nova-network – sets up networking tasks (i.e. IPs)• nova-volume – similar to AWS EBS, maintains
instance snapshots
Cloud Computing
Compute Daemons
OpenStack Nodes
• What is Neutron?
Cloud Computing
OpenStack Neutron"Network-connectivity-
as-a-service"
Set of supported plugins:
Open vSwitchCiscoLinux BridgeNicira NVPRyu
NEC OpenFlow.Big Switch, Floodlight REST Proxy. PLUMgridHyper-V Plugin. Brocade Plugin. Midonet Plugin.
• OpenStack has two main networking modes:• Fixed IPs
• IPs are assigned to instances and are fixed until instance terminates
• Flat & Flat DHCP Mode
• A single-global network for new instances• VLAN-DHCP Mode
• Segmentation mechanism where each tenant (project) can provide its own private network
• Floating IPs• IPs are addresses that are dynamically assigned but can be
reassigned to another instance at any time
Cloud Computing
OpenStack and Networking Modes
Openstack VMs with Neutron
• Neutron provides a bridge adapter (br100 in the image) as a gateway to the VMs running on a particular host
Openstack with Neutron
Horizon(Browser) Nova-API
EC2 APIOpenStack API
REST API
Asynchronous Message Queue
KVM, Xen, …
Nova Compute Nova Network
Neutron NW Mgr
Neutron API
Neutron PluginOpen
vSwitch
allocate
allocate
POST, GET, PUT, DELETE
Neutron Agent
VM1
VM2libvirt orXEN API
• Openstack Heat is responsible for orchestration within Openstack
• It implements an orchestration engine to launch multiple composite cloud applications based on templates in the form of text files that can be treated like code.
– A Heat template describes the infrastructure for a cloud application in a text fileo servers, floating ips, volumes, security groups, users
– When you need to change your infrastructure, simply modify the template and use it to update your existing stack.
Cloud Computing
OpenStack Heat
• Heat templates are text files written in YAML to describe a configuration:
description: Simple template to deploy a single compute instance
resources:
my_instance:
type: OS::Nova::Server
properties:
image: cirros-0.3.3-x86_64
flavor: m1.small
key_name: my_key
networks:
- network: private-net
Cloud Computing
OpenStack Heat Templates
• Openstack Ceilometer is responsible for collecting measurements of the utilization of the resources comprising deployed clouds
– physical and virtual resources
• Ceilometer persists these data points for subsequent retrieval and analysis, and trigger actions when defined criteria are met
– Meters, Samples, Statistics, Pipelines, and Alarms are used to organize Ceilometer functionality
Cloud Computing
OpenStack Ceilometer
• Openstack Swift is the object store for Openstack
• Stores blocks of data and makes it available to users
• Swift is a widely-used and popular object storage system provided under the Apache 2 open source license
• Requests are made via HTTP using a RESTful API.– GET,PUT,POST,DELETE
Cloud Computing
OpenStack Swift
• Cinder is a volume manager– Volumes are hard drives mounted on Machines
– Under the most common scenario, the Cinder volumes provide persistent storage to guest virtual machines
• VMs start and stop frequently– The data in a Volume can persist the lifecycles of a VM
Cloud Computing
OpenStack Cinder
• Exploring OpenStack
• Refer to exercise 1 in the student exercises
Cloud Computing
Exercise 1
• Openstack has a set of command line tools• nova xxx
• keystone xxx
• neutron xxx
Example commands:
Cloud Computing
OpenStack Commands
keystone user-create --name=admin --pass=ADMIN_PASS [email protected]
keystone user-list
keystone user-delete
openstack project createopenstack project delete
Add a new user
List users
Delete user
Project
Cloud Computing
OpenStack Admin Console
OpenStack Dashboard project called Horizon
Cloud Computing
OpenStack Admin Console Cont.
• Projects - organizing of servers and resources
• Admin - Manage Openstack
• Identity - Create and manage groups and users
Cloud Computing
OpenStack Admin VMs
• Images are software loads -- Operating Systems
• Flavors define types of machines
Cloud Computing
OpenStack Networks
• The console is an interface to define networks
• Routers
• ip ranges
• OpenStack can be complex to configure. • Log files are often key to diagnosing a broken
installation• Log files are under: /var/log/**projectname
– var/log/nova
– var/log/glance
– var/log/cinder
– /var/log/keystone
Cloud Computing
OpenStack Resources
Cloud ComputingBest Practices for Cloud Implementation
These are discussed…
• Assume the cloud will fail
• Utilize Elastic IP services• When a server fails, elastic IPs can instantly remap to a set of servers
• Also useful for application upgrades/updates
• Incorporate multiple availability zones
• Eliminate single points of failure
• Use automated backups for databases
• Take snapshots of application instances
Cloud Computing
Designing for Failure
(see next slide for more)
• Accomplished at multiple levels: geographic, data center, application, and infrastructure
• Ensuring no single points of failure exist
Cloud Computing
High Availability
Infrastructure HADatabase1
Database2WebServer2
WebServer1LoadBal1
LoadBal2
failover
Creating a Load Balancer on AWSallows for defining availability zone and health check valuesCreate Placement Groups
(clusters) on AWS to achieve this effect
• Rule of thumb: build for 30-40% capacity beyond estimated requirements
• Estimate by Peak Bandwidth, Concurrent Users, or Application Sizing
• Peak Bandwidth:
• Acquire estimations from monitoring s/w, monthly traffic logs
• Concurrent Connections
• How much bandwidth does each user consume on average?
• Application Size
• Page sizes and requests per page per sec
Cloud Computing
Capacity Planning
• There are 3 ways to implement elasticity:
• Scaling at fixed intervals
• Event-based scaling
• On-demand scaling• The requires a system that
scales without human intervention
• Automate builds and deployments– Use services to monitor system metrics
– Incorporate tools such as Chef, Puppet, CFEngine
Cloud Computing
Implementing Elasticity
New Terms:Spin-up Elasticity– time it takes to spawn a new instanceSpin-down Elasticity – time it takes to shut down instances that are no longer needed(on EC2 this is 1 minute and 1 hour respectively!)
ChefPuppet
AWS CloudFormationRightScaleAbiquoenStratus
AWS CloudWatchCloudKickNetIQScienceLogicZenoss
RightScaleKaavoScalrMorph
Cloud Computing
Cloud Infrastructure Management
Provisioning Configuration Management
MonitoringAutomation / Orchestration
• Many companies have products that automate cloud-based administrative tasks
• Examples:• Verify proper authentication,
provisioning new instance and storage resources, notify upon completion
• Or, automatically scale resources upon changes in load
Cloud Computing
Automation and Orchestration
Tools come in flavors:- Config Mgt (Chef, Puppet, Juju)- Mgt Console Based (RightScale, Abiquo, enStratus)- Template-based (RightScale, CloudFormation)
Cloud Computing
Resource Orchestration• Describes the coordination of services to allow
for business process workflowo provision/manage resourceso reproduce deployments and test environments
Cloud Computing
Automation Products
define allocation limits
Other automation tools/vendors:RightScaleAWS CloudformationKaavoScalrenStratusTidal Enterprise Orchestrator
• Create loosely coupled components
• Use REST-based services, asynchronous calls
• AWS recommends a "GrepTheWeb" approach
Cloud Computing
Decouple Components
• Data that changes infrequently, should be cached on the edge
• Video, audio, CSS, PDFs, JavaScript files, static HTML
• Use content delivery services to cache and deliver (CDNs)
Cloud Computing
Static Data Close to the User
• Keep dynamic data as close as possible to your computing instances
• Reduces latencies
• Decreases costs, data in/out is metered by the Gb
• In-cloud data transfer is free
• Perform processing within same availability zone• Move data into the cloud before processing
– Use external services such as import/export services
Cloud Computing
Dynamic Data Closer to Instances
• Cloud solutions force a change in paradigm
• Updates to servers at 2:00am on Saturday is no longer necessary
• Run servers continuously in parallel• Shift IP addresses to new instances as needed,
• Shift them back afterwards
• Regression/Unit Tests require servers to be provisioned
for only a short time
• On-premise servers may sit idle for most of the day
Cloud Computing
Think Parallel
• Many design rules have already evolved with respect to the cloud including: designing for failure, how to implement elasticity, and placing data close to where it will be used
• OpenStack is a newer contender in the private cloud open source market
• Orchestration and automation tools are appearing quickly to help simplify administrative tasks
Cloud Computing
Summary
• Exploring OpenStack Swift
• Refer to exercise 2 in the student exercises
Cloud Computing
Exercise 2
Module 3: Distributed Processing
The State of the CloudImplementing IaaSDistributed ProcessingThe PaaS ModelSecurity, Standards, and Governance
• Distributed Processing• Introducing Apache Hadoop• MapReduce
Cloud Computing
Overview
• The last few years has seen companies store any and all information passing through their networks– Shopping sites store much more information than just what users
are purchasing
– Search sites store every possible piece of information
• The infrastructure required to store all this used to be cost prohibitive– storage costs have dropped
– Cloud providers are plenty
– Data is a goldmine of value to most companies
Cloud Computing
The Data Explosion
• In cloud solutions running hundreds of instances, how are large computational tasks accomplished?
– Ex: Google searches returns queries in less than a second
• Ans: Via a distributed processing system
• A Distributed Processing System requires often utilizes a distributed file system
– NFS is the most well-known distributed file system
Cloud Computing
Distributed Processing
• NFS is inadequate to handle massively scalable architectures utilized in grids and clouds
– It is file based thus limited to storage on a single machinescapacity
• HDFS (Hadoop File System) is designed to overcome NFS shortcomings taking advantage of large scale nodes
– Breaks files you specify into 64Mb chunks (blocks)
– Distributes the blocks to machines in the cluster
– Replicates the blocks on two other machines along the way
Cloud Computing
Distributed File Systems
• Apache Hadoop (introduced earlier) utilizes a– Distributed file system (HDFS)
– MapReduce algorithm for reliable parallel processing
– Cassandra – a scalable multi-master database
– Avro – data serialization system
– Pig - A dataflow language running on HDFS
– Hive – A data warehouse supporting SQL syntax which is converted into MapReduce Jobs
– Hbase – a distributed column (object) style database
Cloud Computing
The Hadoop "EcoSystem"
• Hadoop can be hosted on AWS S3 file system– MapReduce algorithms can be run on EC2 servers
– Data is read from server instances and written back to S3
Cloud Computing
Hadoop and MapReduce
Clusters of Yahoo Search servers running MapReduce Algorithms
• MapReduce was created by Google to solve issue of searching massive amounts of data
– By 2014 was over 1 billion web sites online
– Required thousands of servers
– Cost of servers became expensive so cheap servers (x86 architecture machines) were sought
– MapReduce was implemented across these thousands of machines
Cloud Computing
The Need for MapReduce
• MapReduce originates from functional programming
• map():
• reduce():
Cloud Computing
MapReduce - Functional Programming
def doubleIt(val): return 2*valresults = map(doubleIt, [1, 2, 3])
[2, 4, 6]
def sum_reducer(val1, val2): return val1 + val2
print reduce(sum_reducer, results)
12
Cloud Computing
HDFS Architecture
NameNode
DataNode
DataNode
Namenode server (master)opens, closes, renames files and directories
Uses a master/slave architecture
Datanodes (slaves)perform read/write of blocksof data to clients
• Hadoop processes tasks using a master/slavearchitecture
Cloud Computing
Hadoop: HDFS & MapReduce
Cloud Computing
Hadoop Commands• Because HDFS is a different file system than the native
OS, it uses a different command set to manage it
– The hadoop script in the bin directory contains the commands
– The syntax is:
where moduleName can be either dfs or dfsadmin for HDFS related tasks
Examples: bin/hadoop dfs –ls /
bin/hadoop dfs –mkdir /task
bin/hadoop moduleName –cmd args
Cloud Computing
Other Hadoop Commands• Inserts myFile into HDFS calling it file2
– Note: if file2 is a directory, it will create file2/myFile
• Display contents of file2
• Retrieves a file from HDFS putting it in the local FS
bin/hadoop dfs –put myFile file2
bin/hadoop dfs –cat file2
bin/hadoop dfs –get file2 localFile2
• Three XML files are commonly used to help configure properties for hadoop deployments:
• core-site.xml
• mapred-site.xml
• hdfs-site.xml
Cloud Computing
Hadoop Config Files
conf/hadoop-env.sh also contains environment specific details that may be edited (such as JAVA_HOME)
contains info about location of namenode
job tracker info, list of data nodes, etc.
path locations for datanodes where blocks will be stored
• Datanodes are given the location of the namenodeserver in their config files (core-site.xml)
– When started the datanodescontact the namenode, allowing them to be dynamically added to the list for job processing
Cloud Computing
Adding Nodes to the Cluster
Namenode
Datanode
Datanode
Datanode
• To run the example:
• Files in the input directory are read and counts of words are written to the output directory
• It is assumed that both inputs and outputs are stored in HDFS – If your input is not in HDFS, but rather a local file
system somewhere, copy the data into HDFS using:
Cloud Computing
Running the WordCounter
./hadoop jar hadoop-*-examples.jar wordcount[-m <#maps>] [-r <#reducers>] <in-dir> <out-dir>
./hadoop dfs -mkdir <hdfs-dir>
./hadoop dfs -copyFromLocal<local-dir> <hdfs-dir>
Cloud Computing
WordCounterimport org.apache.hadoop.fs.Path;import org.apache.hadoop.conf.*;import org.apache.hadoop.io.*;import org.apache.hadoop.mapreduce.*;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class WordCount {
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {private final static IntWritable one = new IntWritable(1);private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throwsIOException, InterruptedException {
String line = value.toString();StringTokenizer tokenizer = new StringTokenizer(line);while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());context.write(word, one);
}}
}
Cloud Computing
WordCounterpublic static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values, Context context) throws IOException, InterruptedException {int sum = 0;while (values.hasNext())
sum += values.next().get();context.write(key, new IntWritable(sum));
}}
public static void main(String[] args) throws Exception {Configuration conf = new Configuration();Job job = new Job(conf, "wordcount");
job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);job.setMapperClass(Map.class);job.setReducerClass(Reduce.class);job.setInputFormatClass(TextInputFormat.class);job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);}
}
• A number of providers offer cloud solutions ideally suited for private-based clouds– Open Source options are ideal choices
• Hadoop is a framework for implementing computational services scaled across many servers
• MapReduce makes scalable operations possible
Cloud Computing
Summary
• Utilize the virtual CentOS image to configure and run an Apache Hadoop application – Step to configure and run:
o Launch guest OS
o Create a Hadoop user, create ssh keys
o Set up ssh (secure shell)
o Obtain hadoop, configure the XML files
o Set Hadoop environment (Java & Hadoop home)
o Run the daemons
o Import file into HDFS
o Run job, view results
Cloud Computing
Exercise 3 – Apache Hadoop
Module 4: The PaaS Model
The State of the CloudImplementing IaaSDistributed ProcessingThe PaaS ModelSecurity, Standards, and Governance
• PaaS Subcategories
• Google App Engine
• Other PaaS Providers
• Private PaaS Considerations
Cloud Computing
Overview
• PaaS: environments that support development and runtime frameworks
– Development tools may be hosted with the PaaS providero Ex: Metadata aPaaS, or
– Created using local IDEs and deployed into the PaaS cloudo Ex: Framework aPaaS
• By 2015, cloud development solutions are growing faster than on-premise development
Cloud Computing
PaaS Overview
• PaaS can be broken into several subcategories:
– Application platform as a service (aPaaS)o Instance aPaaS (Azure, Elastic Beanstalk), o Framework aPaaS (GAE, Heroku, Djangy), o Metadata aPaaS (Force.com, OrangeScape)
– Software infrastructure as a service (SIaaS)o Offer partial cloud development environments,not full cloud platforms
o Ex: AWS SimpleDB, AWS SQS, MS SQL Data Services, Akamai, AWS Cloudfront, AWS IAM
Cloud Computing
PaaS Subcategories
Cloud Computing
Public Cloud Key Players (PaaS)
Force.com
Wyaworks
• GAE is a platform for creating web applications
• No server setup, config, or management
• Python & Java are the primary languages supported
o Python 2.7 and Java 6 runtimes
o Any JVM-based languages can be used (JRuby, Groovy, Scala, etc.)
o SpringMVC, Struts 2, most Python web frameworks (including Django)
Cloud Computing
Google App Engine
http://cloud.google.com/appengine/
Cloud Computing
Open Source OpenShift
• OpenShift is a platform for hosting PAAS
applications
• OpenShift Origin is an opensource implementation of a
PAAS product
• Can be installed and run
• http://www.openshift.org/
• Most commonly run within a Docker Container
Cloud Computing
RedHat OpenShift
• RedHat OpenShift is a platform for hosting PAAS
applications based on OpenShift Origin
• A hosted OpenShift implementation
• Create accounts, upload and install applications
• http://www.openshift.com/
• Many clients provide automated upload and deployments
Apps deployed with Git• Up to 3 applications• No Expiration• Apps with dependencies must
install "cartridges"• Add-ons to support features
Cloud Computing
OpenShift hosted Requirements
Pricing:Free
Requirements:
• Apps run within a secure sandbox environment
• Independent of hardware and OS physical locations
• Apps can read but not write to the file system• Must use provided services for persistence
• Apps may only respond to standard HTTP requests using
standard ports
• Code may only respond to web requests or scheduled tasks
• Cannot spawn subprocesses
Cloud Computing
PAAS Secure Sandbox
JEE (Jboss)
PHP
Python
Ruby
NodeJS
Drupal
WordPress
Cloud Computing
OpenShift Hosted Languages
• Current PaaS trends are toward development of more open solutions (non-framework specific)
• New PaaS clouds avoiding vendor lock-in
• The Open PaaS market has become competitive• Red Hat OpenShift
• VMWare Cloud Foundry
• OpenCloud CloudSwing
• DotCloud
Cloud Computing
Open PaaS
• A private PaaS is one in which can be deployed into your own private cloud
• Cloudbees, Cumulogic offer private PaaS clouds
• Private PaaS solutions can bring large companies with hundreds/thousands of developers onto a common platform
• Saves infrastructure costs, dev-time costs
Cloud Computing
Considering A Private PaaS
• Heroku is a polyglot aPaaS cloud application solution
• It is a multi-tenant hosting environment
• Developers create apps in Java, Clojure, Scala, Python, Ruby, Node.js
• Uses a command-line interface and git (decentralized revision control system) to deploy apps into the cloud
Cloud Computing
Heroku
• Cloud Foundry is an open source PaaS platform released under Apache License 2.0
• VMWare offers 3 products:• Cloudfoundry.com – a service providing online PaaS cloud
capabilities
• Cloudfoundry.org – a community where you can download the software for your own use
• Micro Cloudfoundry – a stand-alone version to locally develop solutions for deployment later
• Can develop Grails, Ruby, Java, Node.js
Cloud Computing
VMWare Cloud Foundry
• Another open cloud PaaSplatform running on Amazon EC2
• Easy deployment using Python admin and command-line tools
• Wide range of language/DB choices
• Free account sign up
Cloud Computing
DotCloud
How to distribute applications
• The cloud provides a very flexible runtime for applications– The application can be moved around and the
environment scaled automatically
• Developers will often use local tools for writing and editing code
– Applications can be tested locally (or remotely) but the process has to be simple and repeatable
• Git is a very common tool for distribution of applications
Why Git?
• Git has many advantages over earlier systems such as CVS and Subversion– More efficient, better workflow, etc.
– distributed nature is implicit
– See the literature for an extensive list of reasons
• Best competitor: Mercurial– Much less popular than Git
• Many cloud PAAS products are based on GIT for version control
Why Git?
• Linus Torvalds uses BitKeeper to manage Linux code
• Ran into BitKeeper licensing issue
– Liked functionality– Looked at CVS as how not to do things
• April 5, 2005 - Linus sends out email showing first version
• June 15, 2005 - Git used for Linux version control
Using Git
• Git is an application that must be installed• Git needs a repository
– git clone remoteurl
– git init myrepo
• Git stages and then commits your changes ( 2 steps) – git add *
– git commit -m "My change message"
• PaaS services have become numerous enough to create numerous subcategories of offerings
• Google App Engine provides numerous APIs for
application development
• Suffers from vendor lock-in
• Many new PaaS vendors are providing platforms
that support multiple languages and open solutions
to avoid vendor lock-in
Cloud Computing
Summary
• OpenShift & Python
• Now you do it!
Cloud Computing
Exercise 4
• OpenShift & WebApp Framework
• Now you do it!
Cloud Computing
Exercise 5
Module 5: Security Issues
The State of the CloudImplementing IaaSDistributed ProcessingThe PaaS ModelSecurity, Standards, and Governance
• Security Concerns
• Authentication Techniques
• Identity Access Management
• Infrastructure Security
• Tackling Compliance
Cloud Computing
Overview
• Cloud security is a responsibility of both the cloud provider and the client– The "cloud stack level" determines each role
• For example, AWS states that for EC2 they are responsible for:– Physical– Environmental– Virtualization
Cloud Computing
Cloud-based Security Issues
IaaSPaaS
SaaS Moving down in the services stack, the client becomes more and more responsible for security!
• The service models have similarities and differences regarding security requirements:
• SaaS – policy controls, user access to application resources
• PaaS – data security, data encryption, data regulatory issues (compliance)
• IaaS – Virtual machine security, physical andenvironmental controls
Cloud Computing
Service Models and Security
Cloud Computing
Cloud Security Alliance• CSA is an organization
made up of many corporate representatives
– Goal is to promote best security practices within the cloud
– Define areas for cloud architecture, governing in the cloud, operating in the cloud
Cloud Computing
CSA Critical Areas of FocusCloud Computing Architectural FrameworkGovernance & Enterprise Risk ManagementLegal & Electronic DiscoveryCompliance & AuditInformation Lifecycle ManagementPortability & InteroperabilityTraditional Security Business Continuity & Disaster RecoveryData Center OperationsIncident Response Notification, and RemediationApplication SecurityEncryption & Key ManagementIdentity & Access ManagementVirtualization
• The Cloud cube classifies clouds in 4-dimensions• Attempts to categorize clouds in order to assure better
security standards
Cloud Computing
The Cloud Cube Model
Proprietary Open
Perimeterized
De-perimeterized
External
Internal
Data
Software
Users
Mgmt
• Identity Access Management focuses on how users may access account resources
• Most vendors provide a proprietary interface to achieve this:
• Google - Google Provisioning API for App Engine/Apps
• AWS - uses their IAM service
• Microsoft - uses Windows Identity Foundation (WIF) API
with MS Forefront
• OpenStack - uses OpenStack Identity (Keystone)
Cloud Computing
Identity Access APIs
• Keystone roles:• Provide user management
• Keep track of what users can do
• Service Catalog• Provide a catalog of available services and their endpoints
Cloud Computing
OpenStack Identity (Keystone)
keystone user-list
keystone user-create --name sally --pass sally --email s@...
keystone tenant-create –-name AWCTenant
keystone role-create --name standard_user
keystone user-role-add --user sally --role standard_user
keystone service-list
• AWS provides several ways for clients to manage and limit access to account resources:
• AWS IAM (Identity Access and Management)
• Free service supports assigning individual username/pswds, access keys, MFA devices, temporary security credentials
• Provides complete API for programmatic security
• Key management, policy management, group management
• Multi-factor authentication (MFA)
• AWS IAM allows clients to grant credentials to individuals or groups
Cloud Computing
AWS Account Security
• Google's Provisioning API allows clients to create, update, delete user accounts, create security groups
• Uses REST-based URLs to perform operations
Cloud Computing
Google Provisioning API
Cloud Computing
Google Provisioning APIAppsPropertyService service = new AppsPropertyService("myAppName");GenericEntry entry = new GenericEntry();entry.addProperty("email", "[email protected]");entry.addProperty("password", "password");entry.addProperty("firstName", "Bob");entry.addProperty("lastName", "Smith");service.insert(
new URL("https://apps-apis.google.com/a/feeds/user/2.0/" + "mydomain"), entry);
URL feedUrl = new URL("https://apps-apis.google.com/a/feeds/user/2.0/" + "mydomain");
List<GenericEntry> allUsers = new ArrayList<GenericEntry>();while (feedUrl != null) {
GenericFeed feed = service.getFeed(feedUrl, GenericFeed.class);allUsers.addAll(feed.getEntries());feedUrl = (feed.getNextLink() == null) ? null :
new URL(feed.getNextLink().getHref());};
service.delete(new URL("https://apps-apis.google.com/a/feeds/user/2.0/" + "mydomain" + "/" + "[email protected]"));
• Many companies will not allow corporate data to be hosted in a public cloud
• Regardless of the vendor security promises, it is still corporate data and full control means ownership of the hosting
• As companies become more comfortable with hosting and with stronger SLA's hosting corporate applications and data will be more common
• Standards need to be accepted and corporate policies need to reflect the cloud's distributed nature
Cloud Computing
Data and the Cloud
• As a step toward company control of distributed data many companies have begun to enforce point of use encryption
• Information is encrypted before sent to a network location
• Information is decrypted locally before use• All information being passed and stored is encrypted
• A cost is paid for all the encryption/decryption cycles• The enforcement and acceptance of encryption
allows a wider use of distributed data
Cloud Computing
Encrypt and decrypt
• Large user-bases such as Google, Yahoo!, MSN, MySpace, Facebook, Twitter and others have all become identity platforms– Utilizing their login mechanisms removes the need for users to
keep registering for new services
– There are two types of authentications: delegated, federated
• Delegated Authentication uses the identity providers mechanism for authentication
– Ex: Facebook, Twitter
– Attempts to bring SSO to reality
– Twitter stores username, passwords on behalf of other sites
o OAuth
Cloud Computing
Authentication & SSO
• Federated Authentication– Users may use any authentication mechanism, as long as it is
compatible
– Decentralized
– Allows for any identity provider to supply credentials
– OpenID is best candidate for this implementation
• Differences between OpenID (Federated) and OAuth (Delegated)
Cloud Computing
Federated Authentication & OpenID
OpenID OAuthDecentralized CentralizedProvider may be Unknown Provider KnownShares Identity Only Shares Additional Data
Resources
Cloud Computing
How Does OAuth v2.0 Work?User Joe's Hardware
ShopFacebook
wishes to access resourcesrequests a temporary token
redirected to facebook login (if needed)
user logs in with Facebook (might be logged in already)
redirected to Joe's Hardware Shop
requests an access token
provided
finds out all of your secrets
provided
• A number of threats specific to virtualization technologies have been identified:
• Blind spots
• Inter-VM Attacks
• Trust Levels
• Instant-on Gaps
Cloud Computing
Virtualization Threats
• Blind Spots• Inability to "see" communications between VMs because it
resides within the software layer
• Inter-VM Attacks• A VM is successfully attacked by breaking out of its
isolation ("hyperjacking") attacking the hypervisor
• Hypervisor can be used to attack other VMs
Cloud Computing
Virtualization Threats
Hypervisor
VMVMVMVMBlind spot
Inter-VM
• Varying Trust Levels• Some servers host apps within VMs that contain mission-
critical data, while others host non-mission critical data
• Instant-on Gaps • Clouds allow for the provisioning / de-provisioning of VMs
• VMs may lie dormant for long periods
• These VMs may become "out-of-date" with respect to security updates
Cloud Computing
Virtualization Threats
• Due to the diversity of the services offered, securing of PaaS & SaaS environments is difficult
• Some companiesoffer solutions foraggregating SaaSservices through a proxy
Cloud Computing
Securing PaaS and SaaS Solutions
• Compliance comes down to who can view and see corporate data
• True compliance requires full control of data
• Google has fired employees for viewing Google App Engine client data
• Amazon assures clients that only few, necessary personnel can view the user organization's data
• Compliance standards:– Statement on Auditing Standards (SAS 70)
– Payment Card Industry Data Security Standards (PCI DSS)
– Health Insurance Portability and Accountability Act (HIPAA)
Cloud Computing
Tackling Compliance
• The top security concerns include the lack of proper identity management controls
• AWS IAM and Identity Management and others provide access to automated user provisioning capabilities
• Numerous virtualization threats pose potential problems with the VM / hypervisor model, but VMs can be self-protected
Cloud Computing
Summary
Course Summary
The State of the CloudImplementing IaaSThe PaaS ModelProviding SaaS SolutionsSecurity, Standards, and Governance
• Key Players (Cloud Providers)• Cloud Best Practices• Implementing Elasticity• Improving Availability / Reliability• Providing Failover• Orchestration Techniques• Automating Scalability• OpenStack• PaaS Subcategories• Using GAE
Cloud Computing
What Did We Learn?• Deploying into PaaS
Environments
• Open PaaS Providers
• Trends in SaaS
• Securing IaaS, PaaS, SaaS services
• Evolving Authentication Techniques
• Virtualization Security Threats
• Identity Access Mgt
Cloud Computing
Reference Sources
• Please take the time to fill out an evaluation
• All evaluations are read and considered
Cloud Computing
Evaluations
Questions
Appendix A:Python Supplemental
Cloud Computing
Introducing Python
• High-level programming language• Supports functional programming• Can be object-oriented• Automatic memory management• Dynamic typing• For this reason, it is often referred to as a scripting
language, much like Ruby, JavaScript, Perl, Tcl
Cloud Computing
Python VersionsVersionDate Comments
0.9 1991 Pre-1.0 release
1.3 1995
1.5 1999 Unicode support, list comprehensions 1.6 2000
2.0 20002.1 2001 New nested function scoping rules,
warnings added 2.2 2001 Declare classes as subclasses, super() added,
new rules for multiple inheritance 2.3 2002 Set class added, generators added
2.4 2004 Decorators added2.5 2006 Conditional expressions, try/except/finally
combo, with statement added2.6 2008 print(), more string formatting methods
2.7 2010 Last 2.x release, several 3.0 backported features3.0 2008 Not 2.x backward compatible, many updates
3.1 2009
Cloud Computing
Executing a Python Script
• There are 4 ways to execute Python scripts We'll explore each of these:
• From within the Python shell
• From the OS command-line
• As a shell script file or by double-clicking
• From within an IDE or interactive environment
Cloud Computing
How Scripts Run Under the Hood
• Python Scripts are not compiled• Scripts are translated into byte code• This is so they will execute faster than raw source files
• Byte code files end with .pyc extensions
• The PVM will run .pyc files if they exist, otherwise it will create the byte code and execute it at runtime
myscript.py PVM myscript.pycpython
Cloud Computing
Built-In Types
Value Descriptionint integerlong long integerfloat floating pointcomplex complex numbersbool Booleansstr stringsobject objectsfunction functionslist List sequences (arrays)tuple Tuple sequences (fixed arrays)dict Dictionaries (hashes)file File objects
This is a list of some of the built-in types in Python
Cloud Computing
Data Structures
• The primary types of Python data structures are:
• Sequences– Strings– Lists– Tuples
• Dictionaries
Cloud Computing
Lists• Lists are ordered sequences of objects
• Duplicates are allowed
my_list = []
my_list = [1, 3, 5]
my_list = [3.3, 'hello', Person()]
my_list = [3.3, 'hello', Person(), 3.3, Person()]
my_list = list('hello')
my_list = list()
Empty lists
someList = list(sequence)
Cloud Computing
List Manipulation• Lists can be concatenated
• Lists can be appended or inserted
• Access lists using index notation:
new_list = my_list + [1, 2, 3]
my_list.append('new value')
Makes a new list
my_list = [1, 2, 3]my_list.insert(1, 'hello')
Added to end of list
[1, 'hello', 2, 3]
print my_list[0] 1
Cloud Computing
Functions• Functions are commonly used within Python• Additional features are introduced in chapter 3
• Functions are defined as follows
def funcName(arg0, arg1, arg2, ..., argN):statementsreturn value
List of parameters must be supplied or () if noneReturn values are optional. A
value of 'None' is returned when a return statement is omitted
Function statements must be indented
Cloud Computing
Functions• Functions must be defined before they can
be called
def displayResults(customer, purchase_amount):print 'Customer: %s, amount: $%f.2' %
(customer['surname'], purchase_amount)
displayResults({'surname' : 'Smith'}, 108.2)
Customer: Smith, amount: $108.20
Cloud Computing
Modules
Modules are namespaces in Python
Physically, each .py file represents a module
Functions, variables, classes declared at the top of a module can be made available to other modules
These attributes can't be used until they are imported
Cloud Computing
Object-oriented Python
• Python features many object-oriented capabilities including inheritance, constructors, overriding, encapsulation
class Person(object):def __init__(self, name, age):
self.name = nameself.age = age
def display(self):print '%s is %i' % (self.name, self.age)
p1 = Person("Bob", 37)p1.display()print type(p1)
Cloud Computing
Constructors
• __init__() acts as the class constructor• __del__() that acts as a destructor, but these aren't
commonly used
class Person: def __init__(self, name, age):
self.name = nameself.age = age
def __str__(self): return self.name + " " + str(self.age)
p = Person('Bob', 37)
self is not automatically receivedIn Python, it must be explicitly provided
Without self here, a local and then global name will be soughtself is implicitly passed
Cloud Computing
Overloading Constructors
• No way to overload constructors in Python• Can implement type checking if neededclass Person:
def __init__(self, name='', age=0): self.age = age
if isinstance(name, str): self.name = name elif isinstance(name, dict):
self.name = name['name']self.age = name['age']
def __str__(self): return self.name + " " + str(self.age)
p1 = Person()p2 = Person('John')p3 = Person('Jim', 33)p4 = Person({'name': 'Sally', 'age' : 43})
Cloud Computing
Inheritance
• The format for inheritance in Python is:
class Subclass(Superclass):
class Employee(Person):def __init__(self, name, age, salary, dept):
Person.__init__(self, name, age)self.salary = salaryself.dept = dept
def __str__(self): return Person.__str__(self) +
'{0} {1}'.format(self.salary, self.dept)
e1 = Employee('Sally', 43, 75000.00, 'HR')
So, an Employee can inherit from Person as follows: