building a just-in-time application stack for analysts

34

Upload: avere-systems

Post on 14-Feb-2017

197 views

Category:

Technology


0 download

TRANSCRIPT

Today’s Speaker

Scott JeschonekDirector of Cloud ProductsAvere Systems

Housekeeping

• Recording• Attachments• Questions• Rating

Agenda

• Highlight challenges faced by today’s IT organizations, especially with analytics teams, when dealing with public clouds

• Focus for today largely on compute and data• Discuss how to meet these challenges• How to create a scalable compute environment in under 10

minutes• How to leverage data both in and outside the public cloud

Public Clouds

Clouds Can Be Easy to Use

AWS EC2 Compute

Google GCP Compute

Microsoft Azure Compute

Each Cloud Offers

Clouds Can Do Many Things

• Virtual Machines/Compute• Containers • Storage• Databases (various)• Networking• Tiered Applications• Big Data Processing• And more…too many to

mention in a slide

Overall Benefits of Cloud, Tools and Integrators• Cloud platform reduces fulfillment time for new resources

• Cloud platform removes permanence from resource allocation

• Cloud platform removes cost from resource allocation (CAPEX)

• Cloud platform increases capacity and flexibility

• Cloud services and tools decrease complexity and cost of ownership

In Fact It’s So Easy ….

…End users can set things up themselves.

Common End User Comments

• “I can’t wait for IT to give me resources.”

• “I don’t have to wait for IT to give me resources.”

• “There are too many requirements to use IT resources…I’ll just go to (enter public cloud name here).”

Liberating, but Still Liable

• Corporate or Institutional data

• Spending on behalf of corporation or institution equates to direct liability

• Security concerns remain, even if the environment is self-contained

• Costs can spiral out of control; budgets may not account for these spending events

Cloud - Extension of IT Resources

• Budget chargeback

• Networking(!)

• Security (of users, of data)

• Resource fulfillment

• Capacity planning (for budgets)

With the Right Tools, IT Can Make Cloud Magic

• On-demand services with automated chargeback

• Extension of existing automation capabilities

• Rapid allocation of new compute without CAPEX costs

• Significantly reduced fulfillment– From order, ship, unbox, rack & stack to “run

automation”

Clouds Can Be Programmed

Myriad of Third Party Tools and Services

Cloud Compute Use Case Examples

• Analytical processing (either single or multi machine use cases)– Life Sciences Analytics / Quality Check (QC) / SNP analysis

applications – Financial Risk Modeling– Rendering and Transcoding activities

• Build/Test environments• Big Data applications such as Hadoop• Application servers/services• Or simply workstations on demand for temporary use

– Example: Amazon Workspaces

Cloud Compute Usage Examples

Cloud Compute 100% Cloud ComputeLocal/SSD Storage

Cloud Storage 100% Cloud ComputeLocal/SSD Storage

Cloud Storage

Cloud Compute

Cloud Compute

On-Premises NAS

WAN

100% Cloud ComputeLocal/SSD Storage

On-Premise Data over WAN

Cloud Compute

WAN

On-Premises NASOn-Premises Compute

Extended Compute (Burst) into CloudLocal/SSD Storage

On-Premise Data over WAN

Data Considerations

• Considerations:– Is there a lot of data?– Are there multiple nodes acting on the data?– Is there to be a lot of writing (versus reading) of data?– Is the data sensitive?– Is there a scratch space requirement?– Will the data need to persist in the cloud?

Choices for Your Data

• Copy to local SSD or Persistent SSD/EBS on each node• Locate / migrate data to object store bucket in cloud

provider• Run a file system in the compute environment and serve

data as a NAS• Use a caching layer in the compute environment and serve

only requested data, leaving the data wherever it originated

Avere vFXT – Caching File System in the Cloud

• Avere vFXT:– Highest performance– Scale-out NAS– Ideal for high core-count applications and large numbers of servers– Global namespace: one mount for various sources, including cloud

and on-premises data– Scale up and down as demand requires– Only obtains data that has been requested by clients

– Ideal for cloud bursting on-premises data to cloud compute– Scale = 10s of 1000s of cores

Avere CloudFusion: NAS-in-the-Cloud

• Avere CloudFusion– Single-node, low cost caching NAS– Uses low-cost s3 storage as the storage

• Store significant data – Presents NFS or SMB– Supports multiple clients

• For example, use it as your AWS Workspaces storage– Use as scratch space

• Simple to configure

Advantages of a caching layer in compute

• No persistent data in compute = lower cost• Achieve high performance at low latencies• Maintain data security by leaving it on-premises• Abstract data sources between on-premises and cloud for a

single file system experience• Reduce complexity of compute environment by avoiding re-

write of any applications

Deploy a Stack

Deployment of Application Stack

• Among the many ways, we’ll start with those provided by the cloud providers themselves

• For compute, choose:– A pre-configured image (AMI, VM) with all necessary software– Multiple pre-configured images with all necessary software– Pre-configured images using Puppet or other CM tool for updates– A container, set of containers in a cluster

• For networking, choose:– A configured VPN (for internet-based connectivity)– Cloud Provider peering connections – Direct connectivity through companies like Equinix– Security Group / Firewall / route configurations

Deployment of Application Stack (continued)

• For security, choose:– IAM in the public cloud – Service accounts / roles to restrict what the compute nodes

can access• For data, choose:

– A caching / file system application– Program to copy / move data to the local nodes, triggered as

part of the stack creation

The 10-Minute Stack

• AWS: CloudFormation Template (JSON / REST)• Google Launcher / Deployment Manager Templates (YAML,

Python)• Microsoft Azure Resource Manager (JSON / REST)

Each offer significant examples on their respective sites.

For AWS, wrappers such as Terraform and Troposphere reduce the complexity.

What You’ll Need

• Command-line tools (aws cli, gcloud, powershell)• Text editor / code editor • A Project / VPC / Network in the respective cloud

– Assume that you will create multiple stacks but within an existing infrastructure framework

– Use the commands and python/etc. to validate the network and security environments

• Image (AMI/Virtual Machine) or configuration management (e.g., Chef) for application image creation

• File System capability…we’ll use Avere – You’ll need python coding for this piece

AWS CloudFormation Templates

Google Deployment Manager

resources:- name: vm-instance type: compute.v1.instance properties: disks: - deviceName: boot type: PERSISTENT boot: true autoDelete: true initializeParams: sourceImage: https://www.googleapis.com/compute/v1/projects/debian-cloud/global/images/debian-7-wheezy-v20150526 machineType: https://www.googleapis.com/compute/v1/projects/myproject/zones/us-central1-f/machineTypes/f1-micro networkInterfaces: - network: $(ref.a-new-network.selfLink) accessConfigs: - name: External NAT type: ONE_TO_ONE_NAT zone: us-central1-f

What Will You Create with the Templates?

• All of the necessary security (if not exists)– For example, if you require that your instances access object

storage, then permission will need to be granted to the instance either directly (in Google’s case) or via IAM role (for AWS)

• Disks (volumes) for the machines (if using persistent)• Network routes for new addresses or network/subnets• Compute instances• UserData can then be included in the templates to call

extra configuration on the instances

Deploying Avere with the Stack

• Leverage CloudFormation / Deployment Manager / Resource Manager to set up the initial nodes

• Add checks to ensure networking is configured properly– Cloud provider endpoint access is critical

• GCS/S3 API endpoint for storage, EC2 or GCE endpoint for controlling IP address failover for vFXT

• Call XML-RPC library to complete configuration of – “Core filer” mappings– Client IP address configuration– Integration with AD or NIS– Configuration to on-premises NFS server

End State

Avere vFXT in Compute

WAN

On-Premises NAS

Application NodeApplication Node

Application NodeApplication Node

Validated NetworkAWS: VPC

GCP: Project NetworkAzure: Virtual Network

vFXT configured with IP Addresses

DNS, NTPMapping to on-premises NAS

Export for Global Namespace

NAT / Proxy / VPN / Router

Application nodes have a mount point

configured based on the Avere vFXT Export

addressesIAM Roles applied

Summary

• Cloud Tools abound for creating on-demand application stacks in your favorite cloud

• IT organizations can leverage these clouds and tools to maximize their customers’ capabilities and thus their satisfaction

• Leverage caching file systems running in the cloud to provide performance-based access to only relevant data, limiting the need to move large amounts of data into the cloud temporarily

Avere SystemsScott Jeschonek

[email protected]

Thank you!