application optimized performance: choosing the right instance (cpn212) | aws re:invent 2013

Delivering Compelling Experiences: Choosing the Right Instance for Application Optimized Performance

Jason Waxman, GM & VP Cloud Platforms Group, Intel Corporation

November 14, 2013

Voice & Gestures Personal assistant

Natural Interaction

20X growth in speech driven mobile network traffic1

>22X in smartphones with gesture recognition features1

Perceptual

Compelling

Experiences drive Growth

Pervasive

Video/Media Content Delivery

Video Search

16X in mobile video traffic2

4X in servers for media/graphics3

Personal

Predictive Analytics Improve healthcare

43% CAGR Big Data & Analytics

Infrastructure4

1.http://www.digitalservicecloud.com/resources/blog/good-customer-service.html

On an average consumers tell:

about bad

experiences1

9 16 ..and

The High Costs of a Bad Experience

people about

good experiences…

Delivering Compelling Experiences

What’s

Required?

Diverse Needs, Common Themes CONSUMER

EXPECTATIONS

DEVELOPER

REQUIREMENTS

AGILITY

RELIABILITY

EFFICIENCY

Personal &

Customized

Service Availability

Privacy/Security

Cost Effective

Fast Service Delivery

Elasticity/Scalability

Stable, Consistent

Privacy/Security

Cost to Serve - Services & APIs reduce

headcount required

Delivering the best experience

SCALE: TAKE ADVANTAGE OF

AWS ELASTICITY

OPTIMIZE: CHOOSE THE RIGHT

INSTANCES

SECURE: MANAGE RISKS FOR

INCREASED DURABILITY

AGILITY EFFICIENCY RELIABILITY

An Intel Company

Intel®

Cloud

Services

Platform

Intel is using Amazon Web Services

Mashery API Management Service

• 175 Customers

• 60,000 Applications

• 215,000 Developers

• 500,000,000 API calls/day

An Intel Company

Scaling enables responsiveness Mashery relies on AWS elasticity

• Capacity Planning - Robust Development and QA

environment to perform load tests and deploy proof-

of-concept silos

• Modular Infrastructure - Loosely coupled single-

purpose servers that scale horizontally

• Globally Distributed - Extend the infrastructure to

where you customers are…every AWS Region,

reaching every corner of the globe.

From 100 queries/sec to 100,000 queries/sec in a matter of minutes

NEWS VOLUMES FOR SOME

CUSTOMERS AT PASSING OF

CELEBRITY IN EARLY 2012

100X

Kevin Baillee CEO, Atomic Fiction

Company Overview • Atomic Fiction crafts high-end visual effects (VFX) for film and television.

• Specialties include digital environments and character work

• Staff scales with projects, varying between 15 and 50 artists

• Known for high end work, medium volume, low cost

• Big company infrastructure, small company vibe

• Developing innovative approaches to reducing technological costs in

order to fee up resources for experienced artistic talent.

Company Overview

Our AWS Story • Pixar-sized render farm in minutes. iMac-sized the next.

• Only pay for what we use

• No physical limits on computing = no physical limits on creativity

• Seamless experience through tools like ZYNC

• Decreased load, thus increased performance, on local filesystem

• Same price for faster artist turnaround

• 100 computers for 10 hours = $2,200

• 1000 computers for 1 hour = $2,200

Problems we’re trying to solve • The problem: how do we “unlimit” creativity while staying profitable?

• Freeing the creative so that directors can achieve their vision

• Artists need iterations in order to hit “the look” and stay on schedule

• Need security, task-appropriate stats, and unlimited availability on demand

• Choosing the right instance: speed vs memory vs cost

• c1.xlarge – 20 compute units, 7GB RAM

• Low cost per hour for lightweight compositing tasks

• cc2.8xlarge – 88 compute units, 60.5GB RAM

• Beefy RAM, good $ per compute unit cost proposition

• Working with our partners at ZYNC, implementation was plug & play!

Star Trek Into Darkness

Key Learnings/What’s Next? • For Star Trek Into Darkness, we achieved exactly what JJ Abrams wanted

• Key findings:

• For over 80% of tasks, cc2.8xlarge was fastest & most cost effective

• Given higher per-hour costs, efficient utilization of high end instances is

critical. Partial hours = wasted money!

• Burstability was critical for hitting deadlines. Ran between 0 and 400

instances simultaneously depending on the needs of the moment.

• Grew over 200% month-over-month two months in a row

• Our ideal instance would be inexpensive, high compute power (Intel Xeon

E5-2600 v2), medium memory (32-48GB) with nVidia GPU.

• Next for us: moving even more of our workflow into the cloud!



AWS ELASTICITY


INSTANCES




Optimize: Choose the right Instance

E- Commerce

Dedicated

Hosting

Enterprise

Applications

High

Performance

Computing

Big Data Content Delivery

and Gaming

Graphics

Rendering

I/O Intensive

CP

U &

Mem

ory

In

ten

siv

e

Cold

Storage

Low End

Networking

Edge

Routing

Storage

De-dupe

Cloud RAN

Small

Cell

Higher latency, lower throughput Lower latency, higher throughput

Micro Instance

M3 Standard

Instance

E5-2670

Cluster

Compute

Instance

E5-2670 Cluster Graphics Instance

X5570

M1 Standard Instance

C1 Compute

Instance M2 Memory Optimized

CR1 Memory

Optimized

E5-2670

Storage –

Optimized

E5-2650

High Memory

G2 GPU Instance

E5-2670

Intel® Cloud Services Platform

• 175,000 Users

• 5M users by 2014

• Entire path of production

is on AWS

• 1448 instances…

Our Wake Up Moment… One month we spent $300K…60% of which we found later was wasted…

• We were spinning up instances and forgetting they were on

• We had larger instances than we actually needed

• Most instances never went over 10% utilization…

PER WEEK OF

UNUTILIZED INSTANCES

108hrs

OF SAVINGS BY TURNING

OFF NON PRODUCTION

INSTANCES AFTER HOURS

$100K

TOTAL SAVINGS

>60%

Optimize for Efficiency Select the Right Instance

Keys to Success • Analyze: “Trusted Advisor”

• Select the right type of instance

for your workload

• Size & Features

• # of Instances

• Reserve instances where possible

for cost efficiency

Steve Litster, PhD. Global Head of Scientific Computing

Novartis Institutes for Biomedical Research

Accelerating Science

Novartis Institutes for BioMedical Research

(NIBR)

Unique research strategy driven by patient needs

World-class research organization with about

6000 scientists globally

Intensifying focus on molecular pathways shared by

various diseases

Integration of clinical insights with mechanistic

understanding of disease

Research-to-Development transition redefined

through fast and rigorous “proof-of-concept” trials

Strategic alliances with academia and biotech

strengthen preclinical pipeline

Requirements

Large Scale Computational Chemistry Simulation

Results in under a week

Flexible target

Ability to run multiple experiments “on-demand”

Challenges

Sustained access to 50000+ compute cores

Ability to monitor and re-launch jobs

No additional Capital Expenditure

Internal HPCC already running at capacity

Job Profile

Embarrassingly Parallel

CPU Bound

Low I/O, Memory and Network requirements

Accelerating the Science

Virtual Screening

Target

Molecule Compound

Molecule

binding

site

"Lock" "Keys"

The Cloud: Flexible Science on Flexible Infrastructure

Engineering the right infrastructure for a workload:

Software runs the same job many times across instance types

Measures the throughput and determines the $ per job

Use the instances that provide the best scientific ROI

CC2 instance (Intel Xeon® ‘Sandy Bridge’) ran best for this

Metric Count

Compute Hours of Science 341,700 hours

Compute Days of Science 14,238 days

Compute Years of Science 39 years

AWS Instance Count-CC2 10,600 instances

Super Computing in the Cloud

$44 Million infrastructure

10 million compounds screened

39 Drug Design years in 11 hours for a cost of …$4,232

3 promising compounds identified

Key Learnings/What’s Next?

Diversity of Life Sciences brings unique challenges

Spend the time analyzing and tuning

Flexibility, Scalability and Performance

Time to rethink and retool

Challenge the Science and the Scientist

Collaboration

Future plans

Chemical Universe : 166 Billion cpds ≤ 17 atoms (Extreme scale CPU)

Next Generation Sequencing in the Cloud (Extreme CPU, Mem, I/O)

“Disruptive” Technologies-Imaging (x10 NGS requirements!)



AWS ELASTICITY


INSTANCES




Optimizing for Security Performance higher performance saves cost

Intel Internal Benchmark

Instance Requirements for 400mbps

OPEN SSL PERFORMANCE

1 2 3

m1.x

larg

e

42 $10K/year

m3.x

larg

e

(w/A

ES

-NI)

1 $5.6K/year 21

1 - Not required but added for redundancy

2 - Requirement is 3.2, but you can’t buy .2, so round up to 4

SAVINGS

50-75% By upgrading from

m1.xlarge to the more

expensive m3.xlarge

because of AES-NI

NASDAQ OMX

36

WE LIST ~3300 GLOBAL COMPANIES WORTH

IN MARKET CAP REPRESENTING

$6 TRILLION

DIVERSE INDUSTRIES AND

MANY OF THE WORLD’S

MOST WELL-KNOWN AND

INNOVATIVE BRANDS

OUR TECHNOLOGY

IS USED TO POWER MORE THAN

I N 5 0

C O U N T R I E S

70 MARKETPLACES

OUR GLOBAL PLATFORM

CAN HANDLE MORE THAN

1 MILLION MESSAGES/SECOND AT A MEDIAN SPEED OF

S U B - 5 5

M I C R O S E C O N D S

FinQloud R3 (Regulatory Record Retention)

Security

Elastic Durable/Available

Cost Effective Transparent

POWERED BY AMAZON WEB SERVICES

Cloud Computing Platform

Exclusively for Financial Services

NASDAQ OMX Security Protocols

* Highly confidential data must be encrypted

and the keys must be stored in HSMs

Data Classification Encryption

Audit

AWS Built In Security

Security

at all times (in flight and at

rest) SSL for all data

Define and enforce what is

and is not approved for

the Cloud

Any action someone does

in R3 is audited and

fully transparent to system

admins and regulators

IAM, MFA, VPC, Direct

Connect private circuits,

routing/firewalls, etc

Data Classification

Technology: What’s Next

Intel® Xeon® Processor

E5-2600 v2 Family

Software Defined Infrastructure

Rack Scale Architecture

Diversity of Datacenter Workloads

E- Commerce

Dedicated

Hosting

Enterprise

Applications

High

Performance

Computing

Big Data Content Delivery

and Gaming

Graphics

Rendering

I/O Intensive

CP

U &

Mem

ory

In

ten

siv

e

Cold

Storage

Low End

Networking

Edge

Routing

Storage

De-dupe

Cloud RAN

Small

Cell

Higher latency, lower throughput Lower latency, higher throughput

Micro Instance

M3 Standard

Instance

E5-2670

Cluster

Compute

Instance

E5-2670 Cluster Graphics Instance

X5570

M1 Standard Instance

C1 Compute

Instance M2 Memory Optimized

CR1 Memory

Optimized

E5-2670

Storage –

Optimized

E5-2650

High Memory

G2 GPU Instance

E5-2670

“New” EC2 C3

Compute Optimized

w/ Latest Intel® Xeon®

E5-2600v2 Processors

E5-2680v2

“New” EC2

Storage Optimized

w/ Latest Intel® Xeon®

E5-2600v2 Processors

E5-2670v2

Technology Matters

Amazon Web Services

discloses instances based on

Intel Xeon

New 2013 AWS C3 Compute-Optimized

Instance

Powered by “NEW” Intel® Xeon® E5-2600v2 processor family

26K cores based on Intel® Xeon® processor E5-2680v2

SC’13

484 TFLOPs*

*SC’13 Submission

Platform Flexibility - Increase

useful life, and capacity

The Future of the platform: Intel Rack Scale Architecture Innovation

Orchestration

Increases Agility, Efficiency & Reliability

CPU / Mem Modules

Silicon – Intel® Atom™ & Xeon

Photonics & switch fabric

Storage – PCIE –SSD &

Caching

Open Network Platform Network platform – Flexible &

Cost effective

Increase utilization thru storage

aggregation

Extreme Compute and Network

bandwidth

The Future of Infrastructure: Intel’s Approach to Software Defined Infrastucture

EXPOSED & INTEGRATED TELEMETRY

Hardware and infrastructure attributes are exposed

and integrated with orchestration software for deeper

insight & optimal provisioning management

BROADEST ENABLED ECOSYSTEM

Integrated and optimized for all leading commercial

and open source operating environments for more

seamless Data Center operations

Rack Scale Architecture

• Scalable Intel®

Atom® &

Xeon® storage

solutions

• SSD’s with

Cache

Acceleration

• Luster

• NVM/Crystal

Ridge

• Open Network

Platforms

• Wind River

OS

• DPDK

• Cave Creek

• Silicon

Photonics

• Intel Xeon

• Atom®C2000

• Intel Xeon

Phi

• Integrated

graphics

• TXT

AMAZON WEB SERVICES (AWS)

Storage Network Compute

Service Assurance Manager

PLATFORM AND ARCHITECTURAL LEADERSHIP

Standards-based compute, network and storage

building blocks in Intel’s Rack Scale Architecture drive

maximum infrastructure efficiency and flexibility

A world where the application defines the system



AWS ELASTICITY


INSTANCES