(bdt201) big data and hpc state of the union | aws re:invent 2014

90
November 12, 2014 | Las Vegas, NV Ben Butler, Sr. Solutions Marketing Mgr., Big Data and HPC

Upload: amazon-web-services

Post on 02-Jul-2015

2.622 views

Category:

Technology


3 download

DESCRIPTION

Leveraging big data and high performance computing (HPC) solutions enables your organization to make smarter and faster decisions that influence strategy, increase productivity, and ultimately grow your business. We kick off the Big Data and HPC track with the latest advancements in data analytics, databases, storage, and HPC at AWS. Hear customer success stories and discover how to put data to work in your own organization.

TRANSCRIPT

Page 1: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

November 12, 2014 | Las Vegas, NV

Ben Butler, Sr. Solutions Marketing Mgr., Big Data and HPC

Page 2: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Big data on AWS

Big data customer success stories

HPC on AWS

HPC Customer Presentation: Honda

AWS resources to get started

Big data and HPC track review: where to go next

Page 3: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Big data on AWS

Big data customer success stories

HPC on AWS

HPC Customer Presentation: Honda

AWS resources to get started

Big data and HPC track review: where to go next

Page 4: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Big data on AWS

Big data customer success stories

HPC on AWS

HPC Customer Presentation: Honda

AWS resources to get started

Big data and HPC track review: where to go next

Page 5: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Big data on AWS

Big data customer success stories

HPC on AWS

HPC Customer Presentation: Honda

AWS resources to get started

Big data and HPC track review: where to go next

Page 6: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Big data on AWS

Big data customer success stories

HPC on AWS

HPC Customer Presentation: Honda

AWS resources to get started

Big data and HPC track review: where to go next

Page 7: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Big data on AWS

Big data customer success stories

HPC on AWS

HPC Customer Presentation: Honda

AWS resources to get started

Big data and HPC track review: where to go next

Page 8: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
Page 9: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Generation

Collection and storage

Analytics and computation

Collaboration and sharing

Page 10: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Generation

Collection and storage

Analytics and computation

Collaboration and sharing

Page 11: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

• IT/Application server logs

• Websites/Mobile apps/Ads

• Sensor data/IoT

• Social media, user content

GBTB

PB

ZB

EB

Page 12: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Lower cost,

higher throughput Generation

Collection and storage

Analytics and computation

Collaboration and sharing

Page 13: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Highly

constrained

Lower cost,

higher throughput Generation

Collection and storage

Analytics and computation

Collaboration and sharing

Page 14: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

What is Big Data?

collect, store, organize, analyze and share it

Page 15: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Technologies and techniques for working

productively with data, at any scale.

Big Data

Page 16: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Accelerated

Generation

Collection and storage

Analytics and computation

Collaboration and sharing

Page 17: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

AWS CloudBig Data

Page 18: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

AnalyzeIngest

Amazon Kinesis

AWS Import/Export

AWS Direct Connect

Collect

Amazon

Glacier

Amazon S3

Amazon

DynamoDB

Store

Amazon

Elastic

MapReduce

Amazon

EC2

Kinesis

Amazon

S3

Share

Amazon

Redshift

Amazon

Redshift

AWS Data

Pipeline

Page 19: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Real-time processing

High throughput; elastic

Easy to use

Amazon EMR, S3, Redshift,

DynamoDB Integrations

Amazon

Kinesis

Page 20: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Store anything

Object storage

Scalable

99.999999999% durability

Amazon

S3

Page 21: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

NoSQL Database

Seamless scalability

Zero admin

Single digit millisecond latency

Amazon

DynamoDB

Page 22: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Relational data warehouse

Massively parallel

Petabyte scale

Fully managed

$1,000/TB/Year

Amazon

Redshift

Page 23: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Hadoop/HDFS clusters

Hive, Pig, Impala, HBase

Easy to use; fully managed

Scale to thousands of nodes

Amazon

Elastic

MapReduce

Page 24: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Corporate Data

Center

Elastic Data

Center

Page 25: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Corporate Data

Center

Elastic Data

Center

Application data

and logs for

analysis pushed

to Amazon S3

Page 26: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Corporate Data

Center

Elastic Data

Center

Amazon Elastic

MapReduce name

node to control

analysis

Page 27: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Corporate Data

Center

Elastic Data

Center

Hadoop cluster

started by Elastic

MapReduce

N

Page 28: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Corporate Data

Center

Elastic Data

Center

N

Add hundreds to

thousands of

nodes

Page 29: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Corporate Data

Center

Elastic Data

Center

N

Disposed of when

job completes

Page 30: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Corporate Data

Center

Elastic Data

Center

Results of

analysis pulled

back into your

systems

Page 31: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
Page 32: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Sets new large-scale

sort record with AWS

● Databricks, founders of

Apache Spark

● Why AWS?

● EC2—fast access to large

compute, SSD, 10Gbs

network

● Agility

Page 33: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Mobile / Cable Telecom

Oil and Gas Industrial

Manufacturing

Retail/Consumer Entertainment

Hospitality

Life Sciences Scientific

Exploration

Financial Services

Publishing Media Advertising

Online Media Social Network

Gaming

Page 34: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Sling uses AWS to Store and Analyze Terabytes of Data

By using AWS, we can

make decisions about new

features and offers very

quickly and very easily.

• Needed to leverage terabytes of usage

data to generate user insights and

innovate to capture market share

• Using AWS made it possible for Sling to

offer value-add product to its partners

• Stored terabytes of analytics data

• Enabled near real-time ad hoc analytics

• Capacity to scale database immediately

Dmitry Dimov

Director, Online Services,

Sling Media

Page 35: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

By Amazon Redshift, we can process

petabytes of data from thousands of

marketing campaigns simultaneously while

reducing operating expenses by 75%

Zhong Hong, VP,

Infrastructure and Operations, VivaKi

Page 36: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

NDN Uses AWS to Serve 600 Million Videos to Worldwide Users

Using AWS has enabled us to build

a solid platform that has scaled

quickly while becoming a source of

profit for our customers.

• NDN, a global media exchange for

publishers and content creators,

enables 146 million users a month

to see videos online

• Ingested and stored more than

100,000 video titles per month and

served 600 million content plays a

month

• Uses Amazon Kinesis to analyze

over a billion user generated events

and page loads per day

Eric Orme

NDN COO and CTO

Page 37: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Financial Times Uses AWS to Reduce Infrastructure Costs by 80%

When our analysts first started

to do queries on Amazon

Redshift, they thought it was

broken because it was working

so fast.

• Needed a way to increase speed,

performance and flexibility of data

analysis at a low cost

• Using AWS enabled FT to run

queries 98% faster than

previously—helping FT make

business decisions quickly

• Easier to track and analyze trends

• Reduced infrastructure costs by

80% over traditional data center

model

John O’Donovan

CTO, Financial Times

Page 38: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

NTT DOCOMO Delivers Voice Recognition Services to Over 60

Million Customers by Using AWS

I cannot imagine NTT

DOCOMO without the

AWS Cloud

Minoru Etoh

Senior VP, NTT DOCOMO

“• NTT DOCOMO, Inc. is the predominant

mobile phone operator in Japan

• DOCOMO launched a popular voice

recognition service and experienced

large traffic spikes in its mobile network

that impacted performance

• DOCOMO decided to migrate their whole

environment to AWS last June

• The company built a voice recognition

architecture able to scale easily to handle

spikes in traffic and serve over 60 million

customers

Page 39: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Kellogg Uses AWS to Save $900K Over 5 Years Over Using On-

premises Infrastructure

Using AWS saves us $900,000 in

infrastructure costs alone, and lets

us run dozens of simulations a day

so we can reduce trade spend. It’s

a win-win.

• Needed a better way to track and model

promotional costs (“trade spend”) to

improve the bottom line—and needed to

be able to run more than 1 trade-spend

simulation/day

• By using SAP HANA on AWS, Kellogg

estimates it will save $900,000 over 5

years versus traditional on-premises

infrastructure alternatives

• As well, the company can run dozens of

trade spend simulations each day, and

decreases deployment time by 30x

Stover McIlwain

Senior Director

IT Infrastructure Engineering

Page 40: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Baylor College of Medicine Uses AWS to Accelerate Analysis and

Discovery

We are able to power ultra large-

scale clinical studies that require

computational infrastructure in a

secure and compliant environment

at a scale not previously possible.

• Stores more than 430 TB of

genomic result data

• Analyzes the genome sequences of

more than 14,000 individuals—5

times faster than with the previous

infrastructure

• Enables more than 200 scientists

worldwide to share tools and data

quickly

Omar Serang

DNAnexus Chief Cloud Officer,

DNAnexus

Page 41: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
Page 42: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

“We used Amazon EMR to make

running Hadoop clusters easy,

and now we can de-dupe 10+

billion documents.

Victor Moreira,

CTO, HG Data

HG Data uses AWS to process billions of documents for BI monthly

Page 43: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Internet

Hadoop

Document

Crawler

Java

Document

Crawler on

EC2

Packaging on

EC2

Amazon S3MongoDB

Cluster on

EC2

Hadoop

ETL and

Analytics

ElasticSearch

Cluster on

EC2

Hadoop

Analytics

Java/PythonAnalytics

MySQL on

RDSHG API

HG WebApp

Direct Clients Enterprise

Partners

End Users

Client

Page 44: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
Page 45: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Take a typical big computation task…

Big Job

Page 46: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

…that an average cluster is too small

(or simply takes too long to complete)…

Big Job

Cluster

Page 47: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

…optimization of algorithms can give some leverage…

Big Job

Cluster

Page 48: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

…and complete the task in hand…

Big Job

Cluster

Page 49: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Applying a large cluster…

Big Job

Cluster

Page 50: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

…can sometimes be overkill and too expensive

Big Job

Cluster

Page 51: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

AWS instance clusters can be balanced to

the job in hand…

Big Job

Page 52: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

…not too large…

Small Job

Page 53: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

…nor too small…

Bigger Job

Page 54: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

…with multiple clusters running at the same time

Page 55: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Why AWS for HPC?

Low cost with flexible pricing Efficient clusters

Unlimited infrastructure

Faster time to results

Concurrent clusters on-demand

Increased collaboration

Page 56: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Popular HPC workloads on AWS

Transcoding and

Encoding

Monte Carlo

Simulations

Computational

Chemistry

Government and

Educational Research

Modeling and

SimulationGenome Processing

Page 57: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Scalability on AWS

Time:+00h

Scale Using Elastic Capacity

<10 cores

Page 58: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Scalability on AWS

Time: +24h

Scale Using Elastic Capacity

>1500

cores

Page 59: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Scalability on AWS

Time:+72h

Scale Using Elastic Capacity

<10 cores

Page 60: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Time: +120h

Scale Using Elastic Capacity

>600 cores

Scalability on AWS

Page 61: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Schrodinger and CycleComputing: computational

chemistry

Simulation by Mark

Thompson of the University

of Southern California to see

which of 205,000 organic

compounds could be used

for photovoltaic cells for solar

panel material.

Estimated computation time

264 years completed in 18

hours.

• 156,314 core cluster

across 8 regions

• 1.21 petaflops (Rpeak)

• $33,000 or 16¢ per

molecule

Page 62: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Cost Benefits of HPC in the Cloud

Pay As You Go Model

Use only what you need

Multiple pricing models

On-Premises

Capital Expense Model

High upfront capital cost

High cost of ongoing support

Page 63: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Reserved

Make a low, one-

time payment

and receive a

significant

discount on the

hourly charge

For committed

utilization

Free Tier

Get started on

AWS with free

usage and no

commitment

For POCs and

getting started

On-Demand

Pay for compute

capacity by the

hour with no

long-term

commitments

For spiky

workloads,

or to define

needs

Spot

Bid for unused

capacity,

charged at a

Spot price that

fluctuates based

on supply and

demand

For time-

insensitive or

transient

workloads

Dedicated

Launch

instances within

Amazon VPC

that run on

hardware

dedicated to a

single customer

For highly

sensitive or

compliance

related

workloads

Many pricing models to support different workloads

Page 64: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

When to consider running HPC workloads on AWS

New ideas

New HPC project

Proof of concept

New application features

Training models

Benchmarking algorithms

Remove the queue

Hardware refresh cycle

Reduce costs

Collaboration of results

Increase innovation speed

Reduce time to results

Improvement

Page 65: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

EBS

Submit jobs, orchestrate

HPC clusters over VPC

Run 1 Million drive head

designs = 70.75 core-years

90x throughput:

Ran in 8 hours, not 30 days

3 days from idea to running

70,908 cores, 729 TFLOPS

c3, r3 with Intel E5-2670 v2

Cost: $5,594

Spot Instances

New Drive

Head

Design

Workloads

World’s Largest F500 Cloud RunTransforming drive design to store the world’s data

Encrypt, route data to

AWS, return results

Cluster

70,908 Cores

with

Spot

Instances

Page 66: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
Page 67: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

New

Motorcycle

Products

ASIMOPowerProducts

Honda Jet

UNI-CUB

MC-β

Automobile

Honda Smart HomeSystem (HSHS)

Dreams are the source of our courage and energy

to meet every challenge without fear of failure.

FCX

(as of March 31, 2014)

(April 2013 to March 2014)

Page 68: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

North America

South America

Europe

China

Asia/Oceania

We had individual HPC resources

at every RandD.

Japan

Motorcycle

Power

products

Fundamental research

Aircraft ENG

Automobile

Others

Page 69: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Europe JapanNorth America

Asia/Oceania

China

South America

Honda DC

We consolidated HPC resources.

Overall

OptimizationGlobalization

Page 70: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Use forcertain period

Parallel Transient clusters

Trial use

Need a lot of cores

High memory

Page 71: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Lead timeNo complicated procedures and screening

Don’t have to worry about the availability of resources.

AgilityUse the AWS API and start EC2 instances quickly

Stop it anytime you want with pay-as-you-go

ServiceChoose from several EC2 instance types (including the

new types)

EC2 Spot instances

Page 72: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Cluster manager

Data

Spot or

On demand

Computing nodesAttached

Page 73: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Instance Usage

Time

Insta

nce

nu

mb

er

Page 74: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Anyway, use cloud

Accumulate knowledge

andplan next step

Suggest improvement to

providers

Release new services

Page 75: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Anyway, use cloud

Accumulate knowledge

andplan next step

Suggest improvement to

providers

Release new

services

Page 76: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
Page 77: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
Page 78: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Solution

Architects

Professional

ServicesPremium

Support

AWS Partner

Network (APN)

Page 79: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

AWS Architectures

Reference architecture

diagrams

aws.amazon.com/architecture

Page 80: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

http://aws.amazon.com/marketplace

Big Data Case Studies

Learn from other AWS customers

aws.amazon.com/solutions/case-studies/big-data

Page 81: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

AWS Partner Network – Big Data Competency

Partner with an AWS Big Data expert

Page 82: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

AWS Marketplace

AWS Online Software Store

aws.amazon.com/marketplace

Shop the big data and HPC categories

Page 83: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

AWS Public Data Sets

Free access to big data sets

aws.amazon.com/publicdatasets

Page 84: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

AWS Big Data

and HPC Test

Drives

APN Partner-provided labs

aws.amazon.com/testdrive

Page 85: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Learn from AWS big data experts

Learn how to use Apache Storm

and Amazon Kinesis to process

streaming real-time data

blogs.aws.amazon.com/bigdata

Page 86: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

aws.amazon.com/training

Big Data Technology Fundamentals

Online Training

Big Data on AWS

Instructor-Led Training

Page 87: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
Page 88: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
Page 89: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

Visit the Big Data Kiosk at the AWS Booth in the Expo Room

Page 90: (BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

http://bit.ly/awsevals

Learn more about Big Data

and HPC on AWS:

aws.amazon.com/big-data

aws.amazon.com/hpc

Thank you!