automating big data technologies for faster time-to-value

Post on 21-Jan-2018

231 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

November 1, 2017 | 11:00 AM PT

Automating Big Data

Technologies for Faster Time-

to-Value

© 2017, Amazon Web Services, Inc. or its affiliates. All rights reserved.

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Today’s PresentersDavid Potes, Solutions Architect, Amazon Web Services

Minesh Patel, Technical Director, Qubole

Seth Myers, Senior Data Scientist, Demandbase

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Today’s Agenda1. An overview of AWS and AWS Marketplace, with an emphasis on

AWS data lake solutions and Qubole

2. Overview of the Qubole solutions featured in our story

3. Challenges faced by Demandbase

4. The Demandbase success story with AWS and Qubole

5. Q&A/Discussion

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Learning Objectives1. How to dramatically reduce management complexities for analytics

operations

2. How to reduce the costs of processing and analyzing data in a data

lake on AWS

3. How to operate at the scale and efficiency of a large enterprise,

with a small data team

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Introduction to Data Lake

Concepts

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Unlocking Data

Most companies and organizations are embarking on ambitious innovation initiatives to unlock their data.

The data already exists but goes unused or is locked away from complimentary data sets in isolated data silos.

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Enter Data Lake Architectures

Data Lake is a new and increasingly

popular architecture to store and analyze

massive volumes and heterogeneous

types of data.

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Benefits of a Data Lake – All Data in One Place

Store and analyze all of your data,

from all of your sources, in one

centralized location.

“Why is the data distributed in

many locations? Where is the

single source of truth ?”

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Benefits of a Data Lake – Quick Ingest

Quickly ingest data

without needing to force it into a

pre-defined schema.

“How can I collect data quickly

from various sources and store

it efficiently?”

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Benefits of a Data Lake – Storage vs Compute

Separating your storage and compute

allows you to scale each component as

required

“How can I scale up with the

volume of data being generated?”

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Benefits of a Data Lake – Schema on Read

“Is there a way I can apply multiple

analytics and processing frameworks

to the same data?”

A Data Lake enables ad-hoc

analysis by applying schemas

on read, not write.

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

AWS Approach to Data Lake

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon S3 is the Data Lake

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Designed Benefits of an Amazon S3 Data Lake

Fixed Cluster Data Lake Amazon S3 Data Lake

• Limited to only the single tool contained

on the cluster (i.e. Hadoop or data

warehouse or Cassandra, etc.). Use

cases & ecosystem tools change

rapidly

• Expensive to add nodes to add storage

capacity

• Expensive to replicate data against

node loss

• Complexity in scaling local storage

capacity

• Long refresh cycles to add additional

storage equipment

• Decouple storage and compute by

making Amazon S3 object based

storage, not a fixed tool cluster the data

lake

• Flexibility to use any and all tools in the

ecosystem. The right tool for the job

• Future proof your architecture. As new

use cases and new tools emerge you

can plug and play current best of breed.

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Why Amazon S3 for Data Lake?

Designed for 11 9s

of durability

Designed for

99.99% availability

Durable Available High performance Multiple upload

Range GET

Store as much as you need

Scale storage and compute

independently

No minimum usage commitments

Scalable

Amazon EMR

Amazon Redshift

Amazon DynamoDB

Integrated

Simple REST API

AWS SDKs

Read-after-create consistency

Event notification

Lifecycle policies

Easy to use

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Automating Complex Tasks

Qubole makes Big Data technologies swift and simple

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

About Qubole

One of the largest cloud-

agnostic Big Data as a Service

companies

Founded by the pioneers of “big

data” @ Facebook and the

creators of Apache Hive

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Poll Question #1

What is the status of your big data initiative?

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

The Vision

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Qubole Data Service

Amazon

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Autonomous Data Management

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Qubole Cloud Agents

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Total Cost Savings Among Qubole Customers in 2016

and 2017

Cluster Life Cycle

Management$150M

Workload-aware

Autoscaling$121M

Spot Shopper

$40M

Cluster Life Cycle Management

Savings

– Amount saved by automatically

terminating a cluster when inactive

Workload-aware Auto-scaling Saving

– Amount saved by predictively adjusting

the number of nodes to meet demand

Spot Shopper savings

– Amount saved by utilizing SPOT

instances

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Architectural Diagram

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Poll Question #2

What big data technology are you using or evaluating?

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Why Qubole?

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Demandbase Automates With

QuboleDemandbase provides more value for their B2B marketing customers

by automating Big Data and Machine Learning operations.

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Who is Demandbase?

Demandbase is a B2B marketing automation company that leverages

artificial intelligence to automate all aspects of the advertising, selling,

and marketing process.

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

The Challenge

• Many factors determine which accounts a business should target

• Do they have a need/budget for the product?

• Are they currently in-market for the product?

• Do they have decision makers ready to buy?

• These insights must come from many different types of big datasets

• Demandbase’s previous account identification tool took multiple days to

run

• Our clients could not iterate or modify their strategies with such slow

turn-around

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

The Data Used to Identify Accounts

• To determine an account’s need for the product

• We have firmographic information on 14 Million accounts

• We’ve built a knowledge graph of all accounts using NLP

technology that crawls 350 TB of web pages a month

• To determine if an account is in-market

• We track 700 Billion web interactions a year, each one mapped

to employees across all accounts

• To identify decision makers

• We are currently tracking over a 100 Million employees across

all accounts

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

All 14M accounts are scored,

top 5K available to user

Keywords extracts from 700B

web interactions

Buyers at each account

identified from 100M+ contactsCompany 2

Company 3

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

The Solution

• The user requests a new list of accounts with a button-

press• 60 EC2 servers are spun up

• A machine learning algorithm is built using Spark and MLLIB

• For each of 14 Million accounts

• Information about relevant web interactions, buyers, online content, etc. fed into

machine learning model

• The model scores each account

• Top 5K accounts are pushed to web app, along with

relevant info

• From button-press to new account list – 20 minutes

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Qubole Makes This Possible

• Qubole manages all of our EC2 instances

• So far, we’ve tested 20 different concurrent models (20 X 60

EC2 servers) successfully

• Qubole keeps our costs down through dynamic bidding and

heterogeneous server clusters

• Our web app calls Qubole’s easy-to-implement Play API, which

spins up the EC2 instances and deploys our Spark job

• With Qubole taking care of the infrastructure, we could focus on

developing the machine learning

• Qubole allowed us to build a self-serve machine-learning-as-service

solution

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Next Steps and Further Information

• Try a pre-configured production-ready Qubole deployment on AWS Data Lake:

• https://aws.amazon.com/quickstart/architecture/qubole-on-data-lake-foundation/

• Buy on AWS Marketplace:

• https://aws.amazon.com/marketplace/pp/B06XX76R24

• Learn more about Qubole:

• https://www.qubole.com/products/qds-for-aws/

• Learn more about Demandbase:

• https://www.demandbase.com/technology/

• Try AWS:

• https://aws.amazon.com/

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Q & A

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Thank you!

top related