serverless analytics and etl on aws presentation- aws … · serverless analytics and etl on aws...

60
©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS [email protected]

Upload: others

Post on 20-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

Serverless Analytics and ETL on

AWSDaniel Haviv

Analytics Specialist Solutions Architect

AWS

[email protected]

Page 2: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Data Lake

• Central repository both structured and

unstructured data

• High in capacity

• Cheap

• Accessible (API, CLI)

• Wide range of integrated tools

Page 3: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Data Lake - HDFS

• HDFS is a good candidate but it has it’s

limitations:• High maintenance overhead (1000s of servers, 10ks of disks)

• Not cheap (3 copies per file)

• Usually serves one Hadoop cluster

Page 4: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Designed for 11 9s

of durability

Designed for

99.99% availability

Durable Available High performance

Multiple upload

Range GET

Store as much as you need

Scale storage and compute

independently

No minimum usage

commitments

Scalable

Amazon Redshift / Spectrum

Amazon EMR

Amazon Athena

AWS Lambda

Integrated

Simple REST API

AWS SDKs

Read-after-create consistency

Event notification

Lifecycle policies

Easy to use

Why Amazon S3 for the Data Lake?

Page 5: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

Athena

Page 6: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Challenges Customers Faced

• Significant amount of work required to analyze data in Amazon S3

• Users often only have access to aggregated data sets

• Managing a Hadoop cluster or data warehouse requires expertise

Page 7: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Introducing Amazon Athena

• Amazon Athena is an interactive query service

that makes it easy to analyze data directly from

Amazon S3 using Standard SQL

Page 8: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

A Sample Pipeline

Page 9: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

A Sample Pipeline

Ad-hoc access to raw data using SQL

Page 10: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

A Sample Pipeline

Ad-hoc access to data using AthenaAthena can query

aggregated datasets as well

Page 11: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Amazon Confidential

Athena is Serverless

• No Infrastructure or administration

• Zero Spin up time

• Transparent upgrades

• Highly Available

• You connect to a service endpoint or log into the console

Page 12: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Amazon Athena is Easy To Use

• Log into the Console

• Create a table• Type in a Hive DDL Statement

• Use the console Add Table wizard

• Start querying

Page 13: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Query Data Directly from Amazon S3

• No loading of data

• Query data in its raw format• Text, CSV, JSON, weblogs, AWS service logs

• Convert to an optimized form like ORC or Parquet for the best performance and lowest cost

• No ETL required

• Stream data from directly from Amazon S3

• Take advantage of Amazon S3 durability and availability

Page 14: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Use ANSI SQL

• Start writing ANSI SQL

• Support for complex joins, nested queries & window functions

• Support for complex data types (arrays, structs)

• Support for partitioning of data by any key

• (date, time, custom keys)

• e.g., Year, Month, Day, Hour or Customer Key, Date

Page 15: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Amazon Confidential

Amazon Athena is Cost Effective

• Pay per query

• $5 per TB scanned from S3

• DDL Queries and failed queries are free

• Save by using compression, columnar formats, partitions

Page 16: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

• Anyone looking to process data stored in Amazon S3

• Data coming IOT Devices, Apache Logs, Omniture logs, CF logs,

Application Logs

• Anyone who knows SQL

• Both developers or Analysts

• Ad-hoc exploration of data and data discovery

• Customers looking to build a data lake on Amazon S3

Who is Athena for?

Page 17: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Accessing Amazon Athena

• Through the console

• Via the AWS API

• Using a JDBC/ODBC clients (either plain SQL

client or BI tools)

Page 18: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Creating Tables and Querying Data

Page 19: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Example

CREATE EXTERNAL TABLE access_logs

(

ip_address String,

request_time Timestamp,

request_method String,

request_path String,

request_protocol String,

response_code String,

response_size String,

referrer_host String,

user_agent String

)

PARTITIONED BY (year STRING,month STRING, day STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS TEXTFILE

LOCATION 's3://YOUR-S3-BUCKET-NAME/access-log-processed/'

External = creates a view of this data.

When you delete the table, the data is not

deleted

Page 20: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Example

CREATE EXTERNAL TABLE access_logs

(

ip_address String,

request_time Timestamp,

request_method String,

request_path String,

request_protocol String,

response_code String,

response_size String,

referrer_host String,

user_agent String

)

PARTITIONED BY (year STRING,month STRING, day STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS TEXTFILE

LOCATION 's3://YOUR-S3-BUCKET-NAME/access-log-processed/'

Location = where data is stored.

In Athena this is mandated to be

in Amazon S3

Page 21: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Example

CREATE EXTERNAL TABLE access_logs

(

ip_address String,

request_time Timestamp,

request_method String,

request_path String,

request_protocol String,

response_code String,

response_size String,

referrer_host String,

user_agent String

)

PARTITIONED BY (year STRING,month STRING, day STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

STORED AS TEXTFILE

LOCATION 's3://YOUR-S3-BUCKET-NAME/access-log-processed/'

Partitioning allows you to limit what your

query runs on

Page 22: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Pay By the Query - $5/TB Scanned

• Pay by the amount of data scanned per query

• Ways to save costs

• Compress

• Convert to Columnar format

• Use partitioning

Dataset Size on Amazon S3 Query Run time Data Scanned Cost

Logs stored as Text

files

1 TB 237 seconds 1.15TB $5.75

Logs stored in

Apache Parquet

format*

130 GB 5.13 seconds 2.69 GB $0.013

Savings 87% less with Parquet 34x faster 99% less data

scanned

99.7% cheaper

Page 23: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Demo

Page 24: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

Glue

Page 25: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Why would AWS get into the ETL space?

Page 26: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

We have lots of ETL partners

Amazon Redshift Partner Page for Data Integration

Fivetran

Page 27: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

The problem is

70% of ETL jobs are hand-coded

With no use of ETL tools.

Page 28: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Actually…

It’s over 90% in the cloud

Page 29: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Code is flexible Code is powerful

You can unit test You can deploy with other code You know your dev tools

Why do we see so much hand-coding?

Page 30: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

AWS Glue automates

the undifferentiated heavy lifting of ETL

Automatically discover and categorize your data making it immediately searchable

and queryable across data sources

Generate code to clean, enrich, and reliably move data between various data

sources; you can also use their favorite tools to build ETL jobs

Run your jobs on a serverless, fully managed, scale-out environment. No compute

resources to provision or manage.

Discover

Develop

Deploy

Page 31: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

AWS Glue: Components

Data Catalog

Hive Metastore compatible with enhanced functionality

Crawlers automatically extracts metadata and creates tables

Integrated with Amazon Athena, Amazon Redshift Spectrum

Job Execution

Run jobs on a serverless Spark platform

Provides flexible scheduling

Handles dependency resolution, monitoring and alerting

Job Authoring

Auto-generates ETL code

Build on open frameworks – Python and Spark

Developer-centric – editing, debugging, sharing

Page 32: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Main components of AWS Glue

Page 33: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

AWS Glue Data Catalog

Discover and organize your data

Page 34: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Glue data catalog

Manage table metadata through a Hive metastore API or Hive SQL.

Supported by tools like Hive, Presto, Spark etc.

We added a few extensions:

Search over metadata for data discovery

Connection info – JDBC URLs, credentials

Classification for identifying and parsing files

Versioning of table metadata as schemas evolve and other metadata are updated

Populate using Hive DDL, bulk import, or automatically through Crawlers.

Page 35: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Data Catalog: Crawlers

Automatically discover new data, extracts schema definitions

• Detect schema changes and version tables

• Detect Hive style partitions on Amazon S3

Built-in classifiers for popular types; custom classifiers using Grok expressions

Run ad hoc or on a schedule; serverless – only pay when crawler runs

Crawlers automatically build your Data Catalog and keep it in sync

Page 36: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

AWS Glue Data Catalog

Bring in metadata from a variety of data sources (Amazon S3, Amazon Redshift, etc.) into a single

categorized list that is searchable

Page 37: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Data Catalog: Table details

Table schema

Table properties

Data statistics

Nested fields

Page 38: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Data Catalog: Version control

List of table versionsCompare schema versions

Page 39: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Data Catalog: Detecting partitions

file 1 file N… file 1 file N…

date=10 date=15…

month=Nov

S3 bucket hierarchy Table definition

Estimate schema similarity among files at each level to

handle semi-structured logs, schema evolution…

sim=.99 sim=.95

sim=.93month

date

col 1

col 2

str

str

int

float

Column Type

Page 40: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Data Catalog: Automatic partition detection

Automatically register available partitions

Table

partitions

Page 41: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Job authoring in AWS Glue

Python code generated by AWS Glue

Connect a notebook or IDE to AWS Glue

Existing code brought into AWS Glue

You have choices on

how to get started

Page 42: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

1. Customize the mappings

2. Glue generates transformation graph and Python code

3. Connect your notebook to development endpoints to customize your code

Job authoring: Automatic code generation

Page 43: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Human-readable, editable, and portable PySpark code

Flexible: Glue’s ETL library simplifies manipulating complex, semi-structured data

Customizable: Use native PySpark, import custom libraries, and/or leverage Glue’s libraries

Collaborative: share code snippets via GitHub, reuse code across jobs

Job authoring: ETL code

Page 44: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Job Authoring: Glue Dynamic Frames

Dynamic frame schema

A C D [ ]

X Y

B1 B2

Like Spark’s Data Frames, but better for:

• Cleaning and (re)-structuring semi-structured

data sets, e.g. JSON, Avro, Apache logs ...

No upfront schema needed:

• Infers schema on-the-fly, enabling transformations

in a single pass

Easy to handle the unexpected:

• Tracks new fields, and inconsistent changing data

types with choices, e.g. integer or string

• Automatically mark and separate error records

Page 45: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Job Authoring: Glue transforms

ResolveChoice() B B B

project

B

cast

B

separate into cols

B B

Apply Mapping() A

X Y

A X Y

Adaptive and flexible

C

Page 46: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Job authoring: Relationalize() transform

Semi-structured schema Relational schema

FKA B B C.X C.Y

PK ValueOffset

A C D [ ]

X Y

B B

• Transforms and adds new columns, types, and tables on-the-fly

• Tracks keys and foreign keys across runs

• SQL on the relational schema is orders of magnitude faster than JSON processing

Page 47: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Job authoring: Glue transformations

Prebuilt transformation: Click and

add to your job with simple

configuration

Spigot writes sample data from

DynamicFrame to S3 in JSON format

Expanding… more transformations

to come

Page 48: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Job authoring: Write your own scripts

Import custom libraries required by your code

Convert to a Spark Data Frame

for complex SQL-based ETL

Convert back to Glue Dynamic Frame

for semi-structured processing and

AWS Glue connectors

Page 49: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Job authoring: Developer endpoints

Environment to iteratively develop and test ETL code.

Connect your IDE or notebook (e.g. Zeppelin) to a Glue development endpoint.

When you are satisfied with the results you can create an ETL job that runs your code.

Glue Spark environment

Remote

interpreter

Interpreter

server

Page 50: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Job Authoring: Leveraging the community

No need to start from scratch.

Use Glue samples stored in Github to share, reuse,

contribute: https://github.com/awslabs/aws-glue-samples

• Migration scripts to import existing Hive Metastore data

into AWS Glue Data Catalog

• Examples of how to use Dynamic Frames and

Relationalize() transform

• Examples of how to use arbitrary PySpark code with

Glue’s Python ETL library

Download Glue’s Python ETL library to start developing code

in your IDE: https://github.com/awslabs/aws-glue-libs

Page 51: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Orchestration and resource management

Fully managed, serverless job execution

Page 52: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Job execution: Scheduling and monitoring

Compose jobs globally with event-

based dependencies

Easy to reuse and leverage work across

organization boundaries

Multiple triggering mechanisms

Schedule-based: e.g., time of day

Event-based: e.g., job completion

On-demand: e.g., AWS Lambda

More coming soon: Data Catalog based

events, S3 notifications and Amazon

CloudWatch events

Logs and alerts are available in

Amazon CloudWatch

Marketing: Ad-spend by

customer segment

Event Based

Lambda Trigger

Sales: Revenue by

customer segment

Schedule

Data

based

Central: ROI by

customer

segment

Weekly

sales

Data

based

Page 53: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Job execution: Job bookmarks

For example, you get new files everyday

in your S3 bucket. By default, AWS Glue

keeps track of which files have been

successfully processed by the job to

prevent data duplication.

Option Behavior

Enable Pick up from where you left off

DisableIgnore and process the entire dataset

every time

PauseTemporarily disable advancing the

bookmark

Marketing: Ad-spend by customer segment

Data objects

Glue keeps track of data that has already

been processed by a previous run of an

ETL job. This persisted state information is

called a bookmark.

Page 54: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Job execution: Serverless

Auto-configure VPC and role-based access

Customers can specify the capacity that

gets allocated to each job

Automatically scale resources (on post-GA

roadmap)

You pay only for the resources you

consume while consuming them

There is no need to provision, configure, or

manage servers

Customer VPC Customer VPC

Compute instances

Page 55: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Common use cases for AWS Glue

Page 56: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Understand your data assets

Page 57: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Instantly query your data lake on Amazon S3

Page 58: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

ETL data into your data warehouse

Page 59: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Build event-driven ETL pipelines

Page 60: Serverless Analytics and ETL on AWS presentation- AWS … · Serverless Analytics and ETL on AWS Daniel Haviv Analytics Specialist Solutions Architect AWS dhaviv@amazon.com. Data

Thank you.