un orquestador en la nube: azure data factory (por carlos sacristán)

22
2017 - Madrid Overview of Azure Data Factory Carlos Sacristán Data & Analytics Solution Architect, Kabel #GIBMad2017

Upload: jorge-millan-cabrera

Post on 11-Apr-2017

100 views

Category:

Technology


0 download

TRANSCRIPT

2017 - Madrid

Overview of Azure Data FactoryCarlos SacristánData & Analytics Solution Architect, Kabel

#GIBMad2017

Who am I?

Carlos SacristánData & Analytics Solution Architect, Kabel

[email protected]

https://twitter.com/sacrisql

+34 649 425 928

https://www.linkedin.com/in/csacristan/

#GIBMad2017

Agenda

#GIBMad2017

#GIBMad2017

What is Azure Data Factory

ADF is a cloud-based data integration service that

orchestrates and automatesthe movement and transformation of data

Think of it like a manufacturing factory running equipment to take the raw materials and transform them into finished goods

#GIBMad2017

What is ADF

Mmmm… but we already have things like Integration Services or Stream Analytics

#GIBMad2017

#GIBMad2017

Two words…

#GIBMad2017

Evolving approaches to analytics

#GIBMad2017

#GIBMad2017

Just four concepts

#GIBMad2017

Linked Services

#GIBMad2017

Datasets

#GIBMad2017

Activities

Pipelines

#GIBMad2017

Just one thing. Scheduling

Pipeline Active Periods

Activity Schedule

Dataset Availability

#GIBMad2017

Just one thing. Scheduling

#GIBMad2017

So, recap: when is executed an Activity?

#GIBMad2017

#GIBMad2017

Process large-scaled datasets with ADF and Azure Batch

Customer Churn

Azure Blob Storage

Game Log Files

Customer Table

On Premises

Data Mart

Game Logs

Customer Table

Azure DB

Customer

Game Usage

Visualize

Data Set(Collection of files, DB table, etc)

Activity: a processing step (Hadoop job, custom code, ML model, etc)

Pipeline: a sequence of activities (logical group)

Data Sources Ingest Transform & Analyze Publish

Customer

TableGeocode

Transform, Combine, etc Analyze Move

#GIBMad2017

Microsoft Ignite

Thanks!

#GIBMad2017