a continuously deployed hadoop analytics platform?

21
A Con&nuously Deployed Hadoop Analy&cs pla2orm? Graham Gear, Director, Systems Engineering, APJ

Upload: hadoop-summit

Post on 07-Jan-2017

85 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: A Continuously Deployed Hadoop Analytics Platform?

ACon&nuouslyDeployedHadoopAnaly&cspla2orm?GrahamGear,Director,SystemsEngineering,APJ

Page 2: A Continuously Deployed Hadoop Analytics Platform?
Page 3: A Continuously Deployed Hadoop Analytics Platform?

LogicalPilotDeliveryPipeline

Opera&onsMonitor

Provision

Automate

Produc&on

DataScien&sts

Hourly

100%Bugs

Produc&onDataScience

Pre-Produc&onDevelopment

Page 4: A Continuously Deployed Hadoop Analytics Platform?

Produc&onWorksta&on

LogicalNascentDeliveryPipeline

DataEngineers

Monthly

0%Bugs

Development UserAcceptanceTest

BackupData

Produc&onWorkload

DataScience

DS,Analysts,Apps

Monthly-Yearly

100%Bugs

Opera&onsMonitor

Provision

Automate

Development Produc&on

GovernanceAudit

Security

Lineage

Pre-Produc&on

Page 5: A Continuously Deployed Hadoop Analytics Platform?

SLAWorksta&on

LogicalStagedDeliveryPipeline

DevOps

Weekly

10%Bugs

SystemSmokeTest

DataEngineers

Weekly

0%Bugs

Development

DS,Analysts,Apps

Monthly-Yearly

90%Bugs

Opera&onsMonitor

Provision

Automate

Development Pre-Produc&on Produc&on

GovernanceAudit

Security

Lineage

Produc&onUserAcceptanceTest

BackupData

Produc&onWorkload

DataScience

Page 6: A Continuously Deployed Hadoop Analytics Platform?

Non-SLASLAWorksta&on

LogicalManualDeliveryPipeline

DevOps

Weekly

10%Bugs

SystemSmokeTest

DataEngineers

Weekly

0%Bugs

Development UserAcceptanceTest

BackupData

DisasterRecovery

DataScience

DataScien&sts

Weekly-Monthly

60%Bugs

Opera&onsMonitor

Provision

Automate

Development Pre-Produc&on Produc&on

GovernanceAudit

Security

Lineage

SLA

Analysts,Apps

Monthly-Yearly

30%Bugs

Produc&onWorkload

Page 7: A Continuously Deployed Hadoop Analytics Platform?

LogicalCon&nuousDeliveryPipeline

Test

Ar&factRepo

Build

AcceptanceTest

ReleaseAr&fact

UnitSuiteTest

BakeAr&fact

DeployPipeline

DevOps

Hourly–Daily

15%Bugs

SystemSmokeTestWorksta&on

SourceRepoReleaseTag

DataEngineers

Hourly

70%Bugs

LightUnitTest

DevelopmentNon-SLA

UserAcceptanceTest

BackupData

DisasterRecovery

DataScience

DataScien&sts

Weekly-Monthly

15%Bugs

AcceptanceTest

Opera&onsMonitor

Provision

Automate

Development Pre-Produc&on Produc&on

GovernanceAudit

Security

Lineage

SLA

Analysts,Apps

Weekly-Monthly

0%Bugs

Produc&onWorkload

Page 8: A Continuously Deployed Hadoop Analytics Platform?

SourceRepo

Git

Gerrit

PhysicalCon&nuousDeliveryPipeline

Serial-tenant

<10nodes

Physical,Cloud

Single-tenant

1Laptop,Desktop

Physical

Development Pre-Produc&on Produc&on

Worksta&onSynthe&cData

CDHSingleNode

Eclipse,Maven

Linux,OS-X

Test

CDHCluster

Synthe&cData

Build

ClouderaDirector

Jenkins

Ar&factRepo

ParcelRepository

Ar&fcatory

Mul&-tenant

>10nodes

Physical,Cloud

Non-SLAProduc&onData

CDHCluster

DSWorkbench

Mul&-tenant

>10nodes

Physical,Cloud

SLAProduc&onData

CDHCluster

TableauJDBC

Opera&ons

ClouderaBDR

ClouderaManagerGovernance

ClouderaOp&mizer

ClouderaNavigator

Page 9: A Continuously Deployed Hadoop Analytics Platform?
Page 10: A Continuously Deployed Hadoop Analytics Platform?

SourceRepo

Git

Gerrit

DataEngineerDevelopmentPipeline

Serial-tenant

<10nodes

Physical,Cloud

Pre-Produc&on Produc&on

Test

CDHCluster

Synthe&cData

Build

ClouderaDirector

Jenkins

Ar&factRepo

ParcelRepository

Ar&fcatory

Mul&-tenant

>10nodes

Physical,Cloud

Non-SLAProduc&onData

CDHCluster

DSWorkbench

Mul&-tenant

>10nodes

Physical,Cloud

SLAProduc&onData

CDHCluster

TableauJDBC

Opera&ons

ClouderaBDR

ClouderaManagerGovernance

ClouderaOp&mizer

ClouderaNavigator

Single-tenant

1Laptop,Desktop

Physical

Worksta&onSynthe&cData

CDHSingleNode

Eclipse,Maven

Linux,OS-X

Development

1.  CreateaMavenmodulefromaMavenArchetype,providingabaselineprojectencodingallcorporatestandardsandandtarge&ngaspecificproduc&onversion

2.  Developadatasetingestandprepara&onpipelineusingFlume,Kaca,HiveandMapReduceusingEclipseandMaven

3.  Buildasuiteofunittestsandsynthe&cdatatoexercisethecodebase,iden&fyingandresolvingbugs

Page 11: A Continuously Deployed Hadoop Analytics Platform?

SourceRepo

Git

Gerrit

DataEngineerSourcePipeline

Serial-tenant

<10nodes

Physical,Cloud

Pre-Produc&on Produc&on

Test

CDHCluster

Synthe&cData

Build

ClouderaDirector

Jenkins

Ar&factRepo

ParcelRepository

Ar&fcatory

Mul&-tenant

>10nodes

Physical,Cloud

Non-SLAProduc&onData

CDHCluster

DSWorkbench

Mul&-tenant

>10nodes

Physical,Cloud

SLAProduc&onData

CDHCluster

TableauJDBC

Opera&ons

ClouderaBDR

ClouderaManagerGovernance

ClouderaOp&mizer

ClouderaNavigator

Single-tenant

1Laptop,Desktop

Physical

Worksta&onSynthe&cData

CDHSingleNode

Eclipse,Maven

Linux,OS-X

Development

1.  ViaMaven,GerritandGitinterac&ons,showdeveloperini&atedprojectsourcecodestages:

•  Stage•  Review•  Commit•  Release

Page 12: A Continuously Deployed Hadoop Analytics Platform?

SourceRepo

Git

Gerrit

AutomatedBakePipeline

Serial-tenant

<10nodes

Physical,Cloud

Single-tenant

1Laptop,Desktop

Physical

Produc&on

Worksta&onSynthe&cData

CDHSingleNode

Eclipse,Maven

Linux,OS-X

Test

CDHCluster

Synthe&cData

Build

ClouderaDirector

Jenkins

Ar&factRepo

ParcelRepository

Ar&fcatory

Mul&-tenant

>10nodes

Physical,Cloud

Non-SLAProduc&onData

CDHCluster

DSWorkbench

Mul&-tenant

>10nodes

Physical,Cloud

SLAProduc&onData

CDHCluster

TableauJDBC

Opera&ons

ClouderaBDR

ClouderaManagerGovernance

ClouderaOp&mizer

ClouderaNavigator

1.  Simulateanautoma&callytriggeredJenkinsunittestsuite,bakeandsmoketestpipelineagainstaDirectorprovisionedTestclusterservedbyAr&fcatoryandParcelrepositories

Pre-Produc&onDevelopment

Page 13: A Continuously Deployed Hadoop Analytics Platform?

AutomatedDeploy&TestPipeline

Serial-tenant

<10nodes

Physical,Cloud

Single-tenant

1Laptop,Desktop

Physical

Development

Worksta&onSynthe&cData

CDHSingleNode

Eclipse,Maven

Linux,OS-X

Test

CDHCluster

Synthe&cData

Build

ClouderaDirector

Jenkins

Ar&factRepo

ParcelRepository

Ar&fcatory

Mul&-tenant

>10nodes

Physical,Cloud

Non-SLAProduc&onData

CDHCluster

DSWorkbench

Mul&-tenant

>10nodes

Physical,Cloud

SLAProduc&onData

CDHCluster

TableauJDBC

Opera&ons

ClouderaBDR

ClouderaManagerGovernance

ClouderaOp&mizer

ClouderaNavigator

SourceRepo

Git

Gerrit

1.  Showdeploy,smokeanduseracceptanceteststages,crea&ngtheopera&onalManagerdashboardsandNavigatormeta-data

Pre-Produc&on Produc&on

Page 14: A Continuously Deployed Hadoop Analytics Platform?

DataScien&st&AnalystDevPipeline

Serial-tenant

<10nodes

Physical,Cloud

Single-tenant

1Laptop,Desktop

Physical

Development Pre-Produc&on

Worksta&onSynthe&cData

CDHSingleNode

Eclipse,Maven

Linux,OS-X

Test

CDHCluster

Synthe&cData

Mul&-tenant

>10nodes

Physical,Cloud

Non-SLAProduc&onData

CDHCluster

DSWorkbench

Mul&-tenant

>10nodes

Physical,Cloud

SLAProduc&onData

CDHCluster

TableauJDBC

Opera&ons

ClouderaBDR

ClouderaManagerGovernance

ClouderaOp&mizer

ClouderaNavigator

SourceRepo

Git

Gerrit

Build

ClouderaDirector

Jenkins

Ar&factRepo

ParcelRepository

Ar&fcatory

1.  QuerydatasetusingImpalaviaHue,captureSQLlogsandfeedthembackintoOp&mizerandproject,showdependencyverifica&onunderschemaevolu&on

2.  AnalysedatasetusingPythonandIbisviatheDSWorkbenchapplica&onfeedingbackintoproject,showdependencycheckingduringdatasetevolu&on

Produc&on

Page 15: A Continuously Deployed Hadoop Analytics Platform?

SourceRepo

Git

Gerrit

Applica&onDeliveryPipeline

Serial-tenant

<10nodes

Physical,Cloud

Single-tenant

1Laptop,Desktop

Physical

Development Pre-Produc&on Produc&on

Worksta&onSynthe&cData

CDHSingleNode

Eclipse,Maven

Linux,OS-X

Test

CDHCluster

Synthe&cData

Build

ClouderaDirector

Jenkins

Ar&factRepo

ParcelRepository

Ar&fcatory

Mul&-tenant

>10nodes

Physical,Cloud

Non-SLAProduc&onData

CDHCluster

DSWorkbench

Mul&-tenant

>10nodes

Physical,Cloud

SLAProduc&onData

CDHCluster

TableauJDBC

Opera&ons

ClouderaBDR

ClouderaManagerGovernance

ClouderaOp&mizer

ClouderaNavigator

1.  Applica&onrevpipeline,showcomparisontopreviousversion

Page 16: A Continuously Deployed Hadoop Analytics Platform?

Pla2ormDeliveryPipeline

Serial-tenant

<10nodes

Physical,Cloud

Development Pre-Produc&on Produc&on

Test

CDHCluster

Synthe&cData

Build

ClouderaDirector

Jenkins

Ar&factRepo

ParcelRepository

Ar&fcatory

Mul&-tenant

>10nodes

Physical,Cloud

Non-SLAProduc&onData

CDHCluster

DSWorkbench

Mul&-tenant

>10nodes

Physical,Cloud

SLAProduc&onData

CDHCluster

TableauJDBC

Opera&ons

ClouderaBDR

ClouderaManagerGovernance

ClouderaOp&mizer

ClouderaNavigator

Single-tenant

1Laptop,Desktop

Physical

Worksta&onSynthe&cData

CDHSingleNode

Eclipse,Maven

Linux,OS-X

SourceRepo

Git

Gerrit

1.  Pla2ormrevpipeline,showcomparisontopreviousversion

Page 17: A Continuously Deployed Hadoop Analytics Platform?

LogicalCon&nuousDeliveryPipeline

Test

Ar&factRepo

Build

AcceptanceTest

ReleaseAr&fact

UnitSuiteTest

BakeAr&fact

DeployPipeline

DevOps

Hourly–Daily

15%Bugs

SystemSmokeTestWorksta&on

SourceRepoReleaseTag

DataEngineers

Hourly

70%Bugs

LightUnitTest

DevelopmentNon-SLA

UserAcceptanceTest

BackupData

DisasterRecovery

DataScience

DataScien&sts

Weekly-Monthly

15%Bugs

AcceptanceTest

Opera&onsMonitor

Provision

Automate

Development Pre-Produc&on Produc&on

GovernanceAudit

Security

Lineage

SLA

Analysts,Apps

Weekly-Monthly

0%Bugs

Produc&onWorkload

Page 18: A Continuously Deployed Hadoop Analytics Platform?
Page 19: A Continuously Deployed Hadoop Analytics Platform?
Page 20: A Continuously Deployed Hadoop Analytics Platform?
Page 21: A Continuously Deployed Hadoop Analytics Platform?

Ques&ons?• Cloudera Framework Example

•  https://github.com/ggear/cloudera-framework

• Cloudera Parcel Maven Plugin •  https://github.com/ggear/cloudera-parcel

• Cloudera Manager API •  https://cloudera.github.io/cm_api/apidocs/v12/index.html

• Cloudera Navigator API •  http://cloudera.github.io/navigator/apidocs/v3

• Cloudera Director •  https://director.cloudera.com

• Cloudera Optimizer •  https://optimizer.cloudera.com [email protected]