a continuously deployed hadoop analytics platform?
TRANSCRIPT
ACon&nuouslyDeployedHadoopAnaly&cspla2orm?GrahamGear,Director,SystemsEngineering,APJ
LogicalPilotDeliveryPipeline
Opera&onsMonitor
Provision
Automate
Produc&on
DataScien&sts
Hourly
100%Bugs
Produc&onDataScience
Pre-Produc&onDevelopment
Produc&onWorksta&on
LogicalNascentDeliveryPipeline
DataEngineers
Monthly
0%Bugs
Development UserAcceptanceTest
BackupData
Produc&onWorkload
DataScience
DS,Analysts,Apps
Monthly-Yearly
100%Bugs
Opera&onsMonitor
Provision
Automate
Development Produc&on
GovernanceAudit
Security
Lineage
Pre-Produc&on
SLAWorksta&on
LogicalStagedDeliveryPipeline
DevOps
Weekly
10%Bugs
SystemSmokeTest
DataEngineers
Weekly
0%Bugs
Development
DS,Analysts,Apps
Monthly-Yearly
90%Bugs
Opera&onsMonitor
Provision
Automate
Development Pre-Produc&on Produc&on
GovernanceAudit
Security
Lineage
Produc&onUserAcceptanceTest
BackupData
Produc&onWorkload
DataScience
Non-SLASLAWorksta&on
LogicalManualDeliveryPipeline
DevOps
Weekly
10%Bugs
SystemSmokeTest
DataEngineers
Weekly
0%Bugs
Development UserAcceptanceTest
BackupData
DisasterRecovery
DataScience
DataScien&sts
Weekly-Monthly
60%Bugs
Opera&onsMonitor
Provision
Automate
Development Pre-Produc&on Produc&on
GovernanceAudit
Security
Lineage
SLA
Analysts,Apps
Monthly-Yearly
30%Bugs
Produc&onWorkload
LogicalCon&nuousDeliveryPipeline
Test
Ar&factRepo
Build
AcceptanceTest
ReleaseAr&fact
UnitSuiteTest
BakeAr&fact
DeployPipeline
DevOps
Hourly–Daily
15%Bugs
SystemSmokeTestWorksta&on
SourceRepoReleaseTag
DataEngineers
Hourly
70%Bugs
LightUnitTest
DevelopmentNon-SLA
UserAcceptanceTest
BackupData
DisasterRecovery
DataScience
DataScien&sts
Weekly-Monthly
15%Bugs
AcceptanceTest
Opera&onsMonitor
Provision
Automate
Development Pre-Produc&on Produc&on
GovernanceAudit
Security
Lineage
SLA
Analysts,Apps
Weekly-Monthly
0%Bugs
Produc&onWorkload
SourceRepo
Git
Gerrit
PhysicalCon&nuousDeliveryPipeline
Serial-tenant
<10nodes
Physical,Cloud
Single-tenant
1Laptop,Desktop
Physical
Development Pre-Produc&on Produc&on
Worksta&onSynthe&cData
CDHSingleNode
Eclipse,Maven
Linux,OS-X
Test
CDHCluster
Synthe&cData
Build
ClouderaDirector
Jenkins
Ar&factRepo
ParcelRepository
Ar&fcatory
Mul&-tenant
>10nodes
Physical,Cloud
Non-SLAProduc&onData
CDHCluster
DSWorkbench
Mul&-tenant
>10nodes
Physical,Cloud
SLAProduc&onData
CDHCluster
TableauJDBC
Opera&ons
ClouderaBDR
ClouderaManagerGovernance
ClouderaOp&mizer
ClouderaNavigator
SourceRepo
Git
Gerrit
DataEngineerDevelopmentPipeline
Serial-tenant
<10nodes
Physical,Cloud
Pre-Produc&on Produc&on
Test
CDHCluster
Synthe&cData
Build
ClouderaDirector
Jenkins
Ar&factRepo
ParcelRepository
Ar&fcatory
Mul&-tenant
>10nodes
Physical,Cloud
Non-SLAProduc&onData
CDHCluster
DSWorkbench
Mul&-tenant
>10nodes
Physical,Cloud
SLAProduc&onData
CDHCluster
TableauJDBC
Opera&ons
ClouderaBDR
ClouderaManagerGovernance
ClouderaOp&mizer
ClouderaNavigator
Single-tenant
1Laptop,Desktop
Physical
Worksta&onSynthe&cData
CDHSingleNode
Eclipse,Maven
Linux,OS-X
Development
1. CreateaMavenmodulefromaMavenArchetype,providingabaselineprojectencodingallcorporatestandardsandandtarge&ngaspecificproduc&onversion
2. Developadatasetingestandprepara&onpipelineusingFlume,Kaca,HiveandMapReduceusingEclipseandMaven
3. Buildasuiteofunittestsandsynthe&cdatatoexercisethecodebase,iden&fyingandresolvingbugs
SourceRepo
Git
Gerrit
DataEngineerSourcePipeline
Serial-tenant
<10nodes
Physical,Cloud
Pre-Produc&on Produc&on
Test
CDHCluster
Synthe&cData
Build
ClouderaDirector
Jenkins
Ar&factRepo
ParcelRepository
Ar&fcatory
Mul&-tenant
>10nodes
Physical,Cloud
Non-SLAProduc&onData
CDHCluster
DSWorkbench
Mul&-tenant
>10nodes
Physical,Cloud
SLAProduc&onData
CDHCluster
TableauJDBC
Opera&ons
ClouderaBDR
ClouderaManagerGovernance
ClouderaOp&mizer
ClouderaNavigator
Single-tenant
1Laptop,Desktop
Physical
Worksta&onSynthe&cData
CDHSingleNode
Eclipse,Maven
Linux,OS-X
Development
1. ViaMaven,GerritandGitinterac&ons,showdeveloperini&atedprojectsourcecodestages:
• Stage• Review• Commit• Release
SourceRepo
Git
Gerrit
AutomatedBakePipeline
Serial-tenant
<10nodes
Physical,Cloud
Single-tenant
1Laptop,Desktop
Physical
Produc&on
Worksta&onSynthe&cData
CDHSingleNode
Eclipse,Maven
Linux,OS-X
Test
CDHCluster
Synthe&cData
Build
ClouderaDirector
Jenkins
Ar&factRepo
ParcelRepository
Ar&fcatory
Mul&-tenant
>10nodes
Physical,Cloud
Non-SLAProduc&onData
CDHCluster
DSWorkbench
Mul&-tenant
>10nodes
Physical,Cloud
SLAProduc&onData
CDHCluster
TableauJDBC
Opera&ons
ClouderaBDR
ClouderaManagerGovernance
ClouderaOp&mizer
ClouderaNavigator
1. Simulateanautoma&callytriggeredJenkinsunittestsuite,bakeandsmoketestpipelineagainstaDirectorprovisionedTestclusterservedbyAr&fcatoryandParcelrepositories
Pre-Produc&onDevelopment
AutomatedDeploy&TestPipeline
Serial-tenant
<10nodes
Physical,Cloud
Single-tenant
1Laptop,Desktop
Physical
Development
Worksta&onSynthe&cData
CDHSingleNode
Eclipse,Maven
Linux,OS-X
Test
CDHCluster
Synthe&cData
Build
ClouderaDirector
Jenkins
Ar&factRepo
ParcelRepository
Ar&fcatory
Mul&-tenant
>10nodes
Physical,Cloud
Non-SLAProduc&onData
CDHCluster
DSWorkbench
Mul&-tenant
>10nodes
Physical,Cloud
SLAProduc&onData
CDHCluster
TableauJDBC
Opera&ons
ClouderaBDR
ClouderaManagerGovernance
ClouderaOp&mizer
ClouderaNavigator
SourceRepo
Git
Gerrit
1. Showdeploy,smokeanduseracceptanceteststages,crea&ngtheopera&onalManagerdashboardsandNavigatormeta-data
Pre-Produc&on Produc&on
DataScien&st&AnalystDevPipeline
Serial-tenant
<10nodes
Physical,Cloud
Single-tenant
1Laptop,Desktop
Physical
Development Pre-Produc&on
Worksta&onSynthe&cData
CDHSingleNode
Eclipse,Maven
Linux,OS-X
Test
CDHCluster
Synthe&cData
Mul&-tenant
>10nodes
Physical,Cloud
Non-SLAProduc&onData
CDHCluster
DSWorkbench
Mul&-tenant
>10nodes
Physical,Cloud
SLAProduc&onData
CDHCluster
TableauJDBC
Opera&ons
ClouderaBDR
ClouderaManagerGovernance
ClouderaOp&mizer
ClouderaNavigator
SourceRepo
Git
Gerrit
Build
ClouderaDirector
Jenkins
Ar&factRepo
ParcelRepository
Ar&fcatory
1. QuerydatasetusingImpalaviaHue,captureSQLlogsandfeedthembackintoOp&mizerandproject,showdependencyverifica&onunderschemaevolu&on
2. AnalysedatasetusingPythonandIbisviatheDSWorkbenchapplica&onfeedingbackintoproject,showdependencycheckingduringdatasetevolu&on
Produc&on
SourceRepo
Git
Gerrit
Applica&onDeliveryPipeline
Serial-tenant
<10nodes
Physical,Cloud
Single-tenant
1Laptop,Desktop
Physical
Development Pre-Produc&on Produc&on
Worksta&onSynthe&cData
CDHSingleNode
Eclipse,Maven
Linux,OS-X
Test
CDHCluster
Synthe&cData
Build
ClouderaDirector
Jenkins
Ar&factRepo
ParcelRepository
Ar&fcatory
Mul&-tenant
>10nodes
Physical,Cloud
Non-SLAProduc&onData
CDHCluster
DSWorkbench
Mul&-tenant
>10nodes
Physical,Cloud
SLAProduc&onData
CDHCluster
TableauJDBC
Opera&ons
ClouderaBDR
ClouderaManagerGovernance
ClouderaOp&mizer
ClouderaNavigator
1. Applica&onrevpipeline,showcomparisontopreviousversion
Pla2ormDeliveryPipeline
Serial-tenant
<10nodes
Physical,Cloud
Development Pre-Produc&on Produc&on
Test
CDHCluster
Synthe&cData
Build
ClouderaDirector
Jenkins
Ar&factRepo
ParcelRepository
Ar&fcatory
Mul&-tenant
>10nodes
Physical,Cloud
Non-SLAProduc&onData
CDHCluster
DSWorkbench
Mul&-tenant
>10nodes
Physical,Cloud
SLAProduc&onData
CDHCluster
TableauJDBC
Opera&ons
ClouderaBDR
ClouderaManagerGovernance
ClouderaOp&mizer
ClouderaNavigator
Single-tenant
1Laptop,Desktop
Physical
Worksta&onSynthe&cData
CDHSingleNode
Eclipse,Maven
Linux,OS-X
SourceRepo
Git
Gerrit
1. Pla2ormrevpipeline,showcomparisontopreviousversion
LogicalCon&nuousDeliveryPipeline
Test
Ar&factRepo
Build
AcceptanceTest
ReleaseAr&fact
UnitSuiteTest
BakeAr&fact
DeployPipeline
DevOps
Hourly–Daily
15%Bugs
SystemSmokeTestWorksta&on
SourceRepoReleaseTag
DataEngineers
Hourly
70%Bugs
LightUnitTest
DevelopmentNon-SLA
UserAcceptanceTest
BackupData
DisasterRecovery
DataScience
DataScien&sts
Weekly-Monthly
15%Bugs
AcceptanceTest
Opera&onsMonitor
Provision
Automate
Development Pre-Produc&on Produc&on
GovernanceAudit
Security
Lineage
SLA
Analysts,Apps
Weekly-Monthly
0%Bugs
Produc&onWorkload
Ques&ons?• Cloudera Framework Example
• https://github.com/ggear/cloudera-framework
• Cloudera Parcel Maven Plugin • https://github.com/ggear/cloudera-parcel
• Cloudera Manager API • https://cloudera.github.io/cm_api/apidocs/v12/index.html
• Cloudera Navigator API • http://cloudera.github.io/navigator/apidocs/v3
• Cloudera Director • https://director.cloudera.com
• Cloudera Optimizer • https://optimizer.cloudera.com [email protected]