hadoop engineering bo_f_final

© Hortonworks Inc. 2011

Hadoop Engineering Best Practices

Raja Aluri, Release EngDeepesh Khandelwal, Quality EngRamya Sunil, Quality Eng


Agenda

• Source Mechanics

• Why do System Testing?

• Test Matrix

• Automated Testing Flow

• Test Planning

• Planning your own System Testing

• Q & A

Architecting the Future of Big Data


Apache Hortonworks Partner Source Mechanics

• Hortonworks Open Source Philosophy

• How we do Apache first development

• How we incorporate fixes or features that did not make into apache yet

• How we integrate our partner contributions to the source code

• Bookkeeping of the delta between apache and Hortonworks



Apache-Hortonworks-Partner Source flow


Partner

Apac

he R

ef

HD

P Re

f

Part

ner

HWX

Apac

he R

ef

HD

P

Apache Git

Had

oop

bran

ch-2

Had

oop

bran

ch-2

.4

Issue Type Course of Action

Normal Issue Patch in Apache first

Urgent Issue Patch in HWX Repo first

Read-Write Repository

Read-Only Repository

ContinuousMerges

ContinuousMerges

HDP Build CI

HDP Package

Repo

HDP Maven Repository

Publish Releases

QE Workflow for Testing


Unit Testing

• Test individual parts of the program in isolation, white-box testing• Homogeneous cluster, usually in-memory• One configuration, usually 1 operating system and unsecure• Limited dataset, usually few kilobytes


Unit testing component

A

Unit testing

component C

Unit testing

component B

?? ????

??

DB Interaction

Concurrent user

interaction

Third party connectors

??

??

??


System Testing

• Mimics production environment– Multiple nodes in the cluster– Multiple concurrent users– Different workloads

• Multiple configurations to test• Large dataset, more complex and richer• Encompasses different types of testing

– Functional– Performance, Stress and Reliability– High Availability– Backwards Compatibility– Integration testing– Third party connectors– Upgrade testing



System Testing cont...

• Heterogeneous testing– Cross version testing– Cross operating system testing– Hardware configs like Disk and CPU– Security settings, level of encryption



Test Matrix

• Total of ~15000+ configurations to test!


OS•CentOS•SuSE•Debian•Ubuntu•Windows

JDK•Oracle JDK•OpenJDK•Different version - 1.6.x, 1.7.x, 1.8.x

Security•Disabled•Enabled – MIT-only, AD-only, MIT-AD

•Ranger - enabled/disabled

Encryption•Wire encryption – enabled/disabled

•Transparent Data Encryption – enabled/disabled

DB•Mysql•Oracle•Postgres•MSSQL

File system•HDFS•WASB•Other vendor specific FSs

Others•Tez – enabled/disabled•Slider apps v/s standalone


Automated Testing Flow


Build Job

Apache Repos

Internal Commits

Staging Repo

QE Deploy Trigger

Provision VMs

Deploy HDP Stack

Test Setup & Execution

Test analysis

Continuous Integration Publishing Builds to staging repo

Installer deploying bits from staging repo to test cluster

Bug tracking system


Test Planning

20+ components in the HDP stack and growing!


Test plan

Internal developers

Apache jiras and

community forums

Product Management

Support tickets


Planning your own QATS



Typical user scenarios

• Fresh install• Upgrade stack, going from an earlier release to a newer one• Migration, changing distributions• Applying changes to an existing cluster

– Upgrading hardware in regards to CPU, memory, disks– Changing dependent software pieces like OS, JDK– Changing security settings like turning ON Kerberos, Encryption– Changing component configs in *-site.xml, enabling HA



Planning your own QATS


E2E automation

Preparation phase

• Collect requirements on the stack and workload

• Identify appropriate hardware

CI development phase

• Build in-house CI system for deployment and testing

Testing phase

• Build basic acceptance tests

• End to end automation for your application


Preparation Phase

• Collect the stack requirements– Identify all the stack components that will be installed including the third-party

applications, connectors– Identify the installer– Identify configs

• Hardware selection– Should be scaled appropriately to mimic production environment– Prefer multi-node than single-node with component services distributed

• Collect workload information– Use actual workload whenever possible– If not, simulate the workload, some tools available

– Use rumen to obtain jobtrace from existing clusters

– Use gridmix to generate workload

– Data set size and complexity– Number of concurrent users



CI Development phase

• Implement a CI system– Modularize CI system, eg individual Jenkins jobs for provision, deploy and test

• Determine the cadence of testing• Establish reporting


Provision cluster Deploy Test


Testing Phase

• Basic Acceptance Tests – Basic service check for individual deployed components– Basic acceptance tests to validate integrations

• Establish baseline – to track performance of pipeline components in future

• Compatibility tests (including apps, third party connectors, dashboards etc)

• E2E automation to simulate production workloads



Q & A



Thank You!


hadoop engineering bo_f_final

Engineering

hortonworks page

system testing q

automated testing flow

testing hardware configs

partner source flow

kilobytes page

level of encryption

quality eng page