building hadoop based big data environment

39
Building Hadoop Based Big Data Environment Evans Ye @ TWHUG 2013/12/14

Upload: evans-ye

Post on 06-May-2015

1.272 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Building hadoop based big data environment

Building Hadoop Based Big Data Environment

Evans Ye @ TWHUG

2013/12/14

Page 2: Building hadoop based big data environment

• Evans Ye @

• Dumbo Team

• http://dumbointaiwan.blogspot.tw/

Who am I

04/11/2023 Copyright 2013 Trend Micro Inc.

Page 3: Building hadoop based big data environment

• Building your own Hadoop version

• Hadoop Deployment

• Hadoop release engineering

• The development environment

• Bigtop puppet

Agenda

04/11/2023 Copyright 2013 Trend Micro Inc.

Page 4: Building hadoop based big data environment

• Add your own patch at any time– From community perspective, they need to take care about

backward complicity,which need much more time and effort on it.

• Fetch official patches in to current adopted version– You may not upgrade your Hadoop version frequently,

But there’s a specific need for that patch.

• Flexibility, Business needed features

Why Build our own version

04/11/2023 Copyright 2013 Trend Micro Inc.

Page 5: Building hadoop based big data environment

04/11/2023 Copyright 2013 Trend Micro Inc.

As a Beginner

Page 6: Building hadoop based big data environment

04/11/2023 Copyright 2013 Trend Micro Inc.

What’s your work?Build Hadoop Infrastructure

Page 7: Building hadoop based big data environment

04/11/2023 Copyright 2013 Trend Micro Inc.

I thought you just need to yum install Hadoop.….

Page 8: Building hadoop based big data environment

• git clone

• Make some changes

• Builde binary tarball

Brute force

04/11/2023 Copyright 2013 Trend Micro Inc.

core-site.xmlhdfs-site.xml

mapred-site.xml…

How to do version control?

Page 9: Building hadoop based big data environment

04/11/2023 Copyright 2013 Trend Micro Inc.

Bigtop

Page 10: Building hadoop based big data environment

• Apache Hadoop App developers: – Run pseudo-distributed Hadoop cluster to test your code on.

• Vendors: – Build your own Apache Hadoop distribution, customized from

Apache Bigtop bits.

• Packaging, Deployment, Integration Testing

How bigtop helps you

04/11/2023 Copyright 2013 Trend Micro Inc.

Page 11: Building hadoop based big data environment

• Ubuntu 10.10

• CentOS 5/6

• Fedora 18

• Mageia 1

• openSUSE 12.2

Supported Linux Distro

04/11/2023 Copyright 2013 Trend Micro Inc.

Page 12: Building hadoop based big data environment

• Build hadoop-common (see BUILDING.txt)

– hadoop-common$ mvn package –Pdist,docs,src,native -Dtar

• Prepare your src tar in bigtop

• Bigtop$ make hadoop-rpm

Build

04/11/2023 Copyright 2013 Trend Micro Inc.

Page 13: Building hadoop based big data environment

04/11/2023 Copyright 2013 Trend Micro Inc.

Hadoop Deployment

Page 14: Building hadoop based big data environment

• Hadoop related config– core-site.xml– hdfs-site.xml– mapred-site.xml– log4j.properties– hadoop-env.sh– fair-scheduler.xml– rack-topology– hadoop-metrics.properties– taskcontroller.cfg

Configuration files

04/11/2023 Copyright 2013 Trend Micro Inc.

Page 15: Building hadoop based big data environment

• Hadoop related file and directory– Namenode metadata

• /name/1, /name/2– Datanode

• /data/1, /data/2 , /data/3 , /data/4– Tasktracker

• /mapred/1/local, /mapred/2/local– …

Local Directories

04/11/2023 Copyright 2013 Trend Micro Inc.

Page 16: Building hadoop based big data environment

More hadoop ecosystem

04/11/2023 Copyright 2013 Trend Micro Inc.

Page 17: Building hadoop based big data environment

04/11/2023 Copyright 2013 Trend Micro Inc.

• Lots of nodes need to be configured

• Less human involved, less mistake made

• Configuration changed quite often– adjust fair scheduler– enable/disable short circuit– try more performance improvement configurations

Problems to solve

Page 18: Building hadoop based big data environment

04/11/2023 Copyright 2013 Trend Micro Inc.

Hadooppet

Page 19: Building hadoop based big data environment

• A IT automation tool to help system administrators automate the many repetitive tasks

• You need to only define the desired state

What is puppet ?

04/11/2023 Copyright 2013 Trend Micro Inc.

Page 20: Building hadoop based big data environment

• A general hadoop cluster deployment tool based on puppet

• Kerberos / ldap auto configured

• A set of hadoop / kerberos management tool

• A set of sanity check scripts for trend hadoop related services

• Manage configuration on puppetmaster

What is Hadooppet ?

04/11/2023 Copyright 2013 Trend Micro Inc.

Page 21: Building hadoop based big data environment

• Abstract environment specific configurations in a single configuration file

• setup.sh– namenode_fqdns=(“dev1.example.com” “dev2.example.com”)– namenode_dirs=(“/name/1” “/name/2”)– namenode_heap=32g– map_slots=5– reduce_slots=3– …

Design

04/11/2023 Copyright 2013 Trend Micro Inc.

Page 22: Building hadoop based big data environment

• Can be used to setup any kind of hadoop cluster

• When doing main version upgarade, minimal the downtime– hadoop1 hadoop2

Namenode Active/Standby NamenodeSecondarynamenode Journalnodes ZKFC

Benifits

04/11/2023 Copyright 2013 Trend Micro Inc.

Page 23: Building hadoop based big data environment

04/11/2023 Copyright 2013 Trend Micro Inc.

Release Engineering

Page 24: Building hadoop based big data environment

04/11/2023 Copyright 2013 Trend Micro Inc.

• Build src tarball in hadoop-common

• Build rpms in bigtop

• submit build to release yum repo

• yum update on hadoop cluster…

Manually

Page 25: Building hadoop based big data environment

04/11/2023 Copyright 2013 Trend Micro Inc.

• Setup hadoop-common daily build

• Setup Bigtop release Build – should be manually triggered

• Setup Hadooppet daily build– Run sanity checks on a REAL CLUSTER

Continuous Integration

Page 26: Building hadoop based big data environment

04/11/2023 Copyright 2013 Trend Micro Inc.

• Build a Xen Server Cluster

Virtualization

Page 27: Building hadoop based big data environment

04/11/2023 Copyright 2013 Trend Micro Inc.

Page 28: Building hadoop based big data environment

04/11/2023 Copyright 2013 Trend Micro Inc.

• Pycon 2012– Small Python Tools for Software Release Engineering

• An automation tool to manageVM lifecycle

• Use Python XenAPI

• Create temporary VM for testingby self service

• Destroy it when the testingis finished

give-me-vm

Page 29: Building hadoop based big data environment

04/11/2023 Copyright 2013 Trend Micro Inc.

• ./give_me_vm.py

• setup passphraseless ssh between each VM

• set hostname

• Install Hadooppet on master

• run deployment

• run sanity checks

• ./destroy_vm.py

Build auto deployment on Hadooppet

Page 30: Building hadoop based big data environment

04/11/2023 Copyright 2013 Trend Micro Inc.

Page 31: Building hadoop based big data environment

04/11/2023 Copyright 2013 Trend Micro Inc.

Development Environment

Page 32: Building hadoop based big data environment

For hadoop service developers…

• No enough hadoop client for each developers

• Developer can not reach server side while developing hadoop related services

• Can not experiment new technology like impala spark flume

• CI on Hadoop related services

04/11/2023 Copyright 2013 Trend Micro Inc.

Page 33: Building hadoop based big data environment

give-me-vm + Hadoop all-in-one VM

• Use Hadooppet to setup a peudo-distributed hadoop VM as Xenserver template

• get a Hadoop all-in-one VM via give-me-vm

• Services integrate its CI test with hadoop all-in-one VM

04/11/2023 Copyright 2013 Trend Micro Inc.

Page 34: Building hadoop based big data environment

04/11/2023 Copyright 2013 Trend Micro Inc.

Bigtop

puppet

Page 35: Building hadoop based big data environment

Bigtop puppet

• Bigtop also has a set of puppet scripts to deploy Hadoop ecosystem

04/11/2023 Copyright 2013 Trend Micro Inc.

Page 36: Building hadoop based big data environment

Bigtop puppet

• Preparation:– A VM with jdk, puppet installed– mkdir –p /data/{1,2}– git clone https://github.com/apache/bigtop.git

04/11/2023 Copyright 2013 Trend Micro Inc.

Page 37: Building hadoop based big data environment

• There’re many great deployment tool exist– Ambari, CM, ETU appliance– Choose suitable distribution by your business need

• If you want to do it by yourself– Bigtop can do packaging for you easily– Leverage bigtop puppet module for your deployment

Conclusion

04/11/2023 Copyright 2013 Trend Micro Inc.

Page 38: Building hadoop based big data environment

Questions?

Page 39: Building hadoop based big data environment

Thank you !