hw09 clouderas distribution for hadoop
Post on 14-Jun-2015
2.391 Views
Preview:
TRANSCRIPT
Cloudera’s Distributionfor Hadoop
Oct 2, 2009
Todd Lipcon
(todd@cloudera.com)
What is CDH?
What’s a Distribution?I How many of you get your apache
httpd from apache.org?
I Pretty much everyone uses Linux
distributions to get software
I CDH is a Hadoop distribution in the
same way that Ubuntu is a Linux
distribution
What’s a Distribution?I How many of you get your apache
httpd from apache.org?
I Pretty much everyone uses Linux
distributions to get software
I CDH is a Hadoop distribution in the
same way that Ubuntu is a Linux
distribution
What is CDH?I Apache Hadoop and its ecosystem,
packaged up and easier to install
I RPM, Debian, and tarball installs
I Better Linux citizenship
I Maintained and tested patch series on
top of upstream
I Ecosystem compatibility guarantees
What’s in CDH?
CDH - Included PackagesI Apache Hadoop (MR, HDFS, and
Common)
I Apache Pig
I Apache Hive
I Cloudera Desktop
I HBase and ZooKeeper (contributed by
HBase team)
I ... more to come
Installation OptionsI APT and Yum repositories
I apt-get install hadoop
I yum install hadoop
I hadoop-conf-pseudo package to get
started
I tarball
CDH on Amazon EC2I hadoop-ec2 launch-cluster
todd-cluster 20
I Support for HDFS on EBS volumes
(better performance than S3)
I Cloudera Desktop automatically
installed and launched
I Great if your data is already on EBS or
S3
CDH on Amazon EC2I hadoop-ec2 launch-cluster
todd-cluster 20
I Support for HDFS on EBS volumes
(better performance than S3)I Cloudera Desktop automatically
installed and launchedI Great if your data is already on EBS or
S3I Soon to come: VMware (vCloud) and
Rackspace
Linux citizenshipI Hadoop should act like other software
you’re used toI Configuration using alternatives in
/etc
I Logs in /var/log
I Start/stop with init.d services
Patches in CDHI Get bug fixes earlyI Backport “Safe” new features
I Sqoop, MRUnitI Fair Scheduler on 18I /metrics servletI S3 fixesI etc...
I Backport “Really Safe” performance
patches
What exactly am I getting?I Hadoop in CDH is still Apache 2.0
I Read the changelog:
...hadoop-0.20/cloudera/CHANGES.cloudera.txt
I Read the patches:
...hadoop-0.20/cloudera/patches/
I Build it yourself:
...hadoop-0.20/cloudera/do-release-build
Is this a fork?
Is this a fork?
No way!
Is this a fork?No way!
I All functionality patches submitted
upstream (some build-system patches
only apply to our build)
I We employ 2 committers fulltime, plus
several contributors
I We regularly meet and work with other
community members from Yahoo!,
Facebook, etc.
My one commercial plug...gotta pay the bills
I We provide paid support for CDH
I Someone to call if your cluster is down
I Access to knowledgeable Hadoop
engineers
I Configuration and tuning help
I Process design reviews
I Prioritize patches you need (and hot
fixes for critical issues)
I </salesman>
Versions of CDH
Versions of CDHI Debian versioning schemeI stable
I no new features, lots of “soak time”I comparable to RHEL 5, Ubuntu LTS, or
Debian stableI recommended for critical production
deployments
Versions of CDHI Debian versioning schemeI testing
I considered usable - testing, notuntested!
I has whiz-bang features and newerversions
I recommended for shops who like thebleeding edge, or for those in PoC/devstage
Versions of CDHI CDH1 (stable)
I Released March ’09I Hadoop 0.18.3, Hive 0.3, Pig 0.2I Will become oldstable this winter
I CDH2 (testing)I Released June ’09I Hadoop 0.18.3, Hadoop 0.20.1, Pig 0.5,
Hive 0.4, HBase 0.20I Can install 0.18 and 0.20 at the same
timeI Will become stable this winter
CDH2 Package Versioning
hadoop-0.18-0.18.3+65-1.cloudera.noarch.rpm
A hadoop package based on Apache Hadoop
0.18.3 with 65 patches
hadoop-0.20-0.20.0+4.4-1.cloudera.noarch.rpm
A hadoop package based on Apache Hadoop
0.20.0 with 4 patches in testing, 4
security/critical fixes
Where do I get CDH?
http://archive.cloudera.com/
Questions?
top related