lessons in apache software...
TRANSCRIPT
![Page 2: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/2.jpg)
$ whoami● An open source (UNIX) software developer
– Linux kernel, C/C++ compilers, FFmpeg, Plan9
● A Hadoop guy● Apache Software Foundation Incubator PMC
– [Bigtop], Hadoop Development Tools, Celix, Helix
● VP of Apache Bigtop
![Page 3: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/3.jpg)
Apache Bigtop
“open-source software related to a system for integration, packaging, deployment and validation of a big data management software distribution based on Apache Hadoop”
![Page 4: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/4.jpg)
4
Remember what Debian did to Linux?
GNU Software Linux kernelLinux kernel
![Page 5: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/5.jpg)
5
Bigtop is trying to do it with Hadoop
Hadoop Ecosystem(Pig, Hive, Mahout)
Linux kernelHadoop(HDFS + MR)
CDH4 beta 1
![Page 6: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/6.jpg)
What is missing, really?● Bigdata management platform view● An generalist 'yin' for specialist 'yang'● Shared, community driven:
– Use cases
– Best practices
– Upcoming standards
– Integration with
![Page 7: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/7.jpg)
![Page 8: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/8.jpg)
![Page 9: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/9.jpg)
![Page 10: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/10.jpg)
![Page 11: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/11.jpg)
One way of using ASF software:$ wget http://apache.org/httpd.tar.gz
$ tar xzvf httpd.tar.gz
$ cd httpd
$ ./configure ; make
$ make install
ERROR: can't write to /usr/local/bin
$ sudo make install
![Page 12: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/12.jpg)
A different way:
$ sudo apt-get install httpd
Would you like to also upgrade your conf?
![Page 13: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/13.jpg)
An “ultimate” way:
$ bigtop launch-cluster –config ./hbase.ini
![Page 14: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/14.jpg)
Aren't we already there?
$ whirr launch-cluster –config ./hbase.ini
![Page 15: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/15.jpg)
Key challenges● A really diverse set of components● High churn APIs● Asynchronous development cycles● Combinatoric explosion of dependencies● Java based● Fundamentally distributed applications
![Page 16: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/16.jpg)
16
ZooKeeper (coordination)
HUE (web based UI)
HBase YARN/MR1HBase
HDFS (filesystem)
Pig (DQL) Hive (SQL) Impala (SQL)
Oozie
![Page 17: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/17.jpg)
17
ZooKeeper (coordination)
HUE (web based UI)
HBase YARN/MR1HBase
HDFS (filesystem)
Pig (DQL) Hive (SQL) Impala (SQL)
Oozie
![Page 18: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/18.jpg)
18
It is a jungle out there
Zookeeper
Hadoop
HDFS
YARN
MR1
HTTPFS
HBase
Pig
Hive
Impala
Sqoop
Oozie
Whirr
Mahout
Flume
Giraph
Hama
Hue
Solr
Crunch
JDK/JRE
Kerberos
Ganglia
Nagios
JSVC
Tomcat
Utils
Postgress
HTTPD
![Page 19: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/19.jpg)
19
HBaseHBase
Hadoop (1.0, 0.22, 0.23)
Dependencies Inferno:
Hive 0.8.1
HBaseHbase (0.92, 0.90)
A million dollar question:
$ tar xzvf hive-0.8.1.tar.gz$ ls hive-0.8.1/lib
![Page 20: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/20.jpg)
20
HBaseHBase
Hadoop (1.0, 0.22, 0.23)
Dependencies Inferno:
Hive 0.8.1
HBaseHbase (0.92, 0.90)
A million dollar question:
$ tar xzvf hive-0.8.1.tar.gz$ ls hive-0.8.1/lib
hbase-0.89.jar log4j-1.2.15.jar log4j-1.2.16.jar
![Page 21: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/21.jpg)
Lessons in cat herding● Admitting the problem● On origins of suffering● You can't make “them” do “it”● The real world is highly asynchronous ● The art of making friends● YLH
![Page 22: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/22.jpg)
The origin of suffering is attachment● Don't get attached to your code ● Don't waste your time on ill-maintained code● Don't second guess your users● Do provide capabilities, not polices● Do focus on specialization● Do allow customization
![Page 23: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/23.jpg)
You can't make “them” do “it”● Don't expect common dependencies● Don't expect agreement on use cases● Don't ask – offer:
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>${hadoop.version}</version>
<optional>true</optional> <classifier>hadoop-2.0.2</classifier>
![Page 24: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/24.jpg)
Embrace asynchronous nature ● Don't expect flag days● Don't expect agreement on releases● Do practice Last Known Good Builds
Av1 Bv22
Cv3 Dv4
Av1 Bv2
Cv3 Dv2
........Av1 Bv2
Cv3 Dv4
Bv22
Dv44
![Page 25: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/25.jpg)
Make yourself indispensable ● Be nice● Do provide glue code● Do provide tons of automation● Do provide missing testing● Do participate in upstream communities:
– RC votes
– Release Planning
![Page 26: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/26.jpg)
What does Bigtop offer:● Community focused on all of the above● Software for:
– Integration
– Build (make, Maven)
– Packaging (RPM, DEB)
– Deployment (Puppet)
– Testing (iTest)
● A continuous integration Jenkins server
![Page 27: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/27.jpg)
Who's on-board?● Cloudera
– CDH4 is 100% based on Bigtop (hadoop v2)
● WANdisco● TrendMicro● Hortonworks, EMC, EBay, Intel (partially)● Canonical
– Ubuntu Server: Hadoop and Bigdata blueprint
● Illumos (early stages of interest)
![Page 28: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/28.jpg)
What's happening● A special release: Bigtop 0.3.0-incubating
– Hadoop 1.0.1
● Last stable release: Bigtop 0.5.0– Hadoop 2.0.2-alpha
● Next stable release: Bigtop 0.6.0– End of Mar 2013 release
– Hadoop 2.0.3-beta (DANGER! DANGER!)
– Major focus on developers
![Page 29: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/29.jpg)
What does Bigtop need?● More of you!
– “Silicon Valley Hands-on Programming”http://www.meetup.com/HandsOnProgrammingEvents/
● More infrastructure for build/test– EC2, Supercell, EMC magic cluster,
CloudStack
● More integration tests– Convince your bosses to commit to Bigtop
● Validate upstream release using Bigtop
![Page 30: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc](https://reader034.vdocuments.us/reader034/viewer/2022042802/5f39b56d00e8736e0061e09c/html5/thumbnails/30.jpg)
How to get in touch● Bigtop home @Apache:
– http://bigtop.apache.org/
● Hangout places:– {dev,user}@bigtop.apache.org– #bigtop on Freenode
● Roman Shaposhnik– [email protected], [email protected]