my talk at lvee 2016
TRANSCRIPT
![Page 1: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/1.jpg)
Using Hadoop stack to build a cloud VATdeclarations revising service
Alex ChistyakovGit in Sky
Grodno, LVEE 2016
![Page 2: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/2.jpg)
Who I am
● Hello, my name is Alex
● Principal Engineer @ Git in Sky
● Hadoop operations engineer
● Former Java developer (not only Java and not so
“former” in fact)
![Page 3: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/3.jpg)
Who are you?
● Linux and OSS enthusiasts?
● Software developers?
● DevOps engineers?
● Big data guys?
![Page 4: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/4.jpg)
Well, what is this all about?
● Configuring a Hadoop/HBase cluster is easy
![Page 5: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/5.jpg)
Well, what is this all about?
● Configuring a Hadoop/HBase cluster is easy
● 1) Buy a lot of hardware
![Page 6: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/6.jpg)
Well, what is this all about?
● Configuring a Hadoop/HBase cluster is easy
● 1) Buy a lot of hardware
● 2) Configure the bloody cluster!
![Page 7: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/7.jpg)
Well, what is this all about?
● Configuring a Hadoop/HBase cluster is easy
● 1) Buy a lot of hardware
● 2) Configure the bloody cluster!
● 3) ???
![Page 8: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/8.jpg)
Well, what is this all about?
● Configuring a Hadoop/HBase cluster is easy
● 1) Buy a lot of hardware
● 2) Configure the bloody cluster!
● 3) ???
● 4) PROFIT!!!
![Page 9: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/9.jpg)
Big Data is hard!
● A customer wants a number of environments fordifferent purposes (dev, testing, staging &production)
● DevOps culture requires repeatability!
● (Observe a beautiful snowflake to the right)
● Business wants to reduce costs
![Page 10: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/10.jpg)
So, we need a detailed plan
● 1) Buy an enterprise subscription from Oracle
![Page 11: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/11.jpg)
So, we need a detailed plan
● 1) Buy an enterprise subscription from Oracle
● ^ FAIL!
![Page 12: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/12.jpg)
So, we need a detailed plan
● 1) Read the manual on the product site
![Page 13: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/13.jpg)
So, we need a detailed plan
● 1) Read the manual on the product site
● 2) Configure everything manually
![Page 14: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/14.jpg)
So, we need a detailed plan
● 1) Read the manual on the product site
● 2) Configure everything manually
● ^ FAIL!
![Page 15: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/15.jpg)
So, we need a detailed plan
● 1) Take Cloudera distribution of Hadoop
![Page 16: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/16.jpg)
So, we need a detailed plan
● 1) Take Cloudera distribution of Hadoop
● 2) Configure everything from a web interface
![Page 17: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/17.jpg)
So, we need a detailed plan
● 1) Take Cloudera distribution of Hadoop
● 2) Configure everything from a web interface
● 3) Don’t forget to buy an enterprise subscription
![Page 18: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/18.jpg)
So, we need a detailed plan
● 1) Take Cloudera distribution of Hadoop
● 2) Configure everything from a web interface
● 3) Don’t forget to buy an enterprise subscription
● 4) ^ MULTIPLE FAILS!!!
![Page 19: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/19.jpg)
A word on proprietary software
● Proprietary software is full of nasty bugs, period
![Page 20: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/20.jpg)
A word on open source software
● Open source software is awesome
![Page 21: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/21.jpg)
Software market in 2016
● It’s not “proprietary vs open source”
![Page 22: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/22.jpg)
Software market in 2016
● It’s not “proprietary vs open source”
● It’s “open source vs open source”
![Page 23: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/23.jpg)
Open source vs open source
● Cloudera CDH vs vanilla Apache
![Page 24: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/24.jpg)
So, we need a detailed plan
● 1) Hire a DevOps engineer
![Page 25: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/25.jpg)
So, we need a detailed plan
● 1) Hire a DevOps engineer
● 2) Use Chef or something
![Page 26: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/26.jpg)
So, we need a detailed plan
● 1) Hire a DevOps engineer
● 2) Use Chef or something
● 3) Automate all the things
![Page 27: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/27.jpg)
So, we need a detailed plan
● 1) Hire a DevOps engineer
● 2) Use Chef or something
● 3) Automate all the things
● 4) ???
![Page 28: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/28.jpg)
So, we need a detailed plan
● 1) Hire a DevOps engineer
● 2) Use Chef or something
● 3) Automate all the things
● 4) ???
● 5) PROFIT!!!
![Page 29: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/29.jpg)
100 reasons not to use Cloudera CDH
● Cloudera CDH obscures configuration
● Cloudera CDH generates textual configs from the DB
● Cloudera CDH is web-interface centric
● Cloudera CDH is a monolith with a vendor lock-in
![Page 30: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/30.jpg)
Our own little open source product
● Based on Ansible (Ansible is like Chef but awesome)
● https://github.com/gitinsky/ansible-hadoop-stack-howto
● https://github.com/gitinsky/ansible-role-*
![Page 31: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/31.jpg)
Problems
● Lack of documentation
![Page 32: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/32.jpg)
Problems
● Lack of documentation
● Lack of manpower
![Page 33: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/33.jpg)
Problems
● Lack of documentation
● Lack of manpower
● Nobody uses our product (except us)
![Page 34: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/34.jpg)
What about the VAT service thing?
● Forget it, it’s not that relevant
![Page 35: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/35.jpg)
Conclusions
● Open source software is awesome
● But Cloudera CDH is not
● We can make open source software better
![Page 36: My talk at LVEE 2016](https://reader031.vdocuments.us/reader031/viewer/2022030305/5872cdb21a28ab74188b46db/html5/thumbnails/36.jpg)
So long, and thanks for all the fish!
● Ask your questions please
● Alex Chistyakov, Principal Engineer @ Git in Sky
● http://gitinsky.com
● http://meetup.com/DevOps-40