it shouldn’t be a cost-center with mesos -...

42
IT shouldn’t be a cost-center with Mesos Imran Shaikh Lead/Architect Blog http://elasticcompute.io @imranshaikh LinuxCon 2016

Upload: phamdieu

Post on 06-Mar-2018

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

IT shouldn’t be a cost-center with Mesos

Imran ShaikhLead/Architect

Blog http://elasticcompute.io@imranshaikh

LinuxCon 2016

Page 2: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

About Me• Lead/Architect• Leading the Mesos & Containers initiative at YP• Manages thousands of server infrastructure in multiple data centers• Presented at various conferences about containers & solutions

– USENIX LISA -2015– MesosCon -2016/2015– SCALE (Southern California Linux Expo) -2016

LinuxCon 2016

Page 3: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

Agenda• Ops a cost-center• Common Ops problems

– Static provisioning– Wasted capacity– Maintenance window– Silo’ed teams

• DevOps as a solution• DevOps in practice - Mesos • How does Mesos solve these problems?• How can you benefit?• Future Ahead• Q/A

LinuxCon 2016

Page 4: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

Ops as cost-center• Doesn't produce direct profit for the company• Removing or scaling down Ops have detrimental affect on the profit

margin• Typically cost-centers are given autonomy• Typically Ops bosses have responsibility

– to manage financial performance– keeping it under budget– Accounting for expenditures

• Over time, Ops is streamlined, process improved etc.– Thereby reducing the overall cost

• All the C-Suite bosses are happy to run the fat checks• At the end, C-Suite boss decides which cost-center grows and which

one gets slashed.• This is basically what is happening across all the companies

everywhere

LinuxCon 2016

Page 5: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

Ops as cost-center• Why it’s hard to change?

– Mindset is to keep the lights-on– 70 to 80% of the budget goes in maintaining existing infrastructure and

applications– That gives very little room to pursue new direction for the Ops org

• Since it cannot be changed (and there is a constant urge to improve):– Cloud computing, outsourcing and offshoring has been common– They made maintenance costs more predictable and easier to measure

and manage• Argument is by doing that (Hiner, J. 2014)

– Ops can focus more on exploring new vendors– Upgrading software– and looking for new solutions that could save the company money,

leapfrog competitors, or break into new markets.

LinuxCon 2016

Page 6: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

Ops as cost-center• I have a genuine problem with this word

“cost-center” because I don’t consider myself as a burden

• I consider myself as:– Providing value to you– Serving you– I want to improve your product– If I cant replace you, I want to at least augment

you• If I can live with that motto, why the dept. I

work in can’t do the same ?

LinuxCon 2016

Page 7: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

Common Ops problems

LinuxCon 2016

Page 8: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

Problem 1: Static provisioning - resources

LinuxCon 2016

Page 9: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

Static provisioning - resources

LinuxCon 2016

Fig: (Angry gorilla, n.d.).

Consider your app as hero and the footprint it requires

to run as villain.Initially, your villain is small

& timid

Page 10: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

LinuxCon 2016

As your product grows, so is your villain. You

need to scale your resources vertically

Page 11: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

LinuxCon 2016

And if your product becomes mature

enough, you have grown your villain so much that

it becomes invincible

Page 12: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

LinuxCon 2016

And if you have more products, so are your

villains

Page 13: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

LinuxCon 2016

And at one point, you would running

datacenters full of these villains

Page 14: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

LinuxCon 2016

So what do you do next? You hire people like me to

manage these villains

Page 15: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

Problem 2: Static provisioning - people

LinuxCon 2016

Page 16: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

Static provisioning - people

LinuxCon 2016

SysAdmins DBAs

SREs Network Engrs

Page 17: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

Static provisioning - people

LinuxCon 2016

SysAdmins DBAs

SREs Network Engrs

Tools Engineer

Page 18: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

Problem 3:Wasted capacity

LinuxCon 2016

Page 19: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

Wasted capacity• When you run apps on dedicated host, approx.

20% of resources get utilized (CPU, Memory etc.)

• Remaining all of that goes to waste• Reason being there are no good isolation

techniques to run multiple apps on a single host• Making multi-tenant apps behave on a same

node is a difficult challenge

LinuxCon 2016

Page 20: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

Problem 4: Maintenance Window

LinuxCon 2016

Page 21: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

Maintenance window• Maintenance of infrastructure require days if not weeks of planning• A thorough launch plan is designed• All the stakeholders are cramped up in a war room to handle their

respective parts• A whole army of Ops people get involved if something goes wrong

– SysAdmins– DBAs– Network Engineers– Storage Admins– SREs– Operation Center– Developers– QAs– PMs

• Larger than expected downtime– Frustrations– Tons of overtime pay– Less than favorable work/personal life balance

LinuxCon 2016

Page 22: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

Problem 5: Silo’ed teams

LinuxCon 2016

Page 23: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

Silo’ed teams

• App that gets build by developer is completely different to what runs in prod

• Dev & Ops are completely isolated world.• Ops team massages it, add configurations,

custom deployment tools etc. • Ops have designed checks, UIs, policies

and processes to monitor, scale or view performance of those apps

LinuxCon 2016

Page 24: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

Problem 6: Ops Rigidity

LinuxCon 2016

Page 25: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

Ops Rigidity• Dev have no window into it and they no idea what

happens to their app in prod• Devs are completely agnostic. There is no

feedback mechanism.• That ends up having poorly written apps.• Running apps shouldn't’t be Ops forte.

– Dev know more about their apps• Help Dev run and manage their apps. Ops should

focus on securing and managing the underlying infrastructure and system.

• Empower them. Don’t handicap them.

LinuxCon 2016

Page 26: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

How to make Ops a profit-center?

• Answer is DevOps• I know it is such a cliché and a management

buzz word• There are lots of theories and best practices

floating around how to go about it• None of the DevOps best practices can yield

immediate result

LinuxCon 2016

Page 27: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

DevOps theories• Dev & Ops collaboration• Treating “Infrastructure as Code” (Riley, C.

2014)• Using Automation• Culture• Using Sprints or Agile or Kanban for Ops

work• Using tools like Jenkins, Chef/Puppet,

Vagrant, Docker, etcd etc.

LinuxCon 2016

Page 28: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

How does DevOps look in practice?

• All these theories sound great, but what is the practical solution

• Tell me which tools or suite of tools encompasses all the DevOps best-practices?

• Answer is Mesos

LinuxCon 2016

Page 29: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

Product 3Product 3Product 2Product 1

static partitioning - resources

LinuxCon 2016

Host

Host

Host

Host

Host

Host

Host

Host

Host

Host

Host

Host Host

Host

Host Host

Host

Host Host

Host

Host

Host

Host

Host

Page 30: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

!static partitioning - resources

LinuxCon 2016

Mesos

AppsMessag

e Queues

Build pipeline jobs

Map reduce jobs

Batch processing

jobs

NoSQLdb

RDBMS

Page 31: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

static partitioning - people

Servers

Storage

Applications

Databases

Network

Code

Systems Administrators

Storage Admins

SREs

DBAs

Network Engrs

Developers

Page 32: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

!static partitioning - people

Servers

Storage

Applications

Databases

Network

Code

Systems Administrators

Storage Admins

SREs

DBAs

Network Engrs

Developers

• More secure• HA, Scalable, Fault tolerant• Manage the envt.

• Self serve storage• DFS, NFS or Block storage

solutions

• Visibility for devs• Service discovery solutions

• Persistent storage• Solutions that use persistent storage for

DBs

• Layer 3 virtual networking• Overlay networks• Solutions that provide IPs to containers

• More ops aware• Write better apps that performs in production• Auto-scaling

Page 33: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

Product 3Product 3Product 2Product 1

Maintenance Window

LinuxCon 2016

Host

Host

Host

Host

Host

Host

Host

Host

Host

Host

Host

Host Host

Host

Host Host

Host

Host Host

Host

Host

Host

Host

Host

• Notifications sent out• Whole army of Ops and Dev team is hurdled up• Traffic is shifted• Apps bounced• Revenue loss• And a whole lot of passing the buck & post-mortem analysis

Page 34: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

!Maintenance window

LinuxCon 2016

Mesos

Apps Message Queues

Build pipeline jobs Map reduce jobs Batch processing

jobsNoSQL

db RDBMS

Page 35: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

!Ops Rigidity• In Mesos, everything gets open up• There is no Ops world• Use Marathon to see what is running in

production, # of instances, scale • Use Chronos to submit batch or cron jobs• Run Build pipeline on Jenkins• Run message queue brokers, scale them with the

pool of resources you have• Employ ELK stack to view logs in real-time• Employ metrics solutions to see performance

metrics of your apps (containers)

LinuxCon 2016

Page 36: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

!Wasted Capacity• With Mesos, there is no wasted capacity. • You can technically run 10s or 100s apps on a

single machine• To isolate every apps, containerize them• With containers, you will be able to rate-limit or

meter CPU usage, memory usage, disk IO, network bandwidth utilization, disk usage etc.

LinuxCon 2016

Page 37: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

!Silo’ed teams• Now that everybody uses the same infrastructure

and processes, there are no silo’ed teams.• Ops and Dev also convert their existing apps to

run on this unified cluster.• They get a unified system to view or manage their

apps– Logging– Metrics– Service discovery– App config store– Same security model to secure their apps– Same isolation techniques to run multi-tenant apps

LinuxCon 2016

Page 38: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

How you can benefit?

• Now the next question you have is:– Does Mesos provide all these things out of

the box? No– But there are enterprise solutions from

Mesosphere’s DCOS which will help you jump start.

– Or if you have heterogeneous environment like YP, wherein you run all kinds of app, then develop this solution in-house

LinuxCon 2016

Page 39: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

What are we doing at YP Engineering?

• Our engineering team has drunken DevOps cool aid• In the beginning, it tastes weird like Dr. Pepper but trust me the taste

grows you on quickly.• We are doing all these crazy stuff you saw earlier

– Centralized logging– Performance metrics– Application secrets– App config store– Service discovery– Persistent storage– Real-time analytics

• Running this DevOps’y infrastructure for more than a year• Open source contribution:

www.github.com/yp-engineering

LinuxCon 2016

Page 40: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

Future Ahead• Things that we saw can be intimidating• After all, we are talking about changing things

we have been doing all these years• But this is the future

– Datacenter operating system is big– Containers are big– Isolation is big– DevOps is big

• If we can do that, our Ops will no longer be a COST-CENTER. It will become PROFIT-CENTER, Indeed!!!

LinuxCon 2016

Page 41: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

REFERENCE LIST• Hiner, J. (2014, October 1). IT as profit center versus cost center: State of the

argument. Retrieved from http://www.zdnet.com/article/it-as-profit-center-versus-cost-center-state-of-the-argument/

• [Angry gorilla]. (n.d.). Retrieved from http://onedaylate.com/images/angry_gorilla.png• Riley, C. (2014, May 5). Meet Infrastructure as Code. Retrieved from

http://devops.com/2014/05/05/meet-infrastructure-code/ • Docker: http://www.docker.com• Mesos: http://mesos.apache.org• Mesosphere DCOS: https://mesosphere.com/product/• Marathon: https://mesosphere.github.io/marathon/• Chronos: https://mesos.github.io/chronos/• Jenkins: https://jenkins.io/• Chef: https://www.chef.io/chef/• Puppet: https://puppet.com/• Vagrant: https://www.vagrantup.com/• Etcd: https://github.com/coreos/etcd

LinuxCon 2016

Page 42: IT shouldn’t be a cost-center with Mesos - Schedschd.ws/hosted_files/lcccna2016/d6/Presentation_slide_linuxcon2016.… · IT shouldn’t be a cost-center with Mesos Imran Shaikh

Thank you for listening !!

Q/AImran Shaikh

Lead/Architect

Blog http://elasticcompute.io@imranshaikh

[email protected]

LinuxCon 2016