openstack at ebsco

Post on 13-Apr-2017

88 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

OpenStack at Ebsco

Nate Baechtold, IT ArchitectEbsco Information ServicesAugust 23, 2016

2

Bulleted List

• The leading discovery service provider for libraries worldwide with more than 10,000 discovery customers in over 100 countries.

• Preeminent provider of online research content for libraries, including hundreds of research databases, historical archives, point-of-care medical reference, and corporate learning tools serving millions of end users at tens of thousands of institutions.

• Leading provider of electronic journals & books for libraries, with more than 360,000 serials, including more than 57,000 e-journals, as well as online access to more than 800,000 e-books.

3

What did we need?

• Self service infrastructure to all development teams.• Full stack automation to all environments.• Increase agility and productivity of operations and development

teams.• Lower costs by leveraging open source solutions.• Provide a solution that integrates well with other products and

allows other products and tools to easily integrate with it.

4

Why OpenStack?

• Easy to consume API that commoditizes infrastructure with the same methodology used by public clouds.

• Abstraction of underlying infrastructure allowing for configuration or hardware differences to not propagate to consumers and automation.

• Standardized interface for compute, network and storage• When software supports OpenStack it tends to “just work”

• Allows us to build an IaaS platform fit for live services and safely hand out access to diverse teams through built in project isolation.

• Prefer to tell consumers that “if you break it then it is our fault” rather than giving them a long list of things that they should never do.

5

Current Scale

• 3 OpenStack clouds• Approximately 1100 running

instances• Almost 500,000 instances

created and destroyed since general availability

• 68% of workloads concentrated in development environments

• Around 1/3 of all virtualized workloads currently on OpenStack

68%

10%

22%

Distribution By Running Instance

DevQa Live DC 1 Live DC 2

6

Design Philosophy

• Build a platform to run production applications.• Multi-tenant at its core

• Should be able to safely support development and operations teams sharing the same cloud.

• All tools needed to build a highly available production application need to be available

• Good enough for development but not production is not an acceptable permanent state.

• Build general purpose solutions. Customize as little as possible.• Provide an easy menu of infrastructure offerings

• Easy to use solution with safeguards to encourage experimentation• Development is easier when you don’t need to worry about breaking the environment

7

Current ArchitectureEbsco Private Cloud Platform

OpenStack CloudMonitoring

Operations

Dashboards NovaNeutron CinderGlance

Keystone Heat Ceilometer Horizon

Load Balancing

What we learned…

9

Problems to Solve:• Skills and training• Selection of vendors and

integrations• Deployment• Adoption• Productionization

10

Skills and training:Our Experiences• Internally develop a core group of

OpenStack SMEs before progressing too far. • Do not waste learning opportunities by

relying to much on professional services.• Look for candidates with strong Linux,

networking, virtualization and python skills rather than OpenStack experience.

• Give your team the time and opportunity to experiment and learn how OpenStack works.

• Vendor support lowers the amount of expertise you need to go to production.

• OpenStack skills are VERY hard to hire

• Administration requires good Linux experience

• Inexperienced administrators can cause huge amounts of damage

11

Vendors and integrations:

Our Experiences• Prefer products that align with OpenStack’s

multi-tenancy model whenever possible.• Focus on vendors building for cloud rather

than trying to integrate it afterwards.• Look at areas to improve everywhere in the

stack. Re-evaluate your product decisions. There is high value when an integration is done right.

• You will not know how good a vendor’s integration is until you try it. There can be many hidden landmines with missing capabilities or API support.

• Tons of vendor integrations with varying degrees of quality

• Many established vendors

• Users need access to everything that they need to deploy and manage a highly available production application

12

Case Study – Existing Load Balancing

• Existing vendor had limited OpenStack knowledge and bare bones integration at the time.

• Actual quote from support after a bug was discovered (vendor specific lines edited)

• “For now, to avoid a failover, I would recommend to program the OpenStack not to delete IPs.”

• LBaaS v1 was extremely limited. Would not have covered all production use cases.

• Product did not support safe multi-tenancy. There were shared resources that were a point of failure.

• Prolonged evaluation period of 6-8 months resulting in rejection.

13

Case Study – Cloud Load Balancer (AVI)

• Installation involves providing OpenStack credentials and it handles the rest.

• Allowed us to make production grade load balancing generally available in development within a week and produciton within a month.

• Multi-tenancy model aligns with OpenStack Projects and with keystone• Nobody had to ask for access. If you had access to OpenStack then you have

access to a load balancing services.• No fighting with permissions or concerns with preventing untrained users from

damaging the environment.

14

Problems to Solve:Our Experiences• Align resources for storage, networking and

datacenter teams and make sure that someone on each team will make troubleshooting installation issues a top priority.

• OpenStack requires tight integration with all of these elements. A slow troubleshooting feedback loop will have a very negative effect on the deployment.

• Understand what deployment choices are difficult to change afterwards and make sure that you got them right.

• Assume multiple tries to get a production ready configuration.

• Deployment• Deployments take a

long time and are complex

• Some OpenStack functionality is not ready for production

15

Problems to Solve:Our Experiences• Have a close relationship with your early adopters.

They will help you increase the resiliency of your deployment.

• Regularly speak with them in person to help them understand OpenStack and to let them tell you about issues before they become a problem.

• Get deployments into your users hands as soon as possible.

• Do not stall getting to production. Teams will not want to code to an API that they cannot use in production.

• Adoption will be limited until you can get production availability.

• Solving problems “just for development environments” is the wrong mentality.

• Early feedback is critical.

• Adoption• Adoption is one of the

most critical elements to success.

16

Problems to Solve:Our Experiences• Monitor OpenStack by actually using

OpenStack. Build instances and use OpenStack functionality to detect failures.

• OpenStack is very complex and understanding the effect of a failure can be difficult.

• If you monitor by using OpenStack you will catch most failures before your users do and know what functionality is impacted.

• Automate common operational and maintenance tasks.

• OpenStack HA is complex but needed for all environments.

• Productionizaton• OpenStack provides

building blocks but some assembly is required to build a product out of it.

• Monitoring and common operational tasks are not solved out of the box.

What we did…

18

Phased Environments…

Prototype• Single machine all

in one deployment• Learn basics• Validate direction• Disposable

environment

Interim• Break apart compute

and control • Limited release to

early adopters• Get feedback and

determine desired configuration

DevQa• Highly available

environment • Treated like production• General availability for

development workloads• Determine

producitonization tasks needed

Production• Implement

productionizaiton tasks

• Deploy production clouds

19

What wound up happening…

Prototype Interim DevQa Production

20

Took too long to get to production…• Critical team member left

• Took too long finding a replacement due to focus on hiring OpenStack skillset.

• Additional work for monitoring and operations automation were required before we were confident hosting production workloads.

• Required skillsets that were not a part of the OpenStack team and focused manpower.

21

Solution: Create a focus squad

• Kicked of a 6 week effort with a cross-functional team that had all required skills.

• This team would focus 100% on getting OpenStack to live.• OpenStack tasks must be top priority for all team members.• Director quote “Set your email to out of office if you have to”

• The focused effort was incredibly efficient.• Feedback loops for troubleshooting massively reduced.• Reduction of blocked tasks created a higher quality implementation.

22

What the focus squad do?

• Created a reliable monitoring solution based on Zabbix and a python framework for executing OpenStack checks.

• Created automated recovery for problems discovered in DevQa.• Automated compute node evacuation• Automated failed OpenStack service recovery

• Increased visibility into the environment with Zabbix and Grafana.• Automated common operational tasks to push button jobs in Rundeck.

• Taking a compute or control node out of service• Restarting OpenStack services

• Deployed all production OpenStack, Zabbix and Rundeck infrastructure.

23

Tracking Success…

• Critical to getting continued commitment but hard to determine.• We track the following metrics:

• Instance count and resource usage• Number of teams and products leveraging OpenStack• The number of instances created and deleted

• This can be a good indicator as to whether OpenStack was the right fit for your organization. Indicates people using automation as opposed to manual usage.

Thank YouQuestons?

top related