introduction to stacki - world's fastest linux server provisioning tool

59
Introduction to Stacki Greg Bruno, PhD VP Engineering, StackIQ

Upload: suresh-paulraj

Post on 13-Apr-2017

288 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Introduction to Stacki - World's fastest Linux server provisioning Tool

Introduction to Stacki

Greg Bruno, PhDVP Engineering, StackIQ

Page 2: Introduction to Stacki - World's fastest Linux server provisioning Tool

Open Source Stack Installer

Stacki is a very fast and ultra reliable Linux server provisioning tool … at scale. With zero prerequisites for taking systems from bare metal to a ping and prompt.

Page 3: Introduction to Stacki - World's fastest Linux server provisioning Tool

PayPal

Page 4: Introduction to Stacki - World's fastest Linux server provisioning Tool

Hadoop @ PayPal

12 x 2TB SATA data drives

48 nodes each rack

1GBE-10GBE NICs

24 x 900GB 6G SAS 10K data drives

24 nodes each rack

10GBE NIC

8 x 4TB NR-SAS data drives

10 GBE NIC

BayArea

SaltLakeCity

LasVegas

DATACENTERS

•  3,000 nodes and growing•  60+ initial server racks•  Heterogeneous HW

across multiple DCs

Data Science Infrastructure Footprint

48 nodes each rack

Page 5: Introduction to Stacki - World's fastest Linux server provisioning Tool

Automation Challenge

Spinout creates some datacenter automation challenges …

•  Smaller team but even more to do•  Rethink automation•  Distributed systems have tons of local drives which require

time consuming disk formatting and partitioning, and hardwareRAID config on masternodes

•  New provisioning solution needs to easily, flexibly integrate w/ other commercial, open source, and homegrown management tools

•  Can 100s or 1000s of nodes be (re)provisioned as quickly asone or a few? (e.g., drive failures mean replacing entire hostfrom O/S to disk to network to firmware to … etc)

Page 6: Introduction to Stacki - World's fastest Linux server provisioning Tool

Stacki @ PayPal

Ambari HDPHealth Detection

Integration

IPMI/iLOOS Disk Network DHCP / DNS /TFTP

Ansible

- Disk Array Controller Configuration- Disk Partitioning Configuration

“Stacki + Ansible = Happiness. :D” – Stacki mailing list 8/11/15

Page 7: Introduction to Stacki - World's fastest Linux server provisioning Tool

Quick, Early Success

14 Minutes*To Fully Provision 6 Racks of Bare Metal (288 Servers)

Includes wiping alldisks then fullypartitioning & formatting ~3500 drives

And Now…

Upgrades all firmware automatically

Executes Ansible scripts on all hosts

Hadoop packages installed

* Versus hours with other hyperscale management tools, or days to weeks with traditional tools and processes

Page 8: Introduction to Stacki - World's fastest Linux server provisioning Tool

How We Solve the Problem

Page 9: Introduction to Stacki - World's fastest Linux server provisioning Tool

History • San Diego Supercomputer Center

•  1986 - National Science Foundation •  Along with NCSA only two non-classified centers •  Mission: serve computational scientists

• Rocks •  2000 - First cluster group inside SDSC •  Version 1.0 released that November as open source •  10k+ clusters world-wide

• StackIQ •  2006 - Commercial support for Rocks •  2011 - Venture Backed •  Focus on next generation clustered systems (Data, Cloud)

• Stacki - 2015 •  June – released as open source •  July – first hyper-scale user

Page 10: Introduction to Stacki - World's fastest Linux server provisioning Tool

Must Haves

 Make it – Automatic◦  Think about it, test it. Deploy it. ◦  People don’t scale, software does. Free your people – allow ops guys to be ops/analysis guys, move them from single machine view to

global machine view.

 Make it – Repeatable◦  State of the environment is guaranteed. Does not require homogeneity of hardware or functionality. Make compute environments

homogenous on heterogeneous hardware and software.◦  Really, nothing is homogenous. Environment maybe, behavior of that environment on different machines while predictable will not be the

same across all hardware. Stacki gets you flexibility and predictability.

 Make it – Reliable◦  You always get what you want when you want it. You can make reasonable estimates of need because you’ve made the environment

predictable and repeatable. Just like science!

 Make it – Comprehensive◦  Manage application layer(s) down to kernels and device configuration with one tool. Never hit the network unconfigured.◦  Provide turn-key deployment with reasonable default settings and ability to customize / re-wire as desired.

Page 11: Introduction to Stacki - World's fastest Linux server provisioning Tool

Stacki Positioning

DevOps / Configuration Tool

DHCP /DNS / TFTPNetworkDiskOS

In-housedevelopeddeployment

tools

- Disk Array Controller Configuration- Disk Partitioning Configuration

Page 12: Introduction to Stacki - World's fastest Linux server provisioning Tool

Datacenter Architecture

Frontend

Network

Backend Backend Backend Backend

em1 em1em1 em1

em1

Page 13: Introduction to Stacki - World's fastest Linux server provisioning Tool

Download and Boot the ISO

Go to www.stacki.com and download the ISO ◦  It’s 1.8 GB ◦  “stacki” pallet plus stripped down CentOS 6.6

Boot the ISO on the host that will be your frontend

Page 14: Introduction to Stacki - World's fastest Linux server provisioning Tool
Page 15: Introduction to Stacki - World's fastest Linux server provisioning Tool
Page 16: Introduction to Stacki - World's fastest Linux server provisioning Tool
Page 17: Introduction to Stacki - World's fastest Linux server provisioning Tool
Page 18: Introduction to Stacki - World's fastest Linux server provisioning Tool
Page 19: Introduction to Stacki - World's fastest Linux server provisioning Tool
Page 20: Introduction to Stacki - World's fastest Linux server provisioning Tool

Frontend Services

Services to build backend nodes ◦  DHCP ◦  TFTP ◦  Named (optional)

Services to access backend nodes ◦  SSH key management ◦  Parallel execution shell

Page 21: Introduction to Stacki - World's fastest Linux server provisioning Tool

Host Configuration Spreadsheet

Page 22: Introduction to Stacki - World's fastest Linux server provisioning Tool

Frontend

Network

Backend Backend Backend Backend

em1 em1em1 em1

em1

Backend Installation

Save your Host Configuration spreadsheet as a CSV Import CSV on frontend ◦  “stack load hostfile file=hosts.csv”

Tell backend nodes to install on their next PXE boot ◦  “stack set host boot backend action=install”

PXE boot all backend nodes Done!

Page 23: Introduction to Stacki - World's fastest Linux server provisioning Tool

BitTorrent-Inspired Package Installation

Stacki

Page 24: Introduction to Stacki - World's fastest Linux server provisioning Tool

Customizing Your Hosts

Page 25: Introduction to Stacki - World's fastest Linux server provisioning Tool

Advanced Networking

Via Host Configuration spreadsheet, you can configure: ◦  Bonded interfaces ◦  VLANs ◦  Bridging ◦  Any combo of the above

Manage hosts in multiple subnets ◦  Build a single cluster from hosts in multiple subnets ◦  Manage hosts in multiple datacenters

Page 26: Introduction to Stacki - World's fastest Linux server provisioning Tool

Host Configuration Spreadsheet

Page 27: Introduction to Stacki - World's fastest Linux server provisioning Tool

Disk Controller Configuration Spreadsheet

Page 28: Introduction to Stacki - World's fastest Linux server provisioning Tool

Disk Partition Configuration Spreadsheet

Page 29: Introduction to Stacki - World's fastest Linux server provisioning Tool

Multiple Distributions

A frontend houses a default distribution ◦  Based on stripped down CentOS 6.6 or 7.1 ◦  Used to build backend nodes

Can add any number of new distributions to a frontend ◦  E.g., RHEL 6.x based distro, CentOS 6.5, etc.

Assign any backend node to any distro

Page 30: Introduction to Stacki - World's fastest Linux server provisioning Tool

Why is this hard and important?

Page 31: Introduction to Stacki - World's fastest Linux server provisioning Tool

Datacenter Architecture

Frontend

Network

Backend Backend Backend Backend

em1 em1em1 em1

em1

Page 32: Introduction to Stacki - World's fastest Linux server provisioning Tool

Datacenter Host Software Stack

DevOps / Configuration Tool

DHCP /DNS / TFTPNetworkDiskOS

In-housedevelopeddeployment

tools

- Disk Array Controller Configuration- Disk Partitioning Configuration

Page 33: Introduction to Stacki - World's fastest Linux server provisioning Tool

The “Step 0” Problem Check namenodes are

empty Format/start HDFS

Create all directories

Create all metastores

Start services (Hbase, Hive, Oozie, Sqoop, Impala, etc)

Deploy client configuration Configure database

Setup/assign monitors (activity, services, and host)

Test database connections

Validate/resolve hostnamesConsistent host timezones

No bad kernel versions running

(CDH) version consistency

Java version consistencyDaemons versions consistency

Mgmt Agents versions consistency

Host specification/SSH ports

MUCH MORE …

DHCP Server/Client setup TFTP/PXE configuration

Server OS installation

Node OS Install

RAID configuration

Boot configuration System/data disk partitioning

Monitoring system setup and config

Lights Out/IPMI setup

User accounts added and syncedSSH keys on all hosts

Network node configuration

Config Mgmt install and configuration

Route configurationOS upgrades/updates

Site specific software and configuration

Host specification/SSH ports

Security

Firewall setupCluster Mgmt utility Database install and config

Multiple network configPackage installation MUCH MORE …

Page 34: Introduction to Stacki - World's fastest Linux server provisioning Tool

Clusters are Different

Adding new servers does require coordination

Newly added servers must: •  Have same software stack as original

servers •  Have same configuration as original

servers •  Know about original servers

And, original servers must: •  Know about new servers

Result: The management complexity added to the Operations staff is “exponential”

Page 35: Introduction to Stacki - World's fastest Linux server provisioning Tool

Exponential Complexity

Number of Servers

Man

agem

ent C

ompl

exity

General Data Center

Clusters

Page 36: Introduction to Stacki - World's fastest Linux server provisioning Tool

The Pain Curve

Number of Servers

Man

agem

ent C

ompl

exity

General Data Center

Clusters

PAIN

Page 37: Introduction to Stacki - World's fastest Linux server provisioning Tool

The Pain Threshold

The pain threshold differs for every organization Function of:

•  cluster(s) size •  number of people in Operations •  Operations staff cluster expertise

Page 38: Introduction to Stacki - World's fastest Linux server provisioning Tool

Moore’s Law

50 1 2 3 4

8

1

2

3

4

5

6

7

Time (Years)

Den

sity

18 monthdoubling

Page 39: Introduction to Stacki - World's fastest Linux server provisioning Tool

Moore’s Law and Infrastructure Value

Page 40: Introduction to Stacki - World's fastest Linux server provisioning Tool

What it Means for You

50 1 2 3 4

100

0

10

20

30

40

50

60

70

80

90

Time (Years)

Valu

e (%

)

3 months90% value

18 months50% value

Page 41: Introduction to Stacki - World's fastest Linux server provisioning Tool

Time is Money

The clock starts ticking when hosts land on your loading dock

Without your applications online, you have an paper weight that consumes power, cooling, and management’s attention

Page 42: Introduction to Stacki - World's fastest Linux server provisioning Tool

Try It Out

Page 43: Introduction to Stacki - World's fastest Linux server provisioning Tool

stacki.com

Download - www.stacki.com

Source & Docs - github.com/StackIQ/stacki/wiki

Discuss - groups.google.com/forum/#!forum/stacki

Page 44: Introduction to Stacki - World's fastest Linux server provisioning Tool

PayPal’s Options

Bring what we used at former parent company eBay with us.

Build our own soups-to-nuts bespoke bare metal provisioning tool.

Find the perfect open source tool that we can use and grow with.

Not Possible

Not Optimal

Not Likely

Page 45: Introduction to Stacki - World's fastest Linux server provisioning Tool

Quick, Early Success

2 Weeks Instead of 2 YearsTo Build a Scale-out Management Solution

1.  Installed Stacki Frontend (base management server) Ran test installations of backend servers 1.  Single Server test 2.  Full Rack test (48 nodes)

2.  Updated distribution (CentOS 6.6) to install additional packages

3.  Integrated IPMI information into Stacki 1.  Can now ssh into all IPMI consoles from the Stacki

frontend host using <hostname>.ipmi 4.  Re-ran with PayPal kickstart changes/additions and was

able to image 6 racks in 14 minutes, including: 1.  Nuking disks/partitions and running a full format of all

data drives

5.  Updated the Stacki post-boot piece to do the following: 1.  Upgrade firmware if host needs it 2.  Runs PayPal Ansible playbook, which:

1.  Installs additional packages 2.  Creates user accounts 3.  Disables unused services 4.  Sets up resolver/ntp/syslog-ng/sudoers/limits.

d/sysctl/etc. 5.  Installs/configures Ambari agents 6.  Checks data drive mounts, fstab 7.  Prepares the rack to be added to a Hadoop

cluster

PayPal development with Stacki includes:

Page 46: Introduction to Stacki - World's fastest Linux server provisioning Tool

DevOps Agnostic

DevOps / Configuration Tool

DHCP /DNS / TFTPNetworkDiskOS

In-housedevelopeddeployment

tools

- Disk Array Controller Configuration- Disk Partitioning Configuration

Page 47: Introduction to Stacki - World's fastest Linux server provisioning Tool

The “Step 0” Problem Check namenodes are

empty Format/start HDFS

Create all directories

Create all metastores

Start services (Hbase, Hive, Oozie, Sqoop, Impala, etc)

Deploy client configuration Configure database

Setup/assign monitors (activity, services, and host)

Test database connections

Validate/resolve hostnamesConsistent host timezones

No bad kernel versions running

(CDH) version consistency

Java version consistencyDaemons versions consistency

Mgmt Agents versions consistency

Host specification/SSH ports

MUCH MORE …

DHCP Server/Client setup TFTP/PXE configuration

Server OS installation

Node OS Install

RAID configuration

Boot configuration System/data disk partitioning

Monitoring system setup and config

Lights Out/IPMI setup

User accounts added and syncedSSH keys on all hosts

Network node configuration

Config Mgmt install and configuration

Route configurationOS upgrades/updates

Site specific software and configuration

Host specification/SSH ports

Security

Firewall setupCluster Mgmt utility Database install and config

Multiple network configPackage installation MUCH MORE …

App Config

Site Config

HW Install

System Performance ValidationBare Metal Installers

Hadoop Mgmt Tool

Upgrades/Patching

Disk Configuration

Monitoring Tool

Configuration Tool

Network/Site Config ToolsSystems Mgmt Tool

Others …

MANUAL

SEMI-AUTOMATED TOOLCHAIN(w/o StackIQ)

w/StackIQFULLY AUTOMATED

Page 48: Introduction to Stacki - World's fastest Linux server provisioning Tool

StackIQ Boss

Page 49: Introduction to Stacki - World's fastest Linux server provisioning Tool

Configuration Database

 Server appliance types (e.g. data, namenode, tomcat, …)

 Number of CPUs  Disk partitioning

 Hardware RAID config

 PCI bus information  …

 And other System Attributes

Page 50: Introduction to Stacki - World's fastest Linux server provisioning Tool

Attributes

 Global ◦  stack set attr

 Appliance ◦  stack set appliance attr

 OS ◦  stack set os attr

 Host ◦  stack set host attr

Page 51: Introduction to Stacki - World's fastest Linux server provisioning Tool

Kickstart Profiles

Page 52: Introduction to Stacki - World's fastest Linux server provisioning Tool

Zoom In

Page 53: Introduction to Stacki - World's fastest Linux server provisioning Tool

Starting from the Empty Set

  { }

Page 54: Introduction to Stacki - World's fastest Linux server provisioning Tool

{ os }

© 2009 UC Regents

Page 55: Introduction to Stacki - World's fastest Linux server provisioning Tool

{ os, core }

© 2009 UC Regents

Page 56: Introduction to Stacki - World's fastest Linux server provisioning Tool

{ os, core, kernel }

© 2009 UC Regents

Page 57: Introduction to Stacki - World's fastest Linux server provisioning Tool

{ os, core, kernel, mapr }

© 2009 UC Regents

Page 58: Introduction to Stacki - World's fastest Linux server provisioning Tool

Manage the Deltas

{os, core, kernel, mapr} {os, core, kernel, horton}

© 2009 UC Regents

Page 59: Introduction to Stacki - World's fastest Linux server provisioning Tool

stacki.com

 @masonkatz