online schema changes for maximizing uptime

34
Online Schema Changes for Maximizing Uptime PALOMINODB OPERATIONAL EXCELLENCE FOR DATABASES David Turner Ben Black www.palominodb.com

Upload: pythian

Post on 27-Jun-2015

948 views

Category:

Technology


0 download

DESCRIPTION

PalominoDB's David Turner and Ben Black cover common operations implemented in production and how you can minimize downtime and customer impact

TRANSCRIPT

Page 1: Online Schema Changes for Maximizing Uptime

Online Schema Changes

for Maximizing Uptime

PALOMINODB OPERATIONAL EXCELLENCE

FOR DATABASES

David Turner

Ben Black

www.palominodb.com

Page 2: Online Schema Changes for Maximizing Uptime

Palominodb: Who are we and what do we do?

• Proactive database awesomeness so you

can focus on other things

• Systems and Performance Optimization

• Operational Best Practices

• Devops

• Ask us how we like working from anywhere

Page 3: Online Schema Changes for Maximizing Uptime

Online Schema Changes

• Best practices

• Online Schema Tools

• PT Online Schema Change

• Innodb Online DDL

• Case Study

Page 4: Online Schema Changes for Maximizing Uptime

Best Practices

• Backups

• Benchmark o Start small

o Step up

o Time

o Performance_schema and Information_schema

Page 5: Online Schema Changes for Maximizing Uptime

Repository

• Prevent

o Broken Replication

o Data loss

o Broken applications

Page 6: Online Schema Changes for Maximizing Uptime

Creating the repository

• Dump

• Script

Page 7: Online Schema Changes for Maximizing Uptime

Syncing the repository

• Migrations

• Schema Changes

• Scheduled

Page 8: Online Schema Changes for Maximizing Uptime

Problems

• Schema changes

• Fragmentation

Page 9: Online Schema Changes for Maximizing Uptime

You want to do what to that 40G

table?

• Down for Maintenance/blocking DDL

• Disable writes

• Remaster

• Facebook OSC

• Large Hadron Migrator

• Openark-kit

• Pt Online Schema Change

• MySQL Innodb DDL - IOD

Page 10: Online Schema Changes for Maximizing Uptime

OSC Tools or how I learned

to love the alter...

• copy original table structure to new table

• alter new table

• create triggers to copy dml from original

table to new table

• copy data in original table to new table in

chunks

• swap table names and drop original table

Page 11: Online Schema Changes for Maximizing Uptime

pt-online-schema-change

Percona toolkit is your friend.

Mature product

Pay attention to version you are using!!!

FK issues, log_bin with 2.0

screen - use it along with tee and time

Page 12: Online Schema Changes for Maximizing Uptime

It's all ball bearings nowadays

--progress time,10 (default 30 seconds)

--max-lag

--recursion-method (how to find slaves)

show processlist

show hosts

dsn table (great for ignoring some slaves)

Page 13: Online Schema Changes for Maximizing Uptime

What could possibly go wrong?

• PK/UK required

• FK names will change on altered table

• FK's reference table to alter

--alter-foreign-keys-method drop_swap

sets foreign_key_checks=0

drops original table (hardlink!!!)

renames new table

Page 14: Online Schema Changes for Maximizing Uptime

What could possibly go wrong?

• non xfs table drops (create the hardlink!)

o Can even cause innodb to crash mysqld

• PKs with gaps

• Largest table to alter vs free disk space

• Disk space (2x for RBR)

• Global mutexes (table drops)

• table metadata locks (triggers)

Page 15: Online Schema Changes for Maximizing Uptime

NOOOOOO!!

• Running PT-OSC against a slave with RBR

Replication started erroring after pt-osc

why???

• And how about syncing a table using pt-osc

with RBR?

Page 16: Online Schema Changes for Maximizing Uptime

Set it and forget it???

• watch

o w

o df -h

o ls -alh ibd files

o mysql -e"show processlist;"|egrep -v "Sleep|repl"

o slave lag

• How is it affecting the application?

• max-lag, max-load,

Page 17: Online Schema Changes for Maximizing Uptime

Features we like

• Throttling

• Nibbling

• Reporting

• Replicating

• Reorgs

Page 18: Online Schema Changes for Maximizing Uptime

Innodb Online DDL - IOD

• Looking back 5.1 and 5.5

o Fast index creation

• Enhancements

o Online DDL

o Inplace

Page 19: Online Schema Changes for Maximizing Uptime

Silver Bullets

http://dev.mysql.com/doc/refman/5.6/en/innodb-

create-index-overview.html#innodb-online-

ddl-summary-grid

• Drop and add Index together

• Temporary table index creation

Page 20: Online Schema Changes for Maximizing Uptime

Alter Online lock modes

• Exclusive

• Shared

• Default

• None

* Note

Page 21: Online Schema Changes for Maximizing Uptime

Alter Online Algorithms

• Inplace

• Copy

Page 22: Online Schema Changes for Maximizing Uptime

Grouping DDL

• Copy

• Sequence

• Lock

Page 23: Online Schema Changes for Maximizing Uptime

IOD Behavior

• Metadata

• Data volume

• Type of index

• Foreign keys

• Partitioning

o Add partition, drop partition

o Truncate partition

o Add partition and coalesce partition

• Algorithm

• Autoincrement

• Locking

Page 24: Online Schema Changes for Maximizing Uptime

IOD Errors

• Locking

• Timeouts

• Tmpdir

• DML

Page 25: Online Schema Changes for Maximizing Uptime

IOD Crash Recovery

• Secondary Indexes

• Clustered Indexes (don't crash!)

Page 26: Online Schema Changes for Maximizing Uptime

Of Interest

• Tmpdir

• Dropping Indexes with Foreign keys

• Cascade on

• Inconsistent .frm

Page 27: Online Schema Changes for Maximizing Uptime

Monitoring

• Rows Affected

• Performance_schema and

Information_schema

• Progress

• Replication lag

Page 28: Online Schema Changes for Maximizing Uptime

IOD Server parms

• innodb_online_alter_log_max_size

• sort_buffer_size

Page 29: Online Schema Changes for Maximizing Uptime

Case Study

• Ticketing system

Page 30: Online Schema Changes for Maximizing Uptime

Architecture

• 1 master

• 1 active slave

• 1 analytics/indexing slave

• 1 backup/reorg slave

Page 31: Online Schema Changes for Maximizing Uptime

Issues

• Growth

• Rebalancing Shards

• Deletions

• Reorgs

Page 32: Online Schema Changes for Maximizing Uptime

Timeline

• Difficult failovers

• Ticket Growth

• FB OSC

• PT OSC

• Reorgs

Page 33: Online Schema Changes for Maximizing Uptime
Page 34: Online Schema Changes for Maximizing Uptime

Questions

David Turner

[email protected]

Ben Black

[email protected]

www.palominodb.com