online schema changes for maximizing uptime

Post on 27-Jun-2015

948 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

PalominoDB's David Turner and Ben Black cover common operations implemented in production and how you can minimize downtime and customer impact

TRANSCRIPT

Online Schema Changes

for Maximizing Uptime

PALOMINODB OPERATIONAL EXCELLENCE

FOR DATABASES

David Turner

Ben Black

www.palominodb.com

Palominodb: Who are we and what do we do?

• Proactive database awesomeness so you

can focus on other things

• Systems and Performance Optimization

• Operational Best Practices

• Devops

• Ask us how we like working from anywhere

Online Schema Changes

• Best practices

• Online Schema Tools

• PT Online Schema Change

• Innodb Online DDL

• Case Study

Best Practices

• Backups

• Benchmark o Start small

o Step up

o Time

o Performance_schema and Information_schema

Repository

• Prevent

o Broken Replication

o Data loss

o Broken applications

Creating the repository

• Dump

• Script

Syncing the repository

• Migrations

• Schema Changes

• Scheduled

Problems

• Schema changes

• Fragmentation

You want to do what to that 40G

table?

• Down for Maintenance/blocking DDL

• Disable writes

• Remaster

• Facebook OSC

• Large Hadron Migrator

• Openark-kit

• Pt Online Schema Change

• MySQL Innodb DDL - IOD

OSC Tools or how I learned

to love the alter...

• copy original table structure to new table

• alter new table

• create triggers to copy dml from original

table to new table

• copy data in original table to new table in

chunks

• swap table names and drop original table

pt-online-schema-change

Percona toolkit is your friend.

Mature product

Pay attention to version you are using!!!

FK issues, log_bin with 2.0

screen - use it along with tee and time

It's all ball bearings nowadays

--progress time,10 (default 30 seconds)

--max-lag

--recursion-method (how to find slaves)

show processlist

show hosts

dsn table (great for ignoring some slaves)

What could possibly go wrong?

• PK/UK required

• FK names will change on altered table

• FK's reference table to alter

--alter-foreign-keys-method drop_swap

sets foreign_key_checks=0

drops original table (hardlink!!!)

renames new table

What could possibly go wrong?

• non xfs table drops (create the hardlink!)

o Can even cause innodb to crash mysqld

• PKs with gaps

• Largest table to alter vs free disk space

• Disk space (2x for RBR)

• Global mutexes (table drops)

• table metadata locks (triggers)

NOOOOOO!!

• Running PT-OSC against a slave with RBR

Replication started erroring after pt-osc

why???

• And how about syncing a table using pt-osc

with RBR?

Set it and forget it???

• watch

o w

o df -h

o ls -alh ibd files

o mysql -e"show processlist;"|egrep -v "Sleep|repl"

o slave lag

• How is it affecting the application?

• max-lag, max-load,

Features we like

• Throttling

• Nibbling

• Reporting

• Replicating

• Reorgs

Innodb Online DDL - IOD

• Looking back 5.1 and 5.5

o Fast index creation

• Enhancements

o Online DDL

o Inplace

Silver Bullets

http://dev.mysql.com/doc/refman/5.6/en/innodb-

create-index-overview.html#innodb-online-

ddl-summary-grid

• Drop and add Index together

• Temporary table index creation

Alter Online lock modes

• Exclusive

• Shared

• Default

• None

* Note

Alter Online Algorithms

• Inplace

• Copy

Grouping DDL

• Copy

• Sequence

• Lock

IOD Behavior

• Metadata

• Data volume

• Type of index

• Foreign keys

• Partitioning

o Add partition, drop partition

o Truncate partition

o Add partition and coalesce partition

• Algorithm

• Autoincrement

• Locking

IOD Errors

• Locking

• Timeouts

• Tmpdir

• DML

IOD Crash Recovery

• Secondary Indexes

• Clustered Indexes (don't crash!)

Of Interest

• Tmpdir

• Dropping Indexes with Foreign keys

• Cascade on

• Inconsistent .frm

Monitoring

• Rows Affected

• Performance_schema and

Information_schema

• Progress

• Replication lag

IOD Server parms

• innodb_online_alter_log_max_size

• sort_buffer_size

Case Study

• Ticketing system

Architecture

• 1 master

• 1 active slave

• 1 analytics/indexing slave

• 1 backup/reorg slave

Issues

• Growth

• Rebalancing Shards

• Deletions

• Reorgs

Timeline

• Difficult failovers

• Ticket Growth

• FB OSC

• PT OSC

• Reorgs

Questions

David Turner

dturner@palominodb.com

Ben Black

ben@palominodb.com

www.palominodb.com

top related