badcamp 2008 db sync

85
Database Synchronization Shaun Haber Warner Bros. Records

Upload: shaun-haber

Post on 08-Dec-2014

2.386 views

Category:

Technology


0 download

DESCRIPTION

http://badcamp.net/session/database-synchronization

TRANSCRIPT

Page 1: BADCamp 2008 DB Sync

Database Synchronization

Shaun HaberWarner Bros. Records

Page 2: BADCamp 2008 DB Sync

What is it?

• Merging content between a dev site and a production site

Page 3: BADCamp 2008 DB Sync

Disclaimer

• No single answer

• No “Drupally” solution

• Not exclusive to Drupal

• Not magic

Page 4: BADCamp 2008 DB Sync

Who the hell am I?

Page 5: BADCamp 2008 DB Sync
Page 6: BADCamp 2008 DB Sync
Page 7: BADCamp 2008 DB Sync

Warner Music Group

Page 8: BADCamp 2008 DB Sync

Warner Bros. Records

• Subsidiary of Warner Music Group

• Family of labels (Reprise, Sire, etc.)

• Over 100 artists

• Top-selling albums

• It’s music biz after all!

Page 9: BADCamp 2008 DB Sync

So what?

Page 10: BADCamp 2008 DB Sync

WBR Tech

• Only label with an in-house Tech team

• “Start-up” mentality

• Fast-paced, hectic, and fun!

• We use Drupal... religiously

Page 11: BADCamp 2008 DB Sync

93 Drupal Sites1 new site every week

Page 12: BADCamp 2008 DB Sync

Launching like crazy!

Page 13: BADCamp 2008 DB Sync
Page 14: BADCamp 2008 DB Sync

Source: http://flickr.com/photos/krosinsky/2848288562/

Web sites in the wild!

Page 15: BADCamp 2008 DB Sync

Websites in the wild

• Always collecting new data!

Page 16: BADCamp 2008 DB Sync

Time

Data Launch

Page 17: BADCamp 2008 DB Sync

Not a bad thing, obviously

• Want websites to grow

• More users + more data = PROFIT

Page 18: BADCamp 2008 DB Sync

But...

• How do we keep the site updated?

- New content

- New features

- Code fixes

- <insert your own update here>

Page 19: BADCamp 2008 DB Sync

Source: http://flickr.com/photos/nimboo/132386298

Minor updates

Major updates

Page 20: BADCamp 2008 DB Sync

Minor Updates

• CSS tweak

• template.php change

• Add a new Block

• Change settings on a View

• Install a new module

Page 21: BADCamp 2008 DB Sync

Major Updates

• Schema changes

• Information re-architecture

• Significant configuration changes

• User flow changes

• New theme integration

Page 22: BADCamp 2008 DB Sync

Maintain a separate Dev site!

Strategy?

Page 23: BADCamp 2008 DB Sync

Time

Dev server

Prod server

New

Page 24: BADCamp 2008 DB Sync

Time

Dev server

Prod server

New QA

Page 25: BADCamp 2008 DB Sync

Time

Dev server

Prod server

New QA

Prod

Dev

Page 26: BADCamp 2008 DB Sync

Time

Dev server

Prod server

New QA

Prod

Dev

Prod

Page 27: BADCamp 2008 DB Sync

Time

Dev server

Prod server

New QA

Prod

Dev

Prod

Dev

Page 28: BADCamp 2008 DB Sync

Time

Dev server

Prod server

New QA

Prod

Dev

Prod

Dev

?

Page 29: BADCamp 2008 DB Sync

Syncing Databases Sucks

Code Easy

Files Easy

Database Hard

Page 30: BADCamp 2008 DB Sync

Time

Dev server

Prod server

New QA

Prod

Dev

Prod

Dev

Page 31: BADCamp 2008 DB Sync

Time

Dev server

Prod server

New QA

Prod

Dev

Prod

Dev

Prod 2.0

Page 32: BADCamp 2008 DB Sync

Order of Events

1. Develop a new site

2. Launch site

3. Take snapshot of prod site

4. Develop on snapshot

5. Magic? => Relaunch new version of site

Page 33: BADCamp 2008 DB Sync

But it’s not Magic!1. Take dev site down

2. Shift sequenced IDs on Dev

3. Take prod site down

4. Merge content from Prod to Dev

5. QA “new” dev site

6. Copy dev site to prod site

7. Bring “new” prod site live

Page 34: BADCamp 2008 DB Sync

It’s Database Surgery!

Source: http://flickr.com/photos/interplast/6339098/

Page 35: BADCamp 2008 DB Sync

2 Step Process

• Step 1 - Shift Sequenced IDs

• Step 2 - Merge content

Page 36: BADCamp 2008 DB Sync

1

2

3

Page 37: BADCamp 2008 DB Sync

1

2

3

1

2

3

Page 38: BADCamp 2008 DB Sync

1

2

3

1

2

3

4

5

4

5

6

Page 39: BADCamp 2008 DB Sync

1

2

3

1

2

3

10

11

4

5

6

Page 40: BADCamp 2008 DB Sync

1

2

3

1

2

3

10

11

4

5

6

4

5

6

Page 41: BADCamp 2008 DB Sync

1

2

3

1

2

3

10

11

4

5

6

4

5

6

Page 42: BADCamp 2008 DB Sync

1

3

1

2a

3

10

11

4

5

6

4

5

6

2a

Page 43: BADCamp 2008 DB Sync

1

3

1

2a

3

10

11

4

5

6

4

5

6

2a

Page 44: BADCamp 2008 DB Sync

Step 1 - Shifting IDs

• comments_cid

• files_fid

• node_revisions_vid

• node_nid

• users_uid

Page 45: BADCamp 2008 DB Sync

Need to know

• Highest common ID between Dev and Prod

• Delta value to shift

• Reference of known tables and fields

Page 46: BADCamp 2008 DB Sync

Highest Common ID

• Top item on the “stack” at time of the snapshot.

1

2

3

1

2

33

Page 47: BADCamp 2008 DB Sync

Delta value

• Amount to shift the conflicted items, with extra padding

3

10

11

7

Page 48: BADCamp 2008 DB Sync

UPDATE tableSET id = id + $delta WHERE id > $common

Page 49: BADCamp 2008 DB Sync

And that’s it for Step 1

Page 50: BADCamp 2008 DB Sync

Actually, it’s MUCH more complicated...

Page 51: BADCamp 2008 DB Sync

What tables have nid?comments.nidcontent_field_* nid.field_*_nidcontent_type_* nid.field_*_nidfiles.nidforum.nidforward_log.nidhistory.nidnode.nidnode_access.nidnode_comment_statistics.nidnode_counter.nidnode_revisions.nidnodefamily.parent_nid, child_nidpanels_node.nid

poll.nidpoll_choices.nidpoll_votes.nidterm_node.niduc_cart_products.niduc_order_products.niduc_product_features.niduc_products.niduc_roles_products.nidusernode.nidwebform.nidwebform_component.nidwebform_submissions.nidwebform_submitted_data.nid

Page 52: BADCamp 2008 DB Sync

Also...

• Special tables:

• location, sequences, url_alias, etc.

• node-nid.tpl.php

• Serialized PHP variables in DB

• PHP code in DB

• URLs in DB or elsewhere (e.g., /node/123)

Page 53: BADCamp 2008 DB Sync

Well shit!

Page 54: BADCamp 2008 DB Sync

Do the best we can!

• Reference of all known tables

• Reference of all known sequence fields

• Reference of all known “special cases”

• Automate as much as possible

Page 55: BADCamp 2008 DB Sync

Scripting Time!

Page 56: BADCamp 2008 DB Sync

Check for unknown tables

$rs = db_query(“SHOW TABLES”);

while ($row = db_fetch_row($rs)) { if (!is_known_table($row[0]) { log_unknown_table($row[0]); }}

if (found_unknown_tables()) { print_unknown_tables(); exit;}

Page 57: BADCamp 2008 DB Sync

Store all known tables in a txt file

accessaccesslogaudio_widget_thumbnailaudio_widget_trackauthmapblocksblocks_rolesboxesbuddylistbuddylist_buddy_group

buddylist_groupsbuddylist_pending_requestscache*commentscontactcontent_field_*content_type_*devel_queriesdevel_times...

Page 58: BADCamp 2008 DB Sync
Page 59: BADCamp 2008 DB Sync

Store all fields in separate txt files

comments.nidcontent_field_* nid.field_*_nidcontent_type_* nid.field_*_nidfiles.nidforum.nidforward_log.nidhistory.nidnode.nidnode_access.nid

node_comment_statistics.nidnode_counter.nidnode_revisions.nidnodefamily.parent_nid, child_nidpanels_node.nidpoll.nidpoll_choices.nidpoll_votes.nid...

Page 60: BADCamp 2008 DB Sync

Now we can shift IDs!

• Iterate thru DB tables

• If table has known fields, shift IDs (remember that SQL command?)

• Rinse and repeat for each sequenced ID

Page 61: BADCamp 2008 DB Sync

UPDATE tableSET id = id + $delta WHERE id > $common

Page 62: BADCamp 2008 DB Sync

Special Cases

Page 63: BADCamp 2008 DB Sync

Sequences table

• Simply reset the value to new highest ID

• Do this after shifting IDs in the “primary” table (node.nid, user.uid, etc.)

Page 64: BADCamp 2008 DB Sync

UPDATE sequencesSET `$seq` = $max

Page 65: BADCamp 2008 DB Sync

Location table

• Stores ID val in column `eid`

• Stores sequence type in column `type`

• type = node, user

Page 66: BADCamp 2008 DB Sync

UPDATE location SET `eid` = `eid` + $delta WHERE `eid` > $commonAND `type` = $type

Page 67: BADCamp 2008 DB Sync

Url_alias table

• ID values are embedded as strings

• Use pattern matching to parse the ID

• node: node/nid

• user: user/uid, blog/uid

• Add the delta, update new alias

Page 68: BADCamp 2008 DB Sync

Pseudo-code

SELECT * FROM url_alias WHERE src LIKE ‘node/%’

preg_match('/node\/([0-9]*)/', $src, $matches)

$id = $matches[1]

$id = $id + $delta

UPDATE url_alias SET src = 'node/$id' WHERE pid = $pid

Page 69: BADCamp 2008 DB Sync

Manually

• Rename any node-nid.tpl.php files

• Search for ID vals in DB:

• Eval’ed PHP code

• Serialized PHP code

• URLs

• anything else?

Page 70: BADCamp 2008 DB Sync

Step 1 Recap

• Maintain indexes for tables and fields

• Automate using the indexes

• Review indexes before each shift

• Inspect for manual cases after each shift

• Document every new case you find!

Page 71: BADCamp 2008 DB Sync

At least most of this can be automated!

Page 72: BADCamp 2008 DB Sync

Step 2 - Merging Content

Page 73: BADCamp 2008 DB Sync

Merging Content

3 3

10

4

5

6

4

5

6

Page 74: BADCamp 2008 DB Sync

What to merge?

• Content

• Really, just the content

• No variables, settings, etc.

Page 75: BADCamp 2008 DB Sync

Need to know

• Highest Common ID (same from Step 1)

• Reference of tables

Page 76: BADCamp 2008 DB Sync

Process

• Iterate thru Prod tables:

• Skip

• INSERT IGNORE (I)

• REPLACE (R)

• DROP and INSERT (A)

Page 77: BADCamp 2008 DB Sync
Page 78: BADCamp 2008 DB Sync

Special Cases

• Url_alias table

• Sequences table

• Some nodes

Page 79: BADCamp 2008 DB Sync

Url_Alias table

• Don’t go by pid

• REPLACE INTO url_alias SET src = '$src', dst = '$dst'

Page 80: BADCamp 2008 DB Sync

Sequences table

• Manually inspect sequence values!

Page 81: BADCamp 2008 DB Sync

Node timestamps

• Get timestamp of Highest Common nid

• Check for older nodes on Prod that have been modified recently

Page 82: BADCamp 2008 DB Sync

SELECT nidFROM nodeWHERE changed > $timestampAND nid > $common

Replace on Dev with

Page 83: BADCamp 2008 DB Sync

That’s it... for now.

Page 84: BADCamp 2008 DB Sync

Future

• Share sequences table between Dev and Prod

• Even/odd IDs (Drupal 6+)

• Macro recordings and playbacks